Ok, so you’ve got a few of your clusters’ users that, if they’re allowed to run wild, will fill up your scheduler’s queue with thousands or millions of jobs. But you can’t let them do it, because putting a million (or even several thousand) jobs into the queue causes a lot of scheduling overhead and slows everybody down. Not only that, but because your scheduler has so much work to do, it’s actually hurting your cluster utilization. Nitro can help alleviate these problems. I’d like to highlight a few of the types of users that Nitro can help you with:
Steve is assigned to work his way through your extensive catalog of products and do numerous simulations of each product to see if it can be made stronger, cheaper, and use less material. To do this he needs to run 50,000 simulations with various permutations that he batches up every Tuesday. And every Tuesday, right after he submits his 50,000 jobs, all of your users start complaining that it’s taking a very long time to submit new jobs, get status, or see their jobs make any progress in the queue. In fact you’ve noticed that cluster utilization actually drops by 10% starting on Tuesday and gradually increases back to the normal 93% by Thursday afternoon when most of Steve’s jobs have completed. So you’ve talked to Steve about only submitting 10,000 jobs per week, but he’s having a hard time combining the jobs and separating the results – it’s costing him time and slowing the company down. Steve’s initiative is saving the company millions of dollars, so you want to accommodate him as best you can.
Nitro solves this problem by allowing Steve to create one task file with all of the simulation command lines in it, and he can submit 1, that’s right — ONE, job each week. No more scheduling nightmares, just one single job and Nitro takes care of completing the tasks one after the other on as many nodes as you want to dedicate to Steve’s job. There’s no wasted time between tasks. Where normally the resource manager has to finish a job, run the job cleanup script and wait for another job to be assigned, Nitro just keeps blasting the jobs to the allocated nodes as fast as they can finish them.
Machine Gun Kelly
Kelly runs Monte Carlo simulations, millions of them, every week. Shes constantly probing corporate bond offerings against various economic scenarios looking for the best bonds to include in the fund that she manages. She starts several thousand jobs a week one job per offering, but she’s complaining that she can’t keep up with all of the bonds that she’d like to evaluate. You’ve been working with Kelly to try to automate the submission of a new bond offering, but each offering has to be another job that adds scheduling overhead.
Nitro offers the capability to create a base pool of nodes, then send a batch of tasks to them and get results through an API. This way as soon as a new bond offering is identified and the Monte Carlo simulation configured, it can be submitted to the Nitro pool and executed with no scheduling overhead. If more resources need to be added periodically to handle a burst of activity, nodes can easily be added to the Nitro session then removed after a period of time, or when activity drops below a threshold.
Cade has thousands of long-running jobs (typically about 4 days long) that he submits throughout the week. Scheduling performance is good, but he’s a steady cluster customer with a constant stream of jobs. You noticed that response from the scheduler (and positive reactions from other cluster customers) was great while he was on vacation. Yes, Nitro can even help with Cade! Just because tasks are very long doesn’t mean that Nitro can’t do them. In fact Nitro can help consolidate Cade’s thousands of jobs into one file that he can just append to when he has a new job to run, and Nitro will pick up the new task and put it into its queue. Cade’s job doesn’t ever have to end, and now all of your other cluster customers are always completely content, not just when Cade is on vacation!