3.5. Selecting a Task Scheduler

We have seen in the previous examples how the radical.pilot.TaskManager matches submitted tasks to pilots for execution. On constructing the task manager, it can be configured to use a specific scheduling policy for that. The following policies are implemented:

  • rp.SCHEDULER_ROUND_ROBIN: alternate tasks between all available pilot. This policy leads to a static and fair, but not necessarily load-balanced task assignment.
  • rp.SCHEDULER_BACKFILLING: dynamic task scheduling based on pilot capacity and availability. This is the most intelligent scheduler with good load balancing, but it comes with a certain scheduling overhead.

An important element to consider when discussing task scheduling is pilot startup time: pilot jobs can potentially sit in batch queues for a long time, or pass quickly, depending on their size and resource usage, resource policies, etc. Any static assignment of tasks will not be able to take that into account – and the first pilot may have finished all its work before a second pilot even came up.

This is what the backfilling scheduler tries to address: it only schedules tasks once the pilot is available, and only as many as a pilot can execute at any point in time. As this requires close communication between pilot and scheduler, that scheduler will incur a runtime overhead for each task – so that is only advisable for heterogeneous workloads and/or pilot setups, and for long running tasks.

04_scheduler_selection.py shows an exemplary scheduling selector, with the following diff to the previous multi-pilot example:


It will select Round Robin scheduling for two pilots, and Backfilling for three or more.

Using multiple pilots is very powerful but it becomes more powerful if you allow RP to load-balance tasks between them. Selecting a Task Scheduler will show how to do just that.