RADICAL-Pilot (RP) is a Pilot Job system written in Python. It allows a user to run large numbers of computational tasks (called ComputeUnits) concurrently on one or more remote ComputePilots that RADICAL-Pilot can start transparently on a multitude of different distributed resources, like HPC clusters and Clouds.
In this model, a part (slice) of a resource is acquired by a user’s application so that the application can directly schedule ComputeUnits into that resource slice, rather than going through the system’s job scheduler. In many cases, this can drastically shorten overall exeuction time as the individual ComputeUnits don’t have to wait in the system’s scheduler queue but can execute directly on the ComputePilots.
ComputeUnits can be sequential, multi-threaded (e.g. OpenMP) or parallel process (e.g. MPI) executables.