3.1. Getting Started

In this section we walk you through the basics of using RADICAL-Pilot (RP). We describe how to launch a local ComputePilot and use a UnitManager to schedule and run ComputeUnits (i.e., tasks) on local and remote resources.

Note

The reader is assumed to be familiar with the general concepts of RP, as described in RADICAL-Pilot (RP) - Overview for reference.

Note

This chapter assumes that the reader has successfully installed RADICAL-Pilot and configured access to the resources on which to execute the code examples (see chapter Installation).

Note

We colloquially refer to RADICAL-Pilot as RP, ComputePilot as pilot, and ComputeUnit as unit.

Download the basic code example 00_getting_started.py. The text below explains the most important sections of that code, showing the expected output from the execution of the example. Please look carefully at the code comments as they explain some aspects of the code which we do not explicitly cover in the text.

3.1.1. Loading the RP Module, Follow the Application Execution

In order to use RADICAL-Pilot, you need to import the radical.pilot module in your Python script or application. Note that we use the rp abbreviation for the module name:

import radical.pilot as rp

All code examples of this guide use the reporter facility of RADICAL-Utils to print well formatted runtime and progress information. You can control that output with the RADICAL_PILOT_REPORT variable, which can be set to TRUE or FALSE to enable / disable reporter output. We assume the setting to be TRUE when referencing any output in this chapter.

os.environ['RADICAL_REPORT'] = 'TRUE'

import radical.pilot as rp
import radical.utils as ru

report = ru.Reporter(name='radical.pilot')
report.title('Getting Started (RP version %s)' % rp.version)

3.1.2. Creating a Session

A radical.pilot.Session is the root object for all other objects in RADICAL-Pilot. radical.pilot.PilotManager and radical.pilot.UnitManager instances are always attached to a Session, and their lifetime is controlled by the session.

A Session also encapsulates the connection(s) to a back end MongoDB server which facilitates the communication between the RP application and the remote pilot jobs. More information about how RADICAL-Pilot uses MongoDB can be found in the RADICAL-Pilot (RP) - Overview section.

To create a new Session, you need to provide the URL of a MongoDB server. If no MongoDB URL is specified on session creation, RP attempts to use the value specified via the RADICAL_PILOT_DBURL environment variable.

os.environ['RADICAL_PILOT_DBURL'] = 'mongodb://<host>:<port>/<db_name>'

session = rp.Session()

Warning

Always call radical.pilot.Session.close() before your application terminates to terminate all lingering pilots. You can use the function argument cleanup=True to delete the entries of the session from the database. If you need to retain those data, use the function argument download=True.

3.1.3. Creating ComputePilots

Pilots are created via a radical.pilot.PilotManager, by passing a radical.pilot.ComputePilotDescription. The most important elements of the ComputePilotDescription are:

  • resource: a label which specifies the target resource, either local or remote, on which to run the pilot, i.e., the machine on which the pilot executes;
  • cores : the number of CPU cores the pilot is expected to manage, i.e., the size of the pilot;
  • runtime : the numbers of minutes the pilot is expected to be active, i.e., the runtime of the pilot.

Depending on the specific target resource and use case, other properties need to be specified. In our user guide examples, we use a separate config.json file to store a number of properties per resource label, to simplify the code of the examples. The examples themselves then accept one or more resource labels, and create the pilots on those resources:

# read the config
config = ru.read_json('%s/config.json' % os.path.dirname(os.path.abspath(__file__)))

# use the resource specified as an argument, fall back to localhost
try   : resource = sys.argv[1]
except: resource = 'local.localhost'

# create a pilot manager in the session
pmgr = rp.PilotManager(session=session)

# define an [n]-core pilot that runs for [x] minutes
pdesc = rp.ComputePilotDescription({
        'resource'      : resource,
        'runtime'       : 10,                         # pilot runtime (min)
        'cores'         : config[resource]['cores'],  # pilot size
        'project'       : config[resource]['project'],
        'queue'         : config[resource]['queue'],
        'access_schema' : config[resource]['schema']
})

# submit the pilot for launching
pilot = pmgr.submit_pilots(pdesc)

For a list of available resource labels, see List of Pre-Configured Resources (not all of those resources are configured for the user guide examples). For further details on the pilot description, please check the API Documentation.

Note

Pilots terminate when calling the function radical.pilot.Session.close() or radical.pilot.Pilot.cancel(). The argument terminate=False of radical.pilot.Session.close() let the pilot run for all its indicated duration, possibly after that the Python application has exited.

3.1.4. Submitting ComputeUnits

Each ComputeUnit is similar to an operating system process, consisting of an executable, a list of arguments, and an environment along with some runtime requirements.

Analogous to pilots, a unit is described via a radical.pilot.ComputeUnitDescription object. This object has two mandatory properties:

  • executable - the executable to launch
  • cores - the number of cores required by the executable

Our example creates 128 units, each running the executable /bin/date:

n    = 128   # number of units to run
cuds = list()
for i in range(0, n):
    # create a new CU description, and fill it.
    cud = rp.ComputeUnitDescription()
    cud.executable = '/bin/date'
    cuds.append(cud)

Units are executed by pilots. The radical.pilot.UnitManager class is responsible for routing those units from the application to the available pilots. The UnitManager accepts ComputeUnitDescriptions as we created above and assigns them, according to some scheduling algorithm, to the set of available pilots for execution (pilots are made available to a UnitManager via the add_pilot call):

# create a unit manager, submit units, and wait for their completion
umgr = rp.UnitManager(session=session)
umgr.add_pilots(pilot)
umgr.submit_units(cuds)
umgr.wait_units()

3.1.5. Running the Example

Note

Remember to set RADICAL_PILOT_DBURL in you environment (see chapter Installation).

Running the example should result in an output similar to the one shown below:

../_images/00_getting_started.png

The runtime of the example can vary significantly. Typically, the first run on any resource for a specific user is the longest because RP requires to set up a Python virtualenv for the pilot. Subsequent runs may update that virtualenv, or may install additional components as needed, but that should take less time than its creation. The Virtualenv creation process should take few minutes on the first execution, depending on your network connectivity, the connectivity of the target resource, and the location of the MongoDB service.

3.1.6. What’s Next?

The next section (Obtaining Unit Details) describes how an application can inspect completed units to extract information about states, exit codes, and standard output and error.