Describing Tasks

The notion of tasks is fundamental in RADICAL-Pilot as tasks define the work to be executed on a supported HPC platform. This notebook will guide the user through the various task types available in RADICAL-Pilot, and how to specify their respective payload. It will also show some means to inspect tasks after (successful or failed) execution.

Warning: We assume that you are familiar with deploying, configuring and using RADICAL-Pilot, for example by taking the getting started introduction tutorial.

Warning: All examples in this notebook are executed locally on a GNU/Linux host. The host needs to have MPI installed - OpenMPI, MPICH, MVAPICH or any other MPI flavor is supported as long as it provides a standard compliant mpiexec command. See the documentation of your GNU/Linux distribution on how to install MPI.

Let’s have a quick check that we have MPI launch method installed.

[1]:
import radical.utils as ru

mpi_lm_exists = bool(ru.which(['mpirun', 'mpiexec']))

First, some preparatory work for the tutorial. We import some modules and set some variables. Note that we import radical.pilot as rp so to abbreviate future API calls.

[2]:
import os
import sys
import pprint

# do not use animated output in notebooks
os.environ['RADICAL_REPORT_ANIME'] = 'False'

import radical.pilot as rp

# determine the path of the currently active virtualenv to simplify some examples below
ve_path = os.path.dirname(os.path.dirname(ru.which('python3')))
display(ve_path)
'/home/docs/checkouts/readthedocs.org/user_builds/radicalpilot/envs/stable'

Initial setup and Pilot Submission

As showed in the introductory tutorials, we will first configure the reporter output, then set up an RADICAL-Pilot session, create pilot and task manager instances and run a small local pilot with 10 cores and 1 GPU assigned to it.

[3]:
# configure reporter output
report = ru.Reporter(name='radical.pilot')
report.title('Tutorial: Describing Tasks (RP version %s)' % rp.version)

# create session and managers
session = rp.Session()
pmgr    = rp.PilotManager(session)
tmgr    = rp.TaskManager(session)

# submit a pilot
pilot = pmgr.submit_pilots(rp.PilotDescription({'resource'     : 'local.localhost',
                                                'runtime'      : 60,
                                                'cores'        : 32,
                                                'gpus'         : 1,
                                                'exit_on_error': True}))

# add the pilot to the task manager and wait for the pilot to become active
tmgr.add_pilots(pilot)
pilot.wait(rp.PMGR_ACTIVE)
report.info('pilot state: %s' % pilot.state)

================================================================================
 Tutorial: Describing Tasks (RP version 3f5bd0a@HEAD)
================================================================================

pilot state: PMGR_ACTIVE

Task execution

At this point we have the system set up and ready to execute our workload. To do so we describe the tasks of which the workload is comprised and submit them for execution. The goal of this tutorial is to introduce the various attributes available for describing tasks, to explain the execution process in some detail, and to describe how completed or failed tasks can be inspected.

RP Executable Tasks vs. Raptor Tasks

RADICAL-Pilot is, in the most general sense, a pilot-based task execution backend. Its implementation focuses on executable tasks, i.e., on tasks which are described by an executable, it’s command line arguments, in- and output files, and by its execution environment.

A more general task execution engine called ‘Raptor’ is additionally provided as part of RADICAL-Pilot. Raptor can additionally execute function tasks, i.e., tasks which are defined by a function code entry point, function parameters and return values. This tutorial that you are reading right now, focuses on executable tasks. Raptor’s additionally supported task types are the topic of the tutorial Raptor: executing Python functions at scale.

Task Descriptions

The rp.TaskDescription class is, as the name suggests, the basis for all task descriptions in RADICAL-Pilot. Its most important attribute is mode: for executable tasks the mode must be set to rp.TASK_EXECUTABLE, which is the default setting.

Executable tasks have exactly one additional required attribute: executable, i.e, the name of the executable. That can be either an absolute path to the executable on the file system of the target HPC platform, or it can be a plain executable name which is known at runtime in the task’s execution environment (we will cover the execution environment setup further down below).

[4]:
# create a minimal executable task
td   = rp.TaskDescription({'executable': '/bin/date'})
task = tmgr.submit_tasks(td)

The task will be scheduled for execution on the pilot we created above. We now wait for the task to complete, i.e., to reach one of the final states DONE, CANCELED or FAILED:

[5]:
tmgr.wait_tasks()
[5]:
['DONE']

Congratulations, you successfully executed a RADICAL-Pilot task!

Task Inspection

Once completed, we can inspect the tasks for details of their execution: we print a summary for all tasks and then inspect one of them in more detail. The output shows a number of task attributes which can be set by the task description. Those are specifically:

  • uid: a unique string identifying the task. If not defined in the task description, RP will generate an ID which is unique within the scope of the current session.

  • name: a common name for the task which has no meaning to RP itself but can be used by the application to identify or classify certain tasks. The task name is not required to be unique.

  • metadata: any user defined data. The only requirement is that the data are serializable via msgpack, which RP internally uses as serialization format. Note that metadata are communicated along with the task itself and, as such, they should usually be very small bits of data to not deteriorate performance.

It is very application dependent what task attributes are useful: you may not need most of those in your specific applications. But for example: task.stdout and task.stderr provide a quick and easy ways to scan the task results without the need to explicit data staging, and the task.task_sandbox is useful if your application employs out-of-band data management and needs access to the task output files.

[6]:
report.plain('uid             : %s\n' % task.uid)
report.plain('tmgr            : %s\n' % task.tmgr.uid)
report.plain('pilot           : %s\n' % task.pilot)
report.plain('name            : %s\n' % task.name)
report.plain('executable      : %s\n' % task.description['executable'])
report.plain('state           : %s\n' % task.state)
report.plain('exit_code       : %s\n' % task.exit_code)
report.plain('stdout          : %s\n' % task.stdout.strip())
report.plain('stderr          : %s\n' % task.stderr)
report.plain('return_value    : %s\n' % task.return_value)
report.plain('exception       : %s\n' % task.exception)
report.plain('\n')
report.plain('endpoint_fs     : %s\n' % task.endpoint_fs)
report.plain('resource_sandbox: %s\n' % task.resource_sandbox)
report.plain('session_sandbox : %s\n' % task.session_sandbox)
report.plain('pilot_sandbox   : %s\n' % task.pilot_sandbox)
report.plain('task_sandbox    : %s\n' % task.task_sandbox)
report.plain('client_sandbox  : %s\n' % task.client_sandbox)
report.plain('metadata        : %s\n' % task.metadata)
uid             : task.000000
tmgr            : tmgr.0000
pilot           : pilot.0000
name            :
executable      : /bin/date
state           : DONE
exit_code       : 0
stdout          : Thu Apr 18 05:30:15 UTC 2024
stderr          :
return_value    : None
exception       : None

endpoint_fs     : file://localhost/
resource_sandbox: file://localhost/home/docs/radical.pilot.sandbox
session_sandbox : file://localhost/home/docs/radical.pilot.sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002
pilot_sandbox   : file://localhost/home/docs/radical.pilot.sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002/pilot.0000/
task_sandbox    : file://localhost/home/docs/radical.pilot.sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002/pilot.0000/task.000000/
client_sandbox  : /home/docs/checkouts/readthedocs.org/user_builds/radicalpilot/checkouts/stable/docs/source/tutorials
metadata        : {'exec_pid': [9032], 'rank_pid': [9039], 'launch_pid': 9025}

All applications can fail, often for reasons out of control of the user. A Task is no different, it can fail as well. Many non-trivial application will need to have a way to handle failing tasks. Detecting the failure is the first and necessary step to do so, and RP makes that part easy: RP’s task state model defines that a failing task will immediately go into FAILED state, and that state information is available as the task.state property.

Note: Depending on when the failure happen, the task may also have a value for the task.stderr property. That will enable to further inspect the causes of the failure. task.stderr will only be available if the task reached the EXECUTING state before failing. See the task state model for more information.

Let us submit a new set of tasks and inspect the failure modes. We will scan /bin/date for acceptable single letter arguments:

[7]:
import string
letters = string.ascii_lowercase + string.ascii_uppercase

report.progress_tgt(len(letters), label='create')

tds = list()
for letter in letters:
    tds.append(rp.TaskDescription({'executable': '/bin/date',
                                   'arguments': ['-' + letter]}))
    report.progress()

report.progress_done()

tasks = tmgr.submit_tasks(tds)
create: ########################################################################

This time, we wait only for the newly submitted tasks. We then find which ones succeeded and check their resulting output. Spoiler alert: We will find 3 valid single-letter options.

[8]:
tmgr.wait_tasks([task.uid for task in tasks])

for task in tasks:
    if task.state == rp.DONE:
        print('%s: %s: %s' % (task.uid, task.description['arguments'], task.stdout.strip()))

task.000021: ['-u']: Thu Apr 18 05:30:16 UTC 2024
task.000035: ['-I']: 2024-04-18
task.000044: ['-R']: Thu, 18 Apr 2024 05:30:16 +0000

By changing the state we check for from rp.DONE to rp.FAILED, we can inspect the error messages for the various tested flags (in task.stderr):

[9]:
tmgr.wait_tasks([task.uid for task in tasks])

for task in tasks:
    if task.state == rp.FAILED:
        print('%s: %s: %s' % (task.uid, task.description['arguments'], task.stderr.strip()))
task.000001: ['-a']: /bin/date: invalid option -- 'a'
Try '/bin/date --help' for more information.
task.000002: ['-b']: /bin/date: invalid option -- 'b'
Try '/bin/date --help' for more information.
task.000003: ['-c']: /bin/date: invalid option -- 'c'
Try '/bin/date --help' for more information.
task.000004: ['-d']: /bin/date: option requires an argument -- 'd'
Try '/bin/date --help' for more information.
task.000005: ['-e']: /bin/date: invalid option -- 'e'
Try '/bin/date --help' for more information.
task.000006: ['-f']: /bin/date: option requires an argument -- 'f'
Try '/bin/date --help' for more information.
task.000007: ['-g']: /bin/date: invalid option -- 'g'
Try '/bin/date --help' for more information.
task.000008: ['-h']: /bin/date: invalid option -- 'h'
Try '/bin/date --help' for more information.
task.000009: ['-i']: /bin/date: invalid option -- 'i'
Try '/bin/date --help' for more information.
task.000010: ['-j']: /bin/date: invalid option -- 'j'
Try '/bin/date --help' for more information.
task.000011: ['-k']: /bin/date: invalid option -- 'k'
Try '/bin/date --help' for more information.
task.000012: ['-l']: /bin/date: invalid option -- 'l'
Try '/bin/date --help' for more information.
task.000013: ['-m']: /bin/date: invalid option -- 'm'
Try '/bin/date --help' for more information.
task.000014: ['-n']: /bin/date: invalid option -- 'n'
Try '/bin/date --help' for more information.
task.000015: ['-o']: /bin/date: invalid option -- 'o'
Try '/bin/date --help' for more information.
task.000016: ['-p']: /bin/date: invalid option -- 'p'
Try '/bin/date --help' for more information.
task.000017: ['-q']: /bin/date: invalid option -- 'q'
Try '/bin/date --help' for more information.
task.000018: ['-r']: /bin/date: option requires an argument -- 'r'
Try '/bin/date --help' for more information.
task.000019: ['-s']: /bin/date: option requires an argument -- 's'
Try '/bin/date --help' for more information.
task.000020: ['-t']: /bin/date: invalid option -- 't'
Try '/bin/date --help' for more information.
task.000022: ['-v']: /bin/date: invalid option -- 'v'
Try '/bin/date --help' for more information.
task.000023: ['-w']: /bin/date: invalid option -- 'w'
Try '/bin/date --help' for more information.
task.000024: ['-x']: /bin/date: invalid option -- 'x'
Try '/bin/date --help' for more information.
task.000025: ['-y']: /bin/date: invalid option -- 'y'
Try '/bin/date --help' for more information.
task.000026: ['-z']: /bin/date: invalid option -- 'z'
Try '/bin/date --help' for more information.
task.000027: ['-A']: /bin/date: invalid option -- 'A'
Try '/bin/date --help' for more information.
task.000028: ['-B']: /bin/date: invalid option -- 'B'
Try '/bin/date --help' for more information.
task.000029: ['-C']: /bin/date: invalid option -- 'C'
Try '/bin/date --help' for more information.
task.000030: ['-D']: /bin/date: invalid option -- 'D'
Try '/bin/date --help' for more information.
task.000031: ['-E']: /bin/date: invalid option -- 'E'
Try '/bin/date --help' for more information.
task.000032: ['-F']: /bin/date: invalid option -- 'F'
Try '/bin/date --help' for more information.
task.000033: ['-G']: /bin/date: invalid option -- 'G'
Try '/bin/date --help' for more information.
task.000034: ['-H']: /bin/date: invalid option -- 'H'
Try '/bin/date --help' for more information.
task.000036: ['-J']: /bin/date: invalid option -- 'J'
Try '/bin/date --help' for more information.
task.000037: ['-K']: /bin/date: invalid option -- 'K'
Try '/bin/date --help' for more information.
task.000038: ['-L']: /bin/date: invalid option -- 'L'
Try '/bin/date --help' for more information.
task.000039: ['-M']: /bin/date: invalid option -- 'M'
Try '/bin/date --help' for more information.
task.000040: ['-N']: /bin/date: invalid option -- 'N'
Try '/bin/date --help' for more information.
task.000041: ['-O']: /bin/date: invalid option -- 'O'
Try '/bin/date --help' for more information.
task.000042: ['-P']: /bin/date: invalid option -- 'P'
Try '/bin/date --help' for more information.
task.000043: ['-Q']: /bin/date: invalid option -- 'Q'
Try '/bin/date --help' for more information.
task.000045: ['-S']: /bin/date: invalid option -- 'S'
Try '/bin/date --help' for more information.
task.000046: ['-T']: /bin/date: invalid option -- 'T'
Try '/bin/date --help' for more information.
task.000047: ['-U']: /bin/date: invalid option -- 'U'
Try '/bin/date --help' for more information.
task.000048: ['-V']: /bin/date: invalid option -- 'V'
Try '/bin/date --help' for more information.
task.000049: ['-W']: /bin/date: invalid option -- 'W'
Try '/bin/date --help' for more information.
task.000050: ['-X']: /bin/date: invalid option -- 'X'
Try '/bin/date --help' for more information.
task.000051: ['-Y']: /bin/date: invalid option -- 'Y'
Try '/bin/date --help' for more information.
task.000052: ['-Z']: /bin/date: invalid option -- 'Z'
Try '/bin/date --help' for more information.

MPI Tasks and Task Resources

So far, we run single-core tasks. The most common way for application to utilize multiple cores and nodes on HPC machines is to use MPI as a communication layer, which coordinates multiple application processes, i.e., MPI ranks. In fact, the notion of ranks is central to RP’s TaskDescription class. All MPI ranks will be near-exact copies of each other: they run in the same work directory and the same environment, are defined by the same executable and arguments, get the same amount of resources allocated, etc. Notable exceptions are:

  • rank processes may run on different nodes;

  • rank processes can communicate via MPI;

  • each rank process obtains a unique rank ID.

It is up to the underlying MPI implementation to determine the exact value of the process’ rank ID. The MPI implementation may also set a number of additional environment variables for each process.

It is important to understand that only applications which make use of MPI should have more than one rank – otherwise identical copies of the same application instance are launched which will compute the same results, thus wasting resources for all ranks but one. Worse: I/O-routines of these non-MPI ranks can interfere with each other and invalidate those results.

Also: applications with a single rank cannot make effective use of MPI - depending on the specific resource configuration, RP may launch those tasks without providing an MPI communicator.

The following rank-related attributes are supported by RADICAL-Pilot:

  • ranks: the number of MPI ranks (application processes) to start

  • cores_per_rank: the number of cores each rank can use for spawning additional threads or processes

  • gpus_per_rank: the number of GPUs each rank can utilize

  • mem_per_rank: the size of memory (in Megabytes) which is available to each rank

  • lfs_per_rank: the amount of node-local file storage which is available to each rank

  • threading_type: how to inform the application about available resources to run threads on

    • rp.OpenMP: define OMP_NUM_THREADS in the task environment

  • gpu_type: how to inform the application about available GPU resources

    • rp.CUDA: define CUDA_VISIBLE_DEVICES in the task environment

The next example uses the radical-pilot-hello.sh command as a test to report on rank creation.

Note: No core pinning is performed on localhost. Thus, tasks see all CPU cores as available to them. However, the THREADS information still reports the correct number of assigned CPU cores.

Note: If there is no MPI launch method installed, then we will proceed with a single rank.

[10]:
tds = list()
for n in range(4):
    ranks = (n + 1) if mpi_lm_exists else 1
    tds.append(rp.TaskDescription({'executable'    : ve_path + '/bin/radical-pilot-hello.sh',
                                   'arguments'     : [n + 1],
                                   'ranks'         : ranks,
                                   'cores_per_rank': (n + 1),
                                   'threading_type': rp.OpenMP}))
    report.progress()

report.progress_done()

tasks = tmgr.submit_tasks(tds)
tmgr.wait_tasks([task.uid for task in tasks])

for task in tasks:
    print('--- %s:\n%s\n' % (task.uid, task.stdout.strip()))
....

--- task.000053:
0 : PID     : 10638
0 : NODE    : build-24104019-project-13481-radicalpilot
0 : CPUS    : 00
0 : GPUS    :
0 : RANK    : 0
0 : THREADS : 1
0 : SLEEP   : 1

--- task.000054:
1 : PID     : 10671
1 : NODE    : build-24104019-project-13481-radicalpilot
1 : CPUS    : 00
1 : GPUS    :
1 : RANK    : 1
1 : THREADS : 2
1 : SLEEP   : 2
0 : PID     : 10673
0 : NODE    : build-24104019-project-13481-radicalpilot
0 : CPUS    : 00
0 : GPUS    :
0 : RANK    : 0
0 : THREADS : 2
0 : SLEEP   : 2

--- task.000055:
1 : PID     : 10603
1 : NODE    : build-24104019-project-13481-radicalpilot
1 : CPUS    : 00
1 : GPUS    :
1 : RANK    : 1
1 : THREADS : 3
1 : SLEEP   : 3
0 : PID     : 10613
0 : NODE    : build-24104019-project-13481-radicalpilot
0 : CPUS    : 00
0 : GPUS    :
0 : RANK    : 0
0 : THREADS : 3
0 : SLEEP   : 3
2 : PID     : 10614
2 : NODE    : build-24104019-project-13481-radicalpilot
2 : CPUS    : 00
2 : GPUS    :
2 : RANK    : 2
2 : THREADS : 3
2 : SLEEP   : 3

--- task.000056:
1 : PID     : 10503
1 : NODE    : build-24104019-project-13481-radicalpilot
1 : CPUS    : 00
1 : GPUS    :
1 : RANK    : 1
1 : THREADS : 4
1 : SLEEP   : 4
0 : PID     : 10504
0 : NODE    : build-24104019-project-13481-radicalpilot
0 : CPUS    : 00
0 : GPUS    :
0 : RANK    : 0
0 : THREADS : 4
0 : SLEEP   : 4
2 : PID     : 10525
2 : NODE    : build-24104019-project-13481-radicalpilot
2 : CPUS    : 00
2 : GPUS    :
2 : RANK    : 2
2 : THREADS : 4
2 : SLEEP   : 4
3 : PID     : 10559
3 : NODE    : build-24104019-project-13481-radicalpilot
3 : CPUS    : 00
3 : GPUS    :
3 : RANK    : 3
3 : THREADS : 4
3 : SLEEP   : 4

Task Data Management

The TaskDescription supports diverse means to specify the task’s input/out data and data-related properties:

  • stdout: path of the file to store the task’s standard output in

  • stderr: path of the file to store the task’s standard error in

  • input_staging: list of file staging directives to stage task input data

  • output_staging: list of file staging directives to stage task output data

Let us run an example task which uses those 4 attributes: we run a word count on /etc/passwd (which we stage as input file) and store the result in an output file (which we fetch back). We will also stage back the files in which standard output and standard error are stored (although in this simple example both are expected to be empty).

[11]:

td = rp.TaskDescription({'executable' : '/bin/sh', 'arguments' : ['-c', 'cat input.dat | wc > output.dat'], 'stdout' : 'task_io.out', 'stderr' : 'task_io.err', 'input_staging' : [{'source': '/etc/passwd', 'target': 'input.dat'}], 'output_staging': [{'source': 'output.dat', 'target': '/tmp/output.test.dat'}, {'source': 'task_io.out', 'target': '/tmp/output.test.out'}, {'source': 'task_io.err', 'target': '/tmp/output.test.err'}] }) task = tmgr.submit_tasks(td) tmgr.wait_tasks([task.uid]) # let's check the resulting output files print(ru.sh_callout('ls -la /tmp/output.test.*', shell=True)[0]) print(ru.sh_callout('cat /tmp/output.test.dat')[0])
-rw-r--r-- 1 docs docs 24 Apr 18 05:30 /tmp/output.test.dat
-rw-r--r-- 1 docs docs  0 Apr 18 05:30 /tmp/output.test.err
-rw-r--r-- 1 docs docs  0 Apr 18 05:30 /tmp/output.test.out

     24      34    1265

RADICAL-Pilot data staging capabilities go beyond what is captured in the example above:

  • Data can be transferred, copied, moved and linked;

  • data can refer to absolute paths, or are specified relative to the systems root file system, to RP’s resource sandbox, session sandbox, pilot sandbox or task sandbox;

  • data staging can be performed not only for tasks, but also for the overall workflow (for example, when many tasks share the same input data).

Find a detailed explanation of RADICAL-Pilot data staging capabilities in our Data Staging with RADICAL-Pilot tutorial.

Task Execution Environment

On HPC platforms, it is common to provide application executables via environment modules. But task execution environments are also frequently used for scripting languages such as Python (e.g., virtualenv, venv or conda). RADICAL-Pilot supports the setup of the task execution environment in the following ways:

  1. environment dictionary

  2. use pre_exec directives to customize task specific environments

  3. prepare and reuse named environments for tasks

We will cover these options in the next three examples.

Environment Dictionary

Environment variables can be set explicitly in the task description via the environment attribute. When that attribute is not specified, tasks will be executed in the default environment that the pilot found on the compute nodes. If the attribute environment is defined, then the default environment will be augmented with the settings specified in environment. Usefull variables to export might be PATH, LD_LIBRARY_PATH, etc., or any application specific environment variables used by your tasks.

Note: As demonstrated below, a number of custom environment variables are always provided, such as the various sandbox locations known to RADICAL-Pilot.

[12]:
td = rp.TaskDescription({'executable' : '/bin/sh',
                         'arguments'  : ['-c', 'printf "FOO=$FOO\nBAR=$BAR\nSHELL=$SHELL\n"; env | grep RP_ | sort'],
                         'environment': {'FOO': 'foo', 'BAR': 'bar'}
                        })
task = tmgr.submit_tasks(td)
tmgr.wait_tasks([task.uid])
print(task.stdout)
[... CONTENT SHORTENED ...]
sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002//pilot.0000//gtod
RP_PILOT_ID=pilot.0000
RP_PILOT_SANDBOX=/home/docs/radical.pilot.sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002//pilot.0000/
RP_PROF=/home/docs/radical.pilot.sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002//pilot.0000//prof
RP_PROF_TGT=/home/docs/radical.pilot.sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002//pilot.0000//task.000058/task.000058.prof
RP_RANK=0
RP_RANKS=1
RP_REGISTRY_ADDRESS=tcp://172.17.0.2:10002
RP_RESOURCE=local.localhost
RP_RESOURCE_SANDBOX=/home/docs/radical.pilot.sandbox
RP_SESSION_ID=rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002
RP_SESSION_SANDBOX=/home/docs/radical.pilot.sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002/
RP_TASK_ID=task.000058
RP_TASK_NAME=task.000058
RP_TASK_SANDBOX=/home/docs/radical.pilot.sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002//pilot.0000//task.000058
RP_VENV_PATH=/home/docs/radical.pilot.sandbox/ve.local.localhost.3f5bd0a@HEAD
RP_VENV_TYPE=venv

Environment Setup with pre_exec

The pre_exec attribute of the task description can be used to specify a set of shell commands which will be executed before the task’s executable is launched. pre_exec can be used to prepare the task’s runtime environment, for example to:

  • Load a system module;

  • export some environment variable;

  • run a shell script or shell commands;

  • activate some virtual environment.

The example shown below activates the virtual environment this notebook is running in (in ve_path) so that it is usable for the task itself. We run another pre_exec command to install the pyyaml module in it. The actual task will then run pip list to check if that module is indeed available.

Warning: The first pre_exec command assumes that this is a virtual environment, not a Conda environment. You may need to change that command if your notebook runs in a Conda environment.

[13]:
td = rp.TaskDescription({'pre_exec'   : ['. %s/bin/activate' % ve_path,
                                         'pip install pyyaml'],
                         'executable' : '/bin/sh',
                         'arguments'  : ['-c', 'which python3; pip show pyyaml'],
                        })
task = tmgr.submit_tasks(td)
tmgr.wait_tasks([task.uid])
print(task.stdout)
Requirement already satisfied: pyyaml in /home/docs/checkouts/readthedocs.org/user_builds/radicalpilot/envs/stable/lib/python3.7/site-packages (6.0.1)
/home/docs/checkouts/readthedocs.org/user_builds/radicalpilot/envs/stable/bin/python3
Name: PyYAML
Version: 6.0.1
Summary: YAML parser and emitter for Python
Home-page: https://pyyaml.org/
Author: Kirill Simonov
Author-email: xi@resolvent.net
License: MIT
Location: /home/docs/checkouts/readthedocs.org/user_builds/radicalpilot/envs/stable/lib/python3.7/site-packages
Requires:
Required-by: myst-parser

Environment Setup with named_env

When the same environment is used for many tasks, then the collective sum of the pre_exec activities can create a significant runtime overhead, both on the shared filesystem and also on the system load. named_env addresses that problem: applications can prepare a task environment and then use the named_env attribute to activate it for the task. This process is very lightweight on system load and runtime overhead and thus the recommended way to set up task environments which are shared among many tasks. Any setup step though which needs to be individually run for each task, such as the creation of task specific input files, should still be added to the task’s pre_exec directives.

Note: If you don’t need to create a new environment, but want to ensure that tasks will use the same environment as where RP Agent runs (rp), then you can provide it per each task: td.named_env = 'rp'.

[14]:

pilot.prepare_env(env_name='test_env', env_spec={'type' : 'venv', 'setup': ['psutil']}) td = rp.TaskDescription({'executable' : '/bin/sh', 'arguments' : ['-c', 'which python3; pip list | grep psutil'], 'named_env' : 'test_env' }) task = tmgr.submit_tasks(td) tmgr.wait_tasks([task.uid]) print(task.stdout)
/home/docs/radical.pilot.sandbox/rp.session.ab32e0e6-fd44-11ee-85e9-0242ac110002/pilot.0000/env/rp_named_env.test_env/bin/python3
psutil             5.9.8

[15]:
report.header('finalize')
session.close()

--------------------------------------------------------------------------------
finalize