5. API Reference

5.1. Sessions and Security Contexts

5.1.1. Sessions

class radical.pilot.Session(dburl=None, uid=None, cfg=None, _primary=True)[source]

A Session is the root object of all RP objects in an application instance: it holds radical.pilot.PilotManager and radical.pilot.UnitManager instances which in turn hold radical.pilot.ComputePilot and radical.pilot.ComputeUnit instances, and several other components which operate on those stateful entities.

__init__(dburl=None, uid=None, cfg=None, _primary=True)[source]

Creates a new session. A new Session instance is created and stored in the database.

Arguments:
  • dburl (string): The MongoDB URL. If none is given, RP uses the environment variable RADICAL_PILOT_DBURL. If that is not set, an error will be raised.
  • cfg (str or dict): a named or instantiated configuration to be used for the session.
  • uid (string): Create a session with this UID. Session UIDs MUST be unique - otherwise they will lead to conflicts in the underlying database, resulting in undefined behaviours (or worse).
  • _primary (bool): only sessions created by the original application process (via rp.Session(), will connect to the DB. Secondary session instances are instantiated internally in processes spawned (directly or indirectly) by the initial session, for example in some of it’s components. A secondary session will inherit the original session ID, but will not attempt to create a new DB collection - if such a DB connection is needed, the component needs to establish that on its own.
add_resource_config(resource_config)[source]

Adds a new ru.Config to the session’s dictionary of known resources, or accept a string which points to a configuration file.

For example:

rc = ru.Config("./mycluster.json")
rc.job_manager_endpoint = "ssh+pbs://mycluster
rc.filesystem_endpoint  = "sftp://mycluster
rc.default_queue        = "private"

session = rp.Session()
session.add_resource_config(rc)

pd = rp.ComputePilotDescription()
pd.resource = "mycluster"
pd.cores    = 16
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)
as_dict()[source]

Returns a Python dictionary representation of the object.

close(cleanup=False, terminate=True, download=False)[source]

Closes the session. All subsequent attempts access objects attached to the session will result in an error. If cleanup is set to True, the session data is removed from the database.

Arguments:
  • cleanup (bool): Remove session from MongoDB (implies * terminate)
  • terminate (bool): Shut down all pilots associated with the session.
  • download (bool): Fetch pilot profiles and database entries.
closed

Returns the time of closing

connected

Return time when the session connected to the DB

created

Returns the UTC date and time the session was created.

get_pilot_managers(pmgr_uids=None)[source]

returns known PilotManager(s).

Arguments:

  • pmgr_uids [string]: unique identifier of the PilotManager we want
Returns:
get_resource_config(resource, schema=None)[source]

Returns a dictionary of the requested resource config

get_unit_managers(umgr_uids=None)[source]

returns known UnitManager(s).

Arguments:

  • umgr_uids [string]: unique identifier of the UnitManager we want
Returns:
inject_metadata(metadata)[source]

Insert (experiment) metadata into an active session RP stack version info always get added.

list_pilot_managers()[source]

Lists the unique identifiers of all radical.pilot.PilotManager instances associated with this session.

Returns:
list_resources()[source]

Returns a list of known resource labels which can be used in a pilot description. Not that resource aliases won’t be listed.

list_unit_managers()[source]

Lists the unique identifiers of all radical.pilot.UnitManager instances associated with this session.

Returns:

5.1.2. Security Contexts

class radical.pilot.Context(ctype, thedict=None)[source]
__init__(ctype, thedict=None)[source]

ctype: string ret: None

classmethod from_dict(thedict)[source]

Creates a new object instance from a string. c._from_dict(x.as_dict) == x

5.2. Pilots and PilotManagers

5.2.1. PilotManagers

class radical.pilot.PilotManager(session, cfg='default')[source]

A PilotManager manages rp.ComputePilot instances that are submitted via the radical.pilot.PilotManager.submit_pilots() method.

It is possible to attach one or more Using Local and Remote HPC Resources to a PilotManager to outsource machine specific configuration parameters to an external configuration file.

Example:

s = rp.Session(database_url=DBURL)

pm = rp.PilotManager(session=s)

pd = rp.ComputePilotDescription()
pd.resource = "futuregrid.alamo"
pd.cpus = 16

p1 = pm.submit_pilots(pd)  # create first  pilot with 16 cores
p2 = pm.submit_pilots(pd)  # create second pilot with 16 cores

# Create a workload of 128 '/bin/sleep' compute units
compute_units = []
for unit_count in range(0, 128):
    cu = rp.ComputeUnitDescription()
    cu.executable = "/bin/sleep"
    cu.arguments = ['60']
    compute_units.append(cu)

# Combine the two pilots, the workload and a scheduler via
# a UnitManager.
um = rp.UnitManager(session=session, scheduler=rp.SCHEDULER_ROUND_ROBIN)
um.add_pilot(p1)
um.submit_units(compute_units)

The pilot manager can issue notification on pilot state changes. Whenever state notification arrives, any callback registered for that notification is fired.

NOTE: State notifications can arrive out of order wrt the pilot state model!

__init__(session, cfg='default')[source]

Creates a new PilotManager and attaches is to the session.

Arguments:
  • session [rp.Session]: The session instance to use.
  • cfg (dict or string): The configuration or name of configuration to use.
Returns:
  • A new PilotManager object [rp.PilotManager].
as_dict()[source]

Returns a dictionary representation of the PilotManager object.

cancel_pilots(uids=None, _timeout=None)[source]

Cancel one or more rp.ComputePilots.

Arguments:
  • uids [string or list of strings]: The IDs of the compute pilot objects to cancel.
close(terminate=True)[source]

Shut down the PilotManager and all its components.

Arguments:
  • terminate [bool]: cancel non-final pilots if True (default)
get_pilots(uids=None)[source]

Returns one or more compute pilots identified by their IDs.

Arguments:
  • uids [string or list of strings]: The IDs of the compute pilot objects to return.
Returns:
  • A list of rp.ComputePilot objects.
list_pilots()[source]

Returns the UIDs of the rp.ComputePilots managed by this pilot manager.

Returns:
  • A list of rp.ComputePilot UIDs [string].
register_callback(cb, cb_data=None, metric='PILOT_STATE')[source]

Registers a new callback function with the PilotManager. Manager-level callbacks get called if the specified metric changes. The default metric PILOT_STATE fires the callback if any of the ComputePilots managed by the PilotManager change their state.

All callback functions need to have the same signature:

def cb(obj, value, cb_data)

where object is a handle to the object that triggered the callback, value is the metric, and data is the data provided on callback registration.. In the example of PILOT_STATE above, the object would be the pilot in question, and the value would be the new state of the pilot.

Available metrics are:

  • PILOT_STATE: fires when the state of any of the pilots which are managed by this pilot manager instance is changing. It communicates the pilot object instance and the pilots new state.
submit_pilots(descriptions)[source]

Submits on or more rp.ComputePilot instances to the pilot manager.

Arguments:
  • descriptions [rp.ComputePilotDescription or list of rp.ComputePilotDescription]: The description of the compute pilot instance(s) to create.
Returns:
  • A list of rp.ComputePilot objects.
uid

Returns the unique id.

wait_pilots(uids=None, state=None, timeout=None)[source]

Returns when one or more rp.ComputePilots reach a specific state.

If pilot_uids is None, wait_pilots returns when all ComputePilots reach the state defined in state. This may include pilots which have previously terminated or waited upon.

Example:

# TODO -- add example

Arguments:

  • pilot_uids [string or list of strings] If pilot_uids is set, only the ComputePilots with the specified uids are considered. If pilot_uids is None (default), all ComputePilots are considered.

  • state [string] The state that ComputePilots have to reach in order for the call to return.

    By default wait_pilots waits for the ComputePilots to reach a terminal state, which can be one of the following:

    • rp.rps.DONE
    • rp.rps.FAILED
    • rp.rps.CANCELED
  • timeout [float] Timeout in seconds before the call returns regardless of Pilot state changes. The default value None waits forever.

5.2.2. ComputePilotDescription

class radical.pilot.ComputePilotDescription(from_dict=None)[source]

A ComputePilotDescription object describes the requirements and properties of a radical.pilot.Pilot and is passed as a parameter to radical.pilot.PilotManager.submit_pilots() to instantiate and run a new pilot.

Note

A ComputePilotDescription MUST define at least resource, cores and runtime.

Example:

pm = radical.pilot.PilotManager(session=s)
pd = radical.pilot.ComputePilotDescription()
pd.resource = "local.localhost"
pd.cores    = 16
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)
resource

[Type: string] [`mandatory`] The key of a Using Local and Remote HPC Resources entry. If the key exists, the machine-specifc configuration is loaded from the configuration once the ComputePilotDescription is passed to radical.pilot.PilotManager.submit_pilots(). If the key doesn’t exist, a radical.pilot.pilotException is thrown.

access_schema

[Type: string] [`optional`] The key of an access mechanism to use. The valid access mechanism are defined in the resource configurations, see Using Local and Remote HPC Resources. The first one defined there is used by default, if no other is specified.

runtime

[Type: int] [mandatory] The maximum run time (wall-clock time) in minutes of the ComputePilot.

sandbox

[Type: string] [optional] The working (“sandbox”) directory of the ComputePilot agent. This parameter is optional. If not set, it defaults to radical.pilot.sandox in your home or login directory.

Warning

If you define a ComputePilot on an HPC cluster and you want to set sandbox manually, make sure that it points to a directory on a shared filesystem that can be reached from all compute nodes.

cores

[Type: int] [mandatory] The number of cores the pilot should allocate on the target resource.

NOTE: for local pilots, you can set a number larger than the physical machine limit when setting RADICAL_PILOT_PROFILE in your environment.

gpus

[Type: int] [optional] The number of gpus the pilot should allocate on the target resource.

memory

[Type: int] [optional] The total amount of physical memory the pilot (and related to it job) requires. This parameter translates into TotalPhysicalMemory at radical.saga.job.Description.

queue

[Type: string] [optional] The name of the job queue the pilot should get submitted to . If queue is defined in the resource configuration (resource) defining queue will override it explicitly.

project

[Type: string] [optional] The name of the project / allocation to charge for used CPU time. If project is defined in the machine configuration (resource), defining project will override it explicitly.

candidate_hosts

[Type: list] [optional] The list of names of hosts where this pilot is allowed to start on.

cleanup

[Type: bool] [optional] If cleanup is set to True, the pilot will delete its entire sandbox upon termination. This includes individual ComputeUnit sandboxes and all generated output data. Only log files will remain in the sandbox directory.

layout

[Type: str or dict] [optional] Point to a json file or an explicit (dict) description of the pilot layout: number and size partitions and their configuration.

5.2.3. Pilots

class radical.pilot.ComputePilot(pmgr, descr)[source]

A ComputePilot represent a resource overlay on a local or remote resource.

Note

A ComputePilot cannot be created directly. The factory method
radical.pilot.PilotManager.submit_pilots() has to be used instead.

Example:

pm = radical.pilot.PilotManager(session=s)
pd = radical.pilot.ComputePilotDescription()

pd.resource = "local.localhost"
pd.cores    = 2
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)
as_dict()[source]

Returns a Python dictionary representation of the object.

cancel()[source]

Cancel the pilot.

description

Returns the description the pilot was started with, as a dictionary.

Returns:
  • description (dict)
log

Returns a list of human readable [timestamp, string] tuples describing various events during the pilot’s lifetime. Those strings are not normative, only informative!

Returns:
  • log (list of [timestamp, string] tuples)
pilot_sandbox

Returns the full sandbox URL of this pilot, if that is already known, or ‘None’ otherwise.

Returns:
  • A string
pmgr

Returns the pilot’s manager.

Returns:
register_callback(cb, metric='PILOT_STATE', cb_data=None)[source]

Registers a callback function that is triggered every time the pilot’s state changes.

All callback functions need to have the same signature:

def cb(obj, state)

where object is a handle to the object that triggered the callback and state is the new state of that object. If ‘cb_data’ is given, then the ‘cb’ signature changes to

def cb(obj, state, cb_data)

and ‘cb_data’ are passed along.

resource

Returns the resource tag of this pilot.

Returns:
  • A resource tag (string)
resource_details

Returns agent level resource information

session

Returns the pilot’s session.

Returns:
stage_in(directives)[source]

Stages the content of the staging directive into the pilot’s staging area

stage_out()[source]

fetch staging_output.tgz from the pilot sandbox, and store in $PWD

state

Returns the current state of the pilot.

Returns:
  • state (string enum)
stderr

Returns a snapshot of the pilot’s STDERR stream.

If this property is queried before the pilot has reached ‘DONE’ or ‘FAILED’ state it will return None.

Returns:
  • stderr (string)
stdout

Returns a snapshot of the pilot’s STDOUT stream.

If this property is queried before the pilot has reached ‘DONE’ or ‘FAILED’ state it will return None.

Returns:
  • stdout (string)
uid

Returns the pilot’s unique identifier.

The uid identifies the pilot within a PilotManager.

Returns:
  • A unique identifier (string).
wait(state=None, timeout=None)[source]

Returns when the pilot reaches a specific state or when an optional timeout is reached.

Arguments:

  • state [list of strings] The state(s) that pilot has to reach in order for the call to return.

    By default wait waits for the pilot to reach a final state, which can be one of the following:

    • radical.pilot.states.DONE
    • radical.pilot.states.FAILED
    • radical.pilot.states.CANCELED
  • timeout [float] Optional timeout in seconds before the call returns regardless whether the pilot has reached the desired state or not. The default value None never times out.

5.3. ComputeUnits and UnitManagers

5.3.1. UnitManager

class radical.pilot.UnitManager(session, cfg='default', scheduler=None)[source]

A UnitManager manages radical.pilot.ComputeUnit instances which represent the executable workload in RADICAL-Pilot. A UnitManager connects the ComputeUnits with one or more Pilot instances (which represent the workload executors in RADICAL-Pilot) and a scheduler which determines which ComputeUnit gets executed on which Pilot.

Example:

s = rp.Session(database_url=DBURL)

pm = rp.PilotManager(session=s)

pd = rp.ComputePilotDescription()
pd.resource = "futuregrid.alamo"
pd.cores = 16

p1 = pm.submit_pilots(pd) # create first pilot with 16 cores
p2 = pm.submit_pilots(pd) # create second pilot with 16 cores

# Create a workload of 128 '/bin/sleep' compute units
compute_units = []
for unit_count in range(0, 128):
    cu = rp.ComputeUnitDescription()
    cu.executable = "/bin/sleep"
    cu.arguments = ['60']
    compute_units.append(cu)

# Combine the two pilots, the workload and a scheduler via
# a UnitManager.
um = rp.UnitManager(session=session, scheduler=rp.SCHEDULER_ROUND_ROBIN)
um.add_pilot(p1)
um.submit_units(compute_units)

The unit manager can issue notification on unit state changes. Whenever state notification arrives, any callback registered for that notification is fired.

NOTE: State notifications can arrive out of order wrt the unit state model!

__init__(session, cfg='default', scheduler=None)[source]

Creates a new UnitManager and attaches it to the session.

Arguments:
  • session [radical.pilot.Session]: The session instance to use.
  • cfg (dict or string): The configuration or name of configuration to use.
  • scheduler (string): The name of the scheduler plug-in to use.
Returns:
add_pilots(pilots)[source]

Associates one or more pilots with the unit manager.

Arguments:

as_dict()[source]

Returns a dictionary representation of the UnitManager object.

cancel_units(uids=None)[source]

Cancel one or more radical.pilot.ComputeUnits.

Note that cancellation of units is immediate, i.e. their state is immediately set to CANCELED, even if some RP component may still operate on the units. Specifically, other state transitions, including other final states (DONE, FAILED) can occur after cancellation. This is a side effect of an optimization: we consider this acceptable tradeoff in the sense “Oh, that unit was DONE at point of cancellation – ok, we can use the results, sure!”.

If that behavior is not wanted, set the environment variable:

export RADICAL_PILOT_STRICT_CANCEL=True
Arguments:
  • uids [string or list of strings]: The IDs of the compute units objects to cancel.
close()[source]

Shut down the UnitManager and all its components.

get_pilots()[source]

Get the pilots instances currently associated with the unit manager.

Returns:
get_units(uids=None)[source]

Returns one or more compute units identified by their IDs.

Arguments:
  • uids [string or list of strings]: The IDs of the compute unit objects to return.
Returns:
list_pilots()[source]

Lists the UIDs of the pilots currently associated with the unit manager.

Returns:
list_units()[source]

Returns the UIDs of the radical.pilot.ComputeUnit managed by this unit manager.

Returns:
register_callback(cb, cb_data=None, metric=None, uid=None)[source]

Registers a new callback function with the UnitManager. Manager-level callbacks get called if the specified metric changes. The default metric UNIT_STATE fires the callback if any of the ComputeUnits managed by the PilotManager change their state.

All callback functions need to have the same signature:

def cb(obj, value)

where object is a handle to the object that triggered the callback, value is the metric, and data is the data provided on callback registration.. In the example of UNIT_STATE above, the object would be the unit in question, and the value would be the new state of the unit.

If ‘cb_data’ is given, then the ‘cb’ signature changes to

def cb(obj, state, cb_data)

and ‘cb_data’ are passed unchanged.

If ‘uid’ is given, the callback will invoked only for the specified unit.

Available metrics are:

  • UNIT_STATE: fires when the state of any of the units which are managed by this unit manager instance is changing. It communicates the unit object instance and the units new state.
  • WAIT_QUEUE_SIZE: fires when the number of unscheduled units (i.e. of units which have not been assigned to a pilot for execution) changes.
remove_pilots(pilot_ids, drain=False)[source]

Disassociates one or more pilots from the unit manager.

After a pilot has been removed from a unit manager, it won’t process any of the unit manager’s units anymore. Calling remove_pilots doesn’t stop the pilot itself.

Arguments:

  • drain [boolean]: Drain determines what happens to the units which are managed by the removed pilot(s). If True, all units currently assigned to the pilot are allowed to finish execution. If False (the default), then non-final units will be canceled.
scheduler

Returns the scheduler name.

submit_units(descriptions)[source]

Submits on or more radical.pilot.ComputeUnit instances to the unit manager.

Arguments:
Returns:
uid

Returns the unique id.

wait_units(uids=None, state=None, timeout=None)[source]

Returns when one or more radical.pilot.ComputeUnits reach a specific state.

If uids is None, wait_units returns when all ComputeUnits reach the state defined in state. This may include units which have previously terminated or waited upon.

Example:

# TODO -- add example

Arguments:

  • uids [string or list of strings] If uids is set, only the ComputeUnits with the specified uids are considered. If uids is None (default), all ComputeUnits are considered.

  • state [string] The state that ComputeUnits have to reach in order for the call to return.

    By default wait_units waits for the ComputeUnits to reach a terminal state, which can be one of the following:

    • radical.pilot.rps.DONE
    • radical.pilot.rps.FAILED
    • radical.pilot.rps.CANCELED
  • timeout [float] Timeout in seconds before the call returns regardless of Pilot state changes. The default value None waits forever.

5.3.2. ComputeUnitDescription

class radical.pilot.ComputeUnitDescription(from_dict=None)[source]

A ComputeUnitDescription object describes the requirements and properties of a radical.pilot.ComputeUnit and is passed as a parameter to radical.pilot.UnitManager.submit_units() to instantiate and run a new unit.

Note

A ComputeUnitDescription MUST define at least an executable or kernel – all other elements are optional.

executable

The executable to launch (string). The executable is expected to be either available via $PATH on the target resource, or to be an absolute path.

default: None

cpu_processes
number of application processes to start on CPU cores

default: 0

cpu_threads
number of threads each process will start on CPU cores

default: 1

cpu_process_type
process type, determines startup method (POSIX, MPI)

default: POSIX

cpu_thread_type
thread type, influences startup and environment (POSIX, OpenMP)

default: POSIX

gpu_processes
number of application processes to start on GPU cores

default: 0

gpu_threads
number of threads each process will start on GPU cores

default: 1

gpu_process_type
process type, determines startup method (POSIX, MPI)

default: POSIX

gpu_thread_type
thread type, influences startup and environment (POSIX, OpenMP, CUDA)

default: POSIX

lfs(local file storage)
amount of data (MB) required on the local file system of the node

default: 0

name

A descriptive name for the compute unit (string). This attribute can be used to map individual units back to application level workloads.

default: None

arguments

The command line arguments for the given executable (list of strings).

default: []

environment

Environment variables to set in the environment before execution (dict).

default: {}

sandbox

(Attribute) This specifies the working directory of the unit. That directory MUST be relative to the pilot sandbox. It will be created if it does not exist. By default, the sandbox has the name of the unit’s uid.

stdout

The name of the file to store stdout in (string).

default: STDOUT

stderr

The name of the file to store stderr in (string).

default: STDERR

input_staging

The files that need to be staged before execution (list of staging directives, see below).

default: {}

output_staging

The files that need to be staged after execution (list of staging directives, see below).

default: {}

pre_exec

Actions (shell commands) to perform before this task starts (list of strings). Note that the set of shell commands given here are expected to load environments, check for work directories and data, etc. They are not expected to consume any significant amount of CPU time or other resources! Deviating from that rule will likely result in reduced overall throughput.

No assumption should be made as to where these commands are executed (although RP attempts to perform them in the unit’s execution environment).

No assumption should be made on the specific shell environment the commands are executed in.

Errors in executing these commands will result in the unit to enter FAILED state, and no execution of the actual workload will be attempted.

default: []

post_exec

Actions (shell commands) to perform after this task finishes (list of strings). The same remarks as on pre_exec apply, inclusive the point on error handling, which again will cause the unit to fail, even if the actual execution was successful.

default: []

kernel

Name of a simulation kernel which expands to description attributes once the unit is scheduled to a pilot and resource. TODO: explain in detail, referencing ENMDTK.

default: None

restartable

If the unit starts to execute on a pilot, but cannot finish because the pilot fails or is canceled, the unit can be restarted.

default: False

tags

Configuration specific tags which influence unit scheduling and execution.

metadata

User defined metadata.

default: None

cleanup

If cleanup (a bool) is set to True, the pilot will delete the entire unit sandbox upon termination. This includes all generated output data in that sandbox. Output staging will be performed before cleanup.

Note that unit sandboxes are also deleted if the pilot’s own cleanup flag is set.

default: False

pilot

If specified as string (pilot uid), the unit is submitted to the pilot with the given ID. If that pilot is not known to the unit manager, an exception is raised.

The Staging Directives are specified using a dict in the following form:

staging_directive = {
‘source’ : None, # see ‘Location’ below ‘target’ : None, # see ‘Location’ below ‘action’ : None, # See ‘Action operators’ below ‘flags’ : None, # See ‘Flags’ below ‘priority’: 0 # Control ordering of actions (unused)

}

source and target locations can be given as strings or ru.URL instances. Strings containing :// are converted into URLs immediately. Otherwise they are considered absolute or relative paths and are then interpreted in the context of the client’s working directory.

RP accepts the following special URL schemas:

  • client:// : relative to the client’s working directory
  • resource://: relative to the RP sandbox on the target resource
  • pilot:// : relative to the pilot sandbox on the target resource
  • unit:// : relative to the unit sandbox on the target resource

In all these cases, the hostname element of the URL is expected to be empty, and the path is always considered relative to the locations specified above (even though URLs usually don’t have a notion of relative paths).

RP accepts the following action operators:

  • rp.TRANSFER: remote file transfer from source URL to target URL.
  • rp.COPY : local file copy, ie. not crossing host boundaries
  • rp.MOVE : local file move
  • rp.LINK : local file symlink
rp.CREATE_PARENTS: create the directory hierarchy for targets on the fly rp.RECURSIVE : if source is a directory, handle it recursively

5.3.3. ComputeUnit

class radical.pilot.ComputeUnit(umgr, descr)[source]

A ComputeUnit represent a ‘task’ that is executed on a ComputePilot. ComputeUnits allow to control and query the state of this task.

Note

A unit cannot be created directly. The factory method rp.UnitManager.submit_units() has to be used instead.

Example:

umgr = rp.UnitManager(session=s)

ud = rp.ComputeUnitDescription()
ud.executable = "/bin/date"

unit = umgr.submit_units(ud)
as_dict()[source]

Returns a Python dictionary representation of the object.

cancel()[source]

Cancel the unit.

description

Returns the description the unit was started with, as a dictionary.

Returns:
  • description (dict)
exit_code

Returns the exit code of the unit, if that is already known, or ‘None’ otherwise.

Returns:
  • exit code (int)
metadata

Returns the metadata field of the unit’s description

name

Returns the unit’s application specified name.

Returns:
  • A name (string).
pilot

Returns the pilot ID of this unit, if that is already known, or ‘None’ otherwise.

Returns:
  • A pilot ID (string)
register_callback(cb, cb_data=None, metric=None)[source]

Registers a callback function that is triggered every time a unit’s state changes.

All callback functions need to have the same signature:

def cb(obj, state)

where object is a handle to the object that triggered the callback and state is the new state of that object. If ‘cb_data’ is given, then the ‘cb’ signature changes to

def cb(obj, state, cb_data)

and ‘cb_data’ are passed unchanged.

session

Returns the unit’s session.

Returns:
state

Returns the current state of the unit.

Returns:
  • state (string enum)
stderr

Returns a snapshot of the executable’s STDERR stream.

If this property is queried before the unit has reached ‘DONE’ or ‘FAILED’ state it will return None.

Returns:
  • stderr (string)
stdout

Returns a snapshot of the executable’s STDOUT stream.

If this property is queried before the unit has reached ‘DONE’ or ‘FAILED’ state it will return None.

Returns:
  • stdout (string)
uid

Returns the unit’s unique identifier.

The uid identifies the unit within a UnitManager.

Returns:
  • A unique identifier (string).
umgr

Returns the unit’s manager.

Returns:
unit_sandbox

Returns the full sandbox URL of this unit, if that is already known, or ‘None’ otherwise.

Returns:
  • A URL (radical.utils.Url).
wait(state=None, timeout=None)[source]

Returns when the unit reaches a specific state or when an optional timeout is reached.

Arguments:

  • state [list of strings] The state(s) that unit has to reach in order for the call to return.

    By default wait waits for the unit to reach a final state, which can be one of the following:

    • rp.states.DONE
    • rp.states.FAILED
    • rp.states.CANCELED
  • timeout [float] Optional timeout in seconds before the call returns regardless whether the unit has reached the desired state or not. The default value None never times out.