3.6. Staging Task Input Data

The vast majority of applications operate on data, and many of those read input data from files. Since RP provides an abstraction above the resource layer, it can run a task on any pilot the application created (see Selecting a Task Scheduler). To ensure that the task finds the data it needs on the resource where it runs, RP provides a mechanism to stage input data automatically.

For each task, the application can specify

  • source: what data files need to be staged;
  • target: what should the path be in the context of the task execution;
  • action: how should data be staged.

If source and target file names are the same, and if action is the default rp.TRANSFER, then you can simply specify task input data by giving a list of file names (we’ll discuss more complex staging directives in a later example):

cud = rp.TaskDescription()
cud.executable     = '/usr/bin/wc'
cud.arguments      = ['-c', 'input.dat']
cud.input_staging  = ['input.dat']

05_task_input_data.py contains an example application which uses the above code block. It otherwise does not differ from our earlier examples (but only adds on-th-fly creation of input.dat).

3.6.1. Running the Example

The result of this example’s execution is straight forward, as expected, but proves that the file staging happened as planned. You will likely notice though that the code runs significantly longer than earlier ones, because of the file staging overhead – we will discuss in Sharing Task Input Data how file staging can be optimized for tasks which share the same input data.


3.6.2. What’s Next?

The obvious next step will be to handle output data: Staging Task Output Data will address exactly this, and also provide some more details on different modes of data staging, before Sharing Task Input Data will introduce RP’s capability to share data between different tasks.