{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Configuration System\n", "\n", "RADICAL-Pilot (RP) uses a configuration system to set control and management parameters for the initialization of its components and to define resource entry points for the target platform.\n", "\n", "It includes:\n", "\n", "* [Run description](#Run-description)\n", " * Resource label for a target platform configuration file;\n", " * Project allocation name (i.e., account/project) - _specific for HPC platforms_;\n", " * Job queue name (i.e., queue/partition) - _specific for HPC platforms_;\n", " * Amount of the resources (e.g., `cores`, `gpus`, `memory`) to allocate for the runtime period;\n", " * Mode to access the target platform (e.g., `local`, `ssh`) - _optional, default is \"local\"_.\n", "* [Target platform description](#Platform-description)\n", " * Batch system (e.g., `SLURM`, `LSF`, etc.);\n", " * Provided launch methods (e.g., `SRUN`, `MPIRUN`, etc.);\n", " * Environment setup (including package manager, working directory, etc.);\n", " * Entry points: batch system URL, file system URL.\n", "\n", "## Run description\n", "\n", "Users have to describe at least one pilot in each RP application. That is done by instantiating a [rp.PilotDescription](../apidoc.rst#radical.pilot.PilotDescription) object. Among that object's attributes, `pd.resource` is mandatory and is referred as a resource label (or platform ID), which corresponds to a target platform configuration file (see the section [Platform description](#Platform-description)). Users need to know what ID corresponds to the HPC platform on which they want to execute their RP application.\n", "\n", "### Allocation parameters\n", "\n", "Every run should state the project name (i.e., allocation account), preferable queue for a job submission, and the amount of required resources explicitly, unless it is a run on _localhost_ without accessing any batch system.\n", "\n", "```python\n", "import radical.pilot as rp\n", "\n", "pd = rp.PilotDescription({\n", " 'resource': 'ornl.frontier', # platform ID\n", " 'project' : 'XYZ000', # allocation account\n", " 'queue' : 'debug', # optional (default value is in the platform description)\n", " 'cores' : 32, # amount of CPU slots\n", " 'gpus' : 8, # amount of GPU slots\n", " 'runtime' : 15 # maximum runtime for a pilot (in minutes)\n", "})\n", "```\n", "\n", "### Resource access schema\n", "\n", "Resource access schema (`pd.access_schema`) defines a set of endpoints for job submission and file system access. It is provided as part of a platform description, and in case of more than one access schemas users can set a specific one in [rp.PilotDescription](../apidoc.rst#radical.pilot.PilotDescription). Check schema availability per target platform:\n", "\n", "* Launching RP application **from the target platform**:\n", " * `local` (_default_) - allows to run application from login nodes of the specific machine, compute nodes while within the interactive session, or within a batch script.\n", "* Launching RP application **outside the target platform**:\n", " * `ssh` - use SSH protocol and corresponding SSH client to access the platform remotely.\n", " * `gsissh` - use GSI-enabled SSH to access the platform remotely.\n", "\n", "