Polaris (ALCF/ANL)
Platform user guide
General description
Resource manager -
PBSPRO
Launch methods (per platform ID)
anl.polaris
-MPIEXEC
Configuration per node (560 nodes in total)
32 CPU cores, each core has 2 threads (
SMT=2
)4 GPUs (NVIDIA A100)
512 GiB of memory
Note
RADICAL-Pilot provides a possibility to manage the -l
option (resource
selection qualifier) for PBSPRO
and sets the default values in a
corresponding configuration file. For the cases, when it is needed to have a
different setup, please, follow these steps:
mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_anl.json <<EOF
{
"polaris": {
"system_architecture": {"options": ["filesystems=grand:home",
"place=scatter"]}
}
}
EOF
Note
Binding MPI ranks to GPUs:
If you want to control GPUs assignment per task, then the following code
snippet provides an example of setting CUDA_VISIBLE_DEVICES
for each MPI
rank on Polaris:
import radical.pilot as rp
td = rp.TaskDescription()
td.pre_exec.append('export CUDA_VISIBLE_DEVICES=$((3 - $PMI_LOCAL_RANK % 4))')
td.gpu_type = '' # reset GPU type, thus RP will not set "CUDA_VISIBLE_DEVICES"
Setup execution environment
Python virtual environment
Create a virtual environment with venv
:
export PYTHONNOUSERSITE=True
python3 -m venv ve.rp
source ve.rp/bin/activate
OR create a virtual environment with conda
:
module load conda; conda activate
conda create -y -n ve.rp python=3.9
conda activate ve.rp
# OR clone base environment
# conda create -y -p $HOME/ve.rp --clone $CONDA_PREFIX
# conda activate $HOME/ve.rp
Install RADICAL-Pilot after activating a corresponding virtual environment:
pip install radical.pilot
# OR in case of conda environment
conda install -c conda-forge radical.pilot
Launching script example
Launching script (e.g., rp_launcher.sh
) for the RADICAL-Pilot application
includes setup processes to activate a certain execution environment and
launching command for the application itself. In this example we use virtual
environment with conda
.
#!/bin/sh
# - pre run -
module load conda
eval "$(conda shell.posix hook)"
conda activate ve.rp
export RADICAL_PROFILE=TRUE
# for debugging purposes
export RADICAL_LOG_LVL=DEBUG
# - run -
python <rp_application>
Execute launching script as ./rp_launcher.sh
or run it in the background:
nohup ./rp_launcher.sh > OUTPUT 2>&1 </dev/null &
# check the status of the script running:
# jobs -l
Monitoring page: https://status.alcf.anl.gov/#/polaris
Note
If you find any inaccuracy in this description, please, report back to us by opening a ticket.