Polaris (ALCF/ANL)
Platform user guide
General description
Resource manager -
PBSPROLaunch methods (per platform ID)
anl.polaris-MPIEXEC
Configuration per node (560 nodes in total)
32 CPU cores, each core has 2 threads (
SMT=2)4 GPUs (NVIDIA A100)
512 GiB of memory
Note
RADICAL-Pilot provides a possibility to manage the -l option (resource
selection qualifier) for PBSPRO and sets the default values in a
corresponding configuration file. For the cases, when it is needed to have a
different setup, please, follow these steps:
mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_anl.json <<EOF
{
"polaris": {
"system_architecture": {"options": ["filesystems=grand:home",
"place=scatter"]}
}
}
EOF
Note
Binding MPI ranks to GPUs:
If you want to control GPUs assignment per task, then the following code
snippet provides an example of setting CUDA_VISIBLE_DEVICES for each MPI
rank on Polaris:
import radical.pilot as rp
td = rp.TaskDescription()
td.pre_exec.append('export CUDA_VISIBLE_DEVICES=$((3 - $PMI_LOCAL_RANK % 4))')
td.gpu_type = '' # reset GPU type, thus RP will not set "CUDA_VISIBLE_DEVICES"
Setup execution environment
Python virtual environment
Create a virtual environment with conda:
module use /soft/modulefiles; module load conda
conda create -y -n ve.rp python=3.9
conda activate ve.rp
# OR clone base environment
# conda activate base
# conda create -y -p $HOME/ve.rp --clone $CONDA_PREFIX
# conda activate $HOME/ve.rp
Install RADICAL-Pilot after activating a corresponding virtual environment:
conda install -c conda-forge radical.pilot
Launching script example
Launching script (e.g., rp_launcher.sh) for the RADICAL-Pilot application
includes setup processes to activate a certain execution environment and
launching command for the application itself.
#!/bin/sh
# - pre run -
module use /soft/modulefiles; module load conda
eval "$(conda shell.posix hook)"
conda activate ve.rp
export RADICAL_PROFILE=TRUE
# for debugging purposes
export RADICAL_LOG_LVL=DEBUG
export RADICAL_REPORT=TRUE
# - run -
python <rp_application>
Execute launching script as ./rp_launcher.sh or run it in the background:
nohup ./rp_launcher.sh > OUTPUT 2>&1 </dev/null &
# check the status of the script running:
# jobs -l
Monitoring page: https://status.alcf.anl.gov/#/polaris
Note
If you find any inaccuracy in this description, please, report back to us by opening a ticket.