Perlmutter (NERSC)
Platform user guide
General description
Resource manager -
SLURM
Launch methods (per platform ID)
nersc.perlmutter*
-SRUN
Configuration per node (per platform ID)
nersc.perlmutter
(3,072 nodes)128 CPU cores, each core has 2 threads (
SMT=2
)512 GiB of memory
nersc.perlmutter_gpu
(1,792 nodes in total)64 CPU cores, each core has 2 threads (
SMT=2
)4 GPUs (NVIDIA A100)
1,536 nodes with 40 GiB of HBM per GPU
256 nodes with 80 GiB of HBM per GPU
256 GiB of memory
Note
Perlmutter uses the --constraint
option in SLURM
to specify nodes
features (SLURM constraint).
RADICAL-Pilot allows to provide such features within a corresponding
configuration file. For example, Perlmutter allows to request to run on up
to 256 GPU nodes, which have 80 GiB of GPU-attached memory instead of 40 GiB
(Specify a constraint during resource allocation),
thus the corresponding configuration should be updated as following:
mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_nersc.json <<EOF
{
"perlmutter_gpu": {
"system_architecture": {"options": ["gpu", "hbm80g"]}
}
}
EOF
Setup execution environment
Python virtual environment
Create a virtual environment with venv
:
export PYTHONNOUSERSITE=True
module load python
python3 -m venv ve.rp
source ve.rp/bin/activate
OR create a virtual environment with conda
:
module load python
conda create -y -n ve.rp python=3.9
conda activate ve.rp
Install RADICAL-Pilot after activating a corresponding virtual environment:
pip install radical.pilot
# OR in case of conda environment
conda install -c conda-forge radical.pilot
Launching script example
Launching script (e.g., rp_launcher.sh
) for the RADICAL-Pilot application
includes setup processes to activate a certain execution environment and
launching command for the application itself.
#!/bin/sh
# - pre run -
module load python
source ve.rp/bin/activate
export RADICAL_PROFILE=TRUE
# for debugging purposes
export RADICAL_LOG_LVL=DEBUG
# - run -
python <rp_application>
Execute launching script as ./rp_launcher.sh
or run it in the background:
nohup ./rp_launcher.sh > OUTPUT 2>&1 </dev/null &
# check the status of the script running:
# jobs -l
Note
If you find any inaccuracy in this description, please, report back to us by opening a ticket.