Perlmutter (NERSC)

Platform user guide

https://docs.nersc.gov/systems/perlmutter/

General description

  • Resource manager - SLURM

  • Launch methods (per platform ID)

    • nersc.perlmutter* - SRUN

  • Configuration per node (per platform ID)

    • nersc.perlmutter (3,072 nodes)

      • 128 CPU cores, each core has 2 threads (SMT=2)

      • 512 GiB of memory

    • nersc.perlmutter_gpu (1,792 nodes in total)

      • 64 CPU cores, each core has 2 threads (SMT=2)

      • 4 GPUs (NVIDIA A100)

        • 1,536 nodes with 40 GiB of HBM per GPU

        • 256 nodes with 80 GiB of HBM per GPU

      • 256 GiB of memory

Note

Perlmutter uses the --constraint option in SLURM to specify nodes features (SLURM constraint). RADICAL-Pilot allows to provide such features within a corresponding configuration file. For example, Perlmutter allows to request to run on up to 256 GPU nodes, which have 80 GiB of GPU-attached memory instead of 40 GiB (Specify a constraint during resource allocation), thus the corresponding configuration should be updated as following:

mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_nersc.json <<EOF
{
    "perlmutter_gpu": {
        "system_architecture": {"options": ["gpu", "hbm80g"]}
    }
}
EOF

Setup execution environment

Python virtual environment

Using Python at NERSC

Create a virtual environment with venv:

export PYTHONNOUSERSITE=True
module load python
python3 -m venv ve.rp
source ve.rp/bin/activate

OR create a virtual environment with conda:

module load python
conda create -y -n ve.rp python=3.9
conda activate ve.rp

Install RADICAL-Pilot after activating a corresponding virtual environment:

pip install radical.pilot
# OR in case of conda environment
conda install -c conda-forge radical.pilot

Launching script example

Launching script (e.g., rp_launcher.sh) for the RADICAL-Pilot application includes setup processes to activate a certain execution environment and launching command for the application itself.

#!/bin/sh

# - pre run -
module load python
source ve.rp/bin/activate

export RADICAL_PROFILE=TRUE
# for debugging purposes
export RADICAL_LOG_LVL=DEBUG

# - run -
python <rp_application>

Execute launching script as ./rp_launcher.sh or run it in the background:

nohup ./rp_launcher.sh > OUTPUT 2>&1 </dev/null &
# check the status of the script running:
#   jobs -l

Note

If you find any inaccuracy in this description, please, report back to us by opening a ticket.