Frontier (OLCF/ORNL)

Platform user guide

https://docs.olcf.ornl.gov/systems/frontier_user_guide.html

General description

  • Resource manager - SLURM

  • Launch methods (per platform ID)

    • ornl.frontier - SRUN

    • ornl.frontier_flux - FLUX

  • Configuration per node (9,408 nodes in total)

    • 64 CPU cores, each core has 2 threads (SMT=2)

    • 8 GPUs (AMD MI250X)

    • 512 GiB of memory

Note

Frontier uses the --constraint option in SLURM to specify nodes features (SLURM constraint). RADICAL-Pilot allows to provide such features within a corresponding configuration file. For example, follow the following steps to set NVMe constraint (NVMe Usage):

mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_ornl.json <<EOF
{
    "frontier": {
        "system_architecture": {"options": ["nvme"]}
    }
}
EOF

Note

RADICAL-Pilot follows the default setting of Frontier SLURM core specialization, which reserves one core from each L3 cache region, leaving 56 allocatable cores out of the available 64.

If you need to change the core specialization to use all 64 cores (i.e., constraining all system processes to core 0), then follow these steps:

mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_ornl.json <<EOF
{
   "frontier": {
      "system_architecture" : {"blocked_cores" : []}
   }
}
EOF

If you need to change only the SMT level (=1), but keep the default setting (8 cores for system processes), then follow these steps:

mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_ornl.json <<EOF
{
   "frontier": {
      "system_architecture" : {"smt"           : 1,
                               "blocked_cores" : [0, 8, 16, 24, 32, 40, 48, 56]}
   }
}
EOF

Note

Changes in the "system_architecture" parameters can be combined.

Setup execution environment

Python virtual environment

Create a virtual environment with venv:

export PYTHONNOUSERSITE=True
module load cray-python
python3 -m venv ve.rp
source ve.rp/bin/activate

Install RADICAL-Pilot after activating a corresponding virtual environment:

pip install radical.pilot

Note

Frontier does not provide virtual environments with conda.

Launching script example

Launching script (e.g., rp_launcher.sh) for the RADICAL-Pilot application includes setup processes to activate a certain execution environment and launching command for the application itself.

#!/bin/sh

# - pre run -
module load cray-python
source ve.rp/bin/activate

export RADICAL_PROFILE=TRUE
# for debugging purposes
export RADICAL_LOG_LVL=DEBUG
export RADICAL_REPORT=TRUE

# - run -
python <rp_application>

Execute launching script as ./rp_launcher.sh or run it in the background:

nohup ./rp_launcher.sh > OUTPUT 2>&1 </dev/null &
# check the status of the script running:
#   jobs -l

Note

If you find any inaccuracy in this description, please, report back to us by opening a ticket.