Frontier (OLCF/ORNL)
Platform user guide
General description
Resource manager -
SLURM
Launch methods (per platform ID)
ornl.frontier
-SRUN
ornl.frontier_flux
-FLUX
Configuration per node (9,408 nodes in total)
64 CPU cores, each core has 2 threads (
SMT=2
)8 GPUs (AMD MI250X)
512 GiB of memory
Note
Frontier uses the --constraint
option in SLURM
to specify nodes
features (SLURM constraint).
RADICAL-Pilot allows to provide such features within a corresponding
configuration file. For example, follow the following steps to set NVMe
constraint (NVMe Usage):
mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_ornl.json <<EOF
{
"frontier": {
"system_architecture": {"options": ["nvme"]}
}
}
EOF
Note
RADICAL-Pilot follows the default setting of Frontier SLURM core specialization, which reserves one core from each L3 cache region, leaving 56 allocatable cores out of the available 64.
If you need to change the core specialization to use all 64 cores (i.e., constraining all system processes to core 0), then follow these steps:
mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_ornl.json <<EOF
{
"frontier": {
"system_architecture" : {"blocked_cores" : []}
}
}
EOF
If you need to change only the SMT level (=1
), but keep the default
setting (8 cores for system processes), then follow these steps:
mkdir -p ~/.radical/pilot/configs
cat > ~/.radical/pilot/configs/resource_ornl.json <<EOF
{
"frontier": {
"system_architecture" : {"smt" : 1,
"blocked_cores" : [0, 8, 16, 24, 32, 40, 48, 56]}
}
}
EOF
Note
Changes in the "system_architecture"
parameters can be combined.
Setup execution environment
Python virtual environment
Create a virtual environment with venv
:
export PYTHONNOUSERSITE=True
module load cray-python
python3 -m venv ve.rp
source ve.rp/bin/activate
Install RADICAL-Pilot after activating a corresponding virtual environment:
pip install radical.pilot
Note
Frontier does not provide virtual environments with conda
.
Launching script example
Launching script (e.g., rp_launcher.sh
) for the RADICAL-Pilot application
includes setup processes to activate a certain execution environment and
launching command for the application itself.
#!/bin/sh
# - pre run -
module load cray-python
source ve.rp/bin/activate
export RADICAL_PROFILE=TRUE
# for debugging purposes
export RADICAL_LOG_LVL=DEBUG
export RADICAL_REPORT=TRUE
# - run -
python <rp_application>
Execute launching script as ./rp_launcher.sh
or run it in the background:
nohup ./rp_launcher.sh > OUTPUT 2>&1 </dev/null &
# check the status of the script running:
# jobs -l
Note
If you find any inaccuracy in this description, please, report back to us by opening a ticket.