{ "cells": [ { "cell_type": "markdown", "id": "a2f88f94-378f-4bde-bdeb-2f4ce810c908", "metadata": {}, "source": [ "# Executing MPI Tasks with RAPTOR\n", "\n", "This notebook will walk you through setting up and using the RAPTOR subsystem to execute **MPI function** tasks. To execute MPI functions with RAPTOR, we need to specify the type and size of the worker to be deployed by RAPTOR. The primary purpose of using RAPTOR to execute MPI functions is RAPTOR's capabilities to construct and deliver heterogeneous (different ranks) private MPI communicators during the execution time to the function. In the example below, we will execute an MPI function that requires 4 MPI ranks, and for that, we will deploy a single master and worker.\n", "\n", "
\n", "\n", "__Warning:__ We assume you have already worked through our [RAPTOR](raptor.ipynb) tutorial.\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "ed034a4d-bf9b-432a-94bb-2f4e471c39bc", "metadata": {}, "source": [ "## Prepare an RP pilot to host RAPTOR\n", "We will launch a pilot with sufficient resources to run both the RAPTOR master (using 1 core) and a single worker instance (using 10 cores):" ] }, { "cell_type": "code", "execution_count": null, "id": "41ac75bb-7c24-4d8d-beaf-8dcae1cc6982", "metadata": {}, "outputs": [], "source": [ "%env RADICAL_REPORT=TRUE\n", "# do not use animated output in notebooks\n", "%env RADICAL_REPORT_ANIME=FALSE" ] }, { "cell_type": "code", "execution_count": null, "id": "1c1aa520-67ac-4322-b89e-678d1e48d64e", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "import radical.pilot as rp\n", "import radical.utils as ru\n", "\n", "# determine the path of the currently active ve to simplify some examples below\n", "ve_path = os.path.dirname(os.path.dirname(ru.which('python3')))\n", "\n", "# create session and managers\n", "session = rp.Session()\n", "pmgr = rp.PilotManager(session)\n", "tmgr = rp.TaskManager(session)\n", "\n", "# submit a pilot\n", "pilot = pmgr.submit_pilots(rp.PilotDescription({'resource' : 'local.localhost',\n", " 'cores' : 4,\n", " 'runtime' : 30,\n", " 'exit_on_error': False}))\n", "\n", "# add the pilot to the task manager and wait for the pilot to become active\n", "tmgr.add_pilots(pilot)\n", "pilot.wait(rp.PMGR_ACTIVE)" ] }, { "cell_type": "markdown", "id": "6f1b35ed-4cac-4a0e-b530-ad71c3a4850a", "metadata": {}, "source": [ "The pilot is now in an `ACTIVE` state, and the resource is acquired." ] }, { "cell_type": "markdown", "id": "688bdced-6a4e-4cde-a767-77747f1fbdfd", "metadata": {}, "source": [ "## Prepare RAPTOR environment\n", "RAPTOR mainly depends on `mpi4py` to launch its masters and workers. Thus, to use RAPTOR, it is required to have `mpi4py` installed in the used Python/conda environment. RP offers two options: creating a Python/conda environment with `mpi4py` installed in it, or using an existing environment by specifying the path of the environment. In this tutorial, we will instruct RP to create a new Python virtual environment named `raptor_ve`:" ] }, { "cell_type": "code", "execution_count": null, "id": "ce790e04-c4a1-47b5-8d6f-7138447aa5e6", "metadata": {}, "outputs": [], "source": [ "pilot.prepare_env(env_name='raptor_ve',\n", " env_spec={'type' : 'venv',\n", " 'path' : '/tmp/raptor_ve',\n", " 'setup': ['radical.pilot',\n", " 'mpi4py']})\n", "print('raptor_ve is ready')" ] }, { "cell_type": "markdown", "id": "7e6da09c-6bd4-42fc-a27d-0b54914e723f", "metadata": {}, "source": [ "The Python virtual environment is now ready, and the pilot can launch the masters/workers and executes the tasks." ] }, { "cell_type": "markdown", "id": "a840e669-a101-4131-a771-ff6dda933436", "metadata": {}, "source": [ "Create a master and MPI worker by specifiying the `raptor_class: MPIWorker` in the worker description below. This value will instruct RAPTOR to start the MPI worker rather than the `DefaultWorker`. Note that we also specified the number of `ranks` (cores) in the worker description to a number **larger** than the required number of ranks for the designated MPI task function in order for the function to be executed on that worker.\n", "\n", "
\n", "\n", "__Warning:__ The number of master(s) and worker(s) depends on the workload specifications and the use case that you are trying to execute. Sometimes, you might be required to deploy two masters and workers instead of 1 for more efficient load balancing. \n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "id": "ff3a35fd-c289-42ce-80da-2d6cc9eb945a", "metadata": {}, "outputs": [], "source": [ "raptor_descr = {'mode' : rp.RAPTOR_MASTER,\n", " 'named_env' : 'raptor_ve'}\n", "worker_descr = {'mode' : rp.RAPTOR_WORKER,\n", " 'ranks' : 2,\n", " 'named_env' : 'raptor_ve',\n", " 'raptor_class': 'MPIWorker'}\n", "\n", "raptor = pilot.submit_raptors([rp.TaskDescription(raptor_descr)])[0]\n", "raptor.submit_workers([rp.TaskDescription(worker_descr),\n", " rp.TaskDescription(worker_descr)])" ] }, { "cell_type": "markdown", "id": "30306b21-0bdb-49d2-8280-52e91bfe351e", "metadata": {}, "source": [ "Define a Python function that requires running in an MPI environment and use the `rp.pythontask` decorator so the function can be serialized by RP. Note that this function takes a `comm` argument, representing the MPI communicator that this function needs." ] }, { "cell_type": "code", "execution_count": null, "id": "7ea76f13-6770-4ae0-a0e4-d6b49cc17120", "metadata": {}, "outputs": [], "source": [ "@rp.pythontask\n", "def func_mpi(comm, msg, sleep=2):\n", " import time\n", " print('hello %d/%d: %s' % (comm.rank, comm.size, msg))\n", " time.sleep(sleep)\n", " return 'func_mpi retval'" ] }, { "cell_type": "markdown", "id": "252c0b5a-b71f-4217-b162-900b4c3021b7", "metadata": {}, "source": [ "Create a corresponding [rp.TaskDescription](../apidoc.rst#radical.pilot.TaskDescription) to `func_mpi` and specify the function object and the number of `ranks` required to run the function within (the number of ranks also represents the size of the `comm` object passed to the function during the execution time)" ] }, { "cell_type": "code", "execution_count": null, "id": "2981e8d6-8c35-49e7-b205-53eb51d798b3", "metadata": {}, "outputs": [], "source": [ "# create a minimal MPI function task\n", "td = rp.TaskDescription({'mode' : rp.TASK_FUNCTION,\n", " 'ranks' : 2,\n", " 'function': func_mpi(None, msg='mpi_task.0')})\n", "mpi_task = raptor.submit_tasks([td])[0]" ] }, { "cell_type": "markdown", "id": "c99d190a-b214-4a2a-98da-a0b8b2f0ac8e", "metadata": {}, "source": [ "Wait for the task status to be reported back." ] }, { "cell_type": "code", "execution_count": null, "id": "a9e67c02-96ec-4b08-a620-3f186e7e9522", "metadata": {}, "outputs": [], "source": [ "tmgr.wait_tasks([mpi_task.uid])\n", "print('id: %s [%s]:\\n out: %s\\n ret: %s\\n'\n", " % (mpi_task.uid, mpi_task.state, mpi_task.stdout, mpi_task.return_value))" ] }, { "cell_type": "code", "execution_count": null, "id": "7419ade7-3bf1-4ab2-8ab1-a671fe3746b9", "metadata": {}, "outputs": [], "source": [ "session.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 5 }