{ "cells": [ { "cell_type": "markdown", "id": "67521807", "metadata": {}, "source": [ "# Executing Tasks with RAPTOR\n", "\n", "This notebook introduces you to RAPTOR, a high-throughput RADICAL-Pilot's subsystem that executes **function** tasks and non-MPI executable tasks at scale on [supported HPC platforms](../supported.rst). This tutorial will guide you through the setup and configuration of RAPTOR and through the specification and execution of a variety of task types.\n", "\n", "
\n", "\n", "__Warning:__ We assume you have already worked through our [Getting Started](../getting_started.ipynb) and [Describing Tasks](describing_tasks.ipynb) tutorials.\n", "\n", "
\n", "\n", "
\n", "\n", "__Warning:__ All examples in this notebook are executed locally on your machine. You need to have installed MPI before executing these examples. RADICAL-Pilot and RAPTOR support OpenMPI, MPICH, MVAPICH or any other MPI flavor that provides a standards compliant `mpiexec` command.\n", "\n", "
\n", "\n", "## When Using RAPTOR\n", "\n", "Use RAPTOR when you want to concurrently execute free/serialized Python functions, Python class methods and shell commands. For example, you want to concurrently execute up to 10^5 machine learning functions across thousands of GPUs and CPUs. RAPTOR supports single, multi-process and MPI Python functions.\n", "\n", "You should also use RAPTOR when your application requires non-MPI tasks which execute for less than 5 minutes. You could use RADICAL-Pilot without RAPTOR for that workload but you would incur into high scheduling and launching overheads.\n", "\n", "## What is RAPTOR\n", "\n", "RAPTOR is a subsystem of RP, thus you have to execute RP in order to use RAPTOR. RAPTOR launches a configurable number of masters and workers on the resources you acquired via RP. Once up and running, each RAPTOR's master\n", "will receive task execution requests from RP. In turn, each master will dispatch those requests to the workers which are optimized to execute small, short-running tasks at scale.\n", "\n", "Different from RP's 'normal' task, RAPTOR can execute a variety of task types:\n", "\n", "- executables: similar to RP's native task execution, but without MPI support\n", "- free Python functions\n", "- Python class methods\n", "- serialized Python functions\n", "- plain Python code\n", "- shell commands\n", "\n", "Importantly, all function invocations can make use of MPI by defining multiple ranks. \n", "\n", "RAPTOR has a number of advanced capabilities, such as:\n", "\n", "- new task types can be added by applications\n", "- the RAPTOR Master class can be overloaded by applications\n", "- the RAPTOR Worker class can be overloaded by applications\n", "- Master and Worker layout can be tuned in a variety of ways\n", "- different Worker implementations are available with different capabilities and scaling properties\n", "- workload execution can mix RAPTOR task execution and 'normal' RP task execution\n", " \n", "Those topics will not be covered in this basic tutorial.\n" ] }, { "cell_type": "markdown", "id": "5958a7c2", "metadata": {}, "source": [ "\n", "## Prepare a RP pilot to host RAPTOR\n", "\n", "We will launch a pilot with sufficient resources to run both the raptor master (using 1 core) and two worker instances (using 8 cores each):\n" ] }, { "cell_type": "code", "execution_count": null, "id": "55e8cb08-6ca0-453e-a7c9-e2a7de331839", "metadata": {}, "outputs": [], "source": [ "%env RADICAL_REPORT=TRUE\n", "# do not use animated output in notebooks\n", "%env RADICAL_REPORT_ANIME=FALSE" ] }, { "cell_type": "code", "execution_count": null, "id": "c8b8387d", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "import radical.pilot as rp\n", "import radical.utils as ru\n", "\n", "# determine the path of the currently active ve to simplify some examples below\n", "ve_path = os.path.dirname(os.path.dirname(ru.which('python3')))\n", "\n", "# create session and managers\n", "session = rp.Session()\n", "pmgr = rp.PilotManager(session)\n", "tmgr = rp.TaskManager(session)\n", "\n", "# submit a pilot\n", "pilot = pmgr.submit_pilots(rp.PilotDescription({'resource' : 'local.localhost',\n", " 'cores' : 17,\n", " 'runtime' : 60,\n", " 'exit_on_error': False}))\n", "\n", "# add the pilot to the task manager and wait for the pilot to become active\n", "tmgr.add_pilots(pilot)\n", "pilot.wait(rp.PMGR_ACTIVE)" ] }, { "cell_type": "markdown", "id": "50b5bb36", "metadata": {}, "source": [ "\n", "We now have an active pilot with sufficient resource and can start executing the RAPTOR master and worker instances. Both master and worker need to run in an environment which has `radical.pilot` installed, so we place it in the pilot agent environment `rp` (the [rp.TaskDescription.named_env](../apidoc.rst#radical.pilot.TaskDescription.named_env) attribute is covered by the tutorial [Describing Tasks](describing_tasks.ipynb)):\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b1c41c91", "metadata": {}, "outputs": [], "source": [ "raptor_descr = {'mode' : rp.RAPTOR_MASTER,\n", " 'named_env': 'rp'}\n", "worker_descr = {'mode' : rp.RAPTOR_WORKER,\n", " 'named_env': 'rp'}\n", "\n", "raptor = pilot.submit_raptors([rp.TaskDescription(raptor_descr)])[0]\n", "raptor.submit_workers([rp.TaskDescription(worker_descr), \n", " rp.TaskDescription(worker_descr)])" ] }, { "cell_type": "markdown", "id": "1ce411cb", "metadata": {}, "source": [ "### Task execution\n", "\n", "At this point we have the pilot set up and running, we started the master task, and the master will upon initialization start the worker tasks: the RAPTOR overlay is now ready to execute a Python function. Note that a function must be decorated with the `rp.pythontask` decorator for it to be callable as a raptor function.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "71eb4ff2", "metadata": {}, "outputs": [], "source": [ "# function for raptor to execute\n", "@rp.pythontask\n", "def msg(val: int):\n", " if (val % 2 == 0):\n", " print('Regular message')\n", " else:\n", " print(f'This is a very odd message: {val}')\n", "\n", "# create a minimal function task\n", "tasks = raptor.submit_tasks([\n", " rp.TaskDescription({'mode' : rp.TASK_FUNCTION,\n", " 'function': msg(3)}),\n", " rp.TaskDescription({'mode' : rp.TASK_FUNCTION,\n", " 'function': msg(8)})\n", "])" ] }, { "cell_type": "markdown", "id": "03112275", "metadata": {}, "source": [ "The task will be scheduled for execution on the pilot we created above. We now wait for the task to complete, i.e., to reach one of the final states `DONE`, `CANCELED` or `FAILED`:" ] }, { "cell_type": "code", "execution_count": null, "id": "5f2ea29b", "metadata": {}, "outputs": [], "source": [ "tmgr.wait_tasks([t.uid for t in tasks])\n", "for task in tasks:\n", " print('id: %s [%s]:\\n out: %s\\n ret: %s\\n'\n", " % (task.uid, task.state, task.stdout.strip(), task.return_value))" ] }, { "cell_type": "code", "execution_count": null, "id": "3fdd9fe6", "metadata": {}, "outputs": [], "source": [ "session.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }