What is Minos Computing Library (MCL)?

MCL is a modern task-based, asynchronous programming model for extremely heterogeneous systems. MCL consists of a single scheduler process and multiple, independent, multi- threaded MCL applications executed concurrently on the same compute node. For example, MCL can seamlessly support the execution of multiple Message Passing Interface (MPI) ranks within a single node as well independent applications programmed with separate programming models and runtimes, e.g., an OpenMP applications next to a TensorFlow application. Users need not be aware of other applications executed on the same compute node, as the MCL scheduler coordinates access to shared computing resources.

MCL aims at abstracting the low-level hardware details of a system, supporting the execution of complex workflows that consists of multiple, independent applications (e.g., scientific simulation coupled with in-situ analysis or AI frameworks that analyze the results of a physic simulation), and performing efficient and asynchronous execution of computation tasks. MCL is not meant to be the programming model employed by domain scientists to implement their algorithms, but rather to support several high-level Domain-Specific Language (DSL)s and programming model runtimes. Currently, MCL supports OpenMP, OpenACC [5], TACO [6], MPI, and pthreads. Work is in progress to support AI frameworks, such as TensorFlow, and other DSLs for chemistry applications.

An MCL application consists of a sequence of tasks that need to be executed on available computing resources. The MCL programming model Application Programming Interface (API) allows user to specify tasks and control dependencies among tasks. Once submitted, tasks are scheduled for execution on a specific device by the MCL scheduler, according to the scheduling algorithm in use.
MCL leverages the OpenCL library and API to interface with computing devices and express computational kernels. Normally, users do not need to directly write OpenCL kernels, as they are automatically generated by the higher-level DSL compiler (e.g., TACO), though directly writing OpenCL kernels and implementing an algorithm using the MCL API is certainly possible. OpenCL allows MCL to execute the same computational kernel on different computing devices, including, CPUs, GPUs, FPGAs, as well as some of the novel AI engines, such as the NVIDIA Deep Learning Accelerator (DLA). MCL has been shown to effectively leverage heterogeneous computing resources [2] and scale up to complex multi- device systems and down to efficient embedded systems. Code developed on a laptop computer seamlessly scales to powerful multi-GPU workstations without any modification and achieves between 5-17x on an 8-GPU node automatically.

MCL has been shown to effectively leverage heterogeneous computing resources [and scale up to complex multi-device systems and down to efficient embedded systems. Code developed on a laptop computer seamlessly scales to powerful multi-GPU workstations without any modification and achieves between 5-17x on an 8-GPU node automatically.