Skip to content

RUNTIME and COMMUNICATOR

Chenhan D. Yu edited this page Jan 8, 2017 · 4 revisions

This page will describe the runtime system of HMLP. We will support matrix-base dependency detection such that automatic parallelization on heterogenous architectures is possible.

Runtime

Definition

Our runtime system is defined on a shared memory (with exceptions for GPUs) platform, which can have p heterogenous works. While many primitive calls are created in a high-level applications, runtime system employs dynamic scheduling to exploit the computing resources. Programmers can manually create their own tasks (as an extension of HMLP task), describing the dependencies with API. For certain supported primitives, it is also possible to let the runtime figures out the dependency and parallelism by itself [SuperMatrix].

Templates

Usage

Communicator

Other than task parallelism, a tree base communicator is an infrastructure used internally in BLIS to exploit the nested loop base parallelism. Here in HMLP we have a simplified version of the communicator for ``internal'' use (of course, you can do what ever you want to it). Despite that I am not the authors who invented the communicator, but the idea is pretty sure from the MPI standard. Depending on how the nested loops are parallelized, the root communicator will be subdivided into sub communicators. The grouping usually associates with the locality of the physical cores. Synchronization can be performed in each communicator to block only a group of thread. For more details, see [Tylor IPDPS'15].

Algorithms and Implementation

Limitations