A framework for high-level specification of data distributions in data-parallel application programs has been conceived. [As used here, "distributions" signifies means to express locality (more specifically, locations of specified pieces of data) in a computing system composed of many processor and memory components connected by a network.] Inasmuch as distributions exert a great effect on the performances of application programs, it is important that a distribution strategy be flexible, so that distributions can be adapted to the requirements of those programs. At the same time, for the sake of productivity in programming and execution, it is desirable that users be shielded from such error-prone, tedious details as those of communication and synchronization.

The Items Named in the Boxes are the elements involved in defining a distribution.

As desired, the present framework enables a user to refine a distribution type and adjust it to optimize the performance of an application program and conceals, from the user, the low-level details of communication and synchronization. The framework provides for a reusable, extensible, data-distribution design, denoted the design pattern, that is independent of a concrete implementation. The design pattern abstracts over coding patterns that have been found to be commonly encountered in both manually and automatically generated distributed parallel programs.

The following description of the present framework is necessarily oversimplified to fit within the space available for this article. Distributions are among the elements of a conceptual data-distribution machinery, some of the other elements being denoted domains, index sets, and data collections (see figure). Associated with each domain is one index set and one distribution. A distribution class interface (where "class" is used in the object-oriented- programming sense) includes operations that enable specification of the mapping of an index to a unit of locality. Thus, "Map(Index)" specifies a unit, while "LocalLayout(Index)" specifies the local address within that unit. The distribution class can be extended to enable specification of commonly used distributions or novel user-defined distributions.

A data collection can be defined over a domain. The term "data collection" in this context signifies, more specifically, an abstraction of mappings from index sets to variables. Since the index set is distributed, the addresses of the variables are also distributed.

This work was done by Mark James, Hans Zima, and Roxana Diaconescua of Caltech for NASA's Jet Propulsion Laboratory.

The software used in this innovation is available for commercial licensing. Please contact Karina Edmonds of the California Institute of Technology at (626) 395-2322. Refer to NPO-42538.

This Brief includes a Technical Support Package (TSP).
Reusable, Extensible High-Level Data- Distribution Concept

(reference NPO-42538) is currently available for download from the TSP library.

Don't have an account? Sign up here.