A reference architecture is defined for an object-oriented implementation of domains, arrays, and distributions written in the programming language Chapel. This technology primarily addresses domains that contain arrays that have regular index sets with the low-level implementation details being beyond the scope of this discussion. The theoretical foundations are based upon the work "A Semantic Framework for Domains, Arrays, and Distributions in Chapel" by Hans Zima. What is defined is a complete set of object-oriented operators that allows one to perform data distributions for domain arrays involving regular arithmetic index sets. What is unique is that these operators allow for the arbitrary regions of the arrays to be fragmented and distributed across multiple processors with a single point of access giving the programmer the illusion that all the elements are collocated on a single processor. Today's massively parallel High Productivity Computing Systems (HPCS) are characterized by a modular structure, with a large number of processing and memory units connected by a high-speed network. Locality of access as well as load balancing are primary concerns in these systems that are typically used for highperformance scientific computation. Data distributions address these issues by providing a range of methods for spreading large data sets across the components of a system. Over the past two decades, many languages, systems, tools, and libraries have been developed for the support of distributions. Since the performance of data parallel applications is directly influenced by the distribution strategy, users often resort to low-level programming models that allow fine-tuning of the distribution aspects affecting performance, but, at the same time, are tedious and errorprone. This technology presents a reusable design of a data-distribution framework for data parallel high-performance applications. Distributions are a means to express locality in systems composed of large numbers of processor and memory components connected by a network. Since distributions have a great effect on the performance of applications, it is important that the distribution strategy is flexible, so its behavior can change depending on the needs of the application. At the same time, high productivity concerns require that the user be shielded from error-prone, tedious details such as communication and synchronization.

This program was written by Mark James of Caltech for NASA's Jet Propulsion Laboratory. For further information, access the Technical Support Package (TSP) free online at www.techbriefs.com/tsp under the Software category.

This software is available for commercial licensing. Please contact Karina Edmonds of the California Institute of Technology at (626) 395-2322. Refer to NPO-42506.



This Brief includes a Technical Support Package (TSP).
Document cover
Implementing Access to Data Distributed on Many Processors

(reference NPO-42506) is currently available for download from the TSP library.

Don't have an account?



Magazine cover
NASA Tech Briefs Magazine

This article first appeared in the August, 2006 issue of NASA Tech Briefs Magazine (Vol. 30 No. 8).

Read more articles from the archives here.


Overview

The document titled "Implementing Access to Data Distributed on Many Processors" (NPO-42506) is a technical support package from NASA's Jet Propulsion Laboratory that outlines a framework for managing large data sets across multiple processors using an object-oriented approach in the Chapel programming language. It addresses the challenges of data distribution in massively parallel High Productivity Computing Systems (HPCS), which are characterized by modular structures with numerous processing and memory units interconnected by high-speed networks.

The primary focus of the document is on the implementation of domains and arrays with regular index sets, emphasizing the need for efficient data distribution strategies that enhance performance in high-performance scientific computations. The document introduces a complete set of object-oriented operators that facilitate data distributions, allowing for fragmented regions of arrays to be distributed across multiple processors while providing a unified access point. This design gives programmers the illusion that all elements are collocated on a single processor, simplifying the programming model.

The document highlights the importance of locality of access and load balancing in distributed systems, as these factors significantly influence the performance of data-parallel applications. It notes that while many languages, systems, tools, and libraries have been developed over the past two decades to support data distributions, users often resort to low-level programming models for fine-tuning, which can be tedious and error-prone. Therefore, the proposed framework aims to shield users from these complexities, allowing them to focus on application development without getting bogged down by the intricacies of communication and synchronization.

Additionally, the document assumes familiarity with Hans Zima's work, "A Semantic Framework for Domains, Arrays, and Distributions in Chapel," as it builds upon the concepts discussed in that paper. The framework presented is designed to be reusable and flexible, adapting to the varying needs of different applications while ensuring high productivity.

In summary, this technical support package serves as a valuable resource for understanding and implementing efficient data distribution strategies in high-performance computing environments, ultimately contributing to advancements in aerospace-related technologies and their broader applications.