A Data Type for Efficient Representation of Other Data Types
- Tuesday, 29 July 2008
Some obstacles to programming of parallel computers are removed.
A self-organizing, monomorphic data type denoted a sequence has been conceived to address certain concerns (summarized below) that arise in programming parallel computers. [“Sequence” as used here should not be confused with “sequence” as the word is commonly understood or with “sequence” as used elsewhere to denote another, polymorphic data type that is also relevant to computer programming.] A sequence in the present sense can be regarded abstractly as a vector, set, bag, queue, or other construct. A sequence is defined in terms of the behavior of the operators that can be applied to it without any foreknowledge of the underpinnings of its representation or particular implementation.
Heretofore, in programming a parallel computer, it has been necessary for the programmer to state explicitly, at the outset, what parts of the program and the underlying data structures must be represented in parallel form. Not only is this requirement not optimal from the perspective of implementation; it entails an additional requirement that the programmer have intimate understanding of the underlying parallel structure. Often, it is not possible to have such understanding because hardware and software are designed simultaneously. The present sequence data type overcomes both the implementation and parallel-structure obstacles. In so doing, the sequence data type provides unified means by which the programmer can represent a data structure for natural and automatic decomposition to a parallel computing architecture.
Sequences exhibit the behavioral and structural characteristics of vectors, but the underlying representations are automatically synthesized from combinations of programmers’ advice and execution use metrics. Sequences can vary bidirectionally between sparseness and density, making them excellent choices for many kinds of algorithms. The novelty and benefit of this behavior lies in the fact that it can relieve programmers of the details of implementations.
The creation of a sequence enables decoupling of a conceptual representation from an implementation. In essence, a sequence is a fundamental extension of a vector. In the most general case, the length and internal structure of a sequence can be changed during run time, enabling the efficient addition and removal of elements around given positions. Because sequences are not subject to predefined limits in length, they can be used equally to store small and large collections of elements.
Sequences have efficient representations in both time and space for given patterns of use. When the use pattern of a sequence is simple, then the user has the option of causing its basic operations to be coded in line for maximal efficiency.
The underlying representation of a sequence is a hybrid of representations composed of vectors, linked lists, connected blocks, and hash tables. The internal structure of a sequence can automatically change from time to time on the basis of how it is being used. Those portions of a sequence where elements have not been added or removed can be as efficient as vectors. As elements are inserted and removed in a given portion, then different methods are utilized to provide both an access and memory strategy that is optimized for that portion and the use to which it is put.
This work was done by Mark James of Caltech for NASA’s Jet Propulsion Laboratory.
In accordance with Public Law 96-517, the contractor has elected to retain title to this invention. Inquiries concerning rights for its commercial use should be addressed to:
Innovative Technology Assets Management
Mail Stop 202-233
4800 Oak Grove Drive
Pasadena, CA 91109-8099
Refer to NPO-41090, volume and number of this NASA Tech Briefs issue, and the page number.