The Object Oriented Data Technology group at NASA's Jet Propulsion Laboratory is developing software for locating data — especially scientific data — stored in various formats on heterogeneous computer systems at different locations. The software is intended to exploit and extend advances in Internet software and in distributed object-oriented software to overcome the technological obstacles to integration of heterogeneous computing environments. The approach taken in this development involves refocusing effort on the development of metadata, which would be used to describe the available data resources and to support interoperability of computing systems. The software would manage a hierarchical conglomerate of data-set-resource definitions that would make it possible for application programs to locate the data that they require, without advance knowledge of which computer data systems and catalogs to search. This software would utilize the Extensible Markup Language (XML) and the Common Object Request Broker Architecture (CORBA) to support for interchange of data among heterogeneous sources. CORBA would enable over-the-wire exchange of XML-based profiles that would contain descriptions of data stored in remote computer systems.
This program was written by Daniel Crichton, John Hughes, Sean Kelly, and Jason Hyon of Caltech for NASA's Jet Propulsion Laboratory. For further information, access the Technical Support Package (TSP) free on-line at www.nasatech.com/tsp under the Information Sciences category.
NPO-21045
This Brief includes a Technical Support Package (TSP).

Software for Locating Heterogeneous Data in Different Places
(reference NPO-21045) is currently available for download from the TSP library.
Don't have an account?
Overview
The document discusses the development of a resource location service designed to enhance the accessibility and interoperability of scientific data across various disciplines, particularly in the context of NASA's Jet Propulsion Laboratory (JPL). The service is implemented entirely in Java, utilizing CORBA and XML to create a flexible architecture that can easily integrate new data systems. This architecture is structured around a standard application programming interface (API) that allows for the creation of generic science analysis tools capable of retrieving and correlating data from multiple sources.
The framework employs an n-tier architecture, which separates the user interface, domain logic, and storage layers, facilitating a more organized approach to data management. A key feature of the service is its ability to handle metadata, which describes the various data resources within a distributed environment. The document emphasizes the importance of a common metadata interchange language, identifying XML as an ideal solution due to its expressiveness, simplicity, and wide acceptance as an Electronic Data Interchange (EDI) standard.
The development of the XML Extensible Profile Language (X2PL) is highlighted as a significant step in creating a generic structure for capturing metadata from diverse domains. This allows for improved interoperability among different data systems, which traditionally require users to navigate unique tools and interfaces for each system. The document outlines the challenges faced in querying heterogeneous data systems and the necessity for a unified approach to metadata development.
Additionally, the document mentions the potential for extending the query service to be accessible via the HTTP standard, enabling HTML pages to send XML queries and render results directly. This would further streamline the process of data retrieval for end users.
The overall goal of the resource location service is to provide a robust infrastructure that supports data discovery and mining techniques, allowing researchers to uncover new relationships within the data. The framework is adaptable and not limited to planetary science; it has potential applications in other fields such as healthcare, defense, and business.
In summary, the document presents a comprehensive overview of a sophisticated data management solution that addresses the complexities of accessing and integrating scientific data from disparate sources, ultimately aiming to enhance research capabilities across various disciplines.

