The NASA Earth Observing System Data and Information System (EOSDIS) is a data-centric system designed for the processing and archiving from NASA’s Earth Observation missions and their distribution, as well as provision of specialized services to users. The major components of EOSDIS are 12 Distributed Active Archive Centers (DAACs), 14 Science Investigator-led Processing Systems (SIPSs), and the EOS Clearing House (ECHO). The DAACs play an import role within EOSDIS. They are divided by discipline with User Working Groups (UWGs) tailored to mission and objectives of the DAAC.
The DAACs have a challenging task for keeping up with the volume of incoming observational artifacts. It is a common workflow pattern. Each piece of the artifacts must be verified, metadata harvested, cataloged, archived, and distributed. The DAACs are also required to export all cataloged metadata to ECHO, and all ingestion and archival metrics to the EOSDIS Metrics System (EMS). A horizontally scaled data system is needed to support the expanding Earth Observational artifacts while still providing traceability and manageability to the data center operation.
While the type of observational artifacts handled by each DAAC varies, the workflow for ingestion, inventory, and archiving can be generalized. The team was determined to develop a new, extensible data management workflow system that can be applied on any projects that require a highly distributed, high-performance, and portable solution to data archival and distribution. HORIZON is a portable data management and workflow system that is extensible to various domain-specific data management needs.
HORIZON is the base framework for the Physical Oceanographic Distributed Active Archive Center (PO.DAAC)’s Data Management and Archive System (DMAS) version 4, and NASA’s Global Imagery Browse Services (GIBS) The Imagery Exchange (TIE) version 0.4. HORIZON uses Apache ZooKeeper as its job tracker. ZK serves as a facade between the job assigner (Job Manager) and the workers (Ingest Servers). This provides HORIZON the ability to scale out its operation when necessary.
HORIZON is an extensible framework for data management systems. It is packaged with the following components: SIP/AIP, Ingest, Job Manager, Significant Event, Security, Inventory, API, Archive Tools, and Operator Tool. The Open Archival Informa tion System (OAIS) specification defines the framework for information capturing to information distribution. The Submission Information Package (SIP) and Archival Information Package (AIP) implementation in HORIZON defines the standard message exchange protocol between HORIZON components and services.
HORIZON is a highly portable framework developed in Java. It can be scaled down to have all services and components running on a single machine, or it can be scaled up to run on hundreds of machines. It is the only data management system in EOSDIS with such flexibility and performance. It provides the operator a single interface to see what is running through the system at any given time, and offers traceability and dynamic updates to job states. HORIZON offers a complete solution for end-to-end data ingestion to archival. It provides message notification and tracking, as well as self-monitoring and tuning.
This work was done by Thomas Huang, Nga T. Quach, Michael E. Gangl, Christian Alarcon, and Cristina M. De Cesare of Caltech for NASA’s Jet Propulsion Laboratory. This software is available for license through the Jet Propulsion Laboratory, and you may request a license at: https://download.jpl.nasa.gov/ops/request/request_introduction.cfm. NPO-49540