The NASA Physical Oceanography Distributed Active Archive Center (PO.DAAC) is NASA’s designated data center for information relevant to the physical state of the ocean. Its core datamanagement and workflow system, Data Management and Archive System (DMAS), is responsible for processing hundreds of thousands of data products each day, around the clock. Its inventory captures over 800 datasets, several million granules, and millions of files. PO.DAAC is in need of a solution to help users quickly identify the relevant oceanographic data artifact. It also needs to export metadata according to the ISO-19115, FGDC, and GCMD specifications. Developing such a solution on top of its Oracle database has several issues. First, it is difficult to maintain since SQL needs to be updated when a schema changes or when new search criteria is needed. Second, multi-table joins yield poor performance. Third, query performance can be improved with additional indexes, but performance is negatively impacted on updates. Fourth, exposing the operational database as the direct backend to a publicly accessible service layer would subject Oracle to a Denial of Service (DoS) attack, which could halt the already very busy DMAS operation environment.
The Extensible Data Gateway Environment (EDGE) solves all four issues by leveraging indexed search technology and dynamically configurable response templates. EDGE uses a highperformance indexed search solution to index all PO.DAAC inventory data. Rather than requiring users to learn a new search syntax, EDGE comes with a Web service interface to implement the OpenSearch specification. It uses a template engine to dynamically generate a metadata response that supports the ISO-19115, FGDC, and GCMD specifications.
EDGE uses Tornado Web Service as the platform for OpenSearch specification, metadata specifications, and as the proxy service to integrate with other data services. It uses Apache Solr for a fast indexed search backend. Inventory data stored in PO.DAAC’s Oracle server is incrementally indexed every 15 minutes. Apache Solr provides an optimal search solution to all the data it manages. By offloading data from Oracle, this eliminates Denial of Service (DoS) attacks against the core data-management backend. To further ensure the reliable serving of data, EDGE’s Apache Solr uses a Master and Slave model. The Master instance is for data indexing. One slave instance is used to serve the PO.DAAC Web portal, and another slave instance is used for Web service support with OpenSearch and Metadata export.