Simple, Scalable, Script-based, Science Processing (S4P) Archive (S4PA) is a disk-based data-archiving system for remote sensing data. It is based on the data-driven framework of S4P. The system is used for new data transfer, data preprocessing, metadata generation, and data archival. The system provides services such as data access control, data subscription, metadata publication, and data recovery. The data is archived on readily available disk drives, with FTP (File Transfer Protocol) and HTTP (Hypertext Transfer Protocol) being primary modes of data access. S4PA includes a graphical user interface for monitoring and re-configuring the system operation, a tool for deploying the system, and various other tools that help manage the data ingest and archiving process, such as data replication, auxiliary file backup, database merge, storage of dataset README documents in CVS (Concurrent Versions System), an interface for machine search, deployment of S4PA instances from configuration stored in CVS, etc.
Use of disk storage for data archival enables optional online access to data, faster data recovery, and online data services. The system consists of: (1) a data poller to detect the presence of data on an FTP server, supporting FTP, SFTP (Secure File Transfer Protocol), and bbFTP protocols; (2) a data transfer component to transfer the data, including any associated data, such as browse files; (3) a preprocessor to process data, including data file renaming, before archiving and metadata generation; (4) a storage component to store data, including any associated data, in a user-accessible location; (5) a subscription handler to notify users of data arrival; (6) a metadata publisher to make metadata available to users/applications, such as ECHO, Mirador, and Giovanni; (7) components for continuous verification of data integrity, auxiliary file backup, data recovery, and database merge; (8) a transient archive to manage data deletion and re-publication; (9) machine-to-machine search capability and HTTP interface; (10) a reconciliation component to reconcile S4PA data holdings with applications, such as ECHO and Mirador, that publish S4PA data; and (11) replication of data holdings to manage system failure.
This work was done by Christopher Lynnes of Goddard Space Flight Center; Guang-Dih Lei, Edward Seiler, C. Wrandle Barth, and Mahabaleshwara Hegde of ADNET Systems, Inc.; and Lei Fang of Global Science & Technology Inc. GSC-15877-1