NASA scientists are uniquely positioned to research and understand the processes affecting the Earth’s climate. To study these important processes, scientists must address the Big Data challenges posed by working with massive amounts of observational and climate model output data. The Advanced Data Analytics Platform (ADAPT) is a cyber infrastructure resource specifically designed to reduce the friction between scientists and data. The system includes a high-performance storage cloud surrounded by large-scale compute resources. A very high-performing network enables fast access to the data stored within ADAPT. Furthermore, the system allows users to bring their applications to the data and define the environment in which those applications run. The science results can then be stored for future analysis or shared through static and dynamic data services within ADAPT without having to move the data or make additional copies. The agility, flexibility, and extensibility of the system make it ideal for NASA scientists to produce science results quickly by analyzing large data sets.
ADAPT combines several recent innovations from the information technology community, many of which have been researched within NASA’s Center for Climate Simulation (NCCS). These technologies include virtualized highspeed Infiniband networks, combined with a high-performance file system and object storage environment, and virtual systems deployed in a cyberinfrastructure specifically designed for data-intensive science applications.
At the center of the resource is a large object storage environment that combines computation with data storage capabilities. This resource will allow users to access the object storage environment much like a traditional file system, while also providing the capability to perform data proximal processing, where the analytics are performed on the data storage node through technologies like a Hadoop Distributed File System (HDFS). Surrounding the storage is a cloud of high-performance compute resources with many processing cores and large memory coupled to the storage through an Infiniband network. Through the use of such technologies as Single Root Input/Output Virtuali zation (SR-IOV), virtual systems can be provisioned on the compute resources with extremely high-speed network connectivity to the storage and to other virtual systems.
ADAPT’s most important feature is the architecture to enable large-scale data analysis through the architectural combination of storage, compute, networking, and cloud computing capabilities. The ability to bring the scientist’s application to the data and define the environment in which that application runs greatly reduces the friction among the scientist, the data, and the high-performance system. This will enable scientists to quickly create analysis applications, port them to a very large resource, and have access to extremely large model and observational data sets.
This work was done by Daniel Duffy, Scott Sinno, Hoot Thompson, Garrison Vaughan, John Schnase, Mark McInerney, and Phil Webster of Goddard Space Flight Center.