Physics Mining of Multi-Source Data Sets
- Created on Thursday, 01 March 2012
Powerful new parallel data mining algorithms can produce diagnostic and prognostic numerical models and analyses from observational data. These techniques yield higher-resolution measures than ever before of environmental parameters by fusing synoptic imagery and time-series measurements. These techniques are general and relevant to observational data, including raster, vector, and scalar, and can be applied in all Earth- and environmental-science domains. Because they can be highly automated and are parallel, they scale to large spatial domains and are well suited to change and gap detection. This makes it possible to analyze spatial and temporal gaps in information, and facilitates within-mission re-planning to optimize the allocation of observational resources.
The basis of the innovation is the extension of a recently developed set of algorithms packaged into MineTool to multi-variate time-series data. MineTool is unique in that it automates the various steps of the data mining process, thus making it amenable to autonomous analysis of large data sets. Unlike techniques such as Artificial Neural Nets, which yield a blackbox solution, MineTool’s outcome is always an analytical model in parametric form that expresses the output in terms of the input variables. This has the advantage that the derived equation can then be used to gain insight into the physical relevance and relative importance of the parameters and coefficients in the model. This is referred to as “physics-mining of data.” The capabilities of MineTool are extended to include both supervised and unsupervised algorithms, handle multi-type data sets, and parallelize it.
The innovations include: (1) Physics mining algorithms, enabling derivation of analytical relations and physical models from observational data; (2) Automated, parallel algorithms, enabling a high degree of automation and parallelization, scaling to large spatial domains well-suited to change and gap detection; (3) Local versus global modeling, to generate locally optimal models appropriate to a specific geospatial region accounting for the unique setting and conditions; (4) Fusion of multi-source, multi-type data that yield higher-resolution measures than ever before by fusing synoptic imagery and independent time-series measurements; and (5) Calculation of Palmer’s Drought Severity Index Analogue.
Successful completion of this project will lead to a major breakthrough in the climate study in particular, and to analysis of multi-source data as applied to the hydrologic cycle affecting climate change impacts and resource management.
This work was done by John Helly, Homa Karimabadi, and Tamara Sipes of SciberQuest, Inc. for Goddard Space Flight Center. GSC-15802-1