A computer program reduces data generated by NASA Earth-science missions into representative clusters characterized by centroids and membership information, thereby reducing the large volume of data to a level more amenable to analysis. The program effects an autonomous data-reduction/clustering process to produce a representative distribution and joint relationships of the data, without assuming a specific type of distribution and relationship and without resorting to domain-specific knowledge about the data.

The program implements a combination of a data-reduction algorithm known as the entropy-constrained vector quantization (ECVQ) and an optimization algorithm known as the differential evolution (DE). The combination of algorithms generates the Pareto front of clustering solutions that presents the compromise between the quality of the reduced data and the degree of reduction.

Similar prior data-reduction computer programs utilize only a clustering algorithm, the parameters of which are tuned manually by users. In the present program, autonomous optimization of the parameters by means of the DE supplants the manual tuning of the parameters. Thus, the program determines the best set of clustering solutions without human intervention.

This program was written by Seungwon Lee, Amy J. Braverman, and Alexandre Guillaume of Caltech for NASA’s Jet Propulsion Laboratory. For more information, download the Technical Support Package (free white paper) at www.techbriefs.com/tsp under the Information Sciences category.

This software is available for commercial licensing. Please contact Karina Edmonds of the California Institute of Technology at (626) 395-2322. Refer to NPO-45583.



This Brief includes a Technical Support Package (TSP).
Document cover
Reducing the Volume of NASA Earth- Science Data

(reference NPO-45583) is currently available for download from the TSP library.

Don't have an account?



Magazine cover
NASA Tech Briefs Magazine

This article first appeared in the March, 2010 issue of NASA Tech Briefs Magazine (Vol. 34 No. 3).

Read more articles from this issue here.

Read more articles from the archives here.


Overview

The document is a Technical Support Package from NASA's Jet Propulsion Laboratory (JPL) detailing a feasibility study on an autonomous data reduction process aimed at managing vast amounts of Earth-science data. The primary objective is to explore performance metrics that can effectively reduce data volume while maintaining quality, specifically focusing on atmospheric and Earth science datasets.

The study emphasizes the importance of finding a balance between two competing objectives: distortion and entropy. To achieve this, the researchers analyze the Pareto front, which represents a set of optimal solutions where improvements in one objective do not significantly degrade the other. The document discusses the challenges associated with determining the curvature of the Pareto front due to its sparsity and noise. To address these issues, the authors propose a new performance metric based on the distance of a solution to an ideal line connecting two extreme solutions: one with zero distortion and the other with zero entropy. The solution that maximizes this distance from the ideal line is considered the best, representing the "sweet spot" in the trade-off between distortion and entropy.

The autonomous data reduction process demonstrated in the study shows that it is feasible to perform data reduction without the need for manual tuning and monitoring of clustering solutions. The final solution derived from the Pareto front effectively balances reduction quality and the amount of data retained, ensuring that neither objective is significantly compromised.

The document also includes references to relevant literature, such as works on Entropy-Constrained Vector Quantization and Differential Evolution, which underpin the methodologies employed in the study. It highlights the potential applications of this research in broader technological and scientific contexts, emphasizing the significance of efficient data management in the field of Earth sciences.

Overall, the Technical Support Package outlines a systematic approach to developing an autonomous data reduction process, providing insights into performance metrics and optimization strategies that can enhance the handling of large datasets in NASA's Earth-science missions. The findings contribute to ongoing efforts to improve data processing capabilities, ultimately supporting more effective scientific research and exploration.