Detecting an Extreme Minority Class in Hyperspectral Data Using Machine Learning
- Created: Saturday, 31 May 2014
Automated classifiers can detect surface sulfur in orbital remote sensing observations.
NASA’s Jet Propulsion Laboratory, Pasadena, California
Orbital remote sensing provides a powerful way to efficiently survey targets for features of interest in inaccessible regions of the Earth as well as on other planets. One such feature of astrobiological relevance is the presence of surface sulfur deposits, which may be present on icy moons such as Europa. All hyperspectral instruments face the difficult task of spectral feature selection (finding the spectral bands that matter), especially those that operate in previously unstudied arenas encountered in planetary missions. This software demonstrates how manually annotated labels can enable automated feature discovery that boosts science return.
This work evaluates the ability of automated classifiers to detect sulfur in remote sensing observations by the Hyperion spectrometer on the EO-1 (Earth Observing-1) spacecraft. A data-driven machine learning solution was required because it is not possible to reliably detect sulfur in hyperspectral data by simply matching observations to sulfur lab spectra, as is common for in-situ mineral imaging. Several methods (manual and automated) were evaluated to select the most relevant attributes (spectral bands) for successful sulfur detection.
Data was taken from an Earthly analog for Europa: a northern island where bioactivity has produced surface sulfur deposits on ice. This software uses machine learning algorithms to automatically discover the best spectral features that discriminate interesting spectra from uninteresting surrounding spectra such as observations of ice, rock, and exposed sulfur deposits. Unique aspects of this study include ground validation using terrestrial sulfuric ice springs, modern machine learning feature detection using a Support Vector Machine (SVM), and real-time execution onboard the EO-1 spacecraft. A primary technical innovation is the ability to handle a severe imbalance between extremely rare positive examples (sulfur deposit locations) and a plethora of negative examples.
The experimental results show that a classifier can be trained to successfully detect sulfur-bearing pixels in data collected by the Hyperion instrument onboard EO-1 while accommodating the particular computational challenges and constraints imposed by the onboard environment. The best results were achieved by using Recursive Feature Elimination (RFE) to select the 12 most discriminative bands (a limitation imposed by the EO-1 hardware), modeling the problem using four classes by decomposing “sulfur” into “bright sulfur” and “dark sulfur” populations, and employing Pair-Wise Expectation Max imization (PWEM) to filter out likely mislabeled items from the training set. Automated feature selection is effective on this problem. RFE tended to select bands similar, but not identical, to those chosen manually by an expert, and RFE’s bands often yielded higher accuracy performance, as well as a smaller number of false positive detections.