Modern datasets consisting of retrievals from space-based missions have target results, but often are accompanied by hundreds or thousands of other retrieved parameters or facts regarding a particular retrieval (e.g., pressure, temperature, spectral intensities). Many of the retrieval attempts fail due to complex or contaminated soundings, wasting precious computational time. This algorithm generates a filter based on all available metadata regarding a run that predicts whether it will converge or not.
Modern missions will generate so much data that only 6% of the record is planned on being processed by the existing slow, CPU-intensive retrieval algorithm. This algorithm generates a filter that permits “sounding selection” to avoid attempted retrievals that would inevitably fail and thus waste CPU cycles.
Unlike linear regressions, Fischer analysis, or other standard machine learning techniques that examine the “bulk” of the data to create a “fit,” this method utilizes a genetic algorithm that establishes upper and lower thresholds for each input feature. These thresholds are then optimized with a training dataset and reduced to the smallest identical set of rules that generates the same filter output. This increases scientific interpretability later as to the mechanics of the filter’s operation.
This work was done by Lukas Mandrake of Caltech for NASA’s Jet Propulsion Laboratory.