Clustering/machine learning methods are used to structure data for prioritization, mapping, and downlinking.
Many current and future NASA missions are capable of collecting enormous amounts of data, of which only a small portion can be transmitted to Earth. Communications are limited due to distance, visibility constraints, and competing mission downlinks. Long missions and high-resolution, multispectral imaging devices easily produce data exceeding the available bandwidth. To address this situation, computationally efficient algorithms were developed for analyzing science imagery onboard the spacecraft. These algorithms autonomously cluster the data into classes of similar imagery, enabling selective downlink of representatives of each class, and a map classifying the terrain imaged rather than the full dataset, reducing the volume of the downlinked data. A range of approaches was examined, including k-means clustering using image features based on color, texture, temporal, and spatial arrangement.
Several unique challenges influenced design decisions for automatic image analysis. First, onboard processing is limited in spaceflight applications. Avionics computers must satisfy strict radiation and energy constraints, and their resources are shared between continuous autonomous control and data processing. Computational constraints mandate a simple approach to image analysis in which statistical properties of the image serve as proxies for the actual content.
A major challenge is the diversity of surface features an aerobot might encounter. An aerobot would be in constant motion but difficult to control due to unpredictable atmospheric currents. It would be difficult to schedule image targets in advance or to anticipate the features of interest that will appear. This favors an “unsupervised” approach that makes few assumptions about image content but instead discovers interesting and representative samples based on the intrinsic properties of the data. Clustering is one common unsupervised approach; it classifies a dataset into discrete categories of items with similar properties.
Image features can be considered to fall into one of four groups, or themes, based on the properties they describe. These are color, edge, frequency, and time. The edge and frequency features correlate with image texture, color captures basic color statistics, and time describes the temporal order in which the images were collected.
The main feature of this innovation is the use of clustering/machine learning methods to structure data for prioritization, mapping, and downlink. The effectiveness of clustering rests on the quality of the feature vectors describing each set of data. Features that are redundant, or have little variance, will reduce the effectiveness of clustering. Dimensionality reduction techniques such as principal component analysis (PCA) can transform a high-dimensional feature space into a lower-dimensional space where the new, uncorrelated features have heightened variance. Ideal clusterings contain compact clusters that are spread far apart from one another.