A novel technique has been developed for anomaly detection of rocket engine test stand (RETS) data. The objective was to develop a system that post-processes a csv file containing the sensor readings and activities (time-series) from a rocket engine test, and detects any anomalies that might have occurred during the test. The output consists of the names of the sensors that show anomalous behavior, and the start and end time of each anomaly.

In order to reduce the involvement of domain experts significantly, several data-driven approaches have been proposed where models are automatically acquired from the data, thus bypassing the cost and effort of building system models. Many supervised learning methods can efficiently learn operational and fault models, given large amounts of both nominal and fault data. However, for domains such as RETS data, the amount of anomalous data that is actually available is relatively small, making most supervised learning methods rather ineffective, and in general met with limited success in anomaly detection.

The fundamental problem with existing approaches is that they assume that the data are iid, i.e., independent and identically distributed, which is violated in typical RETS data. None of these techniques naturally exploit the temporal information inherent in time series data from the sensor networks. There are correlations among the sensor readings, not only at the same time, but also across time. However, these approaches have not explicitly identified and exploited such correlations. Given these limitations of model-free methods, there has been renewed interest in model-based methods, specifically graphical methods that explicitly reason temporally. The Gaussian Mixture Model (GMM) in a Linear Dynamic System approach assumes that the multi-dimensional test data is a mixture of multi-variate Gaussians, and fits a given number of Gaussian clusters with the help of the well-known Expectation Maximization (EM) algorithm. The parameters thus learned are used for calculating the joint distribution of the observations. However, this GMM assumption is essentially an approximation and signals the potential viability of non-parametric density estimators. This is the key idea underlying the new approach.

Since this approach was model-based, it was possible to automatically learn a model of nominal behavior from tests that were marked nominal. Particle filtering and machine learning were applied to capture the model of nominal operations, and voting techniques were used in conjunction with particle filtering to detect anomalies in test runs. Experiments on test stand sensor data show successful detection of a known anomaly in the test data, while producing almost no false positives.

A novel combination of particle filtering, machine learning, and voting techniques was developed to detect anomalies in sensor network data. Although most of the subsystems are tightly integrated into the system, the following two subsystems can also be used as standalone for extraneous tasks. A novel, efficient (but approximate) correlation clustering method that is currently used for sensor selection was developed, but it can also be used to visualize sensor correlations as an aid to manual analysis. Sensors are detected that are overactive (large variance) or underactive (low variance) between commands, which effectively give a high-level map of the effect of commands on sensor groups. This may be used as an aid to visual/ manual analysis.

This work was done by Wanda Solano of Stennis Space Center, and Bikramjit Banerjee and Landon Kraemer of The University of Southern Mississippi. For more information, call the SSC Center Chief Technologist at 228- 688-1929. SSC-00379