Generic, object-oriented fault models, built according to causal-directed graph theory, have been integrated into an overall software architecture dedicated to monitoring and predicting the health of mission- critical systems. Processing over the generic fault models is triggered by event detection logic that is defined according to the specific functional requirements of the system and its components. Once triggered, the fault models provide an automated way for performing both upstream root cause analysis (RCA), and for predicting downstream effects or impact analysis. The methodology has been applied to integrated system health management (ISHM) implementations at NASA SSC’s Rocket Engine Test Stands (RETS).
Previous SSC ISHM systems have focused on integrating distributed smart sensor data into a centralized object model, and on providing high-level, rulebased reasoning for RETS health monitoring. The SSC ISHM did not include advanced health monitoring techniques such as correlation of events at the system level, automated fault diagnosis, failure prediction, root cause analysis, or predictive analysis. The key functional enhancement targeted for ISHM by this project has been the development of an automated, generic, fault-tree-based RCA module designed to enable these additional capabilities. By choosing a generic, modelbased diagnostic methodology, a more complete assessment/evaluation of system health is empowered, while advanced techniques for isolating root causes and predicting the onset of failure are enabled. The objective was to create a library of reusable fault models and correlation logic for use across multiple programs.
The domain-specific insight necessary to perform the design and implementation tasks at SSC has been acquired through scheduled discussions with RETS test engineers and scientists. Where possible, validation of these enhancements took place using real-time operational data, as well as of historical data. A discrete number of generic failure modes can typically be identified for many of the components within an ISHM system model. Failure modes are distinct mechanisms by which the components can fail. From these failure modes, it is possible to construct a fault model — a directed graph that depicts the causal relationships between the component failure modes and any of the observable (or measureable) downstream effects. The nodes in the fault model represent these measureable effects, and the directed connections between the nodes characterize both their causal relationships as well as any appropriate constraints that might apply. Within an ISHM system, such generic fault models can be traversed by software to determine the causes of abnormal system behavior. The models can also be traversed for predicting the downstream impacts. While traversing all applicable fault models upon receipt of detected events, ISHM software can also perform the necessary tests to diagnose and isolate the root causes of problems, ruling out other possible explanations that are not substantiated by event data.
This work was done by Jorge F. Figueroa of Stennis Space Center; Mark G. Walker and Ravi Kapadia of General Atomics; and Jonathan Morris of Jacobs Technology. Inquiries concerning rights for the commercial use of this invention should be addressed to: Mark G. Walker SSC-00319
3550 General Atomics Court
San Diego, CA 92121
Mark G. Walker