Model-Based Fault Diagnosis: Performing Root Cause and Impact Analyses in Real Time
- Created: Wednesday, 01 February 2012
The methodology and its required interfaces have been implemented to become a commercial product for integrated systems health management.
Generic, object-oriented fault models, built according to causal-directed graph theory, have been integrated into an overall software architecture dedicated to monitoring and predicting the health of mission- critical systems. Processing over the generic fault models is triggered by event detection logic that is defined according to the specific functional requirements of the system and its components. Once triggered, the fault models provide an automated way for performing both upstream root cause analysis (RCA), and for predicting downstream effects or impact analysis. The methodology has been applied to integrated system health management (ISHM) implementations at NASA SSC’s Rocket Engine Test Stands (RETS).
Previous SSC ISHM systems have
focused on integrating distributed smart
sensor data into a centralized object
model, and on providing high-level, rulebased
reasoning for RETS health monitoring.
The SSC ISHM did not include
advanced health monitoring techniques
such as correlation of events at the system
level, automated fault diagnosis, failure
prediction, root cause analysis, or predictive
analysis. The key functional enhancement
targeted for ISHM by this project has
been the development of an automated,
generic, fault-tree-based RCA module
designed to enable these additional capabilities.
By choosing a generic, modelbased
diagnostic methodology, a more
complete assessment/evaluation of system
health is empowered, while advanced techniques
for isolating root causes and predicting
the onset of failure are enabled.
The objective was to create a library of
reusable fault models and correlation logic
for use across multiple programs.
The domain-specific insight necessary to perform the design and implementation tasks at SSC has been acquired through scheduled discussions with RETS test engineers and scientists. Where possible, validation of these enhancements took place using real-time operational data, as well as of historical data. A discrete number of generic failure modes can typically be identified for many of the components within an ISHM system model. Failure modes are distinct mechanisms by which the components can fail. From these failure modes, it is possible to construct a fault model — a directed graph that depicts the causal relationships between the component failure modes and any of the observable (or measureable) downstream effects. The nodes in the fault model represent these measureable effects, and the directed connections between the nodes characterize both their causal relationships as well as any appropriate constraints that might apply. Within an ISHM system, such generic fault models can be traversed by software to determine the causes of abnormal system behavior. The models can also be traversed for predicting the downstream impacts. While traversing all applicable fault models upon receipt of detected events, ISHM software can also perform the necessary tests to diagnose and isolate the root causes of problems, ruling out other possible explanations that are not substantiated by event data.
This work was done by Jorge F. Figueroa of Stennis Space Center; Mark G. Walker and Ravi Kapadia of General Atomics; and Jonathan Morris of Jacobs Technology. Inquiries concerning rights for the commercial use of this invention should be addressed to:
Mark G. Walker
3550 General Atomics Court
San Diego, CA 92121