The FAILSAFE project is developing concepts and prototype implementations for software health management in mission-critical, real-time embedded systems. The project unites features of the industry-standard ARINC 653 Avionics Application Software Standard Interface and JPL’s Mission Data System (MDS) technology (see figure). The ARINC 653 standard establishes requirements for the services provided by partitioned, real-time operating systems. The MDS technology provides a state analysis method, canonical architecture, and software framework that facilitates the design and implementation of software-intensive complex systems. The MDS technology has been used to provide the health management function for an ARINC 653 application implementation. In particular, the focus is on showing how this combination enables reasoning about, and recovering from, application software problems.

The FAILSAFE model-based health management concept is depicted in the block diagram.
The application itself consists of two unique applications running in the ARINC 653 system: a target application and the FAILSAFE model-based health monitoring application. The target application is a high-level simulation of the Shuttle Abort Control System (ACS), developed specifically for this task. The target application is a two-partition application with one partition allocated to the sequencing behavior, and one partition allocated to the application I/O. The health monitor application executes in its own partition. The three application partitions communicate via ARINC 653 ports and message queues, which are specified in the system module.xml configuration file. Real-time system data is provided to the health monitor via the use of ARINC 653 sampling ports that allows the health monitor application to intercept any traffic coming across the ports of interest.

This task was turned into a goal-based function that, when working in concert with the software health manager, aims to work around software and hardware problems in order to maximize abort performance results. In order to make it a compelling demonstration for current aerospace initiatives, the prototype has been additionally imposed on a number of requirements derived from NASA’s Constellation Program.

Lastly, the ARINC 653 standard imposes a number of requirements on the system integrator for developing the requisite error handler process. Under ARINC 653, the health monitoring (HM) service is invoked by an application calling the application error service, or by the operating system or hardware detecting a fault. It is these HM and error process details that are implemented with the MDS technology, showing how a static-analytic approach is appropriate for identifying fault determination details, and showing how the framework supports acting upon state estimation and control features in order to achieve safety-related goals.

This work was done by Gregory A. Horvath, David A. Wagner, and Hui Ying Wen of Caltech and Matthew Barry of Kestrel Technology for NASA’s Jet Propulsion Laboratory. For more information, contact This email address is being protected from spambots. You need JavaScript enabled to view it..

This software is available for commercial licensing. Please contact Daniel Broderick of the California Institute of Technology at This email address is being protected from spambots. You need JavaScript enabled to view it.. Refer to NPO-46981.