2008

Algorithm-Based Fault Tolerance Integrated With Replication

In a proposed approach to programming and utilization of commercial off-the-shelf computing equipment, a combination of algorithm-based fault tolerance (ABFT) and replication would be utilized to obtain high degrees of fault tolerance without incurring excessive costs. The basic idea of the proposed approach is to integrate ABFT with replication such that the algorithmic portions of computations would be protected by ABFT, and the logical portions by replication.

ABFT is an extremely efficient, inexpensive, high-coverage technique for detecting and mitigating faults in computer systems used for algorithmic computations, but does not protect against errors in logical operations surrounding algorithms. Replication is a generally applicable, high-coverage technique for protecting general computations from faults, but is inefficient and costly because it requires additional computation time or additional computational circuitry (and, hence, additional mass and power). The goal of the proposed integration of ABFT with replication is to optimize the fault-tolerance aspect of the design of a computing system by using the less- efficient, more-expensive technique to protect only those computations that cannot be protected by the more-efficient, less-expensive technique. It would not be necessary to address the fault-tolerance issue explicitly in writing an application program to be executed in such a system. Instead, ABFT and replication would be managed by middleware containing hooks.

This work was done by Raphael Some and David Rennels of Caltech for NASA’s Jet Propulsion Laboratory.

The software used in this innovation is available for commercial licensing. Please contact Karina Edmonds of the California Institute of Technology at (626) 395-2322. Refer to NPO-43842.

White Papers

The Final Step In Prototyping: Enhancing Your Metal Parts For Accelerated Speed To Market
Sponsored by Able Electropolishing
How To Guide for the Most Common Measurements
Sponsored by National Instruments
Reverse Engineering
Sponsored by Servometer
PICO xMOD Data Sheet
Sponsored by Nordson EFD
OEM Optical System Development
Sponsored by Ocean Optics
Building a Test System for High-Speed Data Streaming Applications
Sponsored by ADLINK Technology

White Papers Sponsored By: