Algorithm-Based Fault Tolerance Integrated With Replication

In a proposed approach to programming and utilization of commercial off-the-shelf computing equipment, a combination of algorithm-based fault tolerance (ABFT) and replication would be utilized to obtain high degrees of fault tolerance without incurring excessive costs. The basic idea of the proposed approach is to integrate ABFT with replication such that the algorithmic portions of computations would be protected by ABFT, and the logical portions by replication.

ABFT is an extremely efficient, inexpensive, high-coverage technique for detecting and mitigating faults in computer systems used for algorithmic computations, but does not protect against errors in logical operations surrounding algorithms. Replication is a generally applicable, high-coverage technique for protecting general computations from faults, but is inefficient and costly because it requires additional computation time or additional computational circuitry (and, hence, additional mass and power). The goal of the proposed integration of ABFT with replication is to optimize the fault-tolerance aspect of the design of a computing system by using the less- efficient, more-expensive technique to protect only those computations that cannot be protected by the more-efficient, less-expensive technique. It would not be necessary to address the fault-tolerance issue explicitly in writing an application program to be executed in such a system. Instead, ABFT and replication would be managed by middleware containing hooks.

This work was done by Raphael Some and David Rennels of Caltech for NASA’s Jet Propulsion Laboratory.

The software used in this innovation is available for commercial licensing. Please contact Karina Edmonds of the California Institute of Technology at (626) 395-2322. Refer to NPO-43842.

White Papers

Cultural audits: What are they and why are they essential?
Sponsored by B Braun
Finding the Right Manufacturer for Your Design
Sponsored by Sunstone Circuits
Managing Risk in Medical Connectors
Sponsored by Fischer Connectors
Working With Mechanical Motion Subsystems
Sponsored by Bell Everman
Overcome Challenges of Your Highly Constrained PCB Designs
Sponsored by Mentor Graphics
Eliminate Risk of Contention and Data Corruption in RS-485 Communications
Sponsored by Sealevel

White Papers Sponsored By: