Formal Linear Algebra Recovery Environment is a computer program for high-performance, fault-tolerant matrix multiplication. The program is based on an extension of the prior theory and practice of fault-tolerant matrix·matrix multiplication of the form C = AB. This extension provides low-overhead methods for detecting errors, not only in C, but also in A and/or B. These methods enable the detection of all errors as long as, in a given case, only one entry in A, B, or C is corrupted. The program also provides for following a low-overhead roll-back approach to correct errors once detected. Results of computational experiments have demonstrated that the methods implemented in this program work well in practice while imposing an acceptably low level of overhead, relative to high-performance matrix-multiplication methods that do not afford fault tolerance.
This program was written by Daniel Katz, Edwin Tisdale, Enrique Quintana-Ortí, John Gunnels, and Robert van de Geijn of Caltech for NASA's Jet Propulsion Laboratory.
This software is available for commercial licensing. Please contact Don Hart of the California Institute of Technology at (818) 393-3425. Refer to NPO-30395.
This Brief includes a Technical Support Package (TSP).

Software for Fault-Tolerant Matrix Multiplication
(reference NPO30395) is currently available for download from the TSP library.
Don't have an account?
Overview
The document discusses the "Formal Linear Algebra Recovery Environment," a software program developed collaboratively by NASA's Jet Propulsion Laboratory (JPL) and the University of Texas. This program focuses on high-performance, fault-tolerant matrix multiplication, specifically the operation C = AB, where A and B are input matrices, and C is the resulting matrix.
The software introduces several key advancements in the field of algorithmic fault tolerance. It provides low-overhead methods for detecting errors not only in the output matrix C but also in the input matrices A and B. The theoretical framework ensures that all errors can be detected as long as only one entry in any of the matrices is corrupted. This capability is particularly crucial for applications in environments susceptible to errors, such as outer space, where Single Event Upsets (SEUs) caused by environmental radiation can occur.
In addition to error detection, the software incorporates a low-overhead roll-back approach for correcting detected errors, enhancing its reliability. Empirical results from computational experiments demonstrate that the implemented methods perform effectively in practice, imposing an acceptable level of overhead compared to traditional high-performance matrix multiplication methods that do not include fault tolerance.
The software is characterized as a portable, standard, high-performance numerical library, making it suitable for various applications. It was developed by a team of researchers, including John A. Gunnels, Daniel S. Katz, Enrique S. Quintana-Ortí, Edwin R. Tisdale, and Robert A. van de Geijn, and is available for commercial licensing.
The document also includes a notice regarding the sponsorship of the work by NASA and clarifies that references to specific commercial products do not imply endorsement by the U.S. Government or JPL. The software's development is part of ongoing efforts to enhance computational reliability in critical applications, particularly in aerospace and other high-stakes environments.
Overall, the "Formal Linear Algebra Recovery Environment" represents a significant advancement in fault-tolerant computing, providing robust solutions for error detection and correction in matrix operations, which are fundamental to many scientific and engineering applications.

