Error-detecting counters have been proposed as parts of fault-tolerant finite state machines that could be implemented in field-programmable gate arrays (FPGAs) and application-specific integrated circuits that perform sequential logic functions. The use of error-detecting counters would complement the fault-tolerant coding schemes described in "Fault-Tolerant Coding for State Machines" (NPO-41050), in this issue on page 55. Counters are often used in state machines in cases in which it is necessary to represent large numbers of states and/or to count clock cycles between certain states. To ensure reliability, it is necessary to ensure that the counters are as free of faults as are the other parts of the state machines.
A primary source of an erroneous count is a single-event upset (SEU — a radiation-induced change in the bit in one flip-flop) in a counter circuit. A counter according to the proposal would be able to detect SEUs, provided that (1) only one SEU occurs during a given count and (2) one SEU induces only one change in a count. The one-SEU condition can be satisfied with high probability if the level of radiation is low enough, and the probability that one SEU would induce multiple count changes is very small.
In one conventional approach, counter circuitry is duplicated to enable detection of an error. That approach entails the use of about twice as much logic circuitry as that of the simple counter. The approach now proposed entails less circuitry.
The basic idea of the proposal is to monitor the count for monotonicity; that is, to determine whether the main counter has counted strictly upward in binary arithmetic in increments of one least-significant bit. Instead of duplicate or triplicate counters, there would be (1) a main counter, (2) a small auxiliary counter, and (3) circuitry that would examine bits in the main and auxiliary counts to determine whether monotonicity has been violated. The verification of monotonicity could involve encoding of counts by several schemes (including, possibly, the coding schemes of the cited prior NASA Tech Briefs article) that are characterized by various degrees of complexity and afford various degrees of fault tolerance. What all these schemes have in common is that (1) detection of an inconsistency between the main and auxiliary counts would be deemed to signify a violation of monotonicity in one of the counts and (2) assuming that at most only one SEU occurred, the inconsistency would remain until completion of the counts.
Assuming that no more than one SEU could occur during a count, it would not be necessary to verify monotonicity on each clock cycle; instead, it would suffice to verify monotonicity only when the count reached critical values (these could be values at which outputs are affected). If the count has not reached a critical value legitimately (if it has skipped or repeated values), then the circuitry for detecting violation of monotonicity would flag the value reached as being in error. Then other parts of the state machine would take the corrective action appropriate for the error.
This work was done by Gary Burke of Caltech for NASA's Jet Propulsion Laboratory.