The complexity of military and aerospace systems is growing — more components, interfaces, power, bandwidth, processing, features, and data — and these systems are being networked to form even more complex "systems of systems." Modern networkcentric systems can contain hundreds, even thousands of electronic modules.

The reliability, or Mean Time Between Failure (MTBF), of an electronic system is inversely related to the number of components in the system. Each component has a statistical failure rate, and the summation of all component failure rates determines the system failure rate. Large complex systems will typically have some, and possibly many, component failures over time. System engineers should embrace a new mindset where frequent component failures in large networked systems is the "normal" operating condition, rather than the "fault" condition.

Figure 1. Simple example of a dual-star xTCA architecture platform.
Furthermore, a large system usually dissipates more heat and power. Module reliability is also inversely related to operating temperature. Simply put, modules operating at higher temperatures tend to fail more often. A rule of thumb is that the failure rate of an electronic module approximately doubles for every 10 to 20°C rise in temperature. There fore cooling is critical to system reliability. Other factors that contribute to system reliability include ESD protection, redundancy, fault localization, and fault isolation.

High Availability

Availability is typically defined as the ratio MTBF/(MTBF + MTTR) where MTTR is the Mean Time to Repair. In simpler terms, availability is the percentage of time that a system is available for normal operation, or conversely the percentage of time that the system is not broken or under repair. For example, a system that is down for one day of the year is 99.7% available, assuming 24x7 operation. A system that is down for only one hour of the year is 99.98% available.