Ethernet is currently the incumbent backplane technology across a wide range of storage, wireless, wireline, military, industrial, and other embedded applications as developers move away from proprietary implementations in an effort to reduce development time and cost while increasing performance and functionality. However, as data rates increase, it has become apparent that many high-performance applications exceed the limits of this traditional protocol. Designing an efficient embedded backplane interconnect with excellent performance requires addressing a number of key design challenges, including header efficiency, protocol processing efficiency, effective bandwidth, and quality of service while strictly managing cost. To meet these challenges, many developers are turning to RapidIO® technology as an alternative to Ethernet.

Many of the differences between Ethernet and RapidIO technologies stem from their initial design constraints. Ethernet was designed to connect a large number of endpoints with a flexible and extensible architecture, leading to the choice of a simple header and support for a single transaction type. As Ethernet was originally intended to connect computer workstations, hardware is only required to identify packet boundaries, necessitating a relatively large software stack to manage protocol processing. While this serves well in LAN and WAN applications because of the presence of powerful processors, this hardware/software trade-off imposes a formidable performance bottleneck in high-speed embedded applications.

Figure 1. Many embedded Ethernet applications use TCP/IP to handle packet loss because of off-the-shelf software support. While higher-layer protocols simplify application development, they also add substantial overhead — 40 bytes in the case of TCP/IP — reducing overall bandwidth efficiency.

RapidIO technology was originally conceived as a next-generation front-side bus for high-speed embedded processors. The value of a front-side bus that could also function as a system-level interconnect was recognized early in the specification's development. As a consequence, RapidIO technology was designed with a focus on embedded in-the-box and chassis control plane applications, emphasizing reliability with minimal latency, limited software impact, protocol extensibility, and simplified switches while achieving effective data rates from 667 Mbps to 30 Gbps. Protocol processing takes place in hardware and supports read/write operations, messaging, data streaming, QoS, data plane extensions, and protocol encapsulation, to name a few of its capabilities.

Header Efficiency

Ethernet's extreme flexibility is one of the main sources of its inefficiency. Use of a simple, generalized header enables Ethernet to add higher-level services as new protocol layers but even basic services require additional header fields. Ethernet's firm requirement for backwards compatibility also introduces inefficiencies such as maintaining preamble and IFG fields required for the original half-duplex shared coax PHY, but for which more recently defined PHYs have less need. As there is no opportunity to optimize the overall header; the presence of duplication across multiple headers increases parsing complexity and latency. While processing of Ethernet headers can be optimized as custom protocol stacks on top of Ethernet Layer 2, the cost of supporting custom stacks across multiple vendors or even multiple generations of the same vendor hardware can become prohibitive. As a consequence, many systems take a performance hit to utilize UDP or TCP/IP rather than communicate directly through the more efficient lower layers.

Figure 2. While Ethernet is a fairly flexible standard supporting many optional layer 3+ protocols, lack of a single, uniform implementation results in a diversity of actual implementations and, as a consequence, increased Ethernet stack complexity. Unique partitioning of processing between hardware and software ends up tying developers to vendor-specific implementations.

RapidIO technology was designed with optimization of the header in mind. Embedded backplanes need to support significantly fewer endpoints than LANs, so a smaller address field can be utilized (i.e., only one or two bytes compared to Ethernet's MAC address of six bytes plus four bytes when IP is used). The specification provides header support for common services such as read and write transaction types that require additional header layers in Ethernet. Redundant fields are removed, duplicate addressing schemes avoided, and fields compressed where possible.

Protocol Efficiency

Ethernet offers only best-effort service unless higher-layer protocols are employed to handle packet loss. Many embedded Ethernet applications use IP with TCP to handle packet loss because software support is widely available and understood. TCP/IP, while simplifying application development, adds significantly to the overall Ethernet header, introducing 40 bytes of overhead (see Figure 1). UDP can also be used when this overhead is too high but then reliable delivery, if required, must be implemented in a proprietary fashion.

Overall, Ethernet is a fairly flexible standard supporting many optional layer 3+ protocols developed by different standards organizations for a broad range of applications. However, because no single, unified specification is uniformly implemented, these have resulted in a diversity of actual implementations and, as a consequence, increased Ethernet stack complexity. Even necessary technology such as TCP/IP offload engines (TOE) is plagued with many proprietary implementations, each with its own unique partitioning of processing between hardware and software and vendor-specific Ethernet stack that ties down developers (see Figure 2). There isn't even a standard SERDES PHY yet for 1 or 10 Gigabit Ethernet backplanes.

The RapidIO specification achieves many of its advantages because it offers a single, uniform protocol with consistent layering managed by a single standards organization. RapidIO technology also guarantees delivery by providing end-to-end error checking, retrying link errors, not allowing switches to drop packets, and supporting virtual channels. Since RapidIO technology directly implements the protocol in hardware, headers can be processed in a straightforward and less processor-intensive manner than equivalent Ethernet implementations that utilize partial hardware offload and custom stacks.

Effective Bandwidth

Ethernet supports a payload size from 64 to 1500 bytes (up to over 9000 with jumbo packets) and its efficiency is best with a maximum payload, although this comes at the cost of increased latency jitter. RapidIO technology transports 1 to 256 bytes, balancing large payload jitter against small payload inefficiency.

For control plane applications that cannot tolerate packet loss, an Ethernet fabric must be significantly over-provisioned to avoid packet loss and limit associated latency and jitter. Given 25- 35% usage for many applications, this translates to a sustainable effective throughput for layer 2 traffic of ≈250 Mbps for 1 GE and 2.5 Gbps for 10 GE, depending on average packet size. Note that performance is defined not by PHY symbol rate, but rather the effective rate in which a protocol reliably transports data. Additionally, even with over-provisioning, end-to-end latency can still run in milliseconds since traffic must traverse multiple software stacks.

By implementing protocol processing in hardware, RapidIO technology greatly reduces effective latency in comparison to Ethernet and can deliver substantially higher fabric utilization in complex topologies. For control plane applications, link-level error correction minimizes latency jitter caused by soft errors, potentially reducing end-to-end latency below 500 ns.

Throughput is also affected by overhead for operations such as reading, writing, and messaging. Ethernet RDMA provides read and write operations, but as a layer 4 protocol, its high overhead is not well-suited for small control-oriented load/store operations. TCP/IP services resemble RapidIO messaging, but where RapidIO messaging supports convenient 4Kbyte messages and is often fully implemented in hardware, TCP/IP supports 64Kbyte messages that are much more dependent upon software for processing. Additionally, the RapidIO standard defines a protocol for keeping caches coherent across the interconnect, a feat ineffective to implement in Ethernet due to low header efficiency, high latency, and inconsistent levels of hardware support.

Extensive packet handling by Ethernet switches also increases overall complexity and processing load. When IP packets are routed, numerous fields must be updated and the FCS recalculated. RapidIO switches typically only update the AckID field, which is not covered by the CRC and so does not force a recalculation.

Quality of Service

Quality of Service (QoS) is essential for many backplane applications. While Ethernet through TCP/IP can support millions of individual streams and differentiate traffic by port number and protocol fields, no universally used class of service (CoS) field exists. The RapidIO standard defines up to six flows that can be considered prioritized classes of service. Through the use of Type 9 encapsulation and virtual channels, millions of streams can be differentiated as well.

QoS is also affected by Ethernet's best-effort service, which commonly manages congestion by dropping packets, leading to latency jitter. Since flow control belongs to upper-layer protocols such as TCP, congestion cannot be promptly managed to prevent packet loss. This lack of short-term, link-level flow control requires larger endpoint receive buffers to avoid overruns. Further exacerbating latency is the fact that error detection and recovery occurs at the system rather than link level. Thus, timeouts exist only at layer 3 and above and are managed by offload hardware in the best case or software in the worst case, resulting in much longer timeouts and significantly increased latency jitter. Additionally, no standards exist for hardware-based recovery such as retries or timeouts, and so Ethernet drops packets for a significant period of time before failure is detected. While there are protocols such as bidirectional forwarding detection that exchange packets to detect failures, these continuously impose bandwidth overhead dependent on the responsiveness required.

Per its spec, all RapidIO networks must provide a minimum level of prioritized service to implement logical layer ordering and deadlock avoidance. This also improves average latency because packets marked with a higher relative priority must make forward progress since they might be responses. Optionally, switches can reorder packets from different flows and offer head-of-line blocking avoidance as well as other QoS features. With multiple flow control mechanisms, RapidIO networks allow congestion to be managed before there is a significant impact on network performance.

The RapidIO standard also defines a link-layer protocol for error recovery and various hardware-based link-to-link and end-to-end timeout mechanisms, enabling virtually all errors to recover at the link level without software intervention, substantially lowering latency jitter. Also, because RapidIO technology links carry valid traffic at all times, a broken link is promptly detected locally at the link level. As a result, failure rates, defined as undetected packet or control symbol errors, are significantly less than the hard failure rate of the devices on either end of the link, depending upon operating conditions.

Cost

From a silicon standpoint, it might seem that the often touted high-volume cost economies of Ethernet would give Ethernet a significant advantage over RapidIO. While this might be true for 4- 8 port GE switches used in LANs, Ethernet switches for use in many embedded applications require more specialized functionality such as VLAN QoS and SERDES PHYs, significantly reducing the number of accommodating vendors, overall shipping volume and, therefore, cost economies. Additionally, RapidIO technology in general assumes a maximum backplane or board-level channel of 100 cm using copper traces on FR4-like materials. Ethernet PHYs for twisted-copper pair must support channels 100 times longer than RapidIO technology's and, since Ethernet cabling assumes bundling with many other similar pairs, it must tolerate significantly more crosstalk. Together, these result in significantly higher PHY complexity than is actually required for backplane applications. As a result, some switches provide a non-standard SERDES PHY for these applications.

Computing true PHY cost must also be done carefully. For example, one commercially available RapidIO endpoint supporting messaging was only 25% larger than a straightforward Gigabit Ethernet controller without full TOE capabilities. Likewise, a four-lane RapidIO SERDES is about 50% larger than a single XAUI lane. This suggests that the silicon complexity required for endpoints is comparable.

For its part, RapidIO technology offers 2.5 times more effective bandwidth per link than GE. Yet, the cost per port of a 16-port RapidIO switch is competitive with, or better, than similar GE switches. For applications requiring more than 1 Gbps, the only alternative for Ethernet is 10 Gbps. Today, RapidIO technology offers higher effective bandwidth for payloads less than 1024 bytes at lower cost, even without taking into consideration the cost imposed on Ethernet endpoints to process protocol stacks at 10 Gbps. Such processing also has a significant impact on power consumption as a GHz-class processor to terminate each GE link increases power by watts.

Virtually any application layer service can be supported by either Ethernet or RapidIO. The difference between the two technologies, however, resides in their individual inefficiencies and the level of hardware processing supported. Ethernet has a long history in the LAN which, because of backwards compatibility, header and protocol inefficiencies, software dependence, complex and proprietary offloading mechanisms, and lack of implementation standards, makes it a less than ideal choice for backplane applications. The RapidIO standard was specifically designed to provide optimized performance for embedded applications.

This article was written by Greg Shippen, System Architect, Freescale Semiconductor's Digital Systems Division (Austin, TX). For more information, contact Mr. Shippen at This email address is being protected from spambots. You need JavaScript enabled to view it., or click here .


Embedded Technology Magazine

This article first appeared in the September, 2008 issue of Embedded Technology Magazine.

Read more articles from this issue here.

Read more articles from the archives here.