Networking originated from the need to share information. Many of us accomplish such a thing on a daily basis through conversation. For example, think about the typical office framework: you work side-by-side with your colleagues, but also have a manager who will check on the work being produced periodically. You have both peer-to-peer and supervisory communication taking place.

Figure 1. The Open Systems Interconnect (OSI) TCP/IP stack is made up of 7 layers.
When it comes to Ethernet, different kinds of equipment are needed, yet the goal of communication stays the same. Keep it simple and, especially in terms of hardware and software, keep it inexpensive. One of the major factors that affects those goals is timeliness — responses should be received in a reasonable period of time after inquiry. Keeping the process predictable creates a deterministic system. Some early arrangements of deterministic networks took the form of the token ring and the token bus.

The token ring was established quite some time ago to allow a complex grid of many terminals to each have some allotted time to get work done. There is one token, and this token allows one terminal node to broadcast and receive. The token must be passed between the nodes, giving each its turn. Typically, there would be a token rotation and a token hold time. The rotation is the given order of terminal nodes to which the token is passed, while hold time is how long each node gets to do its requested job. In a more complex environment, where you have several media access units (MAUs) passing the token around their ring, there may very well be several terminal nodes connected to any one MAU. The terminal nodes will need to share its time with the MAU.

The token bus is very similar to the token ring, in that only one terminal node has the token at any point in time and every node gets the token at a predetermined time. The rotation order and hold time are usually preconfigured when the token bus is set up by the network manager. The network manager that sets these arrangements up is connected to the same token bus. Its primary function is to set each node’s token rotation and time sequence during network initialization, and to continuously monitor network traffic as a diagnosis tool. The IEEE 802.4 standard, also known as Manufacturing Automation Protocol (MAP), was a popular type of communications networking standard installed in many factories where deterministic traffic could be predicted and placed on a network.

Embedded Applications

Networking in the embedded space is used to replace legacy serial communication, connect subsystems (peer-to-peer), connect the subordinates to the supervisor, deliver captured information to storage, enable timely interrogation of stored information, and create seamless information boundaries between systems.

Rise of Ethernet

Ethernet arrived on the scene in 1980 and became fully standardized in 1985. It quickly became popular as a wonderful, low-cost standard. Why was it so inexpensive? It was used heavily with many terminals in the office environment, and the sheer number of these terminals drove the price down. It was based on multi-drop technology; running one long cable and allowing nodes to be appended fairly easily. Using the nondeterministic Carrier Sense Multi- Access/Collision Detect (CSMA/CD) protocol, performance varied between “well-behaved nodes” and “bandwidth hogs.” Well-behaved nodes knew enough to broadcast on the cable and then detach to allow others a chance to transmit. Bandwidth hogs would get a hold of a cable and stay on, preventing other broadcasts and reception. CSMA/CD protocol is the reason these two possibilities exist.

Ethernet evolved over time, using different cables at varying lengths and node counts, starting at 10Base5. When 10GBase-T was developed, it used a full duplex point-to-point mode of transmission between only two nodes. This mode is very high speed, with no interference or determinism issues. Likewise, with 40GBase-T, transmission is also full duplex point-to-point, but with the distances starting to shorten a bit. Our focus will be on the 10Gb Ethernet.

System Integration and Standardization

When looking at system integration goals, there are a variety of issues one can face, but the most important one is standardization. The Open Systems Interconnect (OSI) standard was designed so that multiple parties could participate, communicate, and share information by implementing a specific combination of hardware and software. The hardware is the physical connection to the medium, while the software has to execute and manage the software packet exchange. The ultimate objective is reliable connectivity to get the job done. The availability of the network, or performance, varies along with its speed. The CSMA/CD protocol proved that there are some performance issues. The question that needs asking is: “Is the performance sufficient to get my job done?” There are pros and cons to every structure.

The Open Systems Interconnect (OSI) TCP/IP stack is made up of 7 layers (see Figure 1). The lowest layer is the Physical Layer of fiber or copper — possibly wireless today. This interface is the means by which a node would communicate on the medium. The next layer is the Data Link (MAC) Layer. This is where information pertaining to the station address is used to link information to pass from one node to another. The third layer is the Network Layer, which works with multiple bridges and multiple cell networks. After that, the Transport Layer ensures that information is sent and delivered between a station address on one network to a station address on another network. Next, the Session Layer separates the environment for each particular application or user. Following is the Presentation Layer that ensures the information coming from the Session Layer is put into the proper format for the Application Layer to use. Lastly, the Application Layer is where the work is done; whether you are sending emails, controlling machinery, or collecting information.

Resource requirements on the OSI model depend on the goal each layer is trying to achieve. Layers 1 and 2 don’t require nearly as much as 3, 4, 5, or 6. A tremendous amount of logic needs to be executed in the upper layers, and that can chew up a lot of CPU time and memory depending on system architecture and bus speeds. The faster the computer and related data buses, the more seamless the information transfer will be; moving data is where the majority of resources is being used.

Implementation Scenarios

There are many implementation scenarios combining the hardware interface and intensive OSI TCP/IP software protocol. A logical first step is using the CPU to drive the OSI stack. Today, computers come in single- and multi-core architectures. In a single-core computer, the logic to execute the TCP/IP stack may be sitting in an executable part of that computer. Merely sitting there will occupy some memory resources, and actually consumes a lot of CPU resources to drive this stack. The faster the stack needs to execute, the less time is available to execute the application in that computer. When a multi-core CPU is used, one of the simplest things to do is to move that OSI TCP/IP stack software to a core all by itself. In this way, the dedicated core will execute the communications protocol on its own, leaving the additional cores to do any other work that needs to be done at that node.

Increasing the performance of an existing computer system architecture is another popular option. Increasing CPU memory is always a good start. Another popular option is to use is the OSI enhanced performance architecture.

Use of a silicon stack to offload work is another way of increasing performance. A silicon stack is basically an auxiliary CPU; its sole purpose is to process communications. Silicon stacks provide additional capabilities such as: IPV4/IPV6, iWARP RDNA, iSCSI, FCoE, TCP DDP, and full TCP offload. The elegance of the silicon stack is that the entire OSI TCP/IP Stack Plus More can be implemented without impacting application logic performance.

Silicon Stack Performance

Figure 2. The TCP offload engine can provide up to four extremely high-speed ports executing in parallel with the 1Gb Ethernet port(s) on the SBC.
The software stack is limited to roughly 40 MB per second, whereas the silicon stack can sustain 250 MBps on 1 GbE, and 2500 MBps on 10 GbE. The host CPU overhead for the silicon stack implementation is extremely low, as the silicon stack in essence is a parallel engine to the CPU; the host CPU overhead for the software stack, on the other hand, is comparatively high since the software stack competes for CPU resources at an increasing level as the communication speed increases. Latency is the time it takes for the transmission to start after all of the parameters for the transmission have been preconfigured; here too, it is obvious that the silicon stack exceeds in performance over the software stack as CPU resources are used only minimally for the silicon stack implementation. Determinism is the variation on the latency for sending and receiving transmission packets. Again, the silicon stack wins due to its limited CPU resource impact. As for reliability under load, the silicon stack experiences no noticeable change in performance while the software stack will be impacted as resources are shared with any executing applications.

TCP Offload Engine

How is a TCP offload engine (TOE) integrated into an embedded system? Many single-board computers (SBC) today have XMC sites that can be used to plug in an XMC form-factored TCP offload engine that can potentially support up to four 10Gb Ethernet ports. The SBC will most likely have one or more 1Gb Ethernet ports as well, and those will be driven by the software stack executing on the SBC itself. In the context of the overall system, the TCP offload engine can provide up to four extremely high-speed ports executing in parallel with the 1Gb Ethernet port(s) on the SBC itself (see Figure 2).

Looking at this in block diagram form (see Figure 3), one can see the single-board computer (SBC) and the bus on the SBC. The SBC can process packet information to do all kinds of wonderful functions. Then there is the TOE that is connected by an XMC connector back to the SBC. The TOE card can be used as a switch, and information can be routed from one network to another. It could be routed from one 10Gb port though the SBC for some packet work and modification there, and then shipped off across the bus. It could possibly be used with information coming in for processing and then going out on the 1Gb Ethernet port. The capability provided by adding a TOE is being able to access 10Gb Ethernet ports on an SBC and not impact the processing power of the SBC.

Special FPGA Packet Processing

Figure 3. The capability provided by adding a TOE is being able to access 10Gb Ethernet ports on an SBC without impacting the processing power of the SBC.
A process may require special information packeting or manipulating special packets. Using an FPGA in one of the XMC sites, this work could be offloaded so that not all of the processing is done on the SBC.

For instance, images are being captured and coming in on Gigabit Ethernet. The user may have to take two images and overlay them. This overlay could be done right within the FPGA, then sent back to the CPU if there is any additional processing, and then back across the bus to some other location.

Heightened Expectations

The biggest trend today is sensors. There are more options and better-quality sensors for use in almost every embedded application. They are being designed to provide better situational awareness. To obtain a better understanding of what is going on in the world, it is necessary to actually gather and analyze more data. More data will spill over into increased storage requirements, and all of the systems architecture issues that go along with it, whether you are processing that information or moving it from place to place. No one wants to hear excuses that the data cannot be processed. Expectations have been amplified in terms of better results, sophisticated algorithms, and faster CPUs.

The data transport infrastructure in any system is most important. It needs to assist with data collection, shifting that data to analytical engines in a timely fashion, and getting the data to storage devices. This is often where you find many of the Gigabit and 10Gb Ethernet networking solutions making that possible. Even so, networking expectations are very high when discussing time limits for the delivery of information and the deterministic nature when it comes to the delivery of that information. Technologies are continuing to move forward and engineers concentrate on using the latest technologies. Communications technology needs to meet the requirements of today, as well making it possible to keep ahead of new specification requirements as the future may dictate.

This article was contributed by Acromag, Wixom, MI. For more information, Click Here .