With each new generation of FPGA devices, Xilinx continues to push the performance envelope to match the ever-increasing requirements of target applications. The recent announcement of the Virtex-6 is no exception. More processing power, lower power consumption and updated interface features to match the latest technology I/O requirements are all part of the new devices. While it might be easy to assume that faster, bigger, more powerful is better, it’s important to understand how the latest FPGA innovations actually deliver this higher performance to best match the device to the specific requirements of the application.
Logic Cells, Slices and Blocks
Virtex FPGAs follow a naming convention that includes the size of the device in the name. Specifically, the approximate number of logic cells contained in the part is included in the part number. For example a Virtex-6 LX240T device contains approximately 240,000 logic cells, while a Virtex-5 SX95T contains approximately 95,000 logic cells. Sounds simple, and it is, but just comparing the amount of logic cells can be misleading.
Logic cells consist of combinational logic that creates a lookup table, which implements functions such as AND, OR, NAND, and addition. Flip flops and the connections to the adjacent cells are also implemented in the logic cell. Multiple logic cells are grouped together to create a single unit, called a slice.
As the architecture of the Virtex has evolved, the number of logic cells in a slice has changed: a Virtex-4 slice consists of approximately two logic cells, Virtex-5 and Virtex-6 slices consist of approximately six logic cells.
The next step up on the architectural hierarchy is the CLB (Configurable Logic Block). Here again, the development of more powerful CLBs has changed the relationship between slices and CLBs: a Virtex-4 CLB consists of four slices and Virtex-5 and Virtex-6 CLBs consist of two slices. As a result, Virtex-4 CLBs require eight logic cells and Virtex-5 or Virtex-6 require 12 logic cells. Figure 1 compares these parameters in the three Xilinx generations.
So why did Xilinx design FPGA logic in this hierarchical organization instead of just creating a flat plane of interconnected logic cells? The multilevel design of modern FPGA devices creates a balance between interconnect speed and interconnect flexibility.
The fastest connections exist between logic cells. Connections between slices are slower and connections between CLBs are even slower. Going in the other direction, connections between CLBs are the most flexible and general purpose, slice connections are a bit less flexible, and connections between logic cells are more limited.
With each new generation of FPGAs comes higher component density in the form of more logic cells. Figure 2 graphs the logic cell densities of various devices from the last three Virtex generations. For each generation, Xilinx offers a range of different density devices within a single package type. To focus the scope of this comparison, all of the devices compared are available in the same 35mm x 35mm BGA (Ball-Grid Array) package.
Since Virtex-4 CLBs comprise eight logic cells, while Virtex-5 and Virtex-6 CLBs comprise 12, the increase in logic cells between Virtex-4 and Virtex-5 actually translates to a decrease in CLBs because each CLB in the Virtex-5 requires more logic cells. While the Virtex-5 CLBs are more powerful than their Virtex-4 counterparts, there are still fewer of them to use so the overall performance increase is somewhat reduced. What is clear from this graph is that the Virtex-6 represents a significant increase in density from the Virtex-5 family.
Geometries, Speed and Power
So how are more logic cells packed into the same size package with each new generation? As you might expect, by shrinking the physical size of the logic. IC geometries are measured in nm (nanometers). The progression from Virtex-4 through Virtex-6 has been from 90nm to 65nm to 40nm. An additional benefit of shrinking transistors is an increase in switching rates, which translates to faster clock speeds. Virtex4 runs at 500 MHz, Virtex-5 runs at 550 MHz and Virtex-6 achieves a 600 MHz clock rate.
Unfortunately, whether it’s lunch or shrinking transistors, nothing comes for free. Leakage current tends to increase exponentially as the transistors shrink, increasing the static power, even when the transistors aren’t switching. To compensate, Xilinx has introduced a series of power saving design techniques. Depending on the mode the FPGA is operating in, a power savings of between 20% and 40% can be achieved on the Virtex-6 when compared to comparable Virtex-4 devices. Again, as densities increase and more logic cells are packed in the same size device, these power savings become imperative.
DSPs and Memory
In addition to CLBs, Virtex FPGAs contain DSP slices. These are dedicated multipliers, multiply-accumulator, or multiply-adder blocks. The DSP slices are responsible for the majority of the processing horsepower of FPGAs. Like the CLBs, the DSPs benefit from a compound performance increase with each new generation: improvements in the actual DSP architecture; increases in operational speed from 500 MHz to 550 MHz to 600 MHz with the latest generation; and increasing density allowing more DSP slices to be included in the same size package. While the largest Virtex-4 device includes 512 DSP slices, the Virtex-6 tops out at an impressive 2016.
New to the improved Virtex-6 DSP slice (DSP48E1) is a 25-bit pre-adder positioned before the more traditional multiply-accumulator (MAC) stage. The pre-adder is ideal for implementing functions like filters, which are
ubiquitous in radar and communications systems. Previous FPGA families required building the filters in CLBs, which operated slower and consumed logic that might be best used for other functions.
All Virtex FPGAs include integrated memory blocks (Block RAM) for implementing anything from random access storage to dual-port architectures, to FIFOs depending on the application. For the 35mm x 35mm package we’ve been comparing, Block RAM has increased from a maximum of approximately 7 megabits to 8 megabits between the Virtex-4 and Virtex-5; it then took a sizable leap to a maximum of 38 megabits for the Virtex-6.
Connecting It All Together
Through the last few generations of Virtex devices, BGA ball pitch has remained the same at 1mm, which means there is 1mm spacing between the BGA balls. In a 35mm x 35mm device, this turns out to be a grid of between 1136 and 1156 balls, depending on the device. Because of this, I/O density hasn’t really seen an increase, but the number of different I/O signal types has been expanded as well as I/O speed. The general purpose I/O, SelectIO, is used for connecting everything from devices like A/Ds and D/As, creating parallel and serial data buses, or implementing memory interfaces. The Virtex-6 family is compatible with the latest QDRII+ and DDR3 technology and Xilinx provides examples for implementing interfaces to these devices.
A key interface feature of all of the current Virtex generations is gigabit serial transceivers. Originally named RocketIO MGTs and now GTX and GTH transceivers in the Virtex-6, these interfaces operated at 6.5 Gbits/sec and 11 Gbits/sec, respectively. Gigabit serial connections provide an essential high-speed interface for moving data on and off the FPGA. These interfaces can be used to implement different protocols, such as PCI Express, Serial RapidIO and Xilinx’s own Aurora, a license-free, lightweight protocol ideal for fast point-to-point data connections. Like the SelectI/O, gigabit transceivers have remained similar in number, a maximum of between 16 and 20 on the 35mm square devices of the generations we’ve been comparing. One exception to this is the specialized high bandwidth families beginning with the Virtex-5. To satisfy applications with complex data routing and switching requirements, Xilinx introduced the Virtex-5 TXT family with up to 48 GTX transceivers and the Virtex-6 HXT family with up to 48 GTX and 24 GTH transceivers.
With PCI Express rapidly becoming more prevalent in systems from desktop PCs to targeted digital signal processing subsystems, Xilinx has included integrated PCI Express cores designed to support the gigabit serial transceivers. Virtex-6 supports PCI Express Base Specification 2.0 in x1 through x8 configurations.
Tapping Into Performance
Up to this point weíve been comparing the relative merits of the three most recent Virtex generations, but how does this performance map into actual applications?
A key benefit of FPGAs is the ability to create a high-speed board interface inside the FPGA, tightly coupled to the processing, to stream data on and off the board. A common board form factor used in military applications is the PMC/XMC module. In the case of PMCs this is a PCI or PCI-X interface; for XMCs PCI Express is most common. Unfortunately these interfaces do require a considerable amount of FPGA resources. For both the Virtex4- and -5 devices, a practical solution was to use two FPGAs on the module — one dedicated to processing data and one to implement the board interface. With the higher density of the Virtex-6 devices, it now becomes practical to include processing and interface in one device, simplifying board design and reducing the cost of the PMC/XMC module.
A typical PMC/XMC module used in software defined radio systems is the Pentek Model 7151 Quad 200 MHz, 16-Bit A/D. A key feature of this module is the FPGA based digital down converter. Instantiated in the largest SXT Virtex-5 part, the maximum number of channels achievable is 256. As this architecture migrates to the Virtex-6 SXT family, we will see a 2 to 3 times increase in channel count, an immediate, tangible improvement in performance and a large cost savings as the number of boards in a system can be reduced.
In radar systems and high bandwidth recording systems, where 10- or 12-bit A/Ds are operating at greater than 1 GHz sample rates, a Gen1 PCI interface can quickly become a bottleneck. Even in a single channel application, data streams can exceed 2 GBytes/sec. The Virtex-6 integrated Gen2 PCI interface helps alleviate this obstacle with transfer rates up to 4 GBytes/sec in an x8 lane configuration.
Each new generation of FPGA is enabled by a range of technical advances. These span broad improvements like power reduction and device density to the intricacies of including a pre-adder in the DSP block to streamline filter design. But even from just looking at the few metrics compared in this article, it’s easy to see the ongoing progression of FPGA technology and why FPGAs continue to be a preferred platform for digital signal processing.