High precision SAD5 stereo computations can be performed in an FPGA (field-programmable gate array) at much higher speeds than possible in a conventional CPU (central processing unit), but this uses large amounts of FPGA resources that scale with image size. Of the two key resources in an FPGA, Slices and BRAM (block RAM), Slices scale linearly in the new algorithm with image size, and BRAM scales quadratically with image size. An approach was developed to trade latency for BRAM by sub-windowing the image vertically into overlapping strips and stitching the outputs together to create a single continuous disparity output.
In stereo, the general rule of thumb is that the disparity search range must be 1/10 the image size. In the new algorithm, BRAM usage scales linearly with disparity search range and scales again linearly with line width. So a doubling of image size, say from 640 to 1,280, would in the previous design be an effective 4× of BRAM usage: 2× for line width, 2× again for disparity search range.
The minimum strip size is twice the search range, and will produce an output strip width equal to the disparity search range. So assuming a disparity search range of 1/10 image width, 10 sequential runs of the minimum strip size would produce a full output image.
This approach allowed the innovators to fit 1280×960 wide SAD5 stereo disparity in less than 80 BRAM, 52k Slices on a Virtex 5LX330T, 25% and 24% of resources, respectively. Using a 100-MHz clock, this build would perform stereo at 39 Hz.
Of particular interest to JPL is that there is a flight qualified version of the Virtex 5: this could produce stereo results even for very large image sizes at 3 orders of magnitude faster than could be computed on the PowerPC 750 flight computer. The work covered in the report allows the stereo algorithm to run on much larger images than before, and using much less BRAM. This opens up choices for a smaller flight FPGA (which saves power and space), or for other algorithms in addition to SAD5 to be run on the same FPGA.
This work was done by Carlos Y. Villalpando and Arin C. Morfopoulos of Caltech for NASA’s Jet Propulsion Laboratory. For more information, download the Technical Support Package (free white paper) at www.techbriefs.com/tsp under the Semiconductors & ICs category.
The software used in this innovation is available for commercial licensing. Please contact Daniel Broderick of the California Institute of Technology at
This Brief includes a Technical Support Package (TSP).

SAD5 Stereo Correlation Line-Stripping in an FPGA
(reference NPO-47245) is currently available for download from the TSP library.
Don't have an account?
Overview
The document discusses the development and implementation of the SAD5 Stereo Correlation Line-Stripping technique in an FPGA (Field Programmable Gate Array) by researchers at NASA's Jet Propulsion Laboratory (JPL). The primary challenge addressed is the significant resource consumption of stereo SAD5 computations, which can exceed the capacity of FPGAs when processing large images. Key resources in an FPGA include slices, which determine the number of operations, and BRAM (Block RAM), which serves as internal data storage.
The previous implementation of SAD stereo in an FPGA was limited, as no commercially available FPGA could accommodate the resource requirements for full-resolution images (1280x960). The new approach, termed Line Width Striping, involves splitting the input image into two halves with an overlap, allowing the SAD5 engine to process each side sequentially. This method reduces the BRAM count by 60% while maintaining the necessary computational capabilities, although it does not decrease the number of slices used.
The document outlines three tests conducted to validate the accuracy of the disparity results produced by the SAD5 algorithm: the Spike test, Slope test, and Wall test. Each test involved feeding pairs of images into the system to produce known disparity outputs, confirming the algorithm's effectiveness.
The Line Width Striping method introduces a trade-off between bandwidth and BRAM usage, making it feasible to process larger images without exceeding FPGA limitations. The new latency introduced by this method is approximately 20% worse than processing at full resolution, but it allows for significant improvements in handling larger stereo images, which is crucial for applications such as obstacle detection in autonomous vehicles and missions involving NASA rovers or Entry Descent and Landing scenarios.
The research is positioned as a valuable contribution to future NASA projects, particularly in the context of high-speed autonomous vehicles where long-range stereo computation is essential. The document concludes by emphasizing the potential applications of this technology in various aerospace endeavors, highlighting its significance in advancing stereo vision capabilities in constrained environments.

