The term “smart camera” has been used for decades, but there has been little agreement on the actual definition. This article will use the term “smart” as the Merriam Webster dictionary defines it: the ability to apply knowledge to manipulate one’s environment. Smart is also applied as an adjective, such as smart weapon and smart electrical grid, by achieving greater versatility.
A smart camera is a camera with embedded processing for intelligent response. Intelligence also implies learning, an area of much interest, but one beyond the current scope of this article.
Historically, machine vision relies on cameras sending images to a computer, which is still the most widely employed method today. However, many computers are based upon software written for serial computation, limiting the intelligent system. This results in linear data flow through a CPU. Since 2012 most computers use multicore processors, providing parallel computing, but software will continue to be serial unless specifically written for parallelism.
Computers rely upon their multicore processors working on data from the same memory block that houses the software code. The bus between the CPU and this memory block limits performance to the so-called Von Neumann bottleneck and the Memory Wall. This refers to bandwidth limitations between the memory and the CPU. One can increase clock frequencies to process more instructions per time, but there is a limit to how fast one can push clocks without excessive power consumption creating more heat and energy loss.
The human brain doesn’t work this way. The brain tightly couples memory with processing.
Most machine vision systems today rely upon cameras sending their images over cables to the computer. These cables are bottlenecks to moving large amounts of data from the ever-increasing resolutions and frame rate sensors that are being introduced. New video protocols have been developed to help increase bandwidth: USB3, NBASE-T, Camera Link HS, CoaXPress and fiber optics interfaces. All but USB3 and GigE interfaces require an interface card to move image data to memory.
Sensor manufacturers have produced chips with remarkable capabilities in resolution and frame rates; however, many of the camera suppliers are unable to fully utilize the sensor’s performance, as they are limited by the bandwidths of the particular interface to the computer.
To mimic the most intelligent being on Earth, the human being, we must try to adopt similar architectures. All sentient beings have their eyes closely coupled to their brain, so, a smart camera should have its sensors closely coupled to its processor. Our brains house both memory and processing elements. So, then this tight coupling of sensors to memory to processing should be realized for a truly smart camera.
3D stacked DRAM chips are shortening the memory to processor distance. Two groups are competing! One group is HMC (Hybrid Memory Cube) and the other is HBM (High Bandwidth Memory). Both of these groups rely on through silicon vias (TSVs) to connect the memory.
The use of 3D memory stacks help to eliminate the Memory Wall problem, but what about the processing bottlenecks and the bandwidth limitations of image sensor to computer?
One technique would be to use preprocessing routines to reduce the data sent to the CPU. Field Programmable Gate Arrays (FPGAs) are a good way to solve this problem. FPGAs are a “sea of gates” chip that can be programmed into a general purpose processor by a user after manufacturing; that is, in the field. Essentially all machine vision cameras have FPGAs used for image sensor control and interface formatting. Using FPGAs in cameras reduces development time to create various models of cameras with different sensors or interfaces. However, almost none will allow access to the FPGA for reprogramming.
To bring more intelligence to the “edge”, near the sensor, Xilinx and Altera have added ARM processors with their FPGAs on the same chip. Since acquiring Altera in 2015, Intel merged their FPGAs to Skylake Xeon processors.
Embedded vision has been used in the consumer industry, most notably with smart phones with MIPI (Mobile Industry Processor Interface). Smart phones, however, are dedicated to specific purposes with controls to create pleasing images and they don’t allow user programming. To create a smart phone there is high integration costs, but larger volume allows a lower per-unit cost (helped with cost financing via telecom contracts). In contrast, machine vision has lower integration costs but higher per unit cost since there is a more limited market for sales. Unfortunately, machine vision is still behind the curve.
An example of embedded imaging is the Samsung Galaxy S8 smart phone, using the Qualcomm Snapdragon 835 to integrate numerous functions. This system combines DSP, modem, audio, GPU (graphics processing unit), ARM CPU and image processing.
Another embedded vision approach is the NVIDIA Jetson platform. This GPU-based system provides high-performance, low-power computing for deep learning techniques and computer vision. However, this still requires programming expertise across a set of at least seven different tools and libraries for implementing complete machine vision applications.
Until smart cameras can learn on their own they will be made intelligent only because users program them to create solutions. What impedes the machine vision marketplace in using all these wonderful new systems? It’s the difficulty in programming everything.
Concentrated effort to help facilitate the programming environment is underway. Intel is trying to develop tools that will be used, whether the CPUs or the FPGAs are discrete, or reside together in the same chip. Intel’s Acceleration Stack uses OpenCL to accelerate application development to Intel CPUs and FPGAs. OpenCL is an open, royalty free method of C-based programs targeting CPUs, GPUs and FPGAs.
GPUs were designed to render high quality images for display, essential for the gaming market. Programming languages like OpenCL allows the GPU to be more general purpose for non-graphic based applications.
Why FPGAs? These are user programmable, reconfigurable, real-time, low-power, cyber secure, parallel operation devices that are inexpensive compared to custom ASIC (Application Specific Integrated Circuit) designs. They are highly suitable for embedded vision in low SWAP (size, weight and power) applications. FPGAs are ideal for image processing and camera control. FPGAs are more power-efficient than GPUs and much faster with certain operations.
Since GPUs were originally designed for consumer graphics, they don’t have functional safety designed into them. Addressing applications such as ADAS (Advanced Driver-Assistance Systems) requires functional safety. GPU suppliers must redesign their devices to ensure this reliability. FPGAs, on the other hand, have long been used within applications requiring functional safety, such as ADAS, medical devices and aerospace controls.
Learning how to program CPUs is one thing, and it used to be that learning how to program FPGAs was quite another. However, that’s changed. One needed years of experience understanding gate architecture and bit level manipulation to program an FPGA. There are now products reducing or eliminating this requirement. They allow programming FPGAs without prior knowledge of hardware description languages such as Verilog or VHDL.
Xilinx offers a development environment with “GPU-like and familiar embedded application development and runtime experiences for C, C++ and/or OpenCL development”. Their “reVision” product leverages C/C++ with OpenCL languages and OpenCV libraries. It’s fixed onto their product Zynq to allow software programmability of an ARM with the hardware programmability of an FPGA.
Intel has introduced their HLS Compiler, which “is a high-level synthesis (HLS) tool that takes in untimed C++ as input and generates production-quality register transfer level (RTL) code that is optimized for Intel FPGAs.” However, HLS and reVision have no automatic concept of synchronization of parallel threads. This is still a task the user must undertake.
A relatively new tool in an alpha development stage is Hastlayer which “is a tool that can automatically turn your .NET software into a hardware implementation”; however, it still requires a host PC for current implementation.
The above synthesis tools still require traditional programming expertise, perhaps at the same level as a DSP specialist. But, C-based or .NET based code is still designed for sequential input, not parallelism, losing one of the most powerful features of GPUs and FPGAs. All of these approaches still require user expertise to take advantage of parallelism.
However, there is an attempt to eliminate the use of programming altogether with graphical based creation tools. National Instruments states “LabVIEW FPGA Module extends the LabVIEW graphical development platform to target FPGAs on NI reconfigurable I/O (RIO) hardware.”
Arguably, the most advanced simplification of FPGA programming using graphical hardware synthesis is from Silicon Software’s award winning, 10 year old, Visual Applets. This graphical flowchart method allows non-programmers to use numerous examples and iconized routines for FPGA implementation. Built in diagnostics, simulation and FPGA resource analysis create a powerful but simple way to program FPGAs.
Indeed, it’s time to rethink using FPGAs for smart cameras.
This article was written by Rex A. Lee, Ph.D., CEO/President, Pyramid Imaging, Inc. (Tampa, FL). For more information, contact Dr. Lee at