Roboticists and artificial intelligence (AI) researchers know there is a problem in how current systems sense and process the world. Currently, they are still combining sensors — like digital cameras that are designed for recording images — with computing devices like graphics processing units (GPUs) designed to accelerate graphics for video games.
This means AI systems perceive the world only after recording and transmitting visual information between sensors and processors. But many things that can be seen are often irrelevant for the task at hand such as the detail of leaves on roadside trees as an autonomous car passes by. At the moment, all this information is captured by sensors in meticulous detail and sent clogging the system with irrelevant data, consuming power and taking processing time.
Researchers have borrowed inspiration from the way natural systems process the visual world — a human’s eyes and brain work together to make sense of the world and in some cases, the eyes themselves perform processing to help the brain reduce what is not relevant. The researchers implemented Convolutional Neural Networks (CNNs), a form of AI algorithm for enabling visual understanding, directly on the image plane. The CNNs can classify frames at thousands of times per second without ever having to record these images or send them down the processing pipeline. The researchers considered demonstrations of classifying handwritten numbers, hand gestures, and even plankton.
The research suggests a future with intelligent dedicated AI cameras — visual systems that can simply send high-level information to the rest of the system such as the type of object or event taking place in front of the camera. This approach would make systems far more efficient and secure as no images need to be recorded.
The work incorporates SCAMP, a camera-processor chip that the team describes as a Pixel Processor Array (PPA). A PPA has a processor embedded in each pixel that can communicate to each other to process in truly parallel form. This is ideal for CNNs and vision algorithms.
Integration of sensing, processing, and memory at the pixel level not only enables high-performance, low-latency systems but also promises low-power, highly efficient hardware. SCAMP devices can be implemented with footprints similar to current camera sensors but with the ability to have a general-purpose, massively parallel processor right at the point of image capture.