In order to extend the usefulness of small, unmanned ground vehicles (UGVs) to a wider range of missions, techniques are being developed to enable high-speed teleoperated control. Our goal is to quadruple the speed of teleoperated UGVs compared to currently deployed models. The key limitation is not mechanical, but in the capability of the operator to maintain situational awareness and control at higher speeds. To address these issues, we are developing technologies for immersive teleoperation and driver-assist behaviors.

Our immersive teleoperation system uses a head-mounted display and head-aimed cameras to provide the operator with the illusion of being in the vehicle itself. Driver-assist behaviors will reduce the cognitive load on the operator by automatically avoiding obstacles while maintaining a specified heading or following a building wall or street. We’ve demonstrated immersive teleoperation on the iRobot Warrior UGV and a high-speed surrogate UGV.
Small UGVs such as the iRobot Pack-Bot have revolutionized the way in which soldiers fight wars. A typical UGV transmits video from an onboard camera back to the operator control unit (OCU) that displays the video on a computer screen. In a manner similar to playing a first-person shooter video game, the operator teleoperates the UGV using a joystick, gamepad, or other input device to control vehicle motion. While this teleoperation method works well at slow speeds in simple environments, viewing the world through a fixed camera limits the operator’s situational awareness. Even joystick-controlled cameras that pan and tilt can be distracting to operate while driving the vehicle. This is one of the reasons why small UGVs have been limited to traveling at slow speeds.
Faster, small UGVs would be useful in a wide range of military operations. When an infantry squad storms a building held by insurgents, speed is essential to maintain the advantage of surprise. When a dismounted infantry unit patrols a city on foot, the soldiers need a UGV that can keep up. However, driving at high speeds through complex urban environments is difficult for any vehicle, and small UGVs face additional challenges. Small UGVs need to steer around obstacles — for example, a bump that would be absorbed by a large vehicle’s suspension can send a small, fast-moving UGV flying into the air.
For the Stingray Project, funded by the US Army Tank-Automotive Research, Development and Engineering Center (TARDEC), iRobot Corporation (Bedford, MA) and Chatten Associates (West Conshohocken, PA) developed technologies that enable teleoperation of small UGVs at high speeds through urban terrain. Our approach combines immersive telepresence with semi-autonomous driver-assist behaviors, which command the vehicle to safely maneuver according to the driver’s intent.
In Phase I of the project, we mounted a Chatten Head-Aimed Remote Viewer (HARV) on an iRobot Warrior UGV prototype (Figure 1) and a surrogate, small UGV based on a high-speed, gas-powered, radio-controlled car platform. The operator wears a head-mounted display and a head tracker (Figure 2). The display shows the video from the HARV’s camera, which is mounted on a pan/tilt/roll gimbal. The HARV tracks the operator’s head position and turns the camera to face in the same direction.

For Phase II, we increased the Warrior UGV’s top speed by developing a high-speed, wheeled version of the Warrior (Figure 3). To assist the driver in controlling this speed, LIDAR determined the orientation of features such as street boundaries, building walls, and tree lines. With these capabilities, operators can drive the UGV at much higher speeds.
Head-Aimed Remote Viewer
Testing of remotely operated ground vehicles has shown that head-aimed vision improves teleoperation mission performance between 200 and 400%, depending on the task. In general, as the complexity of the task increases, the relative advantage provided by head-aimed vision is much greater.
Chatten Associates developed the ruggedized HARV for operating a small robotic ground vehicle. Previous experiments have shown that only head-aimed vision can provide sufficient awareness for teleoperation at these speeds. The operator wears a head-mounted display and a head tracker. As the operator turns his head, the HARV gimbal automatically turns the cameras to face the corresponding direction. This provides a far more immersive experience than aiming the camera with a joystick (even with the same head-mounted display).
We successfully integrated the HARV with a Warrior UGV prototype. The HARV is powered by a DC-to-DC converter that provides regulated 24V from the UGV’s unregulated 48V system voltage. The operator drives the UGV via joystick control using the prototype’s R/C control interface. Analog video is transmitted back to the operator using a 2.4-GHz transmitter. Digital commands to update the camera position based on the operator’s head position are transmitted on a separate channel at 900 MHz.
Using this setup, we successfully teleoperated the UGV through both open and wooded terrain, and over grass, asphalt, snow, and concrete curbs. We were able to drive the UGV at the prototype’s current maximum speed while making turns and avoiding obstacles. The operators controlled the robot non-line-of-sight at full speed through a grove of trees, without any difficulty. Based on post-run operator evaluations, head-aiming increased situational awareness by nearly an order of magnitude over the fixed camera solution.
High-Speed Surrogate UGV
In order to gather experimental data on the high-speed tele-operation issues, we used a surrogate high-speed UGV (Figure 3) that consists of a 1/5-scale, radio-controlled, gas-powered Ford GT with a top speed of 50 mph. This car is 36" long and 21" wide. We removed the outer shell of the car and integrated the HARV and a roll cage with the surrogate UGV. In this configuration, the surrogate reached an estimated top speed of 30 mph. The vehicle had three radios: a 75-MHz pulse-width modulated radio for vehicle steering, throttle, and braking; a 1.7-GHz analog radio for NTSC video; and a 2.4-GHz digital radio for gimbal control. The 75-MHz radio had a maximum range of about 250 feet.

In all of our experiments, the operators wore the HARV head-mounted display and controlled the vehicle using the handheld R/C proportional controller (with a small wheel controlling steering and a trigger controlling throttle and brakes). In the first set of experiments, the gimbal was locked down so that its orientation was fixed with respect to the vehicle. In the second set of experiments, the HARV gimbal was enabled, and the orientation was controlled by the operator’s head motions.
Our experiments showed that both operators were able to successfully navigate the slalom course at high speed using the HARV in both fixed and head-aimed modes. In a course that more closely resembles an urban environment, other experiments have shown that head-aiming results in a substantial improvement in operator driving performance versus a fixed camera.
Driver-Assist Behaviors
We developed a set of semi-autonomous driver-assist behaviors to help the operator control the Stingray UGV at high speeds. These behaviors take commands from the OCU, process data from the sensors (LIDAR, GPS, INS), and send motion commands to the UGV. These behaviors reduce the cognitive load and allow the operator to focus attention on other tasks. For example, the operator could order the UGV to drive down a street, while the operator scans the environment for potential threats.
The perimeter- and street-following behaviors use the Hough Transform to detect the location and orientation of walls and street features. This computer vision technique works by transforming image point coordinates into votes in the parameter space of possible lines. Each point corresponds to a vote for all of the lines that pass through that point. By finding the strongest points in the parameter space, it can determine the parameterized equations for the strongest lines in the image. The Hough Transform processes range data from the LIDAR and calculates the strongest line orientations and offsets relative to the robot’s current position.
Enhanced Situational Awareness
Adding additional cameras as inset views into the main image of a head-aimed vision system can greatly increase situational awareness. With a 360° overlay, any motion in the panoramic image that is inconsistent with the motion of the remainder of the panoramic view will instinctively cue the operator’s peripheral view processing to turn and look at the movement. Presenting image data this way takes maximum advantage of the operator’s native visual processing.
Similarly, the rear-view camera functionality takes advantage of operators’ long-term experience with driving cars. Operators reversing the robot out of a tight passage will be able to use the main display to constantly look left and right to check side clearances, while using the inset image from the fixed rear-facing camera to gauge alignment and rear clearances.
The HARV was modified to add five very small video cameras, similar to those used in cell phones. Four of the cameras are mounted at 90° spacing around the perimeter of the gimbal on the pan yoke ring, which moves with the pan axis. These four cameras always have a fixed relationship with viewing angle of the main display. What is shown to the left in the panoramic image will be to the left of where the HARV is currently looking, and visa-versa. The fifth camera is mounted in the base tube. This camera faces to the rear and slightly down, to provide a good view for backing up.
Low-power video processing electronics merges the four 90° cameras together to create the panoramic image, and then overlays it on the NTSC video stream, without inducing any latency into the main video image. A simple operator interface selects the different view modes.
Low-Latency Digital Video
Vision is a critical teleoperation interface, but video requires very high communications bandwidth. Video latency can also critically impact mission performance and reduce operator effectiveness. The general figure of merit in the automotive simulation community is that visual latency needs to be below 100 ms in order for a car to be controllable. Driving performance degrades with any measurable latency. For this reason, we have used analog video and analog radios to transmit the video, with a digital protocol being developed for the HARV system.
To make the video signal more robust and resistant to the types of signal dropouts experienced in the field, the video can be double-encoded, using a second low-resolution version behind the high-resolution video. If macroblocks within the high-resolution video are lost, then the low resolution will show through in those spots until new keyframe macroblocks can be used to fill in the detail.
This article was written by Brian Yamauchi of iRobot Corporation, Bedford, MA, and Kent Massey of Chatten Associates, West Conshohocken, PA. For more information, Click Here .