The development of a robotic random bin picking system that translates to a real-world factory application has received attention for more than 30 years, and has been called by some the “Search for the Holy Grail.” Random bin picking refers to a system where vision-enabled robots locate, grasp, and move single parts from a bin of jumbled or randomly piled parts. In recent years, advances in processing speed, new algorithms, and significant engineering have combined to solve limited versions of the problem, but the more general problem remained.

Semi-Random vs. Really Random

Many of the limited solutions handle what the machine vision industry refers to as “semi-random” problems, where parts are not totally randomized, but loosely fitted in a bin. In some cases, parts are roughly stacked, are flat 2D parts, or they arrive in a single layer in the bin creating less variance in part orientation and height. Semi-random picking techniques typically include pattern matching, 2D or 2.5D part location, and a standard approach trajectory for grasping parts.

Random bin picking refers to a system where vision-enabled robots locate, grasp, and move single parts from a bin of jumbled or randomly piled parts.

Truly randomized bins, where parts are situated underneath or on top of one another in any position, create an additional set of problems. Parts jumbled on top of one another increase the number of potential orientations of a part, reduce the visibility of part features used to recognize individual parts, and require additional potential paths for grasping parts. Real-world solutions need to overcome all of these obstacles, and in addition to solving the vision problem, ensure that the robot avoids colliding with other parts or the bin itself.

All vision guidance solutions tackle extremely difficult problems caused by the possibility of harsh lighting conditions. In addition, random bin picking must deal with increased shadowing and specularities caused by reflections from other parts, variation in appearance of the part based on its pose, varying degrees of occlusion by other parts in the pile, and the lack of (a large number of ) salient features for recognition — a typical characteristic of rough or unfinished parts that are inexpensive enough to toss together in a bin. Another particularly difficult problem is cascading layers of parts — parts that partially lay on top of one another in a weaving formation — where a vision system must be able to recognize a part as safe to grasp even while under other parts.

In addition to vision issues, there are significant challenges with planning grasping paths for robot grippers that avoid collisions with the bin and the rest of the pile within the robot’s range of movement. Robot tools must be developed to grip parts from various positions, and must grasp parts without colliding with parts above, below, or next to the part. The system must also complete the task as fast as or faster than current manual or semi-manual systems to be commercially viable to customers.

Methods and Approaches

Research focused on visually recognizing and locating 3D objects uses either 2D data from a single image or 3D data from stereo images or range scanners. Methods can be subdivided into model-based, appearance-based, and 3D data approaches.

The model-based approaches suffer from difficulties in feature extraction under harsh lighting conditions. Typical parts will not contain a large number of features, limiting the accuracy of a model-based fit to noisy image data. Appearance-based approaches have problems in segmenting out the object for recognition, have trouble with occlusion, and may not provide a 3D pose accurate enough for grasping purposes.

Approaches with 3D data face lighting effects that cause problems for stereo reconstruction, and specularities that create spurious data both for stereo and laser range finders. Once the 3D data is generated, there are the issues of image segmentation and representation. On the representation side, more complex models are often used than in the 2D case. These models contain a larger number of free parameters, which can be difficult to fit to noisy data.