Everywhere we look we are bombarded with 3D. It’s in movies, in-home entertainment, digital camcorders, gaming systems, laptops, and even in our labs. What is it about 3D that is so fascinating and what are some of the technology drivers that are determining its future?
The concept of 3D is not a new idea. In fact, Greek mathematician Euclid is credited with discovering the principles of binocular vision in the 4th century BC. Stereopsis, or "depth sense," was discovered by Sir Charles Wheatstone in 1838 and the stereoscope was popularized by Oliver Wendell Holmes in the late 1800s to 1930s. The first 3D movie appeared in 1922 and became more prolific in the 1950s when theater audiences donned red and blue tinted paper glasses to view early big screen 3D movies. Chances are you have clicked through 3D scenes on a View-Master, first introduced in 1937 at the New York World’s Fair.
While all of these provided some sense of dimensionality, fun effects, and entertainment, none provided a genuinely immersive or realistic environment. Today there is a convergence of technologies that is making 3D more readily acceptable and affordable, not just for the mainstream consumer market but for research, surveillance, inspection, process control, and a wide variety of medical applications.
Perception vs Reality
What we experience in the real world through our own eyes and mind is quite different from the stereoscopic images created through a camera and display. The majority of our 3D perception occurs for objects within 20 feet (6.1 meters). For objects beyond this distance our depth perception relies on factors such as relative scale, horizon lines, and other visual cues. The axis of our eyes also rotates naturally to meet at the desired location when viewing an object in the real world — this is called convergence. The angle of convergence varies depending on the distance between our eyes and the object that is the center of our focus. Our eyes have less convergence when focusing on an object in the distance, compared with objects we focus on that are nearer to us. When creating 3D effects, the cameras and monitors need to artificially produce this angle of convergence disparity in order to fool our brain into perceiving an object at an artificial distance, relative to the display screen (Figure 1).
Three critical components are required to simulate 3D: content creation, processing, and display. Each plays a role in helping us recreate what our own eyes and brain do naturally. Achieving the highest level of realism involves a combination of technologies working together and delivered at a cost point which is viable for the intended application. It is true that 3D images can be completely synthesized within a software environment for research tools, computerized design platforms, and entertainment. This article, however, will discuss capturing images stereoscopically using video cameras to produce content.
Creating a realistic and immersive 3D experience depends on high quality content. This is best achieved through the use of digital high definition (HD) video cameras which feature high resolution CCD or CMOS sensors, wide dynamic range, good sensitivity, digital shutter, and adjustable color matrix. HD video is a specific format, typically 16:9 aspect ratio, 1920h x 1080v matrix. This format can be either interlaced, 1080i, whereby each video frame is displayed as two alternating fields of odd and even video lines, or progressive, 1080p, which is displayed out sequentially one video line after another.
In the United States, the Advanced Television Systems Committee (ATSC) defines several HD and standard definition formats for broadcasters. The most common HD format for cable and broadcasters is 720p which corresponds to 1024 x 720 progressive. When 3D content is broadcast by cable and satellite, the resolution is reduced by half to 1920 x 540, or 540p, to transmit the left and right views within the HD broadcast signal. Alternatively, Blu-ray Disc™ players are capable of delivering 3D content in full resolution 1920 x 1080 format.
Stereoscopic camera alignment is critical to producing a 3D image which minimizes geometric distortion, cross talk, and ghosting. In the early days of stereo photography, the two camera views were offset about 7cm to approximate the spacing between human eyes, or intraocular distance. In addition one needed to consider the alignment, whether parallel or convergent, as well as establishing the zero parallax point in relationship to the display screen plane (Figure 2). There are tools to assist in the calculations of these angles based on desired zero parallax point, target display size, and distance between the camera and objects positioned at the zero parallax point. Alignment of the cameras will contribute to the format of the 3D as experienced on the display or projected on a screen. Objects which appear to come out from the screen, the in-yourface, snap-your-head-back shots make use of negative parallax, or “outie”. Objects which recede into the background, away from the screen, use positive parallax, also called an “innie”. Through post-production processing it is possible to achieve both effects within a single image.
The convergence angle in combination with the intraocular distance during filming can have a significant impact on the amount of 3D effect. As the convergence angle increases, in combination with intraocular distance, the 3D effect will be more extreme. Too much convergence, however, could result in geometric distortion which may prove difficult to correct in postprocessing, if required.
Due to the availability of new software tools, many filmmakers are using zero convergence or parallel alignment and using software as a post capture process to adjust the two camera views, setting the zero parallax point and convergence angles to provide better control of the 3D effect. These processes can be implemented through software more readily than the removal of geometric distortion introduced by having too much convergence. For live 3D events a middle ground is typically used which includes a small convergence angle targeted at the key reference point in the scene.
3D camera rigs include a number of different formats including side-by-side (Figure 3) and beamsplitter configurations (Figure 4). Side-by-side would seem to be the most natural method to recreate the positioning of our eyes. However, due to lenses, optics, and the distances between the cameras and the target object it may be impossible to bring the two cameras close enough together using only a side-by-side arrangement.
In a beamsplitter rig, the cameras are mounted on different planes, perpendicular to one another with a 50/50 mirror at 45° to each camera. These rigs allow the two cameras to be placed in a wide range of positions relative to each other to accommodate any situation — from fully superimposed to several inches of separation. In both of these configurations, if the lenses are zoomed into the shot, the cameras are synchronized to pull closer together to maintain the appropriate alignment as the relationship and distances to the target object change.
Lower cost 3D cameras, which feature integrated lenses, are capable of producing 3D HD video images and for their price they are practical in some consumer applications. However, these systems lack the controls necessary to accommodate different conditions and typically exclude the lens options, sensitivity, and dynamic range of professional broadcast cameras. Digital cinema cameras use larger format sensors operating at 2K (2048×1080), 4K (4096×2160), and higher resolutions to support theatrical production, editing, and projection. These formats are impractical outside of digital cinema because the corresponding large format displays are not cost effective for home and industrial applications.
Processing is required to properly format stereoscopic images so they can be presented in a way that reveals the 3D effect. The formats will depend on the playback method and could include anaglyph, left/right side-by-side, or top and bottom. In some instances, two separate video files are sent through a multiplexer or MUX box which formats the two independent signals into a viewable 3D composite signal. Blu-ray Disc players will automatically output the content in an appropriate 3D format as required by your display. There are also a number of software tools to convert 2D images into 3D content.
Display technology has a huge impact on viewing 3D images and although beyond the scope of this article, a few comments on this critical component are appropriate. There are significant differences in technologies and standards used between home systems and movie theaters. 3D digital cinema depends on a Digital Cinema Initiatives (DCI) compliant projector(s) operating at up to 4K video matrix and 48 frames per second (fps), polarizing filters, and special reflective screens which can be viewed using inexpensive passive-polarized glasses. At home, most 3D displays use active-shutter technology requiring expensive LCD glasses which are RFlinked to the TV, switching each eye on and off up to 120 times per second in synchronization with the monitor. In this way full HD content can be displayed. Passive glass technology for home use cuts resolution in half. Future enhancements will improve HD resolution by implementing new active screen technology that retains full HD display while permitting the use of passive technology glasses. Glasses-less, autostereoscopic technology is becoming available, but currently is most successfully implemented on smaller displays such as the view finder on 3D camcorders, portable gaming terminals, and laptops. Large format autostereoscopic displays have been introduced to mixed reviews due to the presence of 3D viewing zones and reduced resolution compared to other 3D display technology.
Beyond entertainment, 3D has practical implementation in various applications including surgical imaging (Figure 5), remotely controlled underwater inspection systems (Figure 6), surgical microscopy, industrial inspection, and process control. There are several commercially-available, FDAapproved, stereoscopic video systems for surgery. Applying 3D in surgery offers surgeons improved depth perception which is important given the increasing use of minimally invasive and single-port surgical techniques. With these procedures, the surgical field-of-view is narrowed, making it more challenging to discern depth and the orientations of medical instruments and anatomical structures. While the addition of stereoscopic imaging helps physicians improve their visualization, it remains dependent on the surgeon’s experience and training to produce the full benefits of these technologies.
What is in store for the future? The latest wave of 3D, since the theatrical release of James Cameron’s Avatar in 2009, seems to be in its infancy. We understand how to capture 3D content and with new schemes emerging we can convert single camera 2D into 3D. With further advancements in processing and new display technologies which eliminate glasses, 3D will not be just an option, but will quickly become a standard part of our entertainment and visual media environment.