Re-designing the camera for computational photography
Although digital sensors have become almost ubiquitous, the majority of cameras that contain them have changed little in form over the past century. After capturing a focused 2D image, digitization leads to efficient storage for future editing and sharing, but little else. Computational cameras aim to break down the wall between the optics that captures a photo and the post-processing used to enhance it, with designs that jointly optimize for both. The initial images from these cameras appear blurry or even unrecognizable, but contain much useful information for the computer that immediately processes it. A post-processed image is clear and sharp, and can also provide measurements of an object's 3D location, spectral properties (color spectrum), or material composition, for example.
Many optical systems have been developed to extract these unseen clues from a scene of interest, some requiring little or no post-processing computation at all. For example, clever illumination can provide a direct indication of object depth, as demonstrated by Microsoft's Kinect camera. Multispectral imagery can even be created with an unmodified camera, such as a regular point-and-shoot device. A combination of images of the same scene using different filters over the lens will capture more than the three standard color spectrum ranges (red, green, and blue).
The aim of many computational cameras is to add these useful functionalities to a conventional 2D image captured in a single snapshot. Such an ambitious goal inherently requires modification of the optical setup, which can often be realized by adding simple patterned elements to regular camera designs. The patterned optical elements used in current computational cameras fall into two general classes. The first includes elements that are placed at the camera aperture stop, typically referred to as pupil masks, that globally modify the entire image. The second consists of elements, placed very close to the image sensor, which locally modify regions of pixels, much like the Bayer filter pattern in most of today's color cameras. The post-processing of a computational image taken with cameras like these comes in a variety of forms, ranging from simple deconvolution and pixel ‘re-binning’ (recombining data from adjacent sensors to create one pixel of data) to more complex, sparse recovery procedures.
Pupil mask design is a constantly evolving area of research, with various mask patterns proposed to extract image depth,1 extend a camera's depth of field,2 or offer super-resolution,3 among other enhanced functionalities. Each mask alters the camera's 3D point-spread function (PSF) to better present the information to be extracted during post-processing. We have demonstrated a method to optimally design any desired PSF intensity pattern in 3D (see Figure 1).4
Sensor-based coding can help obtain different angular perspectives of an object (its ‘light field’), which is closely related to detecting the phase of an incoming wavefront. Once captured, interesting effects like digital refocusing can be achieved in post-processing. Periodic arrays of small lenses or pinholes provide a simple way to extract these varied perspectives from a single image. More complex periodic pattern designs can lead to phase detection5 or pixel-level optical transfer function design with background noise reduction6,7 (see example in Figure 2).
Finally, pupil- and sensor-based coding can be combined. For example, we can obtain a multispectral image in a single snapshot by inserting a variable filter at the pupil and a periodic array near the sensor (see Figure 3).8 In this way, 27 spectral channels are directly captured at the expense of the image's spatial resolution.
A large degree of flexibility is gained when dynamic optical elements are used to improve the computational image capture process. Although research is still in its initial stages, we have developed a framework to optimally design the 3D PSF formation of a dynamic pupil mask, made with a small LCD screen in the camera lens.9 The screen's pattern changes during the exposure of one image to shape any desired 3D intensity pattern near the sensor. We have also demonstrated the extraction of mixed spatial, angular, and temporal scene content using a pupil element that changes over time.10 This design captures multiple frames of a scene's light field, allowing one to digitally refocus on a moving object, or create an image with varying spatial resolution. Likewise, compressive sensing is possible with variable sensor-based elements11 (i.e., pixel-level optical control) that can also be used for object tracking and deblurring. These early results suggest how future cameras can greatly benefit from dynamic, adaptive optical elements in their specific imaging tasks.
In general, since a computational camera captures and processes optical data to measure something besides a simple 2D image, its light-capturing optics and post-processing procedures must be jointly optimized. Our future work will focus on applying the novel camera designs described to image otherwise undetectable features, such as biomedical, microscopic, or ultrafast phenomena.
This research is supported in part by a National Defense Science and Engineering Graduate Fellowship.
Massachusetts Institute of Technology