A Binocular Camera Platform for Active Vision

A binocular camera head mounted on a robotic arm is being used to conduct research in Active Vision at the GRASP Laboratory at the Department of Computer and Information Science, University of Pennsylvania. Currently research with the head is done by Dr. Brian C. Madden and doctoral student Ulf Cahn von Seelen.

Once and Future Systems

Working together with HelpMate Robotics Inc. (formerly TRC) and the Universities of Rochester, Maryland and Massachusetts, we developed the specifications for the binocular camera platform that became the HelpMate BiSight head in 1994. The goal of this successful SBIR was (and is) the communal benefit to research through the sharing of a common hardware base. We have obtained the 2-axis version (independent pan) of the BiSight head and are using it for vergence and version control of two remote-head CCD cameras and motorized lenses. The SONY cameras (XC77-RR) weigh 65 grams each and have a b/w 2/3" sensor (768x493 pixels) with an electronic aperture. The Fujinon lenses (H10x11E-MPX31) weigh 530 grams and have motorized zoom (11 to 110 mm), focus (1.2 m to infinity) and aperture (F1.9 to F22). Both the 10:1 zoom and the focus have potentiometer feedback. External lenses (1 or 2 diopter) can be added to obtain shorter working distances.

Control of the two pan axes and the zoom and focus functions of the motorized lenses is accomplished by a programmable multi-axis controller (PMAC, Delta Tau Data Systems) that is based on a Motorola DSP chip (56001) and is incorporated into the BiSight system. Up to 8 axes can be simultaneously controlled (enough for the 4 optical and 4 mechanical degrees of freedom available on the pan/tilt TRC head).

In the first generation of the tracking system (January, 1995), frames from the two cameras were digitized in alternation by a single framegrabber (Data Translation DT1451) mounted on the same SPARC IPX platform (Sun Microsystems) as the Delta Tau controller. Through the use of thresholding by the framegrabber lookup table and reading a subsampled image into memory, it was possible to compute (at field rates) the number of the pixels in the image above a criterion luminance value and their centroid. We based our visual servoing on these measures and were able to track a bright object against a dark background.

A second IPX platform ran RCCL/RCI and controlled the robotic arm (PUMA 560). In this way our computational load was distributed among the two SPARCStations and the PMAC controller.

To go beyond the domain of tracking a single bright object against a dark background, we are shifting our image processing to a network of 9 TMS320C40 DSP modules, including framegrabber and convolution boards. The network is connected to a SPARCstation 10 running Solaris 2.4 and 3L Parallel C (PennEyes - A Binocular Active Vision System (370KB), Technical Report MS-CIS-95-37/GRASP LAB 396). The C40 boards provide simultaneous capture of the two camera images as well as windowing, conditioning and convolution capabilities at field rate. With this new configuration, we have extended our problem domain the tracking of arbitrary targets against arbitrary backgrounds.

(19.8MB MPEG4 movie, 8 minutes)

Current Research

To date we have assembled a system composed of a high performance binocular platform mounted on a robotic arm coupled with a DSP network capable of field-rate computation of visual error. The purpose of the system is to provide full 3-dimensional redundant tracking of a moving object. The goal is to move toward an object-centered representation in the presence of relative motion between the object and the observer. The camera pan axes maintain vergence on the object while the whole head is panned and tilted by the robot arm. The fast camera vergence and version movements are complemented by the relatively slow head (neck) rotation which continually recenters the cameras in the middle of their pan range. Similarly, the fast tilt of the head/arm joint is supplemented by a slower vertical movement of the arm which moves the head up or down and so brings the tilt back into the horizontal plane. A unique degree of freedom is the lens zoom which is used to keep the image of the tracked object at a constant size. A slower horizontal movement of the arm brings the head closer to or farther from the object so that the zoom can return to the middle of its magnification range. In this way, the system provides a full set of redundant but slower degrees of freedom for vergence, tilt, and zoom, thus keeping these fast degrees of freedom ready to respond to rapid changes. In addition, the combination of the redundancies in all three degrees of freedom allows for more than animated movement, the redundancies provide for the assumption of alternative configurations when the tracker is confronted with a singularity in the plant or an obstacle in the world.

Tracking is not an end in itself. We are interested in creating utilities that simplify subsequent processing; hence, we are interested in measuring the performance not in terms of the accuracy of the positioning of the sensors but, instead, in terms of the quality of the image properties required by the subsequent processes. Even though capture is maintained and the target does not leave the field of view of the cameras, there is often considerable displacement of the target. Although less costly in image quality than loss of capture, this motion relative to the cameras also degrades the resulting image. The finite integration time of the camera smears the image. Performance is also degraded by other external influences such as specular reflections. All paths and configurations are not equal in terms of the resulting image quality even though the positional errors are comparable.

Using the same error signal that will be sent to the positioning system during the next cycle, a window can be established around the target in the current image. This subimage is the result of the tracking process that is sent to the higher level mechanisms. This electronic redundancy provides a cushion that alters the dynamic requirements of the positioning system. To the extent that the consequences of low amplitude, high frequency slip (such as blur) can be tolerated, the positioning system can be designed to compensate for high amplitude, lower frequency elements of the targets motion.

(381kB MPEG movie; 27 seconds elapsed time, digitized at 7 frames/second)

In this sequence, the current estimate of the visual servoing metric is used to position a window on the target. The combined mechanical and electronic redundancies result in exceptional stability of the GRASP logo as the target moves in three dimensions.

One goal of the research we are pursuing is to provide an answer to the following question: Given knowledge of the plant, the requirements of the subsequent task(s) and velocity and acceleration bounds on the target, how great a magnification (how fine a target resolution) is consistent with the guaranteed maintenance of acquisition.

Binocular Camera Platform Home Page

Ulf M. Cahn von Seelen

Ulf Cahn von Seelen received an MS in Computer Science (Diplom-Informatiker) from the University of Karlsruhe, Germany. He currently is pursuing a Ph.D. in Computer Science in the GRASP Lab at the University of Pennsylvania. His thesis work is on performance evaluation and optimization of active vision systems.

Brian C. Madden

Brian Madden received a BSEE at Tufts University, an MSEE at Northeastern University, and an MA and Ph.D. at the Center for Visual Science at the University of Rochester. His thesis was a computational model of visual acuity. He was a Postdoctoral Fellow in the Computer Science Department at Rochester following his graduate work. On the industrial side, he was a hardware design engineer for Digital Equipment Corporation and worked on the design of programming languages at IBM's Watson Research Center. He has also worked on image quality research at Eastman Kodak and Xerox Corporations. He has been a Research Associate at the GRASP Lab since December, 1992.

July 23, 1996