Brian C. Madden, Ph. D.
Department of Dermatology
Box 697
601 Elmwood Avenue
Rochester, NY 14642
USA
(585) 275-4526
Brian_Madden@urmc.rochester.edu
( ↑ Yes, that is an underscore, ... don’t ask.)
Proprietor
I am an electrical engineer with a 40+ year history of research in vision. Interesting opportunities of all sorts have come and gone over the years, but somehow I have always found my way back to study vision. The experiments and systems I’ve developed over the last decade stemmed from a desire to extend this study into medical imaging. That desire resulted in the creation of The Skin Appearance Laboratory here in the department of dermatology. The facilities of the laboratory and the work being accomplished there are described in the Research section. What brought me here is summarized below.
Additional material is available by following the thumbnail image links.
Firing up the WABAC Machine:
RCA, Aerospace Communications and Controls Division
In the summer after my junior year in high school as part of the
Thayer Academy Summer Science Program, I worked at a division of RCA that was
an Apollo Program subcontractor. There I wrote programs to simulate heat
diffusion of the Lunar Rover electronics to account for convective, conductive
and radiative transfer under ambient conditions covering both transit and lunar
environments. The hard part wasn’t encoding the algorithm. The hard part was minimizing
the number of reversals of the magnetic data tape. The RCA 301 computer used
the sort of vacuum-buffered magnetic tape drives that ran the risk of
stretching the tape every time the tape direction was reversed. A stretch in
the tape would very likely cause an end-of-record mark to be missed and crash
the long-running simulation.
After gluing together so many space rocket kits growing up, it was a real treat to see physical models of proposed NASA hardware in the lab. Some projects fade from memory and you often don’t know what happened to them. I always know just where to look for this one: overhead in the Apollo 15-17 lunar parking lots. And with the new Lunar Reconnaissance Orbiter images, we should all be getting a fresh look at these historic vehicles.
Tufts University
I obtained
my first exposure to vision research working on eye movement psychophysics in
Sam McLaughlin’s lab at Tufts. Sam was interested in the non-surgical treatment
of strabismus in children. Using red and green anaglyph glasses and a custom
projector in a dark room, exercises could be performed to strengthen the
alignment capability of the children’s eyes. In support of this work, he
developed eye movement models that investigated the effects of parametric
readjustment. It is a pleasure to see his theory having a renaissance in eye
movement circles today, now forty years on.
I remember when Sam gave me an early Campbell and Robson linear systems physiology paper and told me I should read it ‘because these guys got it right’. Working in this laboratory was my first real exposure to empirical science in the wild (there were no answers in the back of the book), and it is what got me hooked on vision.
Digital Equipment Corporation
DEC was a wonderful place to work. For
a design engineer it was an environment where it was possible to do creative
engineering. This was made possible in no small part because (back then) the
company was run by engineers. George Fligg was an experienced electrical
engineer who also worked in Control Products and he often served as a mentor. He
would give me advice such as never work for a company that doesn’t do what you
do (at 22, it took me years to fully appreciate the implications of this), and
wear serviceable pants, you never know what you might need to crawl under (this
point became evident straightaway in a time when engineers wore jackets and
ties and all the floors were soaked with lanolin from the days when the Assabet
Mills site housed the largest woolen mill in the world). He would acknowledge a
neat design but then make me understand that while an impulse created by
running the output of a D-flop back into the reset might reduce chip count, it
wasn’t a function the device specification supported and therefore couldn’t be
used in a product.
After cutting my teeth creating some M-Series modules, I worked on the hardware design for a computerized direct numerical control (DNC) for machine tools. The minicomputer, a box the size of a medium-sized suitcase with flashing lights and toggle switches on the front panel, which controlled everything was a PDP-8/L. The DNC could drive two Bridgeport milling machines doing circular interpolation while timesharing with the parts programmer creating new designs on the teletype console. All this with 4K of core (okay, they were 12 bit bytes). When hooked up to two massive Behrens punch presses on the production floor moving at 300 inches per minute, it was a force of nature. This led into the first project for which I assumed responsibility, a redesign of the PDP-14. Digital Equipment Corporation was indeed a wonderful place to work, but nothing lasts forever (DEC, Compaq, Hewlett Packard ... sigh).
Adams-Smith, Inc.
Adams-Smith
built a variety of custom digital instrumentation products for video, control
and measurement applications. Working there gave me an inside view of a small
(four man) startup. It was an education in what it took to design without a net.
There wasn’t much of a margin for error. A bad mistake could have sunk the
company. As my education continued, I eventually learned to interpret (but
never to properly speak) Australian.
On an historical note, I was given the chance at Adams-Smith to design with the Intel 4004 a few months after it was announced. I remember liking the device simply because it would reduce the physical size of the product significantly. Who knew? It wasn’t like the thing came with a label: This device will change the world as you know it.
IBM Research
At IBM I
worked in the Experimental Systems Group at Yorktown Heights on the study of
programming language design. The goal was to examine naturally occurring
programs and to analyze them using the framework of Fillmore’s case grammar. I
ended up collecting as many versions I could find of recipes for Quiche
Lorraine, Beef Stroganoff and Sukiyaki. The recipes were parsed to extract
actions, objects and modifiers. The analysis was extended to 50 different recipes
from the Joy of Cooking and the resulting cooking lexicon was presented to
undergraduates at a local university so their judgments of word associations
could be used to extract the intrinsic relations present in particular recipes,
and for cooking in general.
While at IBM, I also worked on Query By Example, which later, much to my surprise, evolved into a product and showed up in an IBM Super Bowl ad. By then, however, I’d left and gone back to school to study vision. I had come to understand that intuition about language was not my strong suit (and the philistines at Yorktown Heights had paved over the clay tennis courts).
University of Rochester
The Center for Visual Science was a
place where you could learn biological vision from soup to nuts (especially if
you were a graduate student there as long as I was). Among the many things I
worked on, from Necturus eyecup preparations to ideal
spatial patterns to
psychophysical scaling of beauty and beer, there are two projects that stand out in my memory.
What started as a proposal for a physiological-based binocularity metric developed into a characterization of nonlinear cell responses in cat visual cortex with fellow graduate student Mike Mancini. In addition to traditional bar, edge and grating stimuli, we applied Wiener kernel analysis to cells in V1 and V2. Once, we held onto a cell for ten hours. We ran every test we had, three times over. To the best of my knowledge, these experiments produced the first white noise analysis to overcome the increased inhibition that is evident in mammalian cortical recordings.
In some ways this technique gave a very different view of V1 cell responses from that in the popular canon. For example, ‘nonlinear’ Y-cells exhibited a substantial first order kernel. One could interpret the nonlinear full-wave rectified subunit activity as a way for those cells to raise more of the linear component of the response above the threshold cutoff, a form of Spekrijse’s linearization process. It certainly would seem much easier to remove a nonlinear DC shift at a later stage than to cobble together two matched, mirror-symmetric, thresholded ‘linear’ X-cells for every orientation and spatial frequency and phase at every position across the visual field. The validity of any classification depends on how you define nonlinear or what comprises, in fact, an essential, irreversibly distorting nonlinearity to the function of those cells viewed in the context of the totality of cortical visual processing.
In
my thesis, I proposed a model of spatial visual acuity based on data obtained
from animal electrophysiology and human psychophysics. The model succeeded in
explaining the substantial differences observed to occur among the various
measures of spatial acuity. According to my theory, the visual system can
localize stimuli only down to labeled regions that are, at least, several
minutes of arc wide. This labeling forms the substrate of the absolute position
sense. Other measures of spatial acuity (relative positional judgments) have
limits that are one (two-line resolution) and two (localization hyperacuity)
orders of magnitude finer because they allow the coarse positional labeling to
be supplemented by the detection of changes in contrast within limited bands of
spatial frequency. It is the use of an intensive dimension (contrast) as a
supplement to a labeled dimension (location) that allows augmentation of acuity
by visual abilities that are not part of the absolute position sense to elicit
this exceptional spatial performance.
The failure of traditional theories to reconcile the differences in observers' ability to detect positional displacements with some form of contrast sensitivity measure occurs because the discrimination sensitivity of an array of contrast sensitive filters varies with the configuration of the different acuity targets as well as with changes in position. This combination of sources of stimulation precludes any fixed, and obvious, transformation between contrast and position. Transformations that vary with stimulus configuration are still able to support the relative positional judgments of two-line resolution and localization hyperacuity tasks as long as the responses are monotonic with position over the range of positional variations being compared. Hyperacuity stimuli get an extra boost in sensitivity because the minutes-of-arc separation of their features produce undulating spatial frequency spectra that when positionally displaced stimulate bandpass filters operating at threshold performance. By incorporating this interdimensional synergy between positional labeling and contrast sensitivity, I was able to model the wide range of interactions present in the two-line resolution and hyperacuity literature. The model required nothing more than coarse, minutes-of-arc positional labeling and the same bandpass contrast sensitivity filters required for the detection and discrimination of sinewave gratings.
While all this research was winding up, and prior to my defense, I had the good fortune to be invited to give a presentation at the Rank Prize Funds Symposium on Biological and Engineering Aspects of Visual Hyperacuity and Depth Perception that took place in January, 1984. Looking out over the hall at the University of Cambridge, I discovered that practically every living researcher cited in my thesis was there. With this experience behind me and the right guidance, the actual defense was somewhat anticlimactic.
Following my defense, I started a postdoctoral fellowship in the computer science department at the University of Rochester. There I had the opportunity to begin my studies in computer vision and robotics. In particular, I began work on translating the results I’d learned in biological vision to computational vision.
Xerox Research
At
Xerox, the research involved the study of visual psychophysics examined through
the lens of xerography. There I learned about the relation between the
structure of images, the devices that made them, and the visual system. In particular,
I learned the practical effects of an imaging chain from incident light to
perception. The research looked at how tiny piles of charged toner form letters
and images and how they appeared not only at threshold performance levels of
the human visual system, but how those percepts varied at suprathreshold
contrast levels.
Kodak Health, Safety and Human Factors
At Kodak,
the project was to study the imaging characteristics of film. In this work, a
different portion of the imaging chain was examined – image formation on slide
film and the interaction of film and projection systems. I examined film
structure by exposing and developing thousands of slide images of different
contrast test patterns. Combinations of sinewave and squarewave gratings of different
spatial frequencies were obtained under various exposure conditions using my
4,096-line horizontal resolution Vernier target display I designed for my
research at the university. After developing the film and bringing it back to
Kodak, the film structure was subsequently examined using a densitometer that
was the size of a piano. After 100 years in business, this project was part of
an effort by Kodak to break out of the thinking that arose from their
vertically-organized business structure and to optimize more than individual
components of the photographic process.
Xerox Development
Back
at Xerox, I worked on introducing an area sensor into the imaging chain for a
simulation of a laser scanner. This modification allowed the perceptual
consequences of different product designs to be examined prior to fabrication. In
particular, it allowed the study of the magnitude of distortions caused by
positional noise during scanning where different sized regions of the image
were simultaneously captured. The goal of the simulation was to increase scan
efficiency while moving the spatial frequency properties of any artifacts to
regions of lesser visual sensitivity.
Boeing, Helicopter Division, Advanced Computing Technology
At
Helicopters, I worked as an artificial intelligence specialist. There I wrote
Boeing’s 5-year technology forecasts for image processing and for robotics. On
the plus side, going down to NIST three days a week to work on Y14.5
dimensioning and tolerancing standards was both an education and a pleasure. I
also designed a pilot study that tested whether a Cartesian robotic gantry
controlled by a vision sensor could be used to assist in automated mark-up
during composite assembly.
There was so much opportunity at Helicopters, yet there was also a very deep and underappreciated need for the application of technology to the manufacturing process. It must be said that Dilbert’s world was alive and well at Boeing – pointy-haired bosses, cubicles and all. It is easy to understand the stories of Thomas Pynchon, who worked at Boeing earlier on, making a tent on his desk with D-size blueprints so as to have a place to hide. Fortunately for me there was Haim, who kept me both caffeinated and sane.
University of Pennsylvania, Grasp Lab
Extended
Intensity Range Imaging (EIRI) came from work 15 years ago in the robotics lab
at the University of Pennsylvania. It was based on a model of light adaptation
I developed 15 years earlier for my thesis (but didn’t need to use). The
adaptation model was based on physiological data of photoreceptor responses,
much of it surprisingly overlooked in the psychophysics literature. Its
implementation morphed over the years from Data General to PDP-11 assembly
language to APL, FORTRAN, C, and finally to Matlab. EIRI happened along one day
while I was playing with the adaptation model on a workstation in the lab. The
idea came into existence during the few hundred milliseconds between someone
looking over my shoulder and asking ‘What good is that?’, and the response ‘Well,
you could ...’.
The technique involved capture of multiple images, each at a different exposure setting. The set of images obtained in this way could then be fused into a single representation with the pixels taken from multiple images, each pixel adjusted in value to reflect the variation in their capture sensitivities. Pixel values in the composite representation were selected from the input image with the highest sensitivity for which that pixel was not saturated. This fusion resulted in the most accurate (smallest quantization) composite view, an image with an arbitrarily large dynamic range. The result was effectively a floating point image. Such a capability certainly would be adequate to accommodate even the million-to-one luminance difference present between the representations of reflectance in dark shadows and on luminous sources, both common conditions in physical scenes (q.v. thumbnail image link above). The development of composite capture was the leading edge of the move toward representing the dynamic range of real scenes on more limited paper and electronic display modalities. It was an accommodation to the equipment of the day, but the need for artifice in the redistribution of luminance values will lessen as capture and display technologies catch up with the performance imposed by viewer demand and provided by innovation.
The
gestation of PennEyes was somewhat longer than it took to conceive of EIRI. The
binocular platform had its roots in a DoD-funded grant to HelpMate Robotics, Inc.
in partnership with several universities with the goal of deploying similar
hardware to multiple laboratories. With a standard unit built with
state-of-the-art components, it was intended that more time could be spent
researching applications. The head was fabricated and distributed by HelpMate. Although
the binocular platform incorporated two CCD cameras and two lenses with
motorized zoom, focus and aperture, it was still light enough to be supported
by a PUMA 560 arm.
PennEyes was designed to be a three-dimensional visual servo. The binocular platform provided high performance rotation for each camera. The primary vergence and version capabilities were exceptional, with a peak velocity of 1000 degrees per second and a peak acceleration of 12,000 degrees per second². The head itself was supported by a six degree of freedom robotic arm. One of PennEyes’ main characteristics was the capacity for 3D tracking redundancy. In a tracking application, performance was optimized through the initial activation of the most responsive positioning axis combined with supplemental or compensatory movements of the slower axes. Depending on the task requirements, supplemental motions could be used to increase performance while compensatory motions could be used to keep the device centered in its range to better accommodate future tracking requirements.
In addition to the degrees of freedom in the motion of the head and arm, PennEyes also incorporated optical and electronic degrees of freedom for tracking. With the motorized control of lens zoom, it was possible to compensate for relative motion in depth between the head and target and thereby maintain a nearly constant target size in the acquired images. By stabilizing the target in the displayed image frame by electronically shifting the target, it was possible to compensate for tracking errors in all three dimensions. While this capability enabled the position of the moving target to be stationary in the image, that very stabilization had the effect of enhancing the subjective visibility of the tracking error distortions such as motion and defocus blur. Other changes such as specularities and luminance variations were also decidedly more apparent with the stabilized targets than when the tracking error and background context were left in. Alternate path and configuration choices were found to affect tracking and image quality metrics very differently.
A different type of redundant tracking involving a robotic arm was developed for a manipulator on a hovering subsea craft for the Deep Submergence Lab at Woods Hole Oceanographic Institute. The goal was to track moving objects and to combine that information with the arm kinematics to calibrate the lenses and cameras on the vehicle. Once calibrated, an imaging system with multiple cameras could then be used to provide stereo vision and to locate objects in the real world, wet or dry.
Part of the effort involved enhancing existing algorithms that computed the location of the centroid of a spherical object mounted a robotic until it could be consistently located in an image to within 0.01 pixel. With this level of visual performance, the system was able to tract a moving target with an accuracy of better than 3 millimeters even in the presence of occlusion of any one of the three cameras – a common occurrence in the ocean.
Postscript: A casual comment led to a wager that took the better part of a summer to achieve, but in the end the centroid localization was improved by another order of magnitude, to 0.001 pixel.
The
Binocular Camera Platform Home Page grew out of the PennEyes project and
provided a resource to a growing community of roboticists. To maximize
visibility, it was located on The Computer Vision Homepage website maintained
at CMU. The principal resource was a fairly complete listing of the motorized
binocular camera platforms that appeared in the literature. The listing
contained a description of the major components of each head and any available
email contacts for the associated researchers. The site also listed sources for
the various components required to build a platform – cameras, lenses, position
controllers, software, and commercial systems. Of all places, I found a perfect
candidate to be the patron saint of mobile robotic heads, St. Denis, on a door
of the Notre Dame Cathedral in Paris while on the way to a robotics conference
in the French Alps courtesy of the EU.
Building
on the Active Vision paradigm, Hany Farid and I proposed a technique to use
controllable sensors in a manner that would provide both an efficient and
effective method of telepresence. The paper proceeds to describe in detail
problems of occlusion, lighting and focus. A solution is presented that
acquires the necessary images with active vision and generates the required
scene using focus ranging to compute depth information and image stitching and
warping to compose the views. It is demonstrated that a sparse array of active
cameras could provide a sufficient source of information to allow a remote
viewer to become perceptually immersed in a distant scene to a degree that
applications such as telemedicine would become viable.