Structure from Motion using Differential Invariants of Optical Flow.
Computer vision is concerned with inferring information about the three-dimensional (3D) world from two-dimensional (2D) images. The human visual system is adept at discerning quantities such as depth and motion, helping us interact with our environment without needing to come into direct contact with it [15]. It is hence desirable to emulate this proficiency observed in nature. Of particular interest is the role that visual motion plays in extracting information about our surroundings. Even in the absence of stereo, the monocular observer can still determine scene structure through movements, this in fact forms the basis of active vision [6]. The focus of this research is on image motion or optical flow analysis, especially the extraction of the first-order differential invariants of image velocity using a correlative filtering method.
Optical flow is an approximation of a scene's 2D motion field, typically derived from an image sequence. Two-dimensional image motion fields comprise the projection of the 3D velocities of objects in 3D space onto a 2D image plane. These velocities may be resultant from viewer movement (or ego-motion), movement of scene objects, or a combination of both. Optical flow is only an approximation of the image motion field since assumptions are made about the lighting and texturing of surfaces. Assumptions of static light sources or adequate scene texture can result in the presence of optical flow in places of zero motion and vice versa [15].
There has been much work done in the area of optical flow computation [3]. Most of the recent published literature in this field focuses on improving the robustness and accuracy of existing optical flow algorithms rather than developing new approaches [4,8,2,23]. In general, optical flow determination algorithms can be categorized into three approaches: intensity-based methods, energy-based methods, and correlation-based methods [5].
Gradient methods such as that by Sobey and
Srinivasan [21,22]
compute image velocities by calculating the spatial and
temporal derivatives of image intensities, assuming that images are
differentiable. Typically they involve finding the solution to an
overdetermined system of linear equations where one constraint is the
optical flow constraint, defined in Equation 1,
| (1) |
Correlation or matching methods require regions or features to be tracked between images of an image sequence. Such methods are appealing when accurate differentiation of the image is impractical due to noise. Thus the recovery of the motion field is similar to solving the correspondence problem where point trajectories are interpreted as instantaneous velocity vectors. From this information, scene reconstruction can be treated as a classical projective geometry problem. However, unlike the previous methods, matching methods need to assume rigid body motion, and encounter problems when the scene contains several moving objects or occlusions [5].
A further approach to optical flow computation is to use a phase representation as done by Fleet [14]. Most algorithms, by assuming that image intensity is not time varying, approximate image motion as pure image translation. Fleet argues that the dynamics of an image's phase contours is a better approximation to the motion field. It is proposed that a phase based approach need not assume pure image translation and performs well under image contrast variations and geometric deformation due to perspective.
Once optical flow has been determined, its applications are numerous. The most common uses for optical flow fields are ego-motion determination [7,12,24] and 3D scene reconstruction. Further applications include motion detection [27], object segmentation [20], motion compensation, and tracking [25,13]. There are however, further properties of motion fields themselves, such as their divergence, vorticity, and deformation, that also provide us with extensive 3D scene information.
It has been shown that the change in shape of objects caused by relative motion between an object and observer can be decomposed into divergence, curl, and deformation components [16,17]. These three components can be geometrically interpreted as isotropic expansion, rigid rotation, and shear distortion 1. Divergence, curl, and deformation are called the first-order differential invariants of image motion fields, termed thus because their values are independent of the choice of coordinate system and viewer rotations about the projection centre. Moreover, these properties are directly related to 3D scene structure and ego-motion, and can be determined through their affect on scene geometry. Through these relationships, these three quantities can be used to derive information about surface orientations and time-to-contact; two quantities which are useful in obstacle avoidance problems [18,26].
Cipolla and Blake [10,11] have done work on how to derive surface orientation and time to contact from divergence and deformation information. They measured the change in the apparent area of objects to compute the divergence and deformation. This was done with B-spline snakes, which could prove problematic in the absence of trackable features. Work has also been done by Nelson and Aloimonos [19] on the use of divergence for obstacle avoidance, deriving it mathematically but needing to use many images over time to produce good results. The time-to-crash detector implemented by Ancona [1] utilizes optical flow rather than using divergence as in Cipolla and Blake [9].
Although there has been much work in optical flow determination, as well as research into the usefulness of the geometrical properties of optical flow (such as divergence, curl, and deformation), there little work one how to connect the two. Apart from Cipolla's closed-curved tracking method, there are no other well documented methods for deriving the differential invariants of image velocity. This research is therefore aimed at finding new ways to determine the differential invariants of optical flow fields from input optical flow data. It is hypothesized that this can be done using a simple filter correlation technique similar to that of signal deconstruction. Some preliminary experimentation using small images and filters has produced promising results. It is hoped that this new method can take advantage of the extensive research in motion field determination, whilst contributing to research in areas such as structure from motion.
| Date | Task |
| Apr 2004 | Complete literature review |
| Jul | Complete implementations of popular optical flow algorithms |
| Nov | Complete comparisons of these algorithms |
| Dec | Design of appropriate div, curl, and def filters |
| Jan 2005 | Start implementation of div, curl, and def extraction algorithms |
| Feb | Experimentation over different datasets |
| Apr | Investigate the affect of scale and noise |
| Oct | Start experimentation with scene reconstruction |
| Jan 2006 | Start thesis composition |
| Jun | Start thesis review |
| Aug | Thesis submission |
The first stage of this project will involve a comparison of the many optical flow determination algorithms found in contemporary literature. Implementation of these popular algorithms is needed for the evaluation of their performance over different data sets. Issues of importance include how these algorithms handle boundary discontinuities from occlusions, multiple moving objects, and transparency. Both real and synthetic data will be used to assess the accuracy and reliability of these algorithms whose optical flow outputs are needed for the next stage in the research.
The main part of the research involves the design of proper filters for correlations with the output motion field data from suitable algorithms discovered above. Filters need to be designed for each component of divergence, curl, and deformation. Correlation calculations determine the similarity between the filter and the underlying motion field. This way we can deduce how many `units' of say, divergence, exist in a particular motion field. This is an alternative way to determine differential invariants without needing to track curves or areas, and uses existing optical flow techniques.
There are however, several challenges that need to be overcome whilst designing correct filters. The first is the issue of scale. It will be necessary to define what a `unit' of divergence, or curl, or deformation is and then create a bank of filters comprising ones of different scale to correctly deduce the differential invariants. Once deciding upon different filter scales a method will be needed to some how combine these filters together so that they can be applied to motion fields.
Something that has always troubled motion field analysis, in fact almost any real world application, is the presence of noise. The effect of noise will be first encountered in the extraction of motion fields from image sequences, and then once again when trying to correlate the filters from noisy motion fields. An analysis is needed to examine the robustness and noise resistance of a correlative method for determining divergence, curl, and deformation, and any improvements it has over existing curve and area tracking methods.
The reconstruction of 3D scene structure requires the calculation of ego-motion and surface orientations. These quantities can be found from the divergence, curl, and deformation of the motion fields by first calculating the slant and tilt of surface normals. A possible issue with dealing with tilt quantities will be defining an `origin' for the camera coordinate system in which to measure tilt from. This is not a problem for slant since it is only dependent on the viewing direction.
My supervisor and I have conducted literature searches and found no existing research that duplicates this project.