ROBOTICS AND VISION RESEARCH GROUP
Completed Research Theses
An Investigation into Energy Feature Detectors - Svetha Venkatesh (PhD) This work involved the study of the local energy feature detection model and its comparison with other feature detectors. In particular projection properties were compared in which it was found that energy edge detectors are true projections; thus, in contrast to the Marr-Hildreth operation, for example, the energy feature detector was proved to be idempotent with respect to projection. The use of this detector in classifiying 1D and 2D features was also examined.
Intelligent Robot Navigation - Mark Nelson (PhD) For an autonomous mobile robot to perform tasks and, more importantly, to survive in a dynamic environment, its control system must be capable of intelligent behaviour. This research attempted to achieve a better understanding of intelligent behaviour through the development of an intelligent navigational control system for a mobile robot. Questions investigated were: 1) What are the components of such a system? 2) How do these components interact? 3) How is such a system developed so that intelligent behaviour may evolve?
On the Use of Kinematic Redundancy in Robots - Lisa Beckley (MSc) Robot manipulators generally follow fixed motion commands and in most applications, potential obstacles are removed from the manipulator's workspace. This research involved the study of the control of kinematically redundant manipulators. The primary method of control used the pseudoinverse of the Jacobian matrix, which allows the robot to reconfigure away from obstacles. By extending the Jacobian matrix we were able to control the manipulator when the end effector is driven beyond its maximum reach or through singularities. Genetic algorithms were employed to generate a set of possible obstacle-avoiding configurations that satisfied a specified end effector position, and to generate configuration space mappings of obstacles.
Exploring and Utilising Features in Natural Images - Brian Aw (PhD) Edge detection, or more generally, feature detection, has been widely known as an important process in early vision analysis. The phase congruency feature model, proposed by Morrone and Owens in 1987, predicts the accurate location of features in images perceived by the human visual system. To achieve efficient computation, the phase congruency model is implemented using the local energy model. This thesis has shown the discrepancy between the two models and has concluded that the local energy model is a Taylor approximation of the phase congruency model. The conditions which the 1-dimensional implementation of both the phase congruency model and the local energy model must satisfy to achieve stability in the location of feature points with respect to the chosen pixel grid orientation in an image are also studied in this thesis. Unlike the phase congruency model and the local energy model which concentrate on locating feature points, this thesis takes the detection of image features a further step by examining the luminance behaviour of a feature in the local neighbourhood of the point detected by the models. A neural network has been devised to perform the analysis. Various results are obtained. A graded step feature has been found to be the predominant feature in most images. Features are also similar across a class of images and across spatial scales. While not all feature types are present at all scales of an image, features of a more complex image are found to adequately represent features of a simpler image. The last item of the above-mentioned results has a useful application in the area of image compression and reconstruction. Pointers to a common catalogue of features are maintained for every feature point of an image, and reconstruction is done via the coded pointers. The encoded features are found to contain such rich information that even the smooth, featureless regions can be reasonably reconstructed through the propagation of the luminance function from the feature regions. Comparisons with the well-known JPEG image compression technique have been made. The results show the different emphasis of the two techniques: the compression technique proposed in this thesis concentrates its resources in the feature regions whereas the JPEG method tries to achieve overall minimum error rate. Therefore the compression ratio attainable by the method proposed in this thesis depends on the complexity of features in an image. In an (almost) extreme case where the image contains only a small black character in the centre of a white background of 256x256 pixels, this method achieves a compression ratio of 182:1 with error free reconstruction at the character's features, while the JPEG method obtains a ratio of 56:1 with 33% of error at the features.
Feature-Based Stereo Vision on a Mobile Platform- Du Huynh (PhD) It is commonly known that stereopsis is the primary way for humans to perceive depth. Although, with one eye, we can still interact very well with our environment and do very highly skillful tasks by using other visual cues such as occlusion and motion, the resultant effect of the absence of stereopsis is that the relative depth information between objects is essentially lost. While humans fuse the images seen by the left and right eyes in a seemingly easy way, the major problem -- the correspondence of features -- that needs to be solved in all binocular stereo systems of machine vision is not trivial. In this thesis, line segments and corners are chosen to be the features to be matched because they typically occur at object boundaries, surface discontinuities, and across surface markings. Polygonal regions are also selected since they are known to be well-configured and are, very often, associated with salient structures in the image. The use of these high level features, although helping to diminish matching ambiguities, does not completely resolve the matching problem when the scene contains repetitive structures. The spatial relationships between the feature matching pairs enforced in the stereo matching process, as proposed in this thesis, are found to provide even stronger support for correct feature matching pairs and, as a result, incorrect matching pairs can be largely eliminated. Getting global and salient 3D structures has been an important prerequisite for environmental modelling and understanding. While research on postprocessing the 3D information obtained from stereo has been attempted, the strategy presented in this thesis for retrieving salient 3D description is transferring the prominent information extracted from the 2D images to the 3D scene. Thus, the matching of two prominent 2D polygonal regions yields a prominent 3D region, the inter-relation between two 2D region matching pairs is passed on and taken as a relationship betweentwo 3D regions. Humans, when observing and interacting with the environment, do not confine themselves to the observation and then the analysis of a single image. Similarly, stereopsis can be vastly improved with the introduction of additional stereo image pairs. Eye, head, and body movements provide essential mobility for an active change of viewpoints, the disocclusion of occluded objects, the avoidance of obstacles, and the performance of any necessary tasks on hand. This thesis presents a mobile stereo vision system that has its eye movements provided by a binocular head support and stepper motors, and its body movements provided by a mobile platform, the Labmate. With a viewer-centered coordinate system proposed in this thesis, the computation of the 3D information observed at each individual viewpoint, the merging of the 3D information at consecutive viewpoints for environmental reconstruction, and strategies for movement control are discussed in detail. Download PhD thesis - 2.7M gzipped postscript.
Surface Modelling and Surface Following for Robots Equipped with Range Sensors - Chris Pudney (PhD) The construction of surface models from sensor data is an important part of perceptive robotics. When the sensor data are obtained from fixed sensors, the problem of occlusion arises. To overcome occlusion, sensors may be mounted on a robot that moves the sensors over the surface. In this thesis the sensors are single-point range finders. The range finders provide a set of sensor points, that is, the surface points detected by the sensors. The sets of sensor points obtained during the robot's motion are used to construct a surface model. The surface model is used in turn in the computation of the robot's motion, so surface modelling is performed on-line, that is, the surface model is constructed incrementally from the sensor points as they are obtained. A planar polyhedral surface model is used that is amenable to incremental surface modelling. The surface model consists of a set of model segments, where a neighbour relation allows model segments to share edges. Also sets of adjacent shared edges may form corner vertices. Techniques are presented for incrementally updating the surface model using sets of sensor points. Various model segment operations are employed to do this: model segments may be merged, fissures in model segment perimeters are filled, and shared edges and corner vertices may be formed. Details of these model segment operations are presented. The robot's control point is moved over the surface model at a fixed distance. This keeps the sensors around the control point within sensing range of the surface, and keeps the control point from colliding with the surface. The remainder of the robot body is kept from colliding with the surface by using redundant degrees-of-freedom. The goal of surface modelling and surface following is to model as much of the surface as possible. The incomplete parts of the surface model (non-shared edges) indicate where sections of surface that have not been exposed to the robot's sensors lie. The direction of the robot's motion is chosen such that the robot's control point is directed to non-shared edges, and then over the unexposed surface near the edge. These techniques have been implemented and results are presented for a variety of simulated robots combined with real range sensor data. Download PhD thesis - 270K gzipped postscript.
Noncontact 3D Biological Shape Measurement- Youmei Ge (MSc) Many clinically important applications require measurements on a large portion of the human body surface that may not be visible from a single view. For example, a single view may be insufficient for the measurement of a complete facial surface for facial plastic surgery. Observing breast surfaces from multiple views is needed for accurate breast volume measurement. On the other hand, most 3D vision systems only recover 3D data from a single viewpoint, and the recovered 3D data are often incomplete due to the occlusion problem and thus cannot uniquely define the surface. A unique and more complete description of the surfaceis necessary for most applications such as measuring area or volume and finding the best 3D registration between corresponding surfaces. This thesis describes a structured light based system for fast and noncontact 3D measurement of the human body from multiple views. A particular application of our system is the study of human lactation through measuring the breast surface and volume. Fast, accurate, non-contact, and biologically safe measurement is the key requirement in our application. We use structured light to fulfill the requirement. Based on SHAPE, a single view structured light system developed at Monash University, our system for breast volume measurement generates more complete 3D information on object surfaces by observing the object from more than one viewpoint. The breast volume is computed using the integrated data from all views. We present a simple method that performs 3D measurement from multiple views simultaneously. Combined with a camera and a projector, a mirror is used in the method to create an additional viewpoint to recover the occluded regions that are illuminated by the light source but were previously invisible to the camera. Images from the two views, one directly seen by the camera and the other seen via the mirror, are taken simultaneously. We develop the method for the purpose of achieving more complete measurements without increasing image capture time, which is very useful in situations where both speed and accuracy are important. The complete 3D description of the surface of objects requires the acquisition of several images from different vantage viewpoints. Each image contains information on the part of the object that is visible from its viewpoint. A very important task consists in the integration of the information present in each view. We have developed a two view system to achieve a more complete breast volume measurement. The system uses a stationary sensor at each view. Our system can largely eliminate the occlusion regions produced by a single view system, and all data from different views are integrated into an object-centered coordinate system and resampled by a single parametric grid. The system has been used to accurately measure short-term changes in breast volume for lactating mothers. Currently, the system is also used to observe the breast volume change of pregnant women over many weeks' time. Download MSc thesis - 615K gzipped postscript.
Visual Guidance of Robot Motion - Li Fang Gu (MSc) Future robots are expected to cooperate with humans in daily activities. Efficient cooperation requires new techniques for transferring human skills to robots. This thesis presents an approach on how a robot can extract and replicate a motion by observing how a human instructor conducts it. In this way, the robot can be taught without any explicit instructions and the human instructor does not need any expertise in robot programming. A system has been implemented which consists of two main parts. The first part is data acquisition and motion extraction. Vision is the most important sensor with which a human can interact with the surrounding world. Therefore two stereo cameras are used to capture the image sequences of a moving rigid object. In order to compress the incoming images from the cameras and extract 3D motion information of the rigid object, feature detection and tracking are applied to the images. Corners are chosen as the main features because they are more stable under perspective projection and during motion. A reliable corner detector is implemented and a new corner tracking algorithm is proposed based on smooth motion constraints. With both spatial and temporal constraints, 3D trajectories of a set of points on the object can be obtained and the 3D motion parameters of the object can be reliably calculated by the algorithm proposed in this thesis. Once the 3D motion parameters are available through the vision system, the robot should be programmed to replicate this motion. Since we are interested in smooth motion and the similarity between two motions, the task of the second part of our system is therefore to extract motion characteristics and to transfer these to the robot. It can be proven that the characteristics of a parametric cubic B-spline curve are completely determined by its control points, which can be obtained by the least-square fitting method, given some data points on the curve. Therefore a parametric cubic B-spline curve is fitted to the motion data and its control points are calculated. Given the robot configuration, the obtained control points can be scaled, translated, and rotated so that a motion trajectory can be generated for the robot to replicate the given motion in its own workspace with the required smoothness and similarity, although the absolute motion trajectories of the robot and the instructor can be different. All the above modules have been integrated and results of an experiment with the whole system show that the approach proposed in this thesis can extract motion characteristics and transfer these to a robot. A robot arm has successfully replicated a human arm movement with similar shape characteristics by our approach. In conclusion, such a system collects human skills and intelligence through vision and transfers them to the robot. Therefore, a robot with such a system can interact with its environment and learn by observation. Download MSc thesis - 1.4M gzipped postscript.
The Detection of 2D Image Features via Local Energy - Ben Robbins (PhD) Accurate detection and localization of two-dimensional (2D) image features (or `key-points') is important for vision tasks such as structure from motion, stereo matching, and line labeling. 2D image features are ideal for these vision tasks because 2D image features are high in information and yet they occur sparsely in typical images.Several methods for the detection of 2D image features have already been developed. However, it is difficult to assess the performance of these methods because no one has produced an adequate definition of corners that encompasses all types of 2D luminance variations that make up 2D image features. The fact that there does not exist a consensus on the definition of 2D image features is not surprising given the confusion surrounding the definition of 1D image features. The general perception of 1D image features has been that they correspond to `edges' in an image and so are points where the intensity gradient in some direction is a local maximum. The Sobel, Canny and Marr-Hildreth operators all use this model of 1D features, either implicitly or explicitly. However, other profiles in an image also make up valid 1D features, such as spike and roof profiles, as well as combinations of all these feature types. Spikes and roof profiles can also be found by looking for points where the rate of change of the intensity gradient is locally maximal, as Canny did in defining a `roof-detector' in much the same manner that he developed his `edge-detector'. While this allows the detection of a wider variety of 1D features profiles, it comes no closer to the goal of unifying these different feature types to an encompassing definition of 1D features. The introduction of the local energy model of image features by Morrone and Owens in 1987 provided a unified definition of 1D image features for the first time. They postulated that image features correspond to points in an image where there is maximal phase congruency in the frequency domain representation of the image. That is, image features correspond to points of maximal order in the phase domain of the image signal. These points of maximal phase congruency correspond to step-edge, roof, and ramp intensity profiles, and combinations thereof. They also correspond to the Mach bands perceived by humans in trapezoidal feature profiles. This thesis extends the notion of phase congruency to 2D image features. As 1D image features correspond to points of maximal 1D order in the phase domain of the image signal, this thesis contends that 2D image features correspond to maximal 2D order in this domain. These points of maximal 2D phase congruency include all the different types of 2D image features, including grey-level corners, line terminations, blobs, and a variety of junctions. Early attempts at 2D feature detection were simple `corner detectors' based on a model of a grey-level corner in much the same way that early 1D feature detectors were based on a model of step-edges. Some recent attempts have included more complex models of 2D features, although this is basically a more complex a priori judgement of the types of luminance profiles that are to be labeled as 2D features. This thesis develops the 2D local energy feature detector based on a new, unified definition of 2D image features that marks points of locally maximum 2D order in the phase domain representation of the image as 2D image features. The performance of an implementation of 2D local energy is assessed, and compared to several existing methods of 2D feature detection. This thesis also shows that in contrast to most other methods of 2D feature detection, 2D local energy is an idempotent operator. The extension of phase congruency to 2D image features also unifies the detection of image features. 1D and 2D image features correspond to 1D and 2D order in the phase domain representation of the image respectively. This definition imposes a hierarchy of image features, with 2D image features being a subset of 1D image features. This ordering of image features has been implied ever since 1D features were used as candidate points for 2D feature detection by Kitchen and others. Local energy enables the extraction of both 1D and 2D image features in a consistent manner; 2D image features are extracted from the 1D image features using the same operations that are used to extract 1D image features from the input image. The consistent approach to the detection of image features presented in this thesis allows the hierarchy of primitive image features to be naturally extended to higher order image features. These higher order image features can then also be extracted from higher order image data using the same hierarchical approach. This thesis shows how local energy can be naturally extended to the detection of 1D (surface) and higher order image features in 3D data sets. Results are presented for the detection of 1D image features in 3D confocal microscope images, showing superior performance to the 3D extension of the Sobel operator. Download PhD thesis - 2.9M gzipped postscript.
Invariant Measures of Image Features From Phase Information - Peter Kovesi (PhD) If reliable and general computer vision techniques are to be developed it is crucial that we find ways of characterizing low-level image features with invariant quantities. For example, if edge significance could be measured in a way that was invariant to image illumination and contrast, higher-level image processing operations could be conducted with much greater confidence. However, despite their importance, little attention has been paid to the need for invariant quantities in low-level vision for tasks such as feature detection or feature matching.This thesis develops a number of invariant low-level image measures for feature detection, local symmetry/asymmetry detection, and for signal matching. These invariant quantities are developed from representations of the image in the {\em frequency domain}. In particular, phase data is used as the fundamental building block for constructing these measures. Phase congruency is developed as an illumination and contrast invariant measure of feature significance. This allows edges, lines and other features to be detected reliably, and fixed thresholds can be applied over wide classes of images. Points of local symmetry and asymmetry in images give rise to special arrangements of phase, and these too can be characterized by invariant measures. Finally, a new approach to signal matching that uses correlation of local phase and amplitude information is developed. This approach allows reliable phase based disparity measurements to be made, overcoming many of the difficulties associated with scale-space singularities. Download directory list PhD thesis - 14.7M gzipped postscript (in 10 parts).
Mobile Robot Navigation in a Semi-Structured Indoor Environment Using Passive Visual Beacons and Active Sensors - Rajiv Ellepola (MSc) In this thesis, a new technique for mobile robot navigation through a semi-structured, dynamic indoor environment is presented. The robot navigates through the environment using passive visual beacons mounted on the ceiling of the building. The overhead visual beacons are tracked by an upward pointing CCD camera mounted on the mobile robot. By mounting the beacons on the ceiling of the building, one ensures that the beacons lie in a fixed plane relative to the robot. This enables the position of the robot to be determined from the location of the beacons in the camera image. A network of coded beacons in the building are represented as a graph, allowing robot destinations to be specified by a user, and paths planned automatically by the robot. Ultrasonic sensors mounted around the perimeter of the robot are used to sense the environment of the robot. When obstacles are detected in the path of the robot, it is necessary for the robot to navigate around the obstacle to return to its original path of motion. However interpretation of ultrasonic sensor signals is difficult due to the wide beam width and false readings as a result of specular reflection. Specular reflection can also induce crosstalk errors in the sensor readings. A technique for minimizing these errors is presented in this thesis. When the robot detects an obstacle in its path of travel, a recovery procedure needs to be activated to avoid collision, and on completion of the recovery procedure the original motion of the robot has to be resumed. In this thesis, a robot motion control strategy known as recursive motion planning is implemented for the motion control of the mobile robot. Keywords: Mobile Robot, Visual Beacons, Ultrasonic Sensors, Obstacle Avoidance, Crosstalk, Robot Navigation, Recursive Motion Planning.
Visual Recognition of Hand Motion - Eun Jung Holden (PhD) Hand gesture recognition is an active area of research in recent years, being used in various applications from deaf sign recognition systems to human-machine interaction applications. The gesture recognition process, in general, may be divided into two stages: the motion sensing, which extracts useful data from hand motion; and the classification process, which classifies the motion sensing data as gestures. The existing vision based gesture recognition systems extract 2-D shape and trajectory descriptors from the visual input, and classify them using various classification techniques from maximum likelihood estimation to neural networks, finite state machines, fuzzy associative memory, or hidden markov models. This thesis presents the framework of the vision based Hand Motion Understanding (HMU) system that recognises static and dynamic Australian Sign Language (Auslan) signs by extracting and classifying 3-D hand configuration data from the visual input. The HMU system is a pioneer gesture recognition system that uses a combination of a 3-D hand tracker for motion sensing, and an adaptive fuzzy expert system for classification. The HMU 3-D hand tracker extracts 3-D hand configuration data that consists of 21 parameters representing the degrees-of-freedom of the hand from the visual input from a single camera. It does this with an aid of a colour coded glove. The tracker uses a model-based motion tracking algorithm that makes incremental corrections to the 3-D model parameters to re-configure the model to fit the hand posture appearing in the images through the use of a Newton style optimisation technique. Finger occlusions are handled to a certain extent, by recovering the missing hand features in the images through the use of a prediction algorithm. The HMU classifier, then, recognises the sequence of 3-D hand configuration data as a sign by using an adaptive fuzzy expert system where the sign knowledge can be used as inference rules. The classification is performed in two stages. Firstly, for each image, the classifier recognises Auslan basic hand postures that categorise the Auslan signs like alphabets in English. Secondly, the sequence of Auslan basic hand postures that are appearing in the image sequence is analysed and recognised as a sign. Both the posture and sign recognition are performed by the same adaptive fuzzy inference engine. The HMU knowledge base consists of 22 Auslan basic hand postures, and 22 signs. For evaluation, 44 motion sequences (2 for each of the 22 signs) are recorded and among them, 22 randomly chosen sequences are used for testing and the rest are used for training. The evaluation shows that before training the HMU system correctly recognised 20 out of 22 signs and for the three failed cases, the system did not produce output. After training, with the same 22 test sequences, the HMU system recognised 21 signs correctly and for two failed cases did not produce any output. Download PhD thesis - 2.1M gzipped postscript.
Clustering with Genetic Algorithms - Rowena Cole (MSc)
Clustering is the search for those partitions that reflect the structure of an object set.
Traditional clustering algorithms search only a small sub-set of all possible clusterings (the solution space) and consequently, there is no guarantee that the solution found will be optimal.
We report here on the application of Genetic Algorithms (GAs) - stochastic search algorithms touted as effective search methods for large and complex spaces - to the problem of clustering.
GAs which have been made applicable to the problem of clustering (by adapting the representation, fitness function, and developing suitable evolutionary operators) are known as Genetic Clustering Algorithms (GCAs). Download MSc thesis - 452K gzipped postscript.
Reinforcement Learning and Approximation Complexity - Matthew McDonald (PhD) Many tasks can easily be posed as the problem of responding to the states of an external world with actions that maximise the reward received over time. Algorithms that reliably solve such problems exist. However, their worst-case complexities are typically more than proportional to the size of the state space in which a task is to be performed. Many simple tasks involve enormous numbers of states, which can make the application of such algorithms impractical. This thesis examines reinforcement learning algorithms which effectively learn to perform tasks by constructing mappings from states to suitable actions. In problems involving large numbers of states, these algorithms usually must construct approximate, rather than exact solutions, and the primary issue examined in this thesis is the way in which the complexity of constructing adequate approximations scales as the size of a state space increases. The vast majority of reinforcement learning algorithms operate by constructing estimates of the long-term value of states and using these estimates to select actions. The potential effects of errors in such estimates are examined and shown to be severe. Empirical results are presented which suggest that minor errors are likely to result in significant losses in many problems, and where such losses are most likely to occur. The complexity of constructing estimates accurate enough to prevent significant losses is also examined empirically and shown to be substantial. The difficulties associated with constructing usefully accurate value estimates suggest that algorithms which avoid constructing such estimates and instead construct direct mappings from state descriptions to actions may be more reliable in practice. A novel algorithm based on this hypothesis is described and shown to outperform analogous value-based approaches using a collection of several tasks and approximation methods. Although the performance of the new algorithm is better than that of value-based analogues, it suffers from similar limitations. I argue that much of the complexity associated with these approaches is related to the fact that they are designed to construct optimal solutions. In many problems, optimal solutions may be difficult construct, and, as the relationship between the quality of value estimates and their usefulness is only loose, the complexity of constructing approximations to an optimal solution that are sufficiently accurate to be useful may also be high. For many problems, it may be substantially simpler to construct pessimistic estimates that systematically avoid over-estimating state values, and therefore provide assurance of a certain, typically sub-optimal, level of performance. An algorithm for constructing pessimistic value estimates for deterministic problems involving goal states is proposed, and experimental results are described which demonstrate that the complexity of the new algorithm may be markedly lower in practice than that of similar non-pessimistic algorithms when only minimally adequate solutions are required. Empirical results are also presented which show that when value estimates must be constructed from samples of a problem's states, estimates constructed by the pessimistic algorithm are often significantly more reliable than those constructed by comparable non-pessimistic algorithms. Finally, extensions to the pessimistic algorithm are described that enable it to provide bounds on the sub-optimality of its estimates, deal with non-deterministic tasks and tasks without goal states, and problems for which an a priori model of the world is not available. Download PhD thesis - 481K gzipped postscript.
Local Energy Feature Tracing in Digital Images and Volumes - Michael Robins (PhD) Digital image feature detectors often comprise two stages of processing: an initial filtering phase and a secondary search stage. The initial filtering is designed to accentuate specific feature characteristics or suppress spurious components of the image signal. The second stage of processing involves searching the results for various criteria that will indentify the locations of the image features. The local energy feature detection scheme combines the squares of the signal convolved with a pair of filters that are in quadrature with each other. The resulting local energy value is proportional to phase congruency, which is a measure of the local alignment of the phases of the signal's constituent Fourier components. Points of local maximum phase alignment have been shown to correspond to visual features in the images. The local energy calculation accentuates the location of many types of image features, such as lines, edges and ramps and estimates of local energy can be calculated in multi-dimensional image data, by rotating the quadrature filters to several orientations. The second stage search criterion for local energy is to locate the points that lie along the ridges in the energy map that connect the points of local maxima. In three-dimensional data the relatively higher energy values will form films between connecting filaments and tendrills. This thesis examines the use of recursive spatial domain filtering to calculate local energy. A quadrature pair of filters which are based on the first dereivative of the Gaussian function and its Hilbert transform, are rotated in space using a kernel of basis functions to obtain various orientations of the filters. The kernel is designed to be separable and each term is implemented using a recursive digital filter. Once local energy has been calculated, the ridges and surfaces of high energy values are determined using a flooding technique. Starting from the points of local minima we perform an ablative skeletonisation of the higher energy values. The topology of the original set is maintained by examining and preserving the topology of the neighbourhood of each point when considering it for removal. This combination of homotopic skeletonisation and sequential processing of each level of energy values results is a well located, thinned and connected tracing of the ridges. The thesis contains examples of the local energy calculation using steerable recursive filters and the ridge tracing algorithm applied to two and three dimensional images. Details of the algorithms are contained in the text and details of their computer implementation are provided in the appendices. Download PhD thesis - 2.6M gzipped postscript.
A Multiresolution Time-Frequency Analysis and Interpretation of Musical Rhythm - Leigh Smith (PhD) A fundamental notion of music is its organisation in time, namely rhythm. It is clear this time ordering is crucial for listeners to interpret music. This notion has defied formal description. The construction of computational models of music have shown considerable problems representing musical time, in particular, representing structure over time spans longer than short motives and the effects of expressive timing, its rubato. The new approach investigated here is to represent rhythm in terms of time varying frequencies of events. To do so, musical rhythm is cast into signal processing terms of amplitude and frequency modulation of rectified sound signals. This approach reinterprets a musician's performance into terms of a conceived rhythmic signal. The actual rhythm performed is then a sampling of that conceived signal. This is demonstrated as also applicable to MIDI signals. Morlet and Grossmann's wavelets are used to produce the time-frequency representation of the rhythmic signal. These have the property of well determining frequency change over time. When applied to rhythms, they explicitly represent the multiple time scales (music theoretical strata) as spectral components of a rhythmic signal. The application of this multiresolution analysis to example rhythms reveals and explains duration, agogic and intensity accents, accelerando and decelerando, rubato and grouping. The multiresolution analysis is then applied to the well-known problem of foot-tapping to a performed rhythm. Using a correlation of frequency modulation ridges extracted using techniques of stationary phase, dilation scale derivatives and local phase congruency, the tactus rate of the performed rhythm is automatically identified, and from that, a foot-tap rhythm is synthesised. This approach accounts for expressive timing and is demonstrated on rhythms exhibiting asymmetrical rubato and grouping. The research demonstrates the ability to use frequency domain analytical tools such as wavelets to produce formal measures of performed rhythms which match concepts from musicology and music cognition. This approach then forms the basis for future research in cognitive models of rhythm based on interpretation of the time-frequency components. Applications for this research include interpretation of rubato for note-event time editing, automated transcription and quantization. A future application is a real-time version for interactive performance and accompaniment.
On Evolving Modular Neural Networks - Rameri Salama (PhD) The basis of this thesis is the presumption that while neural networks are useful structures that can be used to model complex, highly non-linear systems, current methods of training the neural networks are inadequate in some problem domains. Genetic algorithms have been used to optimise both the weights and architectures of neural networks, but these approaches do not treat the neural network in a sensible manner. In this thesis, I define the basis of computation within a neural network as a single neuron and its associated input connections. Sets of these neurons, stored in a matrix representation, comprise the building blocks that are transferred during a one or more epochs of a genetic algorithm. I develop the concept of a Neural Building Block and two new genetic algorithms are created that utilise this concept. The first genetic algorithm utilises the micro neural building block (micro-NBB); a unit consisting of one or more neurons and their input connections. The micro-NBB is a unit that is transmitted through the process of crossover (and hence requires the introduction of a new crossover operator). However the micro-NBB can not be stored as a reusable component, and must exist only as the product of the crossover operator. The macro neural building block (macro-NBB) is utilised in the second genetic algorithm, and encapsulates the idea that fit neural networks contain fit sub-networks, that need to be preserved across multiple epochs. A macro-NBB is a micro-NBB that exists across multiple epochs. Macro-NBBs must exist across multiple epochs, and this necessitates the use of a genetic store, and a new operator to introduce macro-NBBs back into the population at random intervals. Once the theoretical presentation is completed the newly developed genetic algorithms are used to evolve weights for a variety of architectures of neural networks to demonstrate the feasibility of the approach. Comparison of the new genetic algorithm with other approaches is very favourable on two problems: a multiplexer problem, and a robot control problem. Download PhD thesis - 4.4M gzipped postscript.
A Temporal 3D-Registration Framework for Computer-Integrated Surgery - Bruce Backman (PhD) Traditionally, volumetric modalities such as CT and MRI have provided static snapshots of anatomy enabling insight into the progression of disease and to the severity of injury. Recently, 3D-registration algorithms, originating in the neurosurgical field, have been used to merge these images resulting in richer visualizations. However, in situations where trauma patients are unable to be moved or are at risk of infection, there have been comparatively few advances. This thesis presents a 3D-registration framework that supports longitudinal study of morphologic changes in surface images of the upper body based on an optical technique-structured light imaging. The framework incorporates soft-tissue deformation modeling to allow coordinate frame determination and specific point tracking required for applications of Computer-Integrated Surgery. The framework is implemented in three stages using a coarse-fine approach that separately addresses the different sources of registration error commonly found in temporal registration applications. The coarse stage defines seven thoracic fiducials that form a rigid body. A special anthropomorphic stand is designed and used to enforce a rigid body assumption. Experimental results show the fiducials to have precision of approximately 2 mm. The medium stage incorporates the novel use of ultraviolet light as a surface registration technique. UV is used to avoid error caused when the projected light stripes interfere with the marker material-a common problem with external landmarks and optical assessment systems. A semi-automatic algorithm for identifying the centre of the fiducials is given and shown to be highly accurate-to within 1 pixel precision compared to the visually assessed centre. The movement of these fiducials is also modelled at the extremes of the respiratory cycle with individual fiducials moving from 5-17 mm. A least-squares algorithm is implemented to bring surfaces together based on their fiducial locations and rigid-body motion. This algorithm results in RMS error of approximately 1.17 plus or minus 0.45 mm. The fine stage involves finding point correspondences in changed regions between a base surface and a comparison surface acquired at a different time given the rigid body registration from the previous stages. Five algorithmic variants are assessed using two simulations of thoracic swelling. The results do not show statistical significance between variants but do indicate visually some promising results. An application of this framework could be the near real-time guidance of the FAROArm, a precision measuring instrument commonly used in Computer-Integrated Surgery, to these points. This would facilitate the collection of functional information of clinical interest while maintaining positional congruence with data acquired at a different time point. Thesis temporary unavailable at the request of the Author. Copies may be requested by sending an email to bruce[-at-]csse.uwa.edu.au
An Efficient Algorithm for Extracting Boolean Functions from Linear Threshold Gates, and a Synthetic Decompositional Approach to Extracting Boolean Functions From Feedforward Neural Networks with Arbitrary Transfer Functions - Lawrence Peh (PhD) Artificial neural networks are universal function approximators that represent functions subsymbolically by weights, thresholds and network topology. Naturally, the representation remains the same regardless of the problem domain. Suppose a network is applied to a symbolic domain. It is difficult for a human to dynamically construct the symbolic function from the neural representation. It is also difficult to retrain networks on perturbed training vectors, to resume training with different training sets, to form a new neuron by combining trained neurons, and to reason with trained neurons. Even the original training set does not provide a symbolic representation of the function implemented by the trained network because the set may be incomplete or inconsistent, and the training phase may terminate with residual errors. The symbolic information in the network would be more useful if it is available in the language of the problem domain. Algorithms that translate the subsymbolic neural representation to a symbolic representation are called extraction algorithms. I argue that extraction algorithms that operate on single-output, layered feedforward networks are sufficient to analyse the class of multiple-output networks with arbitrary connections, including recurrent networks. The translucency dimension of the ADT taxonomy for feedforward networks classifies extraction approaches as pedagogical, eclectic, or decompositional. Pedagogical and eclectic approaches typically use a symbolic learning algorithm that takes the network's input-output behaviour as its raw data. Both approaches construct a set of input patterns and observe the network's output for each pattern. Eclectic and pedagogical approaches construct the input patterns respectively with and without reference to the network's internal information. These approaches are suitable for approximating the network's function using a probably-approximately-correct (PAC) or similar framework, but they are unsuitable for constructing the network's complete function. Decompositional approaches use internal information from a network more directly to produce the network's function in symbolic form. Decompositional algorithms have two components. The first component is a core extraction algorithm that operates on a single neuron that is assumed to implement a symbolic function. The second component provides the superstructure for the first. It consists of a decomposition rule for producing such neurons and a recomposition rule for symbolically aggregating the extracted functions into the symbolic function of the network. This thesis makes contributions to both components for Boolean extraction. I introduce a relatively efficient core algorithm called WSX based on a novel Boolean form called BNF. The algorithm has a worst case complexity of $O(\frac{2^n}{\sqrt{n}})$ for a neuron with n inputs, but in all cases, its complexity can also be expressed as O(l) with an O(n) precalculation phase, where l is the length of the extracted expression in terms of the number of symbols it contains. I extend WSX for approximate extraction (AWSX) by introducing an interval about the neuron's threshold. Assuming that the input patterns far from the threshold are more symbolically significant to the neuron than those near the threshold, AWSX ignores the neuron's mappings for the symbolically input patterns, remapping them as convenient for efficiency. In experiments, this dramatically decreased extraction time while retaining most of the neurons' mappings for the training set. Synthetic decomposition is this thesis' contribution to the second component of decompositional extraction. Classical decomposition decomposes the network into its constituent neurons. By extracting symbolic functions from these neurons, classical decomposition assumes that the neurons implement symbolic functions, or that approximating the subsymbolic computation in the neurons with symbolic computation does not significantly affect the network's symbolic function. I show experimentally that this assumption does not always hold. Instead of decomposing a network into its constituent neurons, synthetic decomposition uses constraints in the network that have the same functional form as neurons that implement Boolean functions; these neurons are called synthetic neurons. I present a starting point for constructing synthetic decompositional algorithms, and proceed to construct two such algorithms, each with a different strategy for decomposition and recomposition. One of the algorithms, ACX, works for networks with arbitrary monotonic transfer functions, so long as an inverse exists for the functions. It also has an elegant geometric interpretation that leads to meaningful approximations. I also show that ACX can be extended to layered networks with any number of layers. Download PhD thesis - 397K gzipped postscript.
Reconstruction for Visualisation of Discrete Data Fields Using Wavelet Signal Processing - Bernard Cena (PhD) The reconstruction of a function and its derivative from a set of measured samples is a fundamental operation in visualisation. Multiresolution techniques, such as wavelet signal processing, are instrumental in improving the performance and algorithm design for data analysis, filtering and processing. This dissertation explores the possibilities of combining traditional multiresolution analysis and processing features of wavelets with the design of appropriate filters for reconstruction of sampled data. On the one hand, a multiresolution system allows data feature detection, analysis and filtering. Wavelets have already been proven successful in these tasks. On the other hand, a choice of a discrete filter which converges to a continuous basis function under iteration permits efficient and accurate function representation by providing a ``bridge'' from the discrete to the continuous. A function representation method capable of both multiresolution analysis and accurate reconstruction of the underlying measured function would make a valuable tool for scientific visualisation. The aim of this dissertation is not to try to outperform existing filters designed specifically for reconstruction of sampled functions. The goal is to design a wavelet filter family which, while retaining properties necessary to preform multiresolution analysis, possesses features to enable the wavelets to be used as efficient and accurate "building blocks" for function representation. The application to visualisation is used as a means of practical demonstration of the results. Wavelet and visualisation filter design is analysed in the first part of this dissertation and a list of wavelet filter design criteria for visualisation is collated. Candidate wavelet filters are constructed based on a parameter space search of the $BC$-spline family and direct solution of equations describing filter properties. Further, a biorthogonal wavelet filter family is constructed based on point and average interpolating subdivision and using the lifting scheme. The main feature of these filters is their ability to reconstruct arbitrary degree piecewise polynomial functions and their derivatives using measured samples as direct input into a wavelet transform. The lifting scheme provides an intuitive, interval-adapted, time-domain filter and transform construction method. A generalised factorisation for arbitrary primal and dual order point and average interpolating filters is a result of the lifting construction. The proposed visualisation filter family is analysed quantitatively and qualitatively in the final part of the dissertation. Results from wavelet theory are used in the analysis which allow comparisons among wavelet filter families and between wavelets and filters designed specifically for reconstruction for visualisation. Lastly, the performance of the constructed wavelet filters is demonstrated in the visualisation context. One-dimensional signals are used to illustrate reconstruction performance of the wavelet filter family from noiseless and noisy samples in comparison to other wavelet filters and dedicated visualisation filters. The proposed wavelet filters converge to basis functions capable of reproducing functions that can be represented locally by arbitrary order piecewise polynomials. They are interpolating, smooth and provide asymptotically optimal reconstruction in the case when samples are used directly as wavelet coefficients. The reconstruction performance of the proposed wavelet filter family approaches that of continuous spatial domain filters designed specifically for reconstruction for visualisation. This is achieved in addition to retaining multiresolution analysis and processing properties of wavelets.
Download
PhD thesis - 3.0M gzipped postscript.
Automated Facial Metrology - David O'Mara (PhD) Automated facial metrology is the science of objective and automatic measurement of the human face. There are many reasons for measuring the human face. Psychologists are interested in determining how humans perceive beauty, and how this is related to facial symmetry. Biologists are interested in the relationship between symmetry and biological fitness. Anthropologists, surgeons, forensic experts, and security professionals can also benefit from automated facial metrology. This thesis investigates the concept of automated facial metrology, presenting original techniques for segmenting 3D range and colour images of the human head, measuring the bilateral symmetry of n-dimensional point data (with particular emphasis on measuring the human head), and extracting the 2D profile of the face from 3D data representing the head. Two facial profile analysis techniques are also presented that are incremental improvements over existing techniques. Extensive literature reviews of skin colour modelling, symmetry detection, symmetry measurement, and facial profile analysis are also included in this thesis. It was discovered during this research that bilateral symmetry detection using principal axes is not appropriate for detecting the mid-line of the human face. An original mid-line detection technique that does not use symmetry, and is superior to the symmetry-based technique, was developed as a direct result of this discovery. There is disagreement among researchers about the effect of ethnicity on skin colour. Some researchers claim that people from different ethnic groups have the same skin chromaticity (hue, saturation), while other researchers claim that different ethnic groups have different skin colours. It is shown in this thesis that people from apparently different ethnic groups can have skin chromaticity that is within the same Gaussian distribution. The chromaticity-based skin colour model used in this thesis has been chosen from the many models previously used by other researchers, and its applicability to skin colour modelling has been justified. It is proven in this thesis that the Mahalanobis distance to the skin colour distribution is Gaussian in both the chromatic and normalised-rg colour spaces. Most facial profile analysis techniques use either tangency or curvature to locate anthropometric features along the profile. Techniques based on both approaches have been implemented and compared. Neither approach is clearly superior to the other, but the results indicate that a hybrid technique, combining both approaches, could provide significant improvements. The areas of research most relevant to facial metrology are reviewed in this thesis and original contributions are made to the body of knowledge in each area. The techniques, results, literature reviews, and suggestions presented in this thesis provide a solid foundation for further research and hopefully bring the goal of automated facial metrology a little closer to being achieved. Download PhD thesis - 118Mb PDF.Computational Physiology of the Human Breast - Paul Taylor (PhD) For the past three decades radiologists have disagreed whether the radiographic patterns of breast parenchymal tissue represent a risk factor for the later development of breast cancer. Some early epidemiological studies devised categories of mammographic patterns that were strongly correlated with subsequent breast cancer; this link appeared to act independently of other biological risk factors. Subsequent research led to conflicting conclusions, with reports of little or no influence on breast cancer risk. The conflicting results have been attributed to methodological differences in the studies - due primarily to the lack of an objective, quantitative measurement of parenchymal pattern. Less attention has been paid to the limitations imposed by the mammographic imaging process - its reliance on pattern analysis in two dimensions, based on images that are projections of tissue while it is subject to compression and distortion. The relationship between the geometry of breast tissue in three dimensions and the risk of breast cancer remains largely unexplored. In order to gain a better understanding of mammographic patterns and their putative role in breast cancer risk assessment, this dissertation contends that it is necessary to develop a mathematical model of mammary tissue pattern formation during normal breast development. The problem is to construct a model that is both physiologically plausible, given our current knowledge of breast biochemistry, and anatomically realistic in the pre dictions it makes about the gross architecture of breast tissue. The main hypothesis of this dissertation is that normal growth of tissue in the breast can be modelled by a simple non-linear dynamic system, which represents the biochemical interactions between epithehal and stromal breast tissue. The complex three-dimensional tissue structures created by this model are explained as an emergent property of a self-organising developmental process. This thesis contends that the shape of the tissue generated by this process can be described using fractal geometry. Given that a model of breast morphogenesis can be developed, it is interesting to speculate whether the model can be used to generate synthetic mammograms that exhibit parenchymal patterns similar to those found in real mammograms. In particular, it would help to resolve the argument concerning the existence of mammographic parenchymal risk factors if the patterns contained in the synthetic mammograms were found to be similar to the patterns in real mammograms of women identified as being at a high risk of developing breast cancer. This thesis introduces the EidolaBreast, a mathematical model of the normal development of the female breast following puberty. The model represents the biochemical factors that activate and inhibit the growth of the breast epithelium. The input parameters are divided into two categories: endocrine controls that act systemically on the breast through the action of sex hormones; and local interactions between epithelial cells, connective tissue cells, and the extra-cellular matrix. In order to solve the model numerically it was converted to discrete space and time domains. This formulation of the model employs a three-dimensional array of elements, or voxels, to represent the concentration of morphogens and tissue composition. Epithelial structures are allowed to grow through the voxel array under the control of rules that express the elongation, branching, and self-avoidance properties of glandular tissue found in real breasts. Computer software was developed to solve the EidolaBreast model, given initial and boundary values for the independent parameters and the selection of a voxel size and time step. The output generated by the software is a voxel array composed of cubical cells. A visualisation function was developed in order to display the array in three dimensions. An animation component is included in the software to depict temporal changes in the values of the voxel elements. Three strategies for testing the model were identified. First, a mathematical slicing operation is applied to the voxel array in order to create synthetic tissue sections. The patterns in these sections can then be compared with stained sections of real breast tissue. Second, the three-dimensional shape of the cells in the array occupied by epithelial tissue can be compared with three-dimensional data obtained by imaging or dissection of real, whole breasts. However, in this case there is a paucity of experimental data available from real breasts .A method for obtaining these data using the reconstruction of series of sections is presented in this thesis. Third, the contents of the array can be projected onto a two-dimensional plane in order to create a synthetic mammograrn. This is based on a model of the geometrical and photographic properties of image formation in real mammography. To carry out this test a model, EidolaMammo, of the mammographic image formation process was developed. The model includes a representation of the elastic deformation of the breast as it is compressed. The implementation of this model in computer software permits the prime factors of the exposure to be selected as they would on a real mammography machine. In each of the tests it was found that the breast tissue patterns predicted by the model resembled the range of patterns observed experimentally in normal breasts. To quantify the accuracy of the predictions various statistical texture measurements were computed, including the power spectrum and fractal dimension. The measurements confirmed the similarity of the tissue patterns between the model and real breasts. In order to assess the sensitivity of the EidolaBreast model, two case-control stud ies were conducted using real images obtained from mammographic screening. In the first study a group of women using hormone-replacement therapy (HRT) to alleviate menopausal symptoms were selected. Sequential changes in the breast parenchyma between successive screening rounds were measured and compared with the synthetic mam mograms generated by EidolaMammo. This study concluded that HRT results in a denser parenchymal pattern and that this pattern can be simulated by appropriate changes in the input parameters to the EidolaBreast model. In the second study the parenchymal patterns of women who had been diagnosed with breast cancer were first compared to an age-matched control group selected from screening and then to the synthetic mammograms. This study concluded that the mammographic patterns in the cancer cases differed from the patterns in the matched controls and the synthetic mammograms. The findings of this thesis mean that we can now explain how the complex geometry of tissue in the human breast grows and changes in response to the reproductive and hormonal influences throughout a woman s life. The morphology of the mammary gland is demonstrated to arise from a self-organisational process that can be simulated by a computational model. The model generates patterns that are similar to the patterns in real mammograms and can therefore serve as a useful tool in investigating the changes in these patterns that might precede detectable signs of malignant disease in the breast. Video Analysis in MPEG Compressed Domain - Lifang Gu (PhD) The amount of digital video has been increasing dramatically due to the technology advances in video capturing, storage, and compression. The usefulness of vast repositories of digital information is limited by the effectiveness of the access methods, as shown by the Web explosion. The key issues in addressing the access methods are those of content description and of information space navigation. While textual documents in digital form are somewhat self-describing (i.e., they provide explicit indices, such as words and sentences that can be directly used to categorise and access them), digital video does not provide such an explicit content description. In order to access video material in an effective way, without looking at the material in its entirety, it is therefore necessary to analyse and annotate video sequences, and provide an explicit content description targeted to the user needs. Digital video is a very rich medium, and the characteristics into which users may be interested are quite diverse, ranging from the structure of the video to the identity of the people who appear in it, their movements and dialogues and the accompanying music and audio effects. Indexing digital video, based on its content, can be carried out at several levels of abstraction, beginning with indices like the video program name and name of subject, to much lower level aspects of video like the location of edits and motion properties of video. Manual video indexing requires the sequential examination of the entire video clip in order to annotate it. This is a time-consuming, subjective, and expensive process. As a result, there is an urgent need for tools to automate the indexing process. In response to such needs, various video analysis techniques from the research fields of image processing and computer vision have been proposed to parse, index and annotate the massive amount of digital video data. However, most of these video analysis techniques have been developed for uncompressed video. Since most video data are stored in compressed formats for efficiency in storage and transmission, it would be necessary to perform decompression on compressed video before such analysis techniques can be applied. Two consequences of having to first decompress before processing are incurring computation time for decompression and requiring extra auxiliary storage. To save on the computational cost of decompression and lower the overall size of the data which must be processed, this study attempts to make use of features available in compressed video data and proposes several video processing techniques operating directly on compressed video data. Specifically, techniques of processing MPEG-1 and MPEG-2 compressed data have been developed to help automate the video indexing process. This includes the tasks of video segmentation (shot boundary detection), camera motion characterisation, and highlights extraction (detection of skin-colour regions, text regions, moving objects and replays) in MPEG compressed video sequences. The approach of performing analysis on the compressed data has the advantages of dealing with a much reduced data size and is therefore suitable for computationally-intensive low-level operations. Experimental results show that most analysis tasks for video indexing can be carried out efficiently in the compressed domain. Once intermediate results, which are dramatically reduced in size, are obtained from the compressed domain analysis, partial decompression can be applied to perform more accurate processing to extract high level semantic information.
Download
PhD thesis - 692K gzipped pdf.
An Investigation into Trapezoidation and its Application to Geographic
Information Systems and Computer Graphics - Gian Lorenzetto (PhD) Decomposition, the breaking up of a polygon into elementary parts, is an essential first step for almost all operations on simple polygons. This approach allows for a more efficient algorithm, as the processing of each elementary part is simpler and more efficient than the processing of the whole polygon. The most common form of decomposition for polygons in 2D is trapezoidation, the breaking up of a polygon into trapezoids. The primary focus of this thesis is the trapezoidation process. Specifically, we show a new approach to the trapezoidation of simple polygons in 2D that is both simple yet practical and we improve upon the previous best algorithm for the trapezoidation of nested simple polygons in 2D. We also explore the previously unknown practical benefits to computer graphics of 2D trapezoidation in 3D environments. We present a new 0(n) time heuristic based algorithm for the trapezoidation of simple polygons in 2D, where n is the number of vertices defining the polygon. Our approach is as simple as the 0(n log n) algorithms used in practice, but with better algorithmic performance. The solution is based on exploiting geometric characteristics of simple polygons. Importantly, we show that the algorithm will always produce a correct trapezoidation. We also provide an upper bound of 0(n^2) for worst-case performance. However, we show that for many simple polygons the algorithm runs in linear time. In particular, we show that our algorithm runs in linear time for a large number of polygons found in Geographic Information Systems (GIS). An interesting problem within simple polygon trapezoidation is that of decomposing nested simple polygons. Such polygons are often encountered in GIS. To date only one algorithm has been presented for the trapezoidation of nested simple polygons in which both the enclosing and nested polygons are decomposed. This algorithm runs in 0(n^2 log n) time. Specifically it requires an expensive 0(n^2) post- processing stage. We present a new approach to the problem employing a novel data structure for storing polygon edges. It is this structure that allows us to avoid any costly post-processing stage and consequently our algorithm runs in O(nlogn). An improvement of 0(n) over the previous best algorithm. We show the performance of our algorithm using data sets typical to those found in GIS. The trapezoidation of simple polygons defined in 3D produces a spatial scene representation similar to an octree or binary space partitioning (BSP) tree. Although a 3D trapezoidation has been considered for use in computer graphics its practical value is unknown. Given the extensive use of octree and BSP data structures in computer graphics it is interesting to investigate 3D trapezoidation in a similar context. Specifically, we show its use in walkthrough visualisations of architectural scenes. We present a simple method for constructing a 3D trapezoidation and con sider its use for visibility computation. We draw on the work performed in portal based rendering systems and in particular the past work in cell-to-cell visibility algorithms. We show how such visibility computation may be adapted for use with a 3D trapezoidation. We also present a novel visibility algorithm for walkthrough style visualisations based on this adaption. The Recovery of 3-D Structure Using Visual Texture Patterns -
Angeline Loh (PhD) One common task in Computer Vision is the estimation of three-dimensional surface shape from two-dimensional images. This task is important as a precursor to higher level tasks such as object recognition - since the shape of an object gives clues to what the object is - and object modeling for graphics. Many visual cues have been suggested in the literature to provide shape information, including the shading of an object, its occluding contours (the outline of the object that slants away from the viewer) and its appearance from two or more views. If the image exhibits a significant amount of texture, then this too may be used as a shape cue. Here, 'texture' is taken to mean the pattern on the surface of the object, such as the dots on a pear, or the tartan pattern on a tablecloth. This problem of estimating the shape of an object based on its texture is referred to as shape-from-texture and it is the subject of this thesis. One motivation for studying shape-from-texture is the fact that, according to psychophysical experiments, texture plays an important role in the human perception of shape. It would be useful if computers could mimic this behaviour. Some advantages of using texture as a cue are that it allows shape to be recovered from static monocular images (rather than multiple views) and the fact that textures are ubiquitous in the world around us. Another reason for studying texture as a shape cue is to use it in combination with other cues for a more robust solution. During the past three decades, there has been much work in shape-from-texture. This thesis contributes to the existing body of work by providing three new algorithms: two are shape-from-texture algorithms that solve the problem under different sets of assumptions regarding the texture and viewing geometry; the other algorithm solves the low-level task of estimating the transformation between patches of texture. These three algorithms are described in more detail below. The first shape-from-texture method is fast and direct, and works with homogeneous and stationary textures viewed orthographically, with the frontal texture known. The method is based on the fact that as textures are foreshortened due to the relative orientation of the surface patch to the viewer, the second spectral moments do not change about the axis orthogonal to the tilt axis. From this, the tilt axis may be identified, which in turn leads to an estimation of the slant angle of the surface patch. In this way the orientation of each surface patch may be solved. A number of issues affecting the ideal behaviour of the system are explored, including the scaling property of the Fourier transform, windowing schemes, illumination and blur. A property of the method is that occluding boundaries are estimated to curve away from the viewer. The new method is compared to a recent and well-known method from the literature that uses the same assumptions regarding texture and viewing geometry. This thesis demonstrates that the new method has many advantages over the other existing method; among other things it is more robust, does not exhibit any ambiguities other than the unavoidable tilt ambiguity of +-pi, and never returns complex, and hence unusable, values. The second shape-from-texture method aims to solve the problem in one of its most general forms; the texture is not assumed to be isotropic, homogeneous, stationary or viewed orthographically. In addition, the frontal texture is not assumed to he known a priori, or from a known set, or even present in the image. Instead it is assumed that the surface is smooth and covered in identical texture elements; this assumption allows the entire surface to be recovered via a consistency constraint. The key idea is that if the correct transformation from an arbitrary reference texel to a frontal texel can be estimated only then will a consistent, integrable surface be produced. It is shown that a Levenberg-Marquardt search can estimate the frontal texture efficiently. The methods described in tins thesis have been quantitatively tested on the entire set of Brodatz textures and also on real images. This thesis also investigates the relationship between shape-from-texture and structure-from-motion. If the camera is stationary and the moving object is planar, or nearly so, then the superposition of the images of the moving object produce a texture, the structure of which can be solved. The second shape-from-texture algorithm was adapted to demonstrate an example of such a structure-from-motion reconstruction. The other algorithm that is presented in this thesis estimates the transformation between patches of the same texture, viewed from different orientations. Previous methods for doing this are shown to be non-robust or have limitations if the change between the two texture patches is not incremental. The new method overcomes the drawbacks of these previous methods, as well as being robust to blurring and illumination variations. The work in this thesis is likely to impact in a number of ways. The second shape-from-texture algorithm provides one of the most general solutions to the problem. On the other hand, if the assumptions of the first shape-from-texture algorithm are met, tins algorithm provides an extremely usable method, in that users should be able to input images of textured objects and click on the frontal texture to quickly reconstruct a fairly good estimation of the surface. And lastly, the algorithm for estimating the transformation between textures can be used as a part of many shape-from-texture algorithms, as well as being useful in other areas of Computer Vision. This thesis gives two examples of other applications for the method: re-texturing an object and placing objects in a scene. Download PhD thesis - PDF Representations and Matching Techniques for 3D Free-form Object
and Face Recognition - Ajmal Mian (PhD) The aim of visual recognition is to identify objects in a scene and estimate their pose. Object recognition from 2D images is sensitive to illumination, pose, clutter and occlusions. Object recognition from range data on the other hand does not suffer from these limitations. An important paradigm of recognition is model-based whereby 3D models of objects are constructed offline and saved in a database, using a suitable representation. During online recognition, a similar representation of a scene is matched with the database for recognizing objects present in the scene. A 3D model of a free-form object is constructed offline from its multiple range images (views) acquired from different viewpoints. These views are registered in a common coordinate basis by establishing correspondences between them followed by their integration into a seamless 3D model. Automatic correspondences between overlapping views is the major problem in 3D modeling. This problem becomes more challenging when the views are unordered and hence there is no a priori knowledge about which view pairs overlap. The main challenges in the online recognition phase are the presence of clutter due to unwanted objects and noise, and the presence of occluding objects. This thesis addresses the above challenges and investigates novel representations and matching techniques for 3D free-form rigid object and non-rigid face recognition. A robust representation based on third order tensors is presented. The tensor representation quantizes local surface patches of an object into three-dimensional grids. Each grid is defined in an object centered local coordinate basis which makes the tensors invariant to rigid transformations. This thesis presents a novel multiview correspondence algorithm which automatically establishes correspondences between unordered views of a free-form object with O(N) complexity. It also presents a novel algorithm for 3D free-form object recognition and segmentation in complex scenes containing clutter and occlusions. The combination of the strengths of the tensor representation and the customized use of a 4D hash table for matching constitute the basic ingredients of these algorithms. This thesis demonstrates the superiority of the tensor representation in terms of descriptiveness compared to an existing competitor, i.e. the spin images. It also demonstrates that the proposed correspondence and recognition algorithms outperform the spin image recognition in terms of accuracy and efficiency. The tensor representation is extended to automatic and pose invariant 3D face recognition. As the face is a non-rigid object, expressions can significantly change its 3D shape. Therefore, the last part of this thesis investigates representations and matching techniques for automatic 3D face recognition which are robust to facial expressions. A number of novelties are proposed in this area along with their extensive experimental validation using the largest available 3D face database. These novelties include a region-based matching algorithm for 3D face recognition, a 2D and 3D multimodal hybrid face recognition algorithm, fully automatic 3D nose ridge detection, fully automatic normalization of 3D and 2D faces, a low cost rejection classifier based on a novel Spherical Face Representation, and finally, automatic segmentation of the expression insensitive regions of a face. |
| Copyright (C) 2000, Department of Computer Science & Software Engineering, The University of Western Australia. (Contact Details). Unauthorised duplication or modification of this page and its contents is prohibited. Last Modified 01-08-2000 Page Delivered XXXXXXX |
![]() |