|
Computer Science
4th Year Projects in 2000
|
Dr Cara MacNish (cara@cs.uwa.edu.au)
Cara obtained her
BE(Hons) in Electronic Engineering at UWA in 1987. This was followed
by a short period working as a research and development engineer for
ACET Ltd, before taking up the Prince of Wales Studentship at Trinity
College, Cambridge, in 1988. In 1992 she obtained her Ph.D. in
Logic-based Inference Systems from the University of Cambridge, and in
March 1992 took up a Lectureship in Computer Science at the University
of York. She returned to Australia in 1995 to join the Computer
Science Department at UWA, and in 2000 obtained her Chain Saw
Operator's Certification.
Cara's research interests include logic
and artificial intelligence, automated planning, machine learning,
iterative optimisation, software specification and Computer Science
education.
For more information see the Intelligent
Computing Research Group
web site.
Iterative Techniques for Speech Reconstruction
Systems for speech recognition and natural language understanding may
have to deal with corrupted speech sequences for many reasons. For
example:
- Casual spoken utterances rarely conform to the strict
grammatical rules of written language.
- Speech is often recorded in noisy environments.
- Spoken language is ambiguous - different sequences of words
can produce the same sounds, and/or the same visual clues.
The last case is particularly pertinent to the Department's Lipreading
Project, which is attempting to reconstruct speech from
visual information. Here, even if the information is captured
perfectly, there will still be ambiguities, since the same mouth
patterns may produce different sounds.
The aim of this project is to investigate the use of distributed
iterative search techniques, such as Genetic Algorithms, as a way of
completing or correcting incomplete or incorrect speech sequences. The
idea is to find new sequences that are as "similar" as possible to the
recorded sequence while being phonetically feasible (and, ultimately,
grammatically and semantically correct).
Intelligent Techniques for Code Analysis and Feedback
Evolutionary Optimisation and Test Sequence Generation
The datlab system is a Java system for analysing data
structure implementations submitted by 2nd year students. The system,
and some preliminary work on evolutionary test sequence
generation, is described in [
MacNish00].
The main approach used by the datlab system for testing
student code is to run sequences of test methods on the code, and
compare the resulting output and the errors thrown with those of a
"master copy" that is known to work correctly. While these test
sequences can be generated pseudorandomly, there are a number of
problems with this approach. First, errors may only arise from fairly
specific sequences of operations. Long pseudorandom sequences are
therefore needed to give a high probability of uncovering
errors. Since the response time of the system is also important with
many students using the system at once, there are practical limits on
the size of test sequences used, and on some occasions erroneous code
will pass unnoticed.
Secondly, it is important to be able to provide useful feedback to
students. One way in which this has been done is to provide students
with the sequence of operations that brought about the failure of the
submitted code. They can then be encouraged to work through the
sequence against a diagrammatic representation of the data structure
to identify where its state is corrupted. (The corruption does not, in
general, occur in the last method called - this method simply
reveals the corrupted state.) If long test sequences are used this
process becomes unwieldy and impractical.
Finally, the long pseudorandom sequences provide little feedback to
the lecturer about the types of errors frequently being made by
students. This information could be very useful for addressing common
misunderstandings in lectures and tutorials.
What we seek, therefore, is a way of generating test sequences
that fulfill the following goals.
- The sequences should uncover "most" errors - in particular,
frequently occuring classes of errors should be found. Note that
frequency of occurrence of an error has little correlation with
frequency of pseudorandom generation of a sequence that uncovers
the error. (Project students may wish to explore this correlation.)
- Errors should be uncovered by "short" sequences, which can then
be used by students to investigate the behaviour of their code
and by lecturers to characterise common misunderstandings amoung
students.
The aim of these projects is to explore the development and use of
iterative optimisation techniques (or "evolutionary" techniques) for
test sequence generation. This provides an interesting
application area for iterative optimisation techniques for a number of
reasons:
- The size of the search space for test sequences is large. A
traditional (for example "breadth first") search will have a
branching factor equal to the number of different method calls
in the program being tested. (Any solution proposed should be
tested for performance against traditional search techniques.)
- The search space is erratic. A small change in a test
sequence can have a large (discontinuous) change in the fitness
of the sequence - a single change to a method in a sequence
that finds an error within a small number of operations may
produce a sequence that does not find an error at all.
- Because there are many different errors that can be made in
target programs (that is, many different "goals" for the
optimisation algorithm), the algorithm must work towards a
collection of good solutions. The difficulty is to ensure
that good solution(s) for one class of error are not abandoned
in favour of good solutions for a new error (this is the problem
of ``forgetting'' that occurs in many iterative optimisation
techniques) while at the same time not maintaining a
proliferation of good solutions to the same class of errors.
At least two projects can be offered in this area, focussing on the
following optimisation techniques:
- Genetic Algorithms
Genetic algorithms
[Davis91,Goldberg89]
are based on the biological process of evolution and natural
selection, and are now a well-established technique for iteratively
improving solutions in search spaces that are not "well-behaved". A
population of individuals (encoded solutions) is "evolved" by
crossover and mutation operations, chosen randomly according to some
parameters, with the aim of finding successively improved results
(solutions with an improved fitness value).
- Particle Swarm Optimisers
Particle swarm optimisers are a less well-known technique that
originally grew out of attempts to model the behaviour of flocks
of birds and schools of fish. Each individual "flies" through the
search space, influenced by its own best position to date and the
best position found by the "flock", ideally converging on a
near-optimal solution. A lack of understanding of the mathematics
of the optimiser led the original authors to cap velocity in order
to obtain convergence, thus giving behaviour that appeared closer
to a swarm of particles, hence the (somewhat misleading) name.
Return to the 4th year project list