Why I Use MATLAB for Forensic Image Processing

It is a general rule that any evidence that is to be presented in court must be open to scrutiny and be testable.

In the case where processed digital images are presented in court we have to establish two things:

The House of Lords Science and Technology - Fifth Report: Digital Images as Evidence considers the issues involved in establishing the authenticity of digital images in considerable detail and provides some useful guidelines to follow. However, the report also considers some of the issues involved in the processing of images and does raise some concerns.

The area where I see most concern is where images are processed using proprietary software, for example Photoshop, where the implementation details of of any image processing algorithm are inaccessible. Some algorithms are quite complex and the results may be sensitive to subtleties in the implementation. An image that has been extensively processed using proprietary software may well be challenged in court. Refuting that challenge will be difficult if one does not have access to the source code.

Another area of difficulty is recording all the steps used in the image enhancement process. Some applications such as Photoshop provide an extensive 'history' recording process. However, this sort of facility is designed primarily with the aim of being able to undo processes, or revert back to an earlier step in the sequence of processes. The history recording may not allow access to particular parameter values used in any individual step.

The Advantages of MATLAB

Recording of the processing used

MATLAB is a general purpose programming language. When it is used to process images one generally writes function files, or script files to perform the operations. These files form a formal record of the processing used and ensures that the final results can be tested and replicated by others should the need arise.

Access to implementation details

MATLAB provides many functions for image processing and other tasks. Most of these functions are written in the MATLAB language and are publicly readable as plain text files. Thus the implementation details of these functions are accessible and open to scrutiny. The defense can examine the processing used in complete detail, and any challenges raised can be responded to in an informed way by the prosecution. This makes MATLAB very different from applications, such as Photoshop.

It should be noted that some MATLAB functions cannot be viewed. These are generally lower level functions that are computationally expensive and are hence provided as 'builtin' functions running as native code. These functions are heavily used and tested and can be relied on with considerable confidence.

Numerical accuracy

Another advantage of MATLAB is that it allows one to ensure maximal numerical precision in the final result.

In general, image files store data to 8 bit precision. This corresponds to a range of integer values from 0-255. A pixel in a colour image may be represented by three 8 bit numbers, each representing the red, green and blue components as an integer value between 0 and 255. Typically this is ample precision for representing normal images.

However as soon as one reads this image data into memory and starts to process it it is very easy to generate values that lie outside the range 0-255. For example, to double the contrast of an image one multiplies the intensity values by 2. An image value of 200 will become 400 and numerical overflow will result. How this is dealt with will vary between image processing programs. Some may truncate the results to an integer in the range 0-255, others may perform the mathematical operations in floating point arithmetic and then rescale the final results to an integer in the range 0-255.

It is here that numerical precision, and hence image fidelity, may be lost. Some image processing algorithms result in some pixel values with very large magnitudes (positive or negative). Typically these large values occur at points in the image where intensity discontinuities occur, the edges of the image are common sources of this problem. When this image with widely varying values is rescaled to integers in the range 0-255 much of this range may be used just to represent the few pixels with the large values. The bulk of the image data may then have to be represented within a small range of integer values, say from 0-50. Clearly this represents a considerable loss of image information. If another process is then applied to this image the problems can then accumulate. Trying to establish the extent of this problem, if any, is hard if one is using proprietary software.

Being a general programming language it is possible to have complete control of the precision with which one represents data in MATLAB. An image can be read into memory and the data cast into double precision floating point values. All image processing steps can then be performed in double precision floating point arithmetic, and at no intermediate stage does one need to rescale the results to integers in the range 0-255. Only at the final point when the image is to be displayed and/or written to file does it need to be rescaled. Here one can use histogram truncation to eliminate extreme pixel values so that the bulk of the image data is properly represented.

Advanced algorithms

MATLAB is a scientific programming language and provides strong mathematical and numerical support for the implementation of advanced algorithms. It is for this reason that MATLAB is widely used by the image processing and computer vision community. New algorithms are very likely to be implemented first in MATLAB, indeed they may only be available in MATLAB.

Conclusion

MATLAB may not be as user friendly as an application like Photoshop, however, being a general purpose programming language it provides many important advantages for forensic image processing. Peter Kovesi