DVQ (which stands for "digital video quality") is a metric for evaluating the visual quality of digitized video images. Other video-quality metrics have been proposed, but it appears that each of them (1) may be based on mathematical models that are not related closely enough to the characteristics of human perception, in which case it may not measure visual quality accurately; or (2) may entail such large amounts of memory or computation that the contexts in which it can be applied are restricted. In contrast, DVQ was developed in an effort to incorporate mathematical models of human visual processing while maintaining computational efficiency so that accurate metrics can be computed in real time by use of modest computational resources.

Test and Reference Sequences of digitized video images are processed to generate a measure of the visual quality of the test sequence relative to the reference sequence.

DVQ incorporates aspects of early visual processing, including dynamic adaptation to changing brightness, luminance and chromatic channels, spatial and temporal filtering, spatial-frequency channels, dynamic contrast masking, and summation of probabilities. Among the most complex and time-consuming elements of other proposed metrics are spatial-filtering operations that implement multiple-band-pass spatial filters characteristic of human vision. In DVQ, spatial filtering is accelerated by use of the discrete cosine transform (DCT); this provision affords a powerful advantage because efficient computational hardware and software for the DCT are available and because in many potential applications, DCTs may have already been generated in image-data-compression processing.

DVQ is defined by, and computed in, the process illustrated in the figure. The input to the process is a pair of color video image sequences, of which one is denoted the reference sequence and the other is denoted the test sequence. The first step of the process consists of various sampling, cropping, and color transformations that serve to restrict processing to a region of interest and to represent colors in the sequences in a perceptual color space [e.g., in terms of L (a standard measure of brightness) and chromaticity coordinates (standard measures of hue and saturation) specified by the Commission Internationale de l'Eclairage (CIE)].

The sequences are then subjected to blocking and DCT, the results of which are transformed to local contrast (the ratio between the DCT amplitude and the mean amplitude in the affected block). The next step is a temporal-filtering operation, in which the temporal part of a contrast-sensitivity function (CSF) is implemented in a recursive discrete second-order filter. The outputs from the temporal-filtering operation are converted to just-noticeable differences by dividing each DCT coefficient by its respective visual threshold; this implements the spatial part of the CSF.

In the next step, the two sequences are subtracted. The resulting difference sequence is subjected to a contrast-masking operation, which also depends upon the reference sequence.

Finally, the masked differences can be pooled in various ways to illustrate the perceptual error over various dimensions. As used here, "pooling" signifies summing over one or more of six dimensions that represent, specifically, image frames, color channels, rows of blocks, columns of blocks, horizontal spatial frequencies, and vertical spatial frequencies. The pooled error can be converted to a measure of visual quality.

This work was done by Andrew B. Watson, James Hu, and John F. McGowan III of Ames Research Center. For further information, access the Technical Support Package (TSP) free on-line at www.nasatech.com/tsp  under the Information Sciences category.

This invention is owned by NASA, and a patent application has been filed. Inquiries concerning nonexclusive or exclusive license for its commercial development should be addressed to

the Patent Counsel
Ames Research Center
(650) 604-5104.

Refer to ARC-14236.