Methods exist for processing an original data sequence in order to generate information about the data for the purposes of integrity measurement, ownership demonstration, and authentication. The first category is digital watermarking, the second is data hashing, and the third is error detection coding. In the present context, watermarking is often used for ownership demonstration and authentication purposes; data hashing is often used for integrity measurement purposes; and as with data hashing, error detection coding is used to measure data integrity. However, the latter is normally done as part of data transmission protocol, and therefore addresses transmission errors rather than tampering.
An important step in data authentication and demonstrating data ownership is some form of registration process. This is a process that is recognized and accepted by the community of users whereby information regarding the original data is stored and later presented for comparisons. It is often important that this information is also time-stamped, properly associated with the original data, and stored in a secure fashion. This identifying information regarding the original data can be referred to as registration data. In the exemplifying case of digital watermarking, the watermark, the watermark embedding method, and the watermark recovery method can become part of the registration data.
The Detector and Extractor of File-prints (DEF) is a high-fidelity, robust, and easy-to-implement method for data protection and change detection by using naturally occurring digital watermarks.
This process allows for measuring the integrity and origins of a data sequence without the need to embed watermarks, and unlike data hashing, it is amenable to similarity measurements. DEF is applicable to any binary data, and can be thought of as a visual data hash — sometimes referred to as a data fingerprint. This efficient technique is especially suitable to files larger than 10,000 bytes.
Based on concepts developed for signal detection, DEF generates binary images from a data file that are unique to the data file. These visual hashes, or fileprints, are based on spectral characteristics of the data. No auxiliary data is embedded into the host data to be protected. In addition, DEF is designed to survive common distortion procedures such as lossy compression, conversion between digital/analog format, and copying by using spectral characteristics.
Key components of DEF are time-frequency representations and log-amplitude compression; these are well established techniques commonly found in signal processing systems. DEF is an extra computation step that can be easily accomplished in software languages such as Java or C++, or in hardware via FPGA or ASIC, and brings significant improvements to the downstream detection and classification processes.