A typical data warehouse may contain thousands of tables. Validating the data quality of all data warehouse content is an unmanageable and unmaintainable task if handled in an ad-hoc fashion. This software seeks to ensure that a data warehouse accurately reflects the contents of the business systems that feed it, in turn guaranteeing accuracy of business reports. Data anomalies can occur due to breaks in processes, unexpected changes to business rules, upgrades, unanticipated data relationships, or inadequate testing. Inspecting the data after each batch load for anomalous conditions allows IT to respond quickly to such issues, and to deliver a higher-quality product to end users.

This software provides a mechanism for detecting data quality issues in the data warehouse as soon as the data is loaded. It validates that reporting data is complete and correct by comparing data warehouse data, both detail and aggregate, against the source systems. The software is configurable so as the data warehouse is expanded with new content, validations for that content can be added by a developer or operator with only a few minutes of effort.

The configuration allows for different types of validations that can be tailored to the type of data being validated; for example, validating that aggregates tie-out at various levels of granularity, or that lists of values are in sync. Configuration is flexible and metadata-driven, allowing for handling known exceptions, and for creating custom validation scripts within the framework when necessary.

This work was done by Janet A. Sorrentino of Caltech for NASA's Jet Propulsion Laboratory. This software is available for license through the Jet Propulsion Laboratory, and you may request a license at: here . NPO-49624