Analyzing data piecemeal is usually uninformative. Analysts need tools to evaluate multiple pieces of data simultaneously that are related by a common thread. Identifying that common thread and retrieving the relevant data can be tedious and time-consuming. To identify the data needed for a given analysis, analysts search in hard drives, share drives, and thumb drives, poring over data in spreadsheets, documents, and log files. They search for data related by certain common characteristics in their metadata (extra information about the data itself). Once the datasets have been discovered and collected, the analyst must spend time working with each dataset. The analyst converts the datasets to a common form that will permit the datasets to be processed together, meanwhile investing significant time into ensuring that the content and metadata are believable and, ultimately, trustworthy.

An example of the flow for an Echo-supported project. The database of TimeRecords is assigned units in the first step. A record of that change is represented by the open square. Next, a subset of time is assigned and the change is captured, as well as a record of that change. At the next step, the analysis is divided with a different process begun.

Traditional analysis software focuses on just the data; the metadata is left in the hands of the analyst. Results are produced using scripts that the analyst uses to combine data in different ways and perform mathematical operations on the data. While the analyst doing the work may have a good handle on the process, many of the decisions and other contextual information that led to the result are only present in the analyst's mind. How the result of the analysis is produced is invisible to other analysts or the users (customers) of the result.

Echo is a data analysis environment that packages data with metadata. In Echo, metadata helps users find data, manage and organize analysis workflows, and accurately report results as relevant, usable information.

Echo keeps track of every change made and who made it, and archives each intermediate step in an analysis. The figure shows the path that one analysis has taken. At the bottom left is the original data. A change is made (indicated by the triangle) that assigns units, and the result is indicated by the open square. An additional operation is performed to select a subset of the data within a time window, producing the next result, indicated by the shaded square. The process continues until the analyst produces the desired final result.

The history of each step along the way is automatically captured, and any intermediate result can be reviewed and visualized. Other analysts can modify individual steps in an analysis workflow to produce new results. This capability enables collaboration among different parties, allowing them to share results with transparency, and compare different analysis approaches.

With Echo, the data and metadata are encapsulated, reducing the potential for error, and improving the access to all of the information required to perform an analysis and document the results. Echo was originally developed in support of systems qualification for the weapons program; however, the software can be used in a wide variety of applications. Scientists and data analysts can use the common framework of Echo to manage and analyze many types of datasets, including diagnostic imagery in the area of medical science, and geospatial measurements used in climate science.

For more information, contact Kathleen McDonald, Richard P. Feynman Center for Innovation, at This email address is being protected from spambots. You need JavaScript enabled to view it.; 505-665-9090.

Tech Briefs Magazine

This article first appeared in the February, 2018 issue of Tech Briefs Magazine.

Read more articles from the archives here.