In efforts to automatically capture important data from scientific papers, computer scientists at the National Institute of Standards and Technology (NIST) have developed a method to accurately detect small, geometric objects such as triangles within dense, low-quality plots contained in image data. Employing a neural network approach designed to detect patterns, the NIST model has many possible applications in modern life.
NIST’s neural network model captured 97% of objects in a defined set of test images, locating the objects’ centers to within a few pixels of manually selected locations. The researchers took the data from journal articles dating as far back as the early 1900s in a database of metallic properties at NIST’s Thermodynamics Research Center (TRC). Often the results were presented only in graphical format, sometimes drawn by hand and degraded by scanning or photocopying. The researchers wanted to extract the locations of data points to recover the original, raw data for additional analysis. Until now such data have been extracted manually.
The images present data points with a variety of different markers, mainly circles, triangles, and squares, both filled and open, of varying size and clarity. Such geometrical markers are often used to label data in a scientific graph. Text, numbers, and other symbols, which can falsely appear to be data points, were manually removed from a subset of the figures with graphics editing software before training the neural networks.
Accurately detecting and localizing the data markers was a challenge for several reasons. The markers are inconsistent in clarity and exact shape; they may be open or filled and are sometimes fuzzy or distorted. Some circles appear extremely circular, for example, whereas others do not have enough pixels to fully define their shape. In addition, many images contain very dense patches of overlapping circles, squares and triangles.
The researchers sought to create a network model that identified plot points at least as accurately as manual detection — within 5 pixels of the actual location on a plot the size of several thousand pixels per side.
NIST researchers adopted a network architecture originally developed by German researchers for analyzing bio-medical images, called U-Net. First the image dimensions are contracted to reduce spatial information, and then layers of feature and context information are added to build up precise, high-resolution results.
To help train the network to classify marker shapes and locate their centers, the researchers experimented with four ways of marking the training data with masks, using different-sized center markings and outlines for each geometric object.
The researchers found that adding more information to the masks, such as thicker outlines, increased the accuracy of classifying object shapes but reduced the accuracy of pinpointing their locations on the plots. In the end, the researchers combined the best aspects of several models to get the best classification and smallest location errors. Altering the masks turned out to be the best way to improve network performance, more effective than other approaches such as small changes at the end of the network.
The network’s best performance — an accuracy of 97% in locating object centers — was possible only for a subset of images in which plot points were originally represented by very clear circles, triangles, and squares. The performance is good enough for the TRC to use the neural network to recover data from plots in newer journal papers.