The V-FASTR (VLBA Fast Transient Experiment) system was motivated by the desire to monitor the radio sky for interesting transient events. To be confident that no interesting extragalactic event is missed, every VFASTR candidate requires human review and evaluation. Candidates consist of pulsar pulses, spurious correlated radio frequency interference (RFI), and other potentially unknown phenomena. However, the number of candidates generated by V-FASTR each day ranges from zero, to tens, to hundreds, to thousands, depending on the observational target and environmental conditions. On busy days, the volume of candidates exceeds the amount of time available for human review.

A candidate event classifier was developed to reduce the reviewing burden by automatically tagging events that can be confidently classified as pulses from a known pulsar or as RFI (artifacts). The remaining candidates consist of pulses without a known origin or explanation. They are the ones that most require human intervention.

The V-FASTR candidate classifier computes ten features to describe each candidate and generate a predicted class. It employs a trained random forest classifier to predict the class for each new candidate (“pulse,” “artifact,” or “none”), then consults a database of known pulsars to further refine its predictions. If a candidate classified as “pulse” matches a known pulsar (in location on the sky and dispersion measure), it is re-classified as “pulsar.” If there is no known match for a “pulse,” it is re-classified as “good candidate” since it may indicate the discovery of a new pulsar or other unknown phenomena. Predict ions that are sufficiently confident are added to the metadata associated with the candidate and used to reduce the number of candidates that require human review.

The classifier algorithm used for this work is not novel; the novelty lies in its application to data triage for radio astronomy scientific investigations. This is the first demonstration of an operational machine learning system associated with a radio astronomy facility that can accurately filter and classify the large volume of candidate detections so as to greatly reduce the human time that must be invested in reviewing the candidates.

This kind of system will be vital to future instruments and observatories that collect more data than can be feasibly reviewed. The Square Kilometer Array (SKA) is expected to increase data collection by orders of magnitude. The same challenges arise for optical astronomy, e.g., the Palomar Transient Factory and the James Webb Space Telescope.

The team would like to acknowledge the contributions of Walter Brisken, Sarah Burke-Spolaor, Adam T. Deller, Divya Palaniswamy, David R. Thompson, Steven J. Tingay, and Randall B. Wayth. These are the members of the science team who provided labels of data to aid in training the classifier.

This work was done by Benyang Tang and Kiri L. Wagstaff of Caltech for NASA’s Jet Propulsion Laboratory. This software is available for license through the Jet Propulsion Laboratory, and you may request a license at: https://download.jpl.nasa.gov/ops/request/request_introduction.cfm . NPO-49870