The Intermediate Palomar Transient Factory (iPTF) is a visible-spectrum astronomy survey aimed at detecting “transient” events such as supernovae. Every night, a telescope at the Palomar observatory collects images of various regions of the sky and compares them to a set of reference images taken on previous nights. The image comparison is done via a subtraction process. The reference images are subtracted from the new nightly images, and any remaining light sources are flagged as candidate transient events.
Because there are errors in the subtraction pipeline, the system occasionally identifies “bogus” candidates that are not real events of interest, but merely artifacts of the subtraction process. In fact, an overwhelming majority of candidates are bogus, and there are too many nightly candidates for astronomers to manually review. Thus, a machine learning classification system is trained to differentiate between “real” and “bogus” candidates. Only the candidates judged most likely to be real by this real/bogus classifier are presented to astronomers for review.
In addition to the real/bogus classifier, there are also improvements to the subtraction pipeline aimed to reduce the number of bogus candidates produced for the classifier to process. Unfortunately, when a proposed improvement is implemented, it is difficult to determine what effect is produced in the population of bogus candidates. Determining whether a pipeline change had the desired effect requires being able to determine the proportions of each of these types of candidates within the population of all bogus candidates. If done manually, this process would take many hours of tedious work.
To reduce the manual effort required to characterize the various classes of bogus candidates, a Web-based labeling interface for classifying candidates was developed. As the user provides labels to various candidates, an active learning algorithm suggests other candidates for the user to label that would maximally improve its ability to correctly classify the unlabeled candidates. Thus, the system quickly improves its ability to correctly classify candidates with minimal human effort.
Each candidate is rendered as a 100 × 100-pixel PNG thumbnail image showing the telescope’s field of view around the candidate. Clicking on a candidate image brings up a page with more detailed information, including the reference image, the new “science” image taken on the night when the candidate was detected, and the subtraction between these two images. This page also allows a user to assign the candidate to one of the classes that has already been identified, or to create a new class and assign the candidate to it. The system is initialized with no classes, so the first candidate a user labels is always assigned to a new class.
Given all of the labels assigned by a user, the system constructs a classifier to predict the classes of the unlabeled candidates. The classifier is a random forest built on the features that the real/bogus system also uses to classify candidates. To guide the user in labeling the most informative candidates, the system presents a list of candidates (in the form of the PNG thumbnail images) ranked in the order of informativeness, derived from the classifier’s confidence that the candidate belongs to each class. A user can click on any of these candidates to assign a label. The classifier is retrained after each new label is assigned, and the retraining process typically takes less than several seconds.
The system also allows the user to view a summary of all of the classes currently discovered in the data. Each class is represented using a small number (10) of thumbnail images of candidates from the class. The user can click on a class to see all of the candidates currently assigned to the class. Within the class view, if a user right-clicks on a candidate, this “confirms” the label of the candidate, adding an additional labeled example to the classifier. If a user identifies a mislabeled example within a class, he or she can left-click on the candidate to reassign it to the correct class. Thus, the system allows the user, in a relatively short time (on the order of a couple hours), to label 1,000 candidates that are then used to train a classifier that predicts the labels of the remaining candidates.
This work was done by Gary B. Doran of Caltech for NASA’s Jet Propulsion Laboratory. This software is available for license through the Jet Propulsion Laboratory, and you may request a license at: https://download.jpl.nasa.gov/ops/request/request_introduction.cfm. NPO-49903