A document describes a solution to missing flux values in time-domain optical and radio astronomical survey data that form “light curves.” The technique incorporates a priori astronomical knowledge into a missing value imputation technique. It is assumed that missing values in astronomical time series are either Missing At Random (MAR), or missing due to the flux of the source falling below the instrument’s sensitivity threshold, termed Threshold Removed Observations (TRO).

The imputation technique consists of two stages: (1) classification of a flagged missing observation as MAR or TRO, and (2) imputation of the missing values according to their classification. Having classified the unlabeled MAR+TRO set of observations as either MAR or TRO, standard linear interpolation is applied to infer the values of the MAR values, and cubic spline interpolation to infer the values of the TRO observations.

This work was done by Colorado J. Reed of University of Iowa, and Umaa D. Rebbapragada, Kiri L. Wagstaff, and David R. Thompson of Caltech for NASA’s Jet Propulsion Laboratory. NPO-48544



This Brief includes a Technical Support Package (TSP).
Document cover
Missing Value Imputation in Astronomical Time-Series Data

(reference NPO-48544) is currently available for download from the TSP library.

Don't have an account?



Magazine cover
NASA Tech Briefs Magazine

This article first appeared in the August, 2014 issue of NASA Tech Briefs Magazine (Vol. 38 No. 8).

Read more articles from this issue here.

Read more articles from the archives here.


Overview

The document titled "Missing Value Imputation in Astronomical Time-Series Data" presents a comprehensive approach to handling missing data in astronomical light curves, which are often plagued by incomplete observations due to various factors, including instrument sensitivity thresholds. The work is a collaboration between NASA's Jet Propulsion Laboratory and the California Institute of Technology, focusing on the development of advanced imputation techniques that incorporate domain knowledge from astronomy.

The authors introduce a constrained form of Expectation-Maximization (EM) clustering to classify missing values as either Missing At Random (MAR) or due to the flux falling below a sensitivity threshold (TRO). This method leverages the characteristics of high-flux light curves, which are less likely to contain TRO values, to infer the nature of missing data in lower-flux light curves. The imputation process involves using linear interpolation for MAR values and third-order spline interpolation for TRO values.

The document outlines the methodology, including the creation of simulated MAR and TRO values by applying a flux sensitivity threshold and randomly removing values. It also discusses the evaluation of the imputation techniques, comparing the cluster-based approach with traditional methods. Results indicate that while the clustering technique achieves high recall for MAR systems, it does not consistently outperform standard imputation techniques in terms of reconstruction error.

Figures included in the document illustrate the simulation of missing values, accuracy metrics, and comparisons of various imputation methods. The authors emphasize the significance of incorporating domain-specific knowledge into statistical methods, as traditional techniques often assume that missing data is purely random, which may not be appropriate in the context of astronomical observations.

The document concludes by highlighting the importance of addressing missing data in time-series analyses, as many statistical methods require evenly sampled data. By presenting a system that integrates domain knowledge into the imputation process, the authors aim to improve the accuracy and reliability of analyses conducted on astronomical light curves.

Overall, this work represents a significant advancement in the field of astronomical data analysis, providing a framework for more effective handling of missing values and enhancing the understanding of time-series data in astronomy.