Confidence-based Feature Acquisition (CFA) is a novel, supervised learning method for acquiring missing feature values when there is missing data at both training (learning) and test (deployment) time. To train a machine learning classifier, data is encoded with a series of input features describing each item. In some applications, the training data may have missing values for some of the features, which can be acquired at a given cost. A relevant JPL example is that of the Mars rover exploration in which the features are obtained from a variety of different instruments, with different power consumption and integration time costs. The challenge is to decide which features will lead to increased classification performance and are therefore worth acquiring (paying the cost).
To solve this problem, CFA, which is made up of two algorithms (CFA-train and CFA-predict), has been designed to greedily minimize total acquisition cost (during training and testing) while aiming for a specific accuracy level (specified as a confidence threshold). With this method, it is assumed that there is a non-empty subset of features that are “free;” that is, every instance in the data set includes these features initially for zero cost. It is also assumed that the feature acquisition (FA) cost associated with each feature is known in advance, and that the FA cost for a given feature is the same for all instances. Finally, CFA requires that the base-level classifiers produce not only a classification, but also a confidence (or posterior probability).
CFA trains an ensemble of classifiers M0 ... Mf that use successively larger subsets of the features to classify instances. M0 uses only the “free” (zero cost) features, and M1 additionally incorporates costly features F1 through Fi. CFA reduces FA cost in that model Mi is trained only on instances that cannot be classified with sufficient confidence by model Mi – 1. Therefore, values for feature Fi are acquired only for the instances that require it.Attest time, each test instance is successively classified by M0, M1, M2 ... until its classification is sufficiently confident (i.e., until the confidence of the prediction reaches the confidence threshold). Again, features are acquired for the new instance only as required. In an empirical comparison with an existing method (Cost-Sensitive Naive Bayes) that makes acquisition decisions only during test time (and therefore requires that all training items be fully acquired), CFA achieves the same (or higher) level of performance at a much reduced cost (by at least an order of magnitude).