The challenge is addressed as an “active learning” problem.
This work deals with the challenge of online adaptive data collection in a time series. A remote sensor or explorer agent adapts its rate of data collection in order to track anomalous events while obeying constraints on time and power. This problem is challenging because the agent has limited visibility (all its datapoints lie in the past) and limited control (it can only decide when to collect its next datapoint). This problem is treated from an information-theoretic perspective, fitting a probabilistic model to collected data and optimizing the future sampling strategy to maximize information gain. The performance characteristics of stationary and nonstationary Gaussian process models are compared.Self-throttling sensors could benefit environmental sensor networks and monitoring as well as robotic exploration. Explorer agents can improve performance by adjusting their data collection rate, preserving scarce power or bandwidth resources during uninteresting times while fully covering anomalous events of interest. For example, a remote earthquake sensor could conserve power by limiting its measurements during normal conditions and increasing its cadence during rare earthquake events. A similar capability could improve sensor platforms traversing a fixed trajectory, such as an exploration rover transect or a deep space flyby. These agents can adapt observation times to improve sample coverage during moments of rapid change.
An adaptive sampling approach couples sensor autonomy, instrument interpretation, and sampling. The challenge is addressed as an “active learning” problem, which already has extensive theoretical treatment in the statistics and machine learning literature. A statistical Gaussian process (GP) model is employed to guide sample decisions that maximize information gain. Nonstationary (e.g., time-varying) covariance relationships permit the system to represent and track local anomalies, in contrast with current GP approaches.
Most common GP models are “stationary,” e.g., the covariance relationships are time-invariant. In such cases, information gain is independent of previously collected data, and the optimal solution can always be computed in advance. Information-optimal sampling of a stationary GP time series thus reduces to even spacing, and such models are not appropriate for tracking localized anomalies. Additionally, GP model inference can be computationally expensive.