Outlier or anomaly detection refers to the task of identifying abnormal or inconsistent patterns from a dataset. While they may seem to be undesirable entities, identifying them has many potential applications in fraud and intrusion detection, medical research, and safety-critical vehicle health management. Outliers can be detected using supervised, semi-supervised, or unsupervised techniques. Unsupervised techniques do not require labeled instances for detecting outliers. Supervised techniques require labeled instances of both normal and abnormal operation data for first building a model (e.g., a classifier), and then testing if an unknown data point is a normal one or an outlier. The model can be probabilistic such as Bayesian inference or deterministic such as decision trees, Support Vector Machines (SVMs), and neural networks. Semi-supervised techniques only require labeled instances of normal data. Hence, they are more widely applicable than the fully supervised ones. These techniques build models of normal data and then flag outliers that do not fit the model.
v-Anomica is a novel anomaly detection technique that can be trained on huge data sets with reduced running time compared to the benchmark oneclass SVM algorithm. It achieves this by developing its model incrementally.
In v-Anomica, the idea is to train the machine such that it can provide a close approximation to the exact model using fewer training points and without losing much of the accuracy of the classical approach. The proposed algorithm was tested on a variety of continuous data sets under different conditions. Under static conditions (dataset properties not changing), the developed procedure closely preserves the accuracy of standard one-class SVMs, while reducing both the training time and the test time by 15 to 20 times.