The NN and the SVM help each other “learn” in an iterative process.
A computational method and system based on a hybrid of an artificial neural network (NN) and a support vector machine (SVM) (see figure) has been conceived as a means of maximizing or minimizing an objective function, optionally subject to one or more constraints. Such maximization or minimization could be performed, for example, to optimize solve a data-regression or data-classification problem or to optimize a design associated with a response function. A response function can be considered as a subset of a response surface, which is a surface in a vector space of design and performance parameters. A typical example of a design problem that the method and system can be used to solve is that of an airfoil, for which a response function could be the spatial distribution of pressure over the airfoil. In this example, the response surface would describe the pressure distribution as a function of the operating conditions and the geometric parameters of the airfoil.
The use of NNs to analyze physical objects in order to optimize their responses under specified physical conditions is well known. NN analysis is suitable for multidimensional interpolation of data that lack structure and enables the representation and optimization of a succession of numerical solutions of increasing complexity or increasing fidelity to the real world. NN analysis is especially useful in helping to satisfy multiple design objectives. Feedforward NNs can be used to make estimates based on nonlinear mathematical models. One difficulty associated with use of a feedforward NN arises from the need for nonlinear optimization to determine connection weights among input, intermediate, and output variables. It can be very expensive to train an NN in cases in which it is necessary to model large amounts of information.
Less widely known (in comparison with NNs) are support vector machines (SVMs), which were originally applied in statistical learning theory. In terms that are necessarily oversimplified to fit the scope of this article, an SVM can be characterized as an algorithm that (1) effects a nonlinear mapping of input vectors into a higher-dimensional feature space and (2) involves a dual formulation of governing equations and constraints. One advantageous feature of the SVM approach is that an objective function (which one seeks to minimize to obtain coefficients that define an SVM mathematical model) is convex, so that unlike in the cases of many NN models, any local minimum of an SVM model is also a global minimum.
In the SVM approach as practiced heretofore, underlying feature-space coordinates or functions must be specified. In the NN approach as practiced heretofore, resampling of data is needed to implement a process, known in the art as model hybridization, in which a superior neural network is generated from the synaptic-connection weight vectors of multiple neural networks that yield local minima with acceptably low errors. What is needed is a machine-learning algorithm that combines the desirable features of the NN and SVM approaches and does not require intimate a priori familiarity with operational details of the object to be optimized. Preferably, the algorithm should automatically provide a characterization of many or all of the aspects in feature space needed for the analysis.
A hybrid NN/SVM system (see figure) accepts inputs in the form of parameter values, which are regarded as independent coordinates in an input vector space. In the construction of the SVM, the input coordinates are mapped into a feature space of appropriately greater dimensionality, wherein the coordinates include computed combinations (e.g., powers and/or polynomials) of the input space coordinates. The NN is initially programmed with random synaptic-connection weights and used to construct inner products for the SVM. The inner products are, in turn, used to compute Lagrange multipliers. A training error associated with the connection weights and Lagrange multipliers is calculated. If the training error is too large, one or more connection weights are changed and all of the foregoing (except the initial programming with random weights) steps are repeated. If the training error is not too large, the connection weights and the Lagrange multipliers are accepted as optimal.
An important advantage of this system over a conventional SVM is that the feature- space coordinates that must be specified a priori are determined by the NN subsystem. Moreover, the feature-space coordinates are generated by the NN subsystem to correspond to the data at hand; in other words, the feature space provided by the NN subsystem evolves to match or correspond to the data. A feature space that evolves in this manner is referred to as “data-adaptive.” The feature-space coordinates generated by the NN subsystem can be easily augmented with additional feature-space coordinates (combinations of parameters) and kernel functions provided by the user.
This work was done by Man Mohan Rai of Ames Research Center.
This invention has been patented by NASA (U.S. Patent No. 6,961,719). Inquiries concerning rights for the commercial use of this invention should be addressed to the Ames Technology Partnerships Division at (650) 604-2954. Refer to ARC-14586.