Salience Assignment for Multiple-Instance Data and Its Application to Crop Yield Prediction
- Created on Monday, 01 November 2010
Automated mapping of crops saves on survey time and improves map accuracy.
An algorithm was developed to generate crop yield predictions from orbital remote sensing observations, by analyzing thousands of pixels per county and the associated historical crop yield data for those counties. The algorithm determines which pixels contain which crop. Since each known yield value is associated with thousands of individual pixels, this is a “multiple instance” learning problem.
Because individual crop growth is related to the resulting yield, this relationship has been leveraged to identify pixels that are individually related to corn, wheat, cotton, and soybean yield. Those that have the strongest relationship to a given crop’s yield values are most likely to contain fields with that crop. Remote sensing time series data (a new observation every 8 days) was examined for each pixel, which contains information for that pixel’s growth curve, peak greenness, and other relevant features.
An alternating-projection (AP) technique was used to first estimate the “salience” of each pixel, with respect to the given target (crop yield), and then those estimates were used to build aregression model that relates input data (remote sensing observations) to the target. This is achieved by constructing an exemplar for each crop in each county that is a weighted average of all the pixels within the county; the pixels are weighted according to the salience values. The new regression model estimate then informs the next estimate of the salience values. By iterating between these two steps, the algorithm converges to a stable estimate of both the salience of each pixel and the regression model. The salience values indicate which pixels are most relevant to each crop under consideration.
This approach produces better estimates than an existing “primary instance” (PI) approach does. The PI approach assumes that each county contains a single canonical pixel for each crop (corn, cotton, soybean, etc.) and that the rest of the pixels in that county are noisy observations of the true one.
This work could ultimately provide automated mapping of crops that are being grown, which could save agencies such as the U.S. Department of Agriculture a significant amount of money that is currently devoted to surveying fields to produce summaries of how much of each crop is being grown. Reliable early estimates of the likely volume of production can significantly affect crop prices throughout the season.