Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have developed a technique that enables deep-learning models to efficiently adapt to new sensor data directly on an edge device.
Their on-device training method, Pock-Engine, determines which parts of a huge machine-learning model need to be updated to improve accuracy, and only stores and computes with those specific pieces. It performs the bulk of these computations while the model is being prepared, before runtime, which minimizes computational overhead and boosts the speed of the fine-tuning process.
“On-device fine-tuning can enable better privacy, lower costs, customization ability, and also lifelong learning, but it is not easy. Everything has to happen with a limited number of resources. We want to be able to run not only inference but also training on an edge device. With PockEngine, now we can,” said Associate Professor and Senior Author Song Han.
Deep-learning models are based on neural networks, which comprise many interconnected layers of nodes, or “neurons,” that process data to make a prediction. When the model is run, a process called inference, a data input (such as an image) is passed from layer to layer until the prediction (perhaps the image label) is output at the end. During inference, each layer no longer needs to be stored after it processes the input.
But during training and fine-tuning, the model undergoes a process known as backpropagation, during which the output is compared to the correct answer, and then the model is run in reverse. Each layer is updated as the model’s output gets closer to the correct answer.
PockEngine speeds up the fine-tuning process and cuts down on the amount of computation and memory required. The system first fine-tunes each layer, one at a time, on a certain task and measures the accuracy improvement after each individual layer. In this way, PockEngine identifies the contribution of each layer, as well as trade-offs between accuracy and fine-tuning cost, and automatically determines the percentage of each layer that needs to be fine-tuned.
“This method matches the accuracy very well compared to full back propagation on different tasks and different neural networks,” Han added.
PockEngine deletes bits of code to remove unnecessary layers or pieces of layers, creating a pared-down graph of the model to be used during runtime. It then performs other optimizations on this graph to further improve efficiency.
“It is like before setting out on a hiking trip. At home, you would do careful planning — which trails are you going to go on, which trails are you going to ignore. So then at execution time, when you are actually hiking, you already have a very careful plan to follow,” Han explains.
When they applied PockEngine to deep-learning models on different edge devices, it performed on-device training up to 15 times faster, without any drop in accuracy. PockEngine also significantly slashed the amount of memory required for fine-tuning.
The team also applied the technique to the large language model Llama-V2. With large language models, the fine-tuning process involves providing many examples, and it’s crucial for the model to learn how to interact with users, Han says. The process is also important for models tasked with solving complex problems or reasoning about solutions.
In the future, the researchers want to use PockEngine to fine-tune even larger models designed to process text and images together.
For more information, contact Abby Abazorius at