The amount of power required for training and searching a certain neural network architecture involves the emissions of roughly 626,000 pounds of carbon dioxide. That’s equivalent to nearly the lifetime emissions of the average US car including its manufacturing. This issue gets even more severe in the model deployment phase, where deep neural networks need to be deployed on diverse hardware platforms, each with different properties and computational resources.

Researchers have developed a new automated artificial intelligence (AI) system for training and running certain neural networks. By improving the computational efficiency of the system in some key ways, it can cut down the pounds of carbon emissions involved - in some cases, down to low triple digits. The system — a once-for-all (OFA) network — trains one large neural network comprising many pre-trained subnetworks of different sizes that can be tailored to diverse hardware platforms without retraining. This dramatically reduces the energy usually required to train each specialized neural network for new platforms - which can include billions of Internet of Things (IoT) devices. Using the system to train a computer-vision model, the process required roughly 1/1,300 the carbon emissions compared to today’s state-of-the-art neural architecture search approaches, while reducing the inference time by 1.5 to 2.6 times.

The system was built on a recent AI advance called AutoML (for automatic machine learning), which eliminates manual network design. Neural networks automatically search massive design spaces for network architectures tailored, for instance, to specific hardware platforms.

But there’s still a training efficiency issue: Each model has to be selected then trained from scratch for its platform architecture. The researchers’ AutoML system trains only a single, large OFA network that serves as a “mother” network, nesting an extremely high number of subnetworks that are sparsely activated from the mother network. OFA shares all its learned weights with all subnetworks — meaning they come essentially pre-trained. Thus, each subnetwork can operate independently at inference time without retraining.

One OFA can comprise more than 10 quintillion — a 1 followed by 19 zeroes — architectural settings, covering probably all platforms ever needed. Training the OFA and searching it ends up being far more efficient than spending hours training each neural network per platform. Moreover, OFA does not compromise accuracy or inference efficiency. Instead, it provides state-of-the-art ImageNet accuracy on mobile devices.

For more information, contact Abby Abazorius at abbya@ mit.edu; 617-253-2709.