Since the first paper studying the environmental impact of this technology was published three years ago, there has been a growing movement among researchers to self-report the energy used and emissions generated by their work. Having the right numbers is an important step towards change, but actually collecting those numbers can be a challenge.
“You can’t improve what you can’t measure,” says Jesse Dodge, a scientist at the Allen Institute for AI in Seattle. “The first step for us, if we want to make progress in reducing emissions, is that we have to get a good measurement.”
To that end, the Allen Institute recently collaborated with Microsoft, AI company Hugging Face, and three universities to create a tool that measures the power consumption of any machine learning program running on Azure, Microsoft’s cloud service. In addition, Azure users building new models can see the total electricity consumed by graphics processing units (GPUs)—computer chips specialized for performing calculations in parallel—during every phase of their project, from model selection to training and deployment. . It is the first major cloud provider to give users access to information about the energy impact of their machine learning programs.
While there are already tools that measure energy use and emissions from machine learning algorithms running on local servers, those tools don’t work when researchers use cloud services provided by companies like Microsoft, Amazon and Google. Those services don’t give users direct visibility into the GPU, CPU, and memory resources their activities are consuming—and existing tools like Carbontracker, Experiment Tracker, EnergyVis, and CodeCarbon need those values to provide accurate estimates.
The new Azure tool, which debuted in October, currently reports energy consumption, not emissions. So Dodge and other researchers figured out how to map energy use to emissions and presented a follow-up paper on that work at FAccT, a major IT conference, in late June. The researchers used a service called Watttime to estimate emissions based on zip codes from cloud servers using 11 machine learning models.
They found that emissions can be significantly reduced if researchers use servers in specific geographic locations and at specific times of the day. Emissions from training small models with machine learning can be reduced by up to 80% if training starts at times when more renewable electricity is available online, while emissions from large models can be reduced by over 20% if training is paused when renewable electricity is available power is low and restarts when there is more power.