Google is an inspiration for data center builders and managers, especially those who are green-minded, so it’s worth taking a lot at just how much machine learning is behind the latest developments. In 2014, Google announced that machine learning was the go-to strategy for upping data center efficiencies, and now Google is sharing even more as the strategy is being put into use. One year ago, Google was quiet about the fact that power usage effectiveness (PUE) had come to a standstill. However, one of the company’s engineers took it upon himself to focus on efficiency boosts as a side project. Those efforts have snowballed into machine learning backing PUE bolstering to a degree never seen before.
The Vice President of Datacenter Operations, Joe Kava, told The Platform during a Googleplex tour that the company’s engineers—as they often do—solved a big problem. Already, other data centers are copying the solution thanks to a piece of research recently published on the details. Many cloud creators were taught that measuring everything and using that data to drive outcomes was the only approach. It was used for selling goods, upping efficiency, and tackling new challenges. Google sticks to measuring everything possible, especially at data centers, but that doesn’t make this approach easy. Pinpointing data then actually mining it for better PUE is no easy task. This is especially tricky since, technically, PUE has no limits.
Breaking the Plateau
A plateau doesn’t necessarily mean poor performance, kind of like weight loss. You can be at your goal weight and have professional athlete-level muscle mass and body fat, but what about getting even better? Kava explains that even though Google’s numbers are “awesome already, it had been flat for several quarters. So I put a challenge to the team and asked them what they could do about it. This gets back to that mentality that you continue to measure, implement, improve and remeasure.” The engineer who came up with the answer is Jim Gao, who thought of using machine learning to go through billions of Google’s data regarding data centers.
Gao looked at pump speeds, the outside elements/weather, cooling tower fan speeds and more. Of course, it’s nearly impossible for a human to be able to analyze all of this information. Instead he came up with a machine learning model that required just four inputs to do all the legwork for him. Kava says, “We call him Boy Genius for a reason, and he went off and taught himself how to program and create machine learning algorithms, and trained the algorithm with billions of data points. We found that there are roughly 19 variables that are really important in the operating of the data center that would help to minimize PUE.” Nineteen variables is phenomenal, and impossible without machine learning. Kava says, “I don’t know about you, but I can’t visualize a 19 variable matrix. That is why it is so hard to look at the data and know that it is clear that you have to do this or that to get a better PUE. But it’s actually pretty trivial for a machine to run those millions of combinations and permutations.”
Started from the Bottom…
Once Gao had created the machine learning model along with the 19 variables, Google crosschecked PUEs in datacenters then used the model to see how accurate it was. A little fine tuning was necessary to achieve 99.6 accuracy, but ultimately Gao’s model was a huge success. It was found that in certain situations, a key subset must be used within the variables. That can be a challenge, but at the same time it was discovered that only a machine can achieve this analysis with accuracy—which is key for researchers to keep in mind for their own job security down the road.
Now, Gao’s model has become a full-fledged tool used by Google data center managers around the world. It’s been deployed at five sites since 2015, which has led to 15 percent better PUE overall (and up to 25 percent at one location). Gao’s full research is now available online, and a number of data centers have used that treasure to achieve similar results. While Google is anti-open source algorithms of course, in order to keep that competitive edge, it’s also an invaluable tool when golden nuggets are carefully revealed. Simply knowing there’s no limit has helped others in the IT world get motivated, and who knows what improvements can be made from Gao’s accomplishment?