Data centres pack lots of cool technology in them. The technology make them hot. Literally hot too, considering the amount of heat they dissipate. Now, combine that with a cooling crisis and you truly have a recipe for disaster.
Here’s what happened recently in a data centre I run. An unfortunate turn of events led to both the primary and backup cooling systems to fail together. Basically, the entire server area was without any cooling at all. As you can imagine, the room temperature became severely elevated. You can see from the AHU Air Temperature graph that the return air temperature climbed from about 26 degrees celcius to peak at 50 degrees celcius! That happened in about 1.5 hours.
(Actually 1.5 hours isn’t all that fast. We ran a test at a new data centre where a simulated chill water supply failure brought rack temperatures from a normal 22 degrees celcius to 50 degrees celcius within 2 minutes.)
This Aisle Temperature/Humidity graph shows how the room temperature along the hot aisle also climbed to 50 degrees celcius during the period of time. We believe the temperature probe probably has a sensing range of up to 50 degrees celcius, and that the actual room temperature was likely to be 54 degrees celcius at the peak.
The heat in the room was tremendous. All the variable speed fans of the servers were running at full speed, creating such a din that you literally had to shout to be heard at all while in the room. Metal parts such as door handles, turnstile and grills where actually hot to touch. The entire data centre was like a walk-in oven!
It took about 40 minutes to bring the room temperature back to normal levels.
Notice in the Incoming Power Supply graphs that the DC1 and DC2 (power to IT equipment and miscellaneous equipment other than cooling systems) that the power consumption actually climbed up during the cooling outage. This probably reflects the increase of fan speeds of the server equipment as they try to cope with the heat. Subsequently the power consumption dipped, because many server hardware begun their thermal shutdown. The normal power consumption for this facility is about 96KW. The 96KW comprises DC1 + DC2, which powers all IT equipment, lighting, control systems and other miscellaneous systems, except for cooling systems (which is captured under DC3). Oh btw, DC1, DC2 and DC3 are names of the three Data Centre power sources; They are not Direct Current power sources 🙂
We have been plagued by a spate of cooling problems, with the same failure scenario recurring 3 times within 5 days. The first time, a bunch of our co-located customer servers died, and to us, it was like “wow, how interesting”. The second time it happened, one of my own network servers shutdown, and I thought “Ok, this is getting annoying”.
Now, the third time is the worst yet. The new record high temperatures brought about widespread shutdown of many of our hardware. Damn, we have a serious problem.
I’ve gone through a few data centre projects. One thing I’ve learnt: Electricity and UPSes are simple… it is cooling that is complicated. Just for interest sake, the latest data centre project we have brings chill water right into the IT server area, direct into rack based cooling systems. These are Rittal Liquid Cooling Package (LCP), which is essentially a standard 19″ equipment rack with an aircon attached by its side. Here, we designed for high density racks with up to 20KW load per rack. I’ll blog more about this another time.
View Comment Policy