While on the other side of the world Amazon’s AWS was battling electrical storms, our humble Disaster Recovery (DR) Drill was battling small electrical fires this morning. Every DR Drill we run has always had some form of excitement or other. We conduct one drill for each facility once a year (or, at least, that’s what we’re supposed to). Every time, something quite unexpected will happen.
A longer time ago, most problems were IT related. Switches, routers, or storage systems would lose their configuration. Servers that could not come back up after power was resumed. More recently, we were beginning to run into problems that were more facility related.
Today’s drill saw some electrical problems. As we reached the stage where we resumed power and attempt to restore normal operations, we suddenly noticed a distinct smell. It’s like burning rubber. In a data centre, this usually means there is an electrical short-circuit. Or perhaps a small electrical fire. Soon, it was not just the smell we notice, but also we could clearly see that the air was filling up with smoke. The fumes were also getting pungent.
The DR Drill had to be suspended. We activated emergency power off on the power distribution system, and evacuated everyone. There will be no FM200 gas discharge because we had disabled the system for the DR Drill.
It took some time to clear the smoke before we re-entered the facility to determine the source of the smoke. The offending rack was identified and taken off-line. The resume and restore phases of the DR Drill was all set to continue.
Then, as we were still in the midst of restoring power for the second time, I thought I noticed the air getting cloudy again. Initially I thought it was my imagination. It could simply be some latent smoke from the earlier incident being re-circulated. But a few moments later, we all agreed the smell was back, and soon, we could see new smoke pouring out of another location. Oh dear, it’s happening all over again. Fancy that happening the second time in the same drill.
In case you’re wondering, there was no real danger to people. We knew what we were doing. But obviously we had some defective equipment.
The gadget you see in the photo at the top is the VESDA alarm panel. VESDA, or Very Early Smoke Detection Apparatus, is actually more of a brand name. The generic name for it is HSSD – High-Sensitivity Smoke Detection. These are quite “high-tech” things if you’ve not heard of them before.
HSSD systems draw air through a series of small puncture holes in a network of pipes run through the area under protection, and bring all these air into a sampling chamber were smoke particulars are counted. They are extremely sensitive. The commissioning test for such systems usually involve using a smoldering piece of wire (smoldering only, not actually burning, absolutely no flames), hid inside say an IT rack, and placed in a corner of the room. The HSSD must be able to detect the smoke.
We seemed to be quite unlucky to be hit with two electrical faults at about the same time. There was yet another issue after that, albeit one that has much lesser impact. One of our power distribution units was persistently signaling a fault indication to our fault management system. It is supposed to indicate there is a fault or trip condition somewhere, but we could not find any such condition.
It’s been an eventful day.