The Board Committee of Inquiry (BCOI) investigating the fire at SingTel’s Bukit Panjang Exchange has just released their 37-page report, detailing their findings and recommendations. HardwareZone has very succinctly summarised the report into a few key points, so if you’re too busy to read the original report, check out their article.
We’ve heard it reported before that the fire at the exchange’s cable chamber was likely caused by an unauthorised blowtorch. The official report explains that the unauthorised blowtorch is twice as hot as SingTel’s approved blowtorch, and the excess heat likely resulted in localised overheating and which subsequently led to a slow burning fire that was not noticed before the workers went out to lunch. Fire detection systems were suppressed during the works, and were not reactivated during the lunch break, thus causing the delay in discovering the fire.
None of this is very surprising. What worries me is the significant impact of disruptions coming from a single fire incident. It’s easy to see that this cable chamber could potentially be a critical single point of failure. Critical infrastructure services should thus seek to incorporate sufficient path diversity so that there are no single points of failure.
However, it appears that SingTel did not give enough attention to this matter. Self-healing fibre rings were found to be temporarily folded such that overlapping portions became a single point of failure. Some cabinets have a single entranceway for redundant fibre rings, negating the redundancy characteristics because a single cut at the entranceway would have disrupted services completely. There is also the lack of equipment diversity (i.e. to protect against equipment failure by having redundant equipment).
Beyond SingTel, business customers do not entirely understand or provision for their connectivity resiliency requirements. For example, while SingTel offers enterprise services with both exchange and path diversity (at extra cost, of course), few business customers take up the service.
In a nutshell, both SingTel and their business customers aren’t paying enough attention to designing their critical infrastructure to be resilient against a simple single point of failure. We are talking about a single fire incident here, affecting a single cable chamber. According to the report, SingTel’s own services affected by this fire include:
- 15% of “mio Voice” services nationwide
- 11% of “mio TV” services nationwide
- 8.6″ of broadband services nationwide
- 186 mobile base stations
- 22% of Layer 3 MPLS circuits nationwide
- 4.6% of international leased circuits nationwide
Services of other service providers which use the Bukit Panjang Exchange was not reported.
It is easy to design a resilient, redundant, infrastructure service. But actually implementing it and keeping it maintained that way is the challenge.