What happens when your Uninteruptible Power Supply fails? Kind of ironic, yes?
No, we are not talking about dead batteries. Or a power outage that lasts too long. We are talking about a UPS that failed. And brought down an entire agency. 2500 stand alone PCs and another 15,000 users wondering why they cant get anywhere.
It happened. Why? A comedy of errors and ignorance. Of a general services group that apparently does not understand that any electronic device has a power on wattage and a running wattage. Of layman that do not understand that Air conditioning is not needed when there is no equipment running. And of a maintenance group that does their job well in testing the fail over to a generator once a week.
It happened again, but this time, due to some very heroic efforts of the people left out of the loop, no one noticed except the employees. The Public never did. Just the employees.
Time was when a building coordinator could dictate all that there was about a building. They were the experts. They KNEW the stuff, and us techs were just ignorant slobs.
But in this day of 24x7 operations, when no outage is tolerable, they are long overdue for an update. And some re-education on startup wattage and running wattage.
This comedy of errors started 4 years ago when this General services person rightly decided that if the equipment was running, then the AC had to be running. But then incorrectly decided that since the equipment was on the UPS, so must be the AC.
bad choice. The AC does not have data, and so an interruption of a few seconds (10-15) until the generator kicked in was not terminal. So they overloaded the UPS. So much so that when the AC compressors (plural) all decided to kick on at the same time, it flipped the UPS over to by pass mode. That is raw power - as in all the sags and surges.
Half an hour later, the maintenance schedule kicked in, and tested the power on of the generator. Since the UPS was in bypass, all went down hard! And when the generator kicked in and all came back up, the startup wattage blew the circuit on the UPS. The network was in effect dead. Computers were on, but no switches, servers, or routers. Dead.
We did manage to get all back on in 2 hours. And are taking steps to make sure it does not happen again. And spending the bucks that should have been spent to take the AC off the UPS. And fortunately, the customers never noticed as the problem was identified early in the morning and the by pass set up by 8am.
And steps are being taken to ensure it does not happen again. All involved with the recovery did their best and it was enough. This time.
but the lesson learned? Building people do not know IT systems, and unless they start asking for help when designing the backup systems, they are going to get fried. This time, they are excused for the age old reason of "I did not know". But that is not going to fly in the future. This was their warning. I hope they heed it. Somehow I doubt they will.
What do you do when the UPS fails? The power company never did. It was our failure to make sure they understood how power works. now they know.
We will not accept that excuse again.