La Fibre

Datacenter et équipements réseaux => Datacenter => hébergement Énergie => Discussion démarrée par: butler_fr le 20 avril 2018 à 12:01:09

Titre: Incident pop cogent de toulouse - 20/04/18
Posté par: butler_fr le 20 avril 2018 à 12:01:09
Incident électrique sur le pop cogent de toulouse depuis environ 11h20
Titre: Incident pop cogent de toulouse - 20/04/18
Posté par: Hugues le 20 avril 2018 à 14:15:46
C'est KO chez AppliWave aussi. (Sans impact)


Titre: Incident pop cogent de toulouse - 20/04/18
Posté par: butler_fr le 20 avril 2018 à 16:40:57
Tout est remonté à 12h30 environ
1h10 de coupure et pour l'instant pas d'infos...
Titre: Incident pop cogent de toulouse - 20/04/18
Posté par: Hugues le 20 avril 2018 à 16:42:32
Je veux bien le RFO quand tu l'auras ;-)
Titre: Incident pop cogent de toulouse - 20/04/18
Posté par: JulienOHAYON le 20 avril 2018 à 19:30:47
C'est KO chez AppliWave aussi. (Sans impact)


Effectivement !
Titre: Incident pop cogent de toulouse - 20/04/18
Posté par: butler_fr le 23 avril 2018 à 15:00:07
le retour de cogent:

On 20th April of 2018, we experienced a power outage at our Cogent datacenter in Toulouse.

The timestamps of the power outage were:

Start: 11:19 CEST
End: 12:41 CEST

During a standard operation to replace a defective converter (48V/230V), we found some unforeseen problems:

-          As part of the process to isolate the faulty converter, some breakers had to be opened to ensure the work can be done as per the mandatory safety regulations. When this was done, the AC power downwards the TGBT was cut, impacting the main breaker switch. UPS batteries took the load for approximately 25-30 minutes until they got depleted.
-          We tried to restore the power to the TGBT reverting the position for all breakers, but it didn´t work.
-          Under this kind of events there are 2 workarounds to restore AC power feed to the TGBT:
o   Use an output taken directly from the UPS.
o   Try to bypass manually the main breaker (that requires a special process and tools)
It was decided to proceed with the manual bypass of the main breaker, as it would help to isolate the root cause of the fault without risk of cutting the AC power again once it was restored.
-          With the main breaker bypassed, it was possible to diagnose that there was a static bypass switch faulty. This faulty part led us to think originally that the 48/220 V converter was faulty, due to the nature of alarms raised, and also prevented the generator to take the load correctly, as the transfer did not work.
-          After restoring the AC power, our Field technician worked with some customers on site to ensure their racks have recovered the power.
-          The defective bypass switch has been replaced, and power infrastructure has been normalized. UPS backup is recovered and batteries are loaded again.

Our Field country manager did an audit of the incident and we have found some room of improvement in our recovery process that is already being implemented:

-          Field team retrained in the bypass processes.
-          Audit of the required tools to undertake these emergency bypass processes.