Auteur Sujet: Incident pop cogent de toulouse - 20/04/18  (Lu 3261 fois)

0 Membres et 1 Invité sur ce sujet

butler_fr

  • Client Bbox adsl
  • Modérateur
  • *
  • Messages: 3 605
  • FTTH orange
Incident pop cogent de toulouse - 20/04/18
« le: 20 avril 2018 à 12:01:09 »
Incident électrique sur le pop cogent de toulouse depuis environ 11h20

Hugues

  • AS2027 MilkyWan
  • Modérateur
  • *
  • Messages: 12 423
  • Lyon (69) / St-Bernard (01)
    • Twitter
Incident pop cogent de toulouse - 20/04/18
« Réponse #1 le: 20 avril 2018 à 14:15:46 »
C'est KO chez AppliWave aussi. (Sans impact)




butler_fr

  • Client Bbox adsl
  • Modérateur
  • *
  • Messages: 3 605
  • FTTH orange
Incident pop cogent de toulouse - 20/04/18
« Réponse #2 le: 20 avril 2018 à 16:40:57 »
Tout est remonté à 12h30 environ
1h10 de coupure et pour l'instant pas d'infos...

Hugues

  • AS2027 MilkyWan
  • Modérateur
  • *
  • Messages: 12 423
  • Lyon (69) / St-Bernard (01)
    • Twitter
Incident pop cogent de toulouse - 20/04/18
« Réponse #3 le: 20 avril 2018 à 16:42:32 »
Je veux bien le RFO quand tu l'auras ;-)

JulienOHAYON

  • AS29075 Officiel Ielo
  • Expert
  • *
  • Messages: 199
  • Paris (75)
    • ielo
Incident pop cogent de toulouse - 20/04/18
« Réponse #4 le: 20 avril 2018 à 19:30:47 »
C'est KO chez AppliWave aussi. (Sans impact)



Effectivement !

butler_fr

  • Client Bbox adsl
  • Modérateur
  • *
  • Messages: 3 605
  • FTTH orange
Incident pop cogent de toulouse - 20/04/18
« Réponse #5 le: 23 avril 2018 à 15:00:07 »
le retour de cogent:

Citer
On 20th April of 2018, we experienced a power outage at our Cogent datacenter in Toulouse.

The timestamps of the power outage were:

Start: 11:19 CEST
End: 12:41 CEST

During a standard operation to replace a defective converter (48V/230V), we found some unforeseen problems:

-          As part of the process to isolate the faulty converter, some breakers had to be opened to ensure the work can be done as per the mandatory safety regulations. When this was done, the AC power downwards the TGBT was cut, impacting the main breaker switch. UPS batteries took the load for approximately 25-30 minutes until they got depleted.
-          We tried to restore the power to the TGBT reverting the position for all breakers, but it didn´t work.
-          Under this kind of events there are 2 workarounds to restore AC power feed to the TGBT:
o   Use an output taken directly from the UPS.
o   Try to bypass manually the main breaker (that requires a special process and tools)
It was decided to proceed with the manual bypass of the main breaker, as it would help to isolate the root cause of the fault without risk of cutting the AC power again once it was restored.
-          With the main breaker bypassed, it was possible to diagnose that there was a static bypass switch faulty. This faulty part led us to think originally that the 48/220 V converter was faulty, due to the nature of alarms raised, and also prevented the generator to take the load correctly, as the transfer did not work.
-          After restoring the AC power, our Field technician worked with some customers on site to ensure their racks have recovered the power.
-          The defective bypass switch has been replaced, and power infrastructure has been normalized. UPS backup is recovered and batteries are loaded again.

Our Field country manager did an audit of the incident and we have found some room of improvement in our recovery process that is already being implemented:

-          Field team retrained in the bypass processes.
-          Audit of the required tools to undertake these emergency bypass processes.