Back

Network outage

Resolved
Full unavailability

Started 11 months ago

Timeline 6

  • Ongoing
    Sep 23, 2024 17:55 UTC+0 (11 months ago)

    There is currently a network disruption in Germany. Our technicians are working hard on a solution.

  • Investigating
    Sep 23, 2024 18:16 UTC+0 (11 months ago)

    The network connection has been restored. Our technicians are currently investigating how the problem occurred and what measures can be taken to prevent it in the future.


  • Investigating
    Sep 23, 2024 18:22 UTC+0 (11 months ago)
    The impact of this issue has been revised to Degradation

    Our technicians can no longer detect any problems. However, investigations into the incident are still ongoing. We will contact you as soon as we have more information.


  • Observing
    Sep 23, 2024 20:51 UTC+0 (11 months ago)
    The impact of this issue has been revised to No Impact

    Issue has been fully resolved around 10.04pm CEST. We're now observing the situation and will prepare the RFO which will be provided tomorrow.


  • Observing
    Sep 24, 2024 09:23 UTC+0 (11 months ago)

    On Monday, September 23, 2024 at around 18:30 CEST, one of our edge routers (ER2) in the fra1 datacenter (Frankfurt/Main, Telehouse Germany) experienced a linecard failure. Due to the redundant structure of our routing infrastructure, this incident had no impact on the external reachability of the site at this time, as the other edge router (ER1) maintained connectivity.


    In order to restore connectivity, our on-call engineering team was activated and a network technician was sent to the data center to monitor the orderly restart of the affected device on site.


    Shortly after the technician arrived at the datacenter, there was a similar outage on the remaining ER1 at around 19:55 CEST, which was handling the site's external connectivity without any remaining redundancy at this time. This second outage resulted in a complete loss of external network conncectivity.


    At 18:06 CEST (approx. 9 minutes later), the technician was able to put ER1 back into provisional operation. This restored external network availability. However, the other edge router (ER2) was still out of service.


    After investigating the incident together with the manufacturer's support team, a software error was identified as the cause of the failure based on the logs. As both edge routers were initially put into operation 1-2 hours apart, the problem occurred on both devices at the same amount of power-on-hours.


    In order to solve the software problem in a lasting way, the edge router (ER2), which is still out of operation, was updated to a new software version by means of a firmware update. The update, including the associated configuration adjustments, was completed at around 21:40 CEST. In order to return the ER2 to redundant operation, a configuration adjustment to the active edge router (ER1) was necessary. However, this adjustment led to a total failure of ER1 due to the prevailing software problem, which resulted in a repeated unavailability of the external network connectivity of approx. 4 minutes.


    At 21:53 CEST, external connectivity was restored via the now updated ER2. ER1 was then restarted in order to fully restore redundant operation. At 22:05 CEST, all peering and transit connections at the fra1 (Telehouse) site were active again and network connectivity was fully redundant again.


    Currently, both edge routers are being operated on a different software version to ensure that the ER2 update does not cause any complications. Internal data traffic at the site was not affected at any time. The total downtime of 17 minutes only affected the external accessibility of the systems at the affected location.


    In the next 48 hours, after successfully verifying the functionality of ER2, we will update the firmware on ER1 as part of an emergency maintenance window. During this maintenance work, we do not expect any loss of external accessibility due to the restored redundancy. We will inform you separately about the upcoming maintenance.


  • Resolved
    Sep 25, 2024 22:02 UTC+0 (11 months ago)

    Tonight the firmware of ER1 has been updated to the same version as ER2. There was no service impact at any time. We're closing this incident as we do not expect any further issues.