Resolved: DC3HAM Network Issue on 19th July 2022

Dear customers,

We are experiencing an outage in our colocation DC3.HAM. We are currently investigating the root cause and will update you soonest.

We apologize for any inconvenience.

update 21:50: lumen confirmed a problem in their network and started fixing. we escalated to Lumen management.

update Lumen 22:42: As this network fault is impacting multiple clients, the event has increased visibility with Lumen leadership. As such, client trouble tickets associated to this fault have been automatically escalated to higher priority.

update Lumen 00:20: Further troubleshooting has isolated the trouble to a local providers network. The local provider has dispatched a field team. Work is underway to obtain an estimated time of arrival.

update 03:30: Lumen restored the connectivity, all systems are reachable again

DC3HAM: Lumen Network Issue on 12th April 2022

UPDATE 2022-04-12 15:15 CEST

Lumen finally succeeded to reconnect their datacenter in Hamburg which hosts our colocation DC3.HAM.
We have been checking and verifying all systems afterwards.
Systems including our ticket system are back and available.

UPDATE 2022-04-12 14:35 CEST

We are checking and verifying all systems and monitorings


UPDATE 2022-04-12 14:30 CEST

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (ScLo)

[SUMMARY OF WORK]

The Lumen NOC advises some services have begun to clear and the local provider continues to repair the remaining damaged fiber cable.

Checking on local network connections.

***************************************************


UPDATE 2022-04-12 11:48 AM CEST

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (ScLo)

[SUMMARY OF WORK]

The cause of the services interruption identified as Force Majeure Dortmund, Germany. Fibre maintainers are on site and work is ongoing.

We continue to push for ETR.

***************************************************


UPDATE 2022-04-12 09:56 AM CEST

Our colocation DC3HAM is still not available. The carrier Lumen is working on the problem.
Unfortunately also our ticket system is affected, we are reachable by mail.
We continue to push for ETR.


Dear customers,

We are experiencing an outage in our colocation DC3.HAM, obviously caused by our provider Lumen. We are in escalation contact with Lument about this and will update you soonest.

We apologize for any inconvenience.

DC3HAM Network Outage

Lumen finally succeeded around 01:30 am on Saturday to reconnect their datacenter in Hamburg which hosts our colocation DC3.HAM.
We have been checkng and verifying all systems afterwards.
Systems including our ticket system are back and available.
We will follow up with Lumen on an incident report.

——————————————————————————————-

Colocation DC3 seems to be back online, we are checking related systems from MCON side

——————————————————————————————-

*** CASCADED EXTERNAL NOTES 2022-03-12 00:05:37 GMT From CASE: 23317190 – SM Parent
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk ()

[SUMMARY OF WORK]
Good Morning

We are now seeing the services restored, we have asked the vendor to provide a full RFO.

We will keep you updated with all progress.

Kind Regards
Lumen

[PLAN OF ACTION]
Investigating RFO

[TIME – NOW] 2022-03-12 00:05 (UTC)
***************************************************

UPDATE 2022-03-11 23:56 AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (ScLo)

[SUMMARY OF WORK]

Good Afternoon

Colt partner has confirmed the completion of splicing however Colt customers services are still down. Partner has been requested to recheck the splicing. We will keep you updated on our progress via this email address. Thank you

We will continue to push for an ETR.

[PLAN OF ACTION]

[TIME – NOW] 2022-03-11 20:58 (UTC)

[UPDATE ETA]

***************************************************


UPDATE 2022-03-11 19:06 AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (ScLo)

[SUMMARY OF WORK]

Good Afternoon

Please be advised the local provider last mile field engineers continue in repair efforts for fix.

We continue to push for ETR.

[PLAN OF ACTION]

[TIME – NOW] 2022-03-11 17:58 (UTC)

[UPDATE ETA] 2022-03-11 19:15 (UTC)

***************************************************


UPDATE 2022-03-11 18:10 AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (ScLo)

[SUMMARY OF WORK]

Good Afternoon

Please be advised we are pushing for an ETR from the local provider. They have stated cable repair preperation is ongoing and will update further.

[PLAN OF ACTION]

[TIME – NOW] 2022-03-11 16:40 (UTC)

[UPDATE ETA] 2022-03-11 17:40 (UTC)

***************************************************


UPDATE 2022-03-11 15:59 AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (ScLo)

[SUMMARY OF WORK]

Good Afternoon

Please be advised our local providers last mile partner confirmed that situation is complex as damage location is occupied with heavy construction machinery which need to be cleared for digging work. Civil work is ongoing and expected time of restoration is awaited.

[PLAN OF ACTION]

We will follow up with further update in next 2 hours.

[TIME – NOW] 2022-03-11 14:42 (UTC)

[UPDATE ETA] 2022-03-11 16:42 (UTC)

***************************************************

Next update by: 2022-03-11 16:45 GMT


UPDATE 2022-03-11 14:54 AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (ScLo)

[SUMMARY OF WORK]

Good Afternoon

Please be advised that we continue to work with the local provider for updates and progress in relation to this case, we have requested an urgent update and confirmation of when service will be restored, as original ETR provided has now passed, we will aim to provide a further update in the next 60 minutes

[PLAN OF ACTION]

await local provider update and update customer once feedbacj received

[TIME – NOW] 2022-03-11 13:43 (UTC)

[UPDATE ETA] 2022-03-11 14:43 (UTC)

***************************************************

Next update by: 2022-03-11 14:45 GMT


UPDATE 2022-03-11 14:20 AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (ScLo)

[SUMMARY OF WORK]

Good Afternoon

Please be advised that we can confirm engineers continue to work to restore service, as advised previously, we have been given an ETR of 13:00 GMT, and this still stands at this time , however damage to fibre was extensive so this maybe pushed back ,we will aim to provide a further update in the next 60 minutes

[PLAN OF ACTION]

await local provider update and update customer once feedback received

[TIME – NOW] 2022-03-11 12:20 (UTC)

[UPDATE ETA] 2022-03-11 13:20 (UTC)

***************************************************

Next update by: 2022-03-11 13:20 GMT


UPDATE 2022-03-11 08:18AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (ScLo)

[SUMMARY OF WORK]

Good Morning

Please be advised that we can confirm that engineers are onsite and working to restore service, the fibre break is located at Wuppertal City Germany, the local provider has confirmed that the ETR for completion of the work is 13:00 GMT

We will aim to provide a further update around 12:00-12:30 GMT to confirm if we are still on target for the ETR, once we have this confirmation, we will forward this over to you

[PLAN OF ACTION]

chase local provider around 12:30 to confirm we are still on target for ETR provided, once confirmed update customer

[TIME – NOW] 2022-03-11 09:42 (UTC)

***************************************************

Next update by: 2022-03-11 12:15 GMT


UPDATE 2022-03-11 08:18AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk ()

[SUMMARY OF WORK]

Good Morning

We are still awaiting testing from the vendor. We will keep you updated in all progress.

Kind Regards

Lumen

[PLAN OF ACTION]

Investigating

[TIME – NOW] 2022-03-11 07:15 (UTC)

***************************************************


UPDATE 2022-03-11 07:19AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (AsHa)

[SUMMARY OF WORK]

Good Afternoon,

Field engineers are actively repairing the fault and we shall update you as soon as information is available.

Kind Regards,

Lumen

[PLAN OF ACTION]

Engage Local Carrier

[TIME – NOW] 2022-03-11 06:17 (UTC)

***************************************************


UPDATE 2022-03-11 03:37AM CET

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (AsHa)

[SUMMARY OF WORK]

Good Afternoon,

Our local carrier have informed us there engineers are expected to arrive at the affected location at 04:30 GMT. We will update you at this time of there findings.

Kind Regards

Lumen

[PLAN OF ACTION]

Engage Local Carrier

[TIME – NOW] 2022-03-11 02:23 (UTC)

***************************************************


UPDATE 2022-03-11 02:39AM CET


[CUSTOMER UPDATE] EMEA Service Desk (AsHa)

[SUMMARY OF WORK]

Good Afternoon,

There is an issue in our partners network on a link between Dortmund and Dusseldorf, Germany.We will inform you accordingly of any information as it becomes available.

Kind Regards

Lumen

[PLAN OF ACTION]

Engage Local Carrier

[TIME – NOW] 2022-03-11 01:36 (UTC)

***************************************************

Next update by: 2022-03-11 02:40 GMT


UPDATE 2022-03-11 02:22AM CET
no update from support of datacenter carrier LUMEN
escalation level has been raised


UPDATE 2022-03-11 00:56AM CET
ticket has been raised at support of datacenter carrier LUMEN

***************************************************

[CUSTOMER UPDATE] EMEA Service Desk (AsHa)

[SUMMARY OF WORK]

Good Afternoon,

Your services are affected by a major outage in our local carrier’s network. We are engaging them and will updat eyou accordingly.

Kind Regards,

Lumen

[PLAN OF ACTION]

Engage Local Carrier

[TIME – NOW] 2022-03-10 23:57 (UTC)

**********************************************

Next update by: 2022-03-11 01:00 GMT


UPDATE 2022-03-10 11:48PM CET
we just identified that uplinks of or datacenter carrier LUMEN are down right now
this being said we are currently tracking with their support to find a quick solution


UPDATE 2022-03-10 11:00PM CET
network is currently not working as expected
thus sites and services are not available right now
we are working with high pressure to resolve this issue as fast as possible

Resolved: Outage DC3HAM

Dear Customer,

a trivial change on our external firewall with a (normal) subsequent sync to the secondary device caused both firewalls to go into “disabled”
state and not forward any packets anymore.

An deactivation and activation of the High Availability solved the problem.

The connection interruption lasted from 11:08 – 11:41 CEST, we apologize for any inconvenience.

We are in contact with the developers to find the rootcause of this issue.

We apologize for any inconvenience.

Resolved: DC3HAM Storage Outage

RESOLVED

UPDATE 2021-08-28 02:59PM CEST
On Friday 27th August 19:30 a redundant storage cluster in our colocation DC3.HAM was failing during normal operations.
After onsite ananlyis we found that the storage stopped all services due to a suspected split brain error.
As a result of the storage cluster virtual servers running on VMware could not run properly.

The repair of the storage cluster was started immediately after analysis and finished around 28th August 5a.m.
After storage recovery all running virtual servers have been restarted and checked, all production systems have been up and running after 28th August 09:45 a.m.

We are in further analysis of the root cause.

UPDATE 2021-08-28 10:25AM CEST
most systems are back, we are working to fix remaining problems mainly on QA system

UPDATE 2021-08-27 07:30PM CEST
VM storage cluster is currently not working as expected
thus sites and services are not available right now
we are working with high pressure to resolve this issue as fast as possible

Resolved: DC3HAM Network Outage

Update 04:26 PM CEST:

RESOLVED

Commercial power was restored, thus restoring services to a stable state.

We will provide details about exact root cause once we received from provider Lumen.

Update 04:12 PM CEST:

Transport NOC reports the main power breakers have been reset and commercial power is restored to the location. The team is working to turn individual breakers on one at a time to restore equipment. Services will begin to restore as each breaker is energized.

Update 02:59 PM CEST:

Field Operations have arrived on site and determined a commercial power failure to be the cause of impact to services. The local power provider has been engaged to assist with restoral efforts.

Update 02:05 PM CEST:

Lumen is still working on the issue

There is a major network event in Frankfurt, Germany that effects our services on a global scale, depending on the routing our services in our Colocation in Hamburg might not be reachable. The provider Lumen is working on that issue, we will update this as soon as we know more

Resolved: Connectivity issues in AWS AZ in Region Frankfurt

https://status.aws.amazon.com/

7:24 AM PDT Starting at 5:07 AM PDT we experienced increase connectivity issues for some instances, degraded performance for some EBS volumes and increased error rates and latencies for the EC2 APIs in a single Availability Zone (euc1-az3) in the EU-CENTRAL-1 Region. By 6:03 AM PDT, API error rates had returned to normal levels, but some Auto Scaling workflows continued to see delays until 6:35 AM PDT. By 6:10 AM PDT, the vast majority of EBS volumes with degraded performance had been resolved as well, and by 7:05 AM PDT, the vast majority of affected instances had been recovered, some of which may have experienced a power cycle. A small number of remaining instances and hosted on hardware which was adversely affected by this event and require additional attention. We continue to work to recover all affected instances and have opened notifications for the remaining impacted customers via the Personal Health Dashboard. For immediate recovery, we recommend replacing any remaining affected instances if possible.

6:29 AM PDT We continue to make progress in resolving the connectivity issues affecting some instances in a single Availability Zone (euc1-az3) in the EU-CENTRAL-1 Region. The increased error rates and latencies for the RunInstance and CreateSnapshot APIs have been resolved, as well as the degraded performance for some EC2 volumes within the affected Availability Zone. We continue to work on the remaining EC2 instances that are still impaired as a result of this event, some of which may have experienced a power cycle. While we do not expect any further impact at this stage, we would recommend continuing to utilize other Availability Zones in the EU-CENTRAL-1 region until this issue has been resolved.
6:05 AM PDT We are seeing increased error rates and latencies for the RunInstances and CreateSnapshot APIs, and increased connectivity issues for some instances in a single Availability Zone (euc1-az3) in the EU-CENTRAL-1 Region. We have resolved the networking issues that affected the majority of instances within the affected Availability Zone, but continue to work on some instances that are experiencing degraded performance for some EBS volumes. Other Availability Zones are not affected by this issue. We would recommend failing away from the affected Availability Zone until this issue has been resolved.
5:29 AM PDT We are investigating increased error rates and latencies for the EC2 APIs and connectivity issues for some instances in a single Availability Zone in the EU-CENTRAL-1 Region

while performing regular updates there is an issue on AWS (see below)
we are publishing this because it might have impact on further proceedings during our regular patching

https://status.aws.amazon.com/
1:24 PM PDT We are investigating connectivity issues for some EC2 instances in a single Availability Zone (euc1-az1) in the EU-CENTRAL-1 Region.

https://status.aws.amazon.com/
6:54 PM PDT RESOLVED: Connectivity Issues & API Errors connectivity issues for some EC2 instances in a single Availability Zone (euc1-az1) in the EU-CENTRAL-1 Region has been resolved and the service is operating normally.