Resolved: DC3HAM Storage Outage


UPDATE 2021-08-28 02:59PM CEST
On Friday 27th August 19:30 a redundant storage cluster in our colocation DC3.HAM was failing during normal operations.
After onsite ananlyis we found that the storage stopped all services due to a suspected split brain error.
As a result of the storage cluster virtual servers running on VMware could not run properly.

The repair of the storage cluster was started immediately after analysis and finished around 28th August 5a.m.
After storage recovery all running virtual servers have been restarted and checked, all production systems have been up and running after 28th August 09:45 a.m.

We are in further analysis of the root cause.

UPDATE 2021-08-28 10:25AM CEST
most systems are back, we are working to fix remaining problems mainly on QA system

UPDATE 2021-08-27 07:30PM CEST
VM storage cluster is currently not working as expected
thus sites and services are not available right now
we are working with high pressure to resolve this issue as fast as possible