We had a major incident tonight that led all US communities to be offline for more than 6 hours - you can find an Incident link to this here.
We are dearly sorry for this unacceptable lapse in service. At inSided, we realize that incidents happen - however, in this case, our out-of-office-hours response was not good enough. US customers were left with no service, acknowledgment from us, or any way to contact us. We are taking this seriously with the highest possible priority and will, of course, learn from this to prevent any form of repeat in the future. We have outlined several points we can act upon quickly to reduce a significant point of failure. However, we need to improve our internal incident escalation process should anything like this happen again out of hours.
Brief overview:
We had a down time of 6 hours and 47 minutes in total which impacted all US hosted customers and took all communities completely offline.
This occurred due to a critical failure in a US based infrastructure. Mitigating actions were taken late, due to lapses in the incident-procedures in place.
The end result was that between 00:00 UTC and 06:47 UTC (01:00-07:47 CET; 4pm-10:47pm Pacific) all users were faced with an unbranded, technical 502 error message.
This post mortem will give an overview of the timeline, incident breakdown, actions taken and action planned - this can be found here: https://status.insided.com/incidents/25j27dc1n57d