We are seeing sporadic issues with users needing to login multiple times to be authenticated.

Incident Report for StrongDM

Postmortem

A recent update revealed an underlying issue with our replica database. Internal alerts flagged the problem and the incident was declared when StrongDM received customer support tickets.

To resolve the issue and prevent recurrence, our Infrastructure team made improvements to query monitoring and adjusted how we manage replica lag within RDS.

Incident Timeline:

Sep 5, 20:49 UTC - Change deployed
Sep 6, 13:50 UTC - First problem report from internal alerts
Sep 6, 15:15 UTC - Tickets from two customers, incident declared, replica disabled
Sep 6, 15:59 UTC - Incident resolved

Posted Sep 12, 2024 - 20:42 UTC

Resolved

The incident is considered resolved as we have seen no additional errors. We will be performing an internal post-mortem/RCA and an incident after action review next week.

Posted Sep 06, 2024 - 15:59 UTC

Update

The US Control Plane was experiencing intermittent authentication issues affecting all users, as well as listing available resources. The issue presented by requiring a user to authenticate multiple times before they are allowed into the AdminUI or the SDM Client. We have remediated the source of the issue and are continuing to monitor for any additional errors.

Posted Sep 06, 2024 - 15:39 UTC

Update

We are continuing to monitor for any further issues.

Posted Sep 06, 2024 - 15:38 UTC

Monitoring

The issue has been identified and a fix has been implemented. Normal operations should resume. We will continue to monitor and provide further updates here.

Posted Sep 06, 2024 - 15:31 UTC

Update

We are continuing to investigate this issue.

Posted Sep 06, 2024 - 15:27 UTC

Investigating

We are currently investigating this issue and will update here with more information.

Posted Sep 06, 2024 - 15:19 UTC

This incident affected: US Control Plane (Admin UI).