Connections through SDM nodes temporarily disrupted

Incident Report for StrongDM

Postmortem

Summary:

During a routine system update on October 2, a server release was pushed to production that changed how tokens were validated, which led to failed connections to resources. Our internal canary deployment caught the issue and alerted the team. They started to rollback the change when we started to learn about customer impact. There were two compounding issues that caused this outage. 

Cause:

It took some time for the issue to manifest in our Canary environment. By the time the system alerted, the build was deemed OK and pushed to production. Which was the ultimate root cause of the failure.

Impact:

  • Disruption to resource connections.
  • Users who logged in during the window needed to re-authenticate.
  • No unauthorized access or security exposure.

Remediation & Prevention:

  • Extending the delay between staging and production deployments for better early detection.
  • Updating infrastructure to manage environment
Posted Oct 09, 2025 - 22:30 UTC

Resolved

This incident has been resolved. An RCA will be provided within a week.

If you logged in to the StrongDM service during this short outage, the authentication won't be able to be verified with our control plane. Please log out and log back in to refresh the authentication with a valid token.
Posted Oct 02, 2025 - 22:14 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Oct 02, 2025 - 21:36 UTC

Investigating

The problem has been identified and a rollback is underway.
More details will follow.
Posted Oct 02, 2025 - 21:23 UTC
This incident affected: US Control Plane (API), UK Control Plane (API), and EU Control Plane (API).