StrongDM Status - Incident History

SDM is aware of an outage involving AWS.

2024-10-07T20:00:29Z

Oct 7, 20:00 UTC
Resolved - Customers may experience intermittent access to the admin UI and delays in receiving logs, which could impact auditing and reporting.

For more information, please visit AWS Health: https://health.aws.amazon.com/health/status

We will provide further updates as they become available.

Updates will impact Reports Dashboards

2024-09-26T22:30:02Z

Sep 26, 22:30 UTC
Completed - The scheduled maintenance has been completed.

Sep 26, 22:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Sep 26, 16:28 UTC
Scheduled - During this 30 minute maintenance, reports dashboards will be stale/inaccurate as we update our backend data replication tasks. If reports are still not updating after the maintenance window ends, please reach out to StrongDM Support.

Update will impact Reports Dashboard

2024-09-10T22:30:02Z

Sep 10, 22:30 UTC
Completed - The scheduled maintenance has been completed.

Sep 10, 22:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Sep 10, 18:47 UTC
Scheduled - The maintenance will be for 30 minutes. During that time report dashboards may be inaccurate. If reports are still not updating after the maintenance window ends reach out to StrongDM Support.

We are seeing sporadic issues with users needing to login multiple times to be authenticated.

2024-09-06T15:59:23Z

Sep 6, 15:59 UTC
Resolved - The incident is considered resolved as we have seen no additional errors. We will be performing an internal post-mortem/RCA and an incident after action review next week.

Sep 6, 15:39 UTC
Update - The US Control Plane was experiencing intermittent authentication issues affecting all users, as well as listing available resources. The issue presented by requiring a user to authenticate multiple times before they are allowed into the AdminUI or the SDM Client. We have remediated the source of the issue and are continuing to monitor for any additional errors.

Sep 6, 15:38 UTC
Update - We are continuing to monitor for any further issues.

Sep 6, 15:31 UTC
Monitoring - The issue has been identified and a fix has been implemented. Normal operations should resume. We will continue to monitor and provide further updates here.

Sep 6, 15:27 UTC
Update - We are continuing to investigate this issue.

Sep 6, 15:19 UTC
Investigating - We are currently investigating this issue and will update here with more information.

PostgreSQL DB Maintenance

2024-08-24T03:00:04Z

Aug 24, 03:00 UTC
Completed - The scheduled maintenance has been completed.

Aug 24, 01:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Jul 29, 21:40 UTC
Scheduled - We wanted to inform you of an upcoming brief update to our PostgreSQL database that is scheduled for Friday, August 23rd between 6:00 PM to 8:00 PM Pacific Time. This update is part of our commitment to continually enhance the security and performance of our services.

Service Impact:
During the update window, you may experience a brief service outage lasting approximately 2 minutes if you attempt to use your StrongDM connection while the database is modifying. This includes automated queries during the outage window. You should not have to take any actions to resume normal usage after the update.

Authentication Errors

2024-08-01T21:28:49Z

Aug 1, 21:28 UTC
Resolved - This incident is resolved. We will update this page with a more detailed report of the causes.

Aug 1, 20:53 UTC
Monitoring - A fix has been implemented and we are monitoring the results.

Aug 1, 20:27 UTC
Identified - The issue has been identified and fix is being implemented.

Aug 1, 20:00 UTC
Investigating - We have been alerted to an outage involving clients not being able to log back in after a session outage.

StrongDM is investigating the issue and further updates will be posted as we have them.

UPDATE: StrongDM Important Service Interruption Notice - Scheduled PostgreSQL Update on Friday, May 31st from 6pm - 8pm Pacific

2024-06-01T03:00:02Z

Jun 1, 03:00 UTC
Completed - The scheduled maintenance has been completed.

Jun 1, 01:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Apr 24, 20:15 UTC
Scheduled - We wanted to inform you of an upcoming major update to our PostgreSQL database that is scheduled for Friday, May 31st from 6:00 PM to 8:00 PM Pacific Time. This update is part of our commitment to continually enhance the security and performance of our services.

Service Impact:
During the update window, you may experience a brief service outage lasting approximately 5 minutes if you attempt to use your StrongDM connection while the database is upgrading. Following this, there may also be a temporary period of inaccuracy in your Analytics charts as the data repopulates.

Why This Is Necessary:
This update will introduce improvements and new features that will enhance the functionality and reliability of StrongDM services. It is essential for maintaining the highest level of service quality and security for your PostgreSQL data.

What You Need to Do:
Please ensure that any critical operations or activities scheduled during this time are adjusted to avoid any potential disruptions. We recommend reviewing and planning around this maintenance window to minimize any impact on your operations.

High DB Load after upgrade.

2024-06-01T02:00:00Z

Jun 1, 02:00 UTC
Resolved - From between 9pm and 10:50pm CDT some customers were seeing failed connections and authentication issues due to the high load of the db after the scheduled upgrade. These issue should be clear at this time.

Engineering continues to monitor.

SDM Partial Login Outage

2024-05-13T18:20:00Z

May 13, 18:20 UTC
Resolved - Login from app.strongdm.com is currently routing you to 'page not found'.
The issue has been identified and a fix is being implemented.

SDM Partial Login Outage

2024-05-09T16:43:16Z

May 9, 16:43 UTC
Resolved - A partial outage impacting logins to the SDM app occurred from 11:17am to 11:25am CDT.

The interruption is due to an improperly migrated table in our db prior to a version change. A roll back was initiated and functionality should be restored.

Login Failure via app.strongdm.com

2024-05-06T21:59:27Z

May 6, 21:59 UTC
Resolved - This incident has been resolved.

May 6, 21:54 UTC
Monitoring - The issue has been resolved and login should be working at this time.

May 6, 21:49 UTC
Identified - Login from app.strongdm.com is currently routing you to 'page not found'.
The issue has been identified and a fix is being implemented.

strongDM Analytics Maintenance.

2024-04-19T02:00:02Z

Apr 19, 02:00 UTC
Completed - The scheduled maintenance has been completed.

Apr 19, 01:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Apr 18, 19:09 UTC
Scheduled - strongDM analytics may show incorrect data briefly due to maintenance on its backend database during this window.

DB Maintenance

2024-04-18T01:30:02Z

Apr 18, 01:30 UTC
Completed - The scheduled maintenance has been completed.

Apr 18, 01:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Apr 17, 19:52 UTC
Scheduled - SDM will be doing maintenance on our production db during the 30 minute window specified. Downtime is expected to be no more than a few minutes.

We do not expect that users or service accounts will be disconnected as they were during the outage on April 15th.

Authentication outage

2024-04-15T20:20:42Z

Apr 15, 20:20 UTC
Resolved - The incident affecting authentication within the StrongDM Platform has been resolved and Engineering staff are continuing to monitor the health of the Platform. Further status updates will be provided as needed.

Apr 15, 20:10 UTC
Investigating - StrongDM is currently experiencing errors with authentication to our Platform. We are currently investigating and will be providing updates.

SAML-based authentications were failing.

2024-03-12T22:00:00Z

Mar 12, 22:00 UTC
Resolved - March 12, 22:09 UTC: SDM revoked a set of older encryption keys.
March 13, 00:14 UTC: A signing certificate used to verify SAML-based authentications fell out of cache and was re-retrieved and re-decrypted, but this decryption failed as it was encrypted using revoked keys.
March 13, 10:12 UTC: SDM was alerted to failures authenticating via SAML, affecting SSO logins. This also affected access to Snowsight resources, which use SAML for authentication.
March 13, 11:24 UTC: Issue was escalated and began restoring the relevant revoked keys.
March 13, 12:07 UTC: Revoked keys were restored. Issue resolved.

A bug was pushed out impacting some customers using drivers related to MYSQL.

2024-01-08T17:00:00Z

Jan 8, 17:00 UTC
Resolved - At 17:00 UTC a sdm-cli update began rolling out which contained a bug; this bug caused some drivers to stop functioning, these drivers being primarily aliases for mysql:
* aurora-mysql
* aurora-postgres
* clustrix
* cockroach
* greenplum
* maria
* memsql
* singlestore
* a mongo replica-set variant

We were alerted to this problem at 19:38 UTC and executed a rollback for those who had already received the slow rollout, and that rollback will be complete at 20:38 UTC.

Routing Issues

2023-09-08T16:57:16Z

Sep 8, 16:57 UTC
Resolved - We made an enhancement to our routing protocol for compatibility with a new operating mode, which caused the current routing system to fail. The change has since been rolled back, and the issue identified to be resolved. The service should be functioning normally at this time.

Sep 8, 16:45 UTC
Investigating - Some organizations may experience connection issues. We are currently investigating the issue and will update here with additional information.

AWS Maintenance Outage

2023-08-11T04:00:00Z

Aug 11, 04:00 UTC
Resolved - On August 11th, 2023 from 11:14pm to 11:15pm Pacific the StrongDM production database was taken offline due to required maintenance by our service provider which caused a brief interruption of service.

We apologize for the inconvenience.

System email provider is currently inoperable

2023-04-20T16:26:20Z

Apr 20, 16:26 UTC
Resolved - We have implemented a resolution with our system email provider and the issue should now be resolved.

Apr 20, 15:43 UTC
Investigating - We are having an issue with our system email provider. This impacts password reset emails and other account-related emails. We are currently working towards a resolution and will provide further updates as soon as possible.

System Outage

2023-04-19T16:30:00Z

Apr 19, 16:30 UTC
Resolved - We released a server change this morning which increased the amount of data in each report generated by the reports library. This caused an increase in database activity that eventually led to operations timing out. This caused a brief outage in some server operations, primarily related to authentication. These lasted from 16:21 UTC to 16:29 UTC, upon which the issue was resolved.

During this time period, end user queries completed successfully and no logging data was lost.

We deployed a fix that prevents future impact on the database from operations related to the report library.

System email provider is currently down

2023-01-05T03:02:35Z

Jan 5, 03:02 UTC
Resolved - Our email provider has provided a fix and we have verified that the issue is resolved. We are closing the incident as our provider has closed it on their side.

Jan 5, 00:21 UTC
Identified - We are having an issue with our system email provider. This impacts password reset emails and other account-related emails. We are currently working and staying up to date with the provider and will provide further updates as soon as possible.

Network issue preventing access to StrongDM

2022-12-05T21:22:44Z

Dec 5, 21:22 UTC
Resolved - AWS resolved the issue on their side. Our indicators show our service is fully operational for all our customers.
The issue had been resolved.

Dec 5, 21:10 UTC
Monitoring - AWS are beginning to see signs of recovery. Currently our indications are that StrongDM service is back to normal but we continue monitoring it closely.

Dec 5, 21:01 UTC
Identified - AWS confirmed there is an issue which is impacting Internet connectivity for the US-EAST-2 Region. AWS is working on fixing the issue.
We will continue to follow up and let you know once the issue is resolved in AWS.

Dec 5, 20:24 UTC
Update - We are continuing to investigate this issue.

Dec 5, 20:21 UTC
Investigating - We have noticed a network issue preventing some customers and users from accessing the StrongDM client and portal. We are investigating the issue and you can follow the status of our investigation at https://strongdm.statuspage.io/

Okta SCIM sync issue

2022-10-27T20:05:50Z

Oct 27, 20:05 UTC
Resolved - Our system was running into issues connecting to Okta SCIM and may have caused disruptions in user creation and deletion. When users are created and deleted on the Okta side, these may have not sync'ed to SDM. Logging of IPs might also have been affected.

strongDM Service Degradation

2022-05-27T12:00:00Z

May 27, 12:00 UTC
Resolved - We experienced a slight degradation in services from a version promotion yesterday, which resulted in a higher connection count across the strongDM fleet. This led to some servers exceeding their maximum connection limit. Some users were briefly unable to access strongDM resources as a result. This release has been rolled back to a previous version to mitigate stress on the fleet.

Heavy Database Load

2021-10-18T19:49:40Z

Oct 18, 19:49 UTC
Resolved - This incident has been resolved.

Oct 18, 18:21 UTC
Monitoring - The high database load condition has been resolved. Engineering continues to investigate root cause, and we are monitoring all systems actively.

Oct 18, 18:09 UTC
Update - The database load has been resolved and the engineering team continues to investigate the root cause.

Oct 18, 18:07 UTC
Investigating - strongDM is experiencing heavy database load which may manifest in service degradation. The strongDM engineering team is actively investigating and we will update this incident as more information becomes available.