Final Update: Tuesday, 24 November 2015 00:19 UTC
We've confirmed that all systems are back to normal with no customer impact as of 11/23, 23:05 UTC. Our logs show the incident started on 11/23, 17:31 UTC and that during the 5 hours, 34 that it took to resolve the issue up to 18% of customers may have experienced failures or timeouts when accessing their data in Application Insights.
-Application Insights Service Delivery Team
We've confirmed that all systems are back to normal with no customer impact as of 11/23, 23:05 UTC. Our logs show the incident started on 11/23, 17:31 UTC and that during the 5 hours, 34 that it took to resolve the issue up to 18% of customers may have experienced failures or timeouts when accessing their data in Application Insights.
- Root Cause: The failure was due to a failure in a back-end service which caused telemetry to be unavailable for query through the Azure portal.
- Lessons Learned: We understand the triggers for the issue and are investigating configuration changes to the service platform which will prevent or compartmentalize any recurring impact.
- Incident Timeline: X Hours & X minutes - 11/23, 17:31 UTC through 11/23, 23:05 UTC
-Application Insights Service Delivery Team
Update: Monday, 23 November 2015 23:12 UTC
We are continuing to work on full recovery, but less than 3% of customers may experience intermittent data access issues until fully resolved.
We are continuing to work on full recovery, but less than 3% of customers may experience intermittent data access issues until fully resolved.
- Work Around: none
- Next Update: Before 11/24 01:30 UTC
Update: Monday, 23 November 2015 20:48 UTC
Root cause has been isolated to a transient failure in a back-end service which was impacting customer queries and alert configuration. The service self-recovered, but we adjusted several settings temporarily to help speed the recovery. Alert configuration is now working as expected and customers are able to create, update, and delete alerts, as well as alerts should again come with the correct test name. A few customers may experience intermittent data access failures.
Root cause has been isolated to a transient failure in a back-end service which was impacting customer queries and alert configuration. The service self-recovered, but we adjusted several settings temporarily to help speed the recovery. Alert configuration is now working as expected and customers are able to create, update, and delete alerts, as well as alerts should again come with the correct test name. A few customers may experience intermittent data access failures.
- Work Around: none
- Next Update: Before 11/23 23:00 UTC
Initial Update: Monday, 23 November 2015 18:44 UTC
We are aware of issues within Application Insights and are actively investigating. Roughly 18% of the customers may experience UX failures for viewing their data.
We are aware of issues within Application Insights and are actively investigating. Roughly 18% of the customers may experience UX failures for viewing their data.
- Work Around: none
- Next Update: Before 11/23 20:00 UTC
-Application Insights Service Delivery Team