Final Update: Friday, 10/2/2015 19:01 UTC
We’ve confirmed that all systems are back to normal with no customer impact as of 10/2, 18:40 UTC. Our logs show the incident started on 10/1, 20:35 UTC and that it took to resolve the issue 1% of customers experienced data gaps across all data types.
• Root Cause: The failure was due to deployment in production that caused race around condition for one of the service instance among multiple instances. As a mitigation , reboot of impacted instances fixed the issue .
• Lessons Learned: 1. We plan to implement multiple checks to detect race around condition and immediate mitigation. 2. Improvement on our monitoring system to detect the issue fast.
• Incident Timeline: 21 Hours & 55 minutes - 10/1, 20:35 UTC through 10/2, 18:40 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.
-Application Insights Service Delivery Team
Initial Update: Friday, 10/2/2015 18:20 UTC
We are aware of issues within Application Insights and are actively investigating. Some customers may experience Data Gaps. The following data types are affected: Availability, Customer Event, Dependency, Exception, Metric, Page Load, Page View, Performance Counter, Request, Trace.
• Work Around:
• Next Update: Before 20:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience.
-Application Insights Service Delivery Team
Clik here to view.
