Final Update: Saturday, 7/25/2015 10:18 UTC
We’ve confirmed that all systems are back to normal with no customer impact as of 7/25, 10:00 UTC. Our logs show the incident started on 7/24, 18:40 UTC and that during the ~15 hours that it took to resolve the issue 20% of customers experienced ; However this issue has caused dropping of small amount of trace data from current query system, so a subset of customers will still see partial data return if data is queried against this. We plan to replay this data soon (no ETA yet) and then all data will be accessible. Also customers will see data gap in availability report for initial impact window (7/24 18:40- 20:00 UTC).
• Root Cause: The failure was due to Microsoft Azure storage downtime that initiated imbalance in our data nodes.
• Incident Timeline: 14 Hours & 40 minutes - 7/24, 18:40 UTC through 7/25, 10:00 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.
-Application Insights Service Delivery Team
Update: Saturday, 7/25/2015 05:23 UTC
We are still working to mitigate the issue completely. System health has been restored partially and impact has been reduced to a subset of customers who are hosted on impacted data nodes. At this moment we don't have any ETA for full recovery but we provide an update as we progress.
• Next Update: Before 17:00 UTC
-Application Insights Service Delivery Team
Update: Saturday, 7/25/2015 00:08 UTC
We continue to restore the system health. Initial issue that was preventing users to access data has been completely fixed and at present impact is limited to data latency. Our system is recovering but on slow processing rate. DevOps team is looking into further options to mitigate the situation as soon as possible.
• Next Update: Before 05:00 UTC
-Application Insights Service Delivery Team
Update: Friday, 7/24/2015 21:05 UTC
Root cause has been isolated to storage services errors caused in Microsoft Azure Storage which was impacting Application Insights. To address this issue Microsoft Azure team has applied mitigation and our services are recovering fast. Some customers may experience data latency till issue is completely resolved.
• Next Update: Before 7/25 00:00 UTC
-Application Insights Service Delivery Team
Initial Update: Friday, 7/24/2015 19:12 UTC
We are aware of issues within Application Insights and are actively investigating. Some customers may experience Data access issue for all data types.
• Work Around: None
• Next Update: Before 21:00 UTC
We are working hard to resolve this issue and apologize for any inconvenience.
-Application Insights Service Delivery Team