To increase familiarity with the Stream Analytics Query Language, we are kicking off a blog series that will feature a new query pattern every week from Common Stream Analytics Query Patterns. In addition to the examples and explanations found in the documentation, posts will also include a sample data file that you can use to test out the query directly in an ASA job.
This week’s pattern is below and you can read about more patterns here. Looking to solve a problem that you don’t see captured yet? Let us know via our msdn forum or by tweeting us @AzureStreaming.
Description: Determine the length of time that a given condition was true
Example Scenario: Suppose you have a data stream of vehicles crossing a toll bridge. A bug occurred causing all cars to have an incorrect weight (over 20,000 pounds) for a period of time. Determine the duration that this bug impacted car data.
Input:
MAKE | TIME | WEIGHT |
Honda | 2015-01-01T00:00:01.0000000Z | 2000 |
Toyota | 2015-01-01T00:00:02.0000000Z | 25000 |
Honda | 2015-01-01T00:00:03.0000000Z | 26000 |
Toyota | 2015-01-01T00:00:04.0000000Z | 25000 |
Honda | 2015-01-01T00:00:05.0000000Z | 26000 |
Toyota | 2015-01-01T00:00:06.0000000Z | 25000 |
Honda | 2015-01-01T00:00:07.0000000Z | 26000 |
Toyota | 2015-01-01T00:00:08.0000000Z | 2000 |
Test File:
Download from GitHub: CarWeights.json
Query
SELECT
PrevGood.Time AS StartFault,
ThisGood.Time AS Endfault,
DATEDIFF(second, PrevGood.Time, ThisGood.Time)AS FaultDuraitonSeconds
FROM
Input AS ThisGood TIMESTAMPBY Time
INNERJOIN Input AS PrevGood TIMESTAMPBY Time
ONDATEDIFF(second, PrevGood, ThisGood)BETWEEN 1 AND 3600
AND PrevGood.Weight < 20000
INNERJOIN Input AS Bad TIMESTAMPBY Time
ONDATEDIFF(second, PrevGood, Bad)BETWEEN 1 AND 3600
ANDDATEDIFF(second, Bad, ThisGood)BETWEEN 1 AND 3600
AND Bad.Weight >= 20000
LEFTJOIN Input AS MidGood TIMESTAMPBY Time
ONDATEDIFF(second, PrevGood, MidGood)BETWEEN 1 AND 3600
ANDDATEDIFF(second, MidGood, ThisGood)BETWEEN 1 AND 3600
AND MidGood.Weight < 20000
WHERE
ThisGood.Weight < 20000
AND MidGood.Weight ISNULL
Output:
STARTFAULT | ENDFAULT | FAULTDURATIONSECONDS |
2015-01-01T00:00:01.0000000Z | 2015-01-01T00:00:08.0000000Z | 7 |
2015-01-01T00:00:01.0000000Z | 2015-01-01T00:00:08.0000000Z | 7 |
2015-01-01T00:00:01.0000000Z | 2015-01-01T00:00:08.0000000Z | 7 |
2015-01-01T00:00:01.0000000Z | 2015-01-01T00:00:08.0000000Z | 7 |
2015-01-01T00:00:01.0000000Z | 2015-01-01T00:00:08.0000000Z | 7 |
2015-01-01T00:00:01.0000000Z | 2015-01-01T00:00:08.0000000Z | 7 |
Explanation: We are looking for 2 “good” events with a series of consecutive “bad” events (weight over 20,000) in between. This is implemented using 2 JOINs and validating that we get good -> bad -> good by checking the weight and comparing the time stamps. We can now compute the duration between the beginning and end good events which gives us the duration of the bug.