I spent last Friday & Saturday writing some SQL code for client to make finding historical data “outages” quicker and easier. They wanted to know where all of the time spans where the quality wasn’t good. I was thinking it was a little like a needle in a haystack problem, but it’s not. It’s harder than that. It’s more like measuring every piece of hay in the haystack.
Here’s the short version of how the code works. Before you say it sounds really inefficient, there’s a trick below that helped speed things up. The SQL code pulls back raw historical data and inspects it for when the data quality changes. For every time span where the data quality is the same, I would record an entry in a database with the tag name in the historian, the data quality, the start time of the time span, and the end time of the time span.
Doesn’t sound too bad until you look at the size of the pile of data. I had to run this on a year & a half of data across 4000 tags.
The trick: I used ActiveFactory Query to play around with several ways of pulling the raw data. The best way I found was to use the delta retrieval mode with a 100% value deadband. What that does is allow the InSQL backend that is processing the data in the files to only return results when the value changes 100% of its range. The trick is that it always returns a values when the quality changes (Technically, its when the OPC quality changes, not the quality detail). This dramatically speeds up the retrieval of analog tags. Doesn’t help a whole lot on boolean tags, but 80-90% of the tags I had to process were analogs. FYI, the reason it doesn’t help on boolean is a change from 0 to 1 (or 1 to 0) is 100% of the range. It can and does happen with analog tags too, but it’s a far rarer event.
To catch all of the cases where value changes came through on the raw data retrieval, there is some code to process all of the raw rows to cull out the non-quality changes.
It still took about two days of execution time to sift through all of that data. My rough estimate is that there are somewhere around 50 billion raw data points. It only takes a few minutes to sort through a single day’s data and cache it off. I would love to hear if anybody has any other tricks for doing this kind of mining expedition.