Harder Than a Needle in a Haystack
I spent last Friday & Saturday writing some SQL code for client to make finding historical data “outages” quicker and easier. They wanted to know where all of the time spans where the quality wasn’t good. I was thinking it was a little like a needle in a haystack problem, but it’s not. It’s harder than that. It’s more like measuring every piece of hay in the haystack.
Here’s the short version of how the code works. Before you say it sounds really inefficient, there’s a trick below that helped speed things up. The SQL code pulls back raw historical data and inspects it for when the data quality changes. For every time span where the data quality is the same, I would record an entry in a database with the tag name in the historian, the data quality, the start time of the time span, and the end time of the time span.
Doesn’t sound too bad until you look at the size of the pile of data. I had to run this on a year & a half of data across 4000 tags.
The trick: I used ActiveFactory Query to play around with several ways of pulling the raw data. The best way I found was to use the delta retrieval mode with a 100% value deadband. What that does is allow the InSQL backend that is processing the data in the files to only return results when the value changes 100% of its range. The trick is that it always returns a values when the quality changes (Technically, its when the OPC quality changes, not the quality detail). This dramatically speeds up the retrieval of analog tags. Doesn’t help a whole lot on boolean tags, but 80-90% of the tags I had to process were analogs. FYI, the reason it doesn’t help on boolean is a change from 0 to 1 (or 1 to 0) is 100% of the range. It can and does happen with analog tags too, but it’s a far rarer event.
To catch all of the cases where value changes came through on the raw data retrieval, there is some code to process all of the raw rows to cull out the non-quality changes.
It still took about two days of execution time to sift through all of that data. My rough estimate is that there are somewhere around 50 billion raw data points. It only takes a few minutes to sort through a single day’s data and cache it off. I would love to hear if anybody has any other tricks for doing this kind of mining expedition.


We did a web page to show gaps in data across three servers. They all had issues, making srarhes for ‘time in a given state’ difficult. Some of it turned out to be network and vmvare issues, but we never got 100% which was a bit dissapointing
I used sinatra and activerecord with ruby, and a daily scheduled task to collect data for approx 15 tags for a month
I’m curious what the VMWare issues were. My experience is that issues are rarely a direct result of running on VMWare.
[...] his latest blog entry, David Goodman from Avid Solutions talks about ways of analyzing gaps in Wonderware Historian data. [...]
Wonderware’s new slogan “We Wonderware the data is”….
Hey Guys…. I told you when we started this blog we’d be brutally honest sometime
- Andyt
Using the “100%” value deadband is a clever way to do this–great job with that. In cases where you need to check a lot of tags over a long time period (as in your scenario), Historian will be a little more efficient querying lots of tags (even all 4000) over shorter time periods (say, a history block duration). For example, query all tags looking for the gaps during a day, then repeat for another day instead of querying one tag for the whole year and then repeating for another tag.