14 Myths About Historians & SQL Databases

tranh2 · April 1, 2014, 3:46pm

Hey Induction/Ignition Users!

I was watching a video about SQL Databases in SCADA (I assume for “Historian” uses as well as other uses in Ignition SCADA).
youtube.com/watch?v=QMku2PRjKLg

I also came across an article 14 Myths about Historians. I just wanted to start a discusssion about this and see if any of these apply or are of concern with Ignition SCADA since it uses SQL databases.

“14 Myths About Historians
Introduction
Relational databases are ideal for many applications, but are not the best solution for timeseries
data. Historian
applications have been around a long time and through the years various fallacies and ‘myths’ have developed
and perpetuate without substance or evidence. This paper will discuss the merits and shortfalls of relational
databases as well as uncover and challenge these myths in the context of specialized process historian
applications. This paper will also present some basics of historian applications, explain some important
considerations for an objective evaluation of historian solutions, and properly set expectations for the value a
historian can bring.
Highthroughput
systems are not the same as process historians. While examples can be found of the New York
Stock Exchange trading systems and other highthroughput
applications which use relational databases, it can not
be implied from these examples that Relational Database systems are suitable for everything. While Microsoft
SQL Server or Oracle databases are capable of storing timeseries
data, there are some very significant problems
that should be considered when contemplating their use for process history and industrial applications.
Myth #1: Storage is so cheap that efficiency does not matter
It is common to dramatically underestimate how much data a typical process actually generates. A modest 5,000
tag historian which logs data every second, generates 157 billion distinct values per year. Even when stored
efficiently in 8 bytes per value, storage requirements would be around 1 terabyte per year. In some tests when
comparing SQL Server storage requirements for timeseries
data and necessary indices with those of
Wonderware Historian the difference was 50:1. Even though storage prices are falling, 50 terabytes of data a
year is still a lot. It should also be recognized that it is not merely a matter of having enough disk space to hold
that much data; it is frequently a requirement that historical data be robustly protected, which multiplies the
amount of storage required for backups or disk mirroring. Some industries impose mandatory regulatory
obligations to maintain several years of data, which further magnify the overall storage requirement.
Myth #2: Relational databases are fast enough
As hardware price performance has improved, relational databases have benefited. However, relational
databases are designed to protect referential integrity around “transactions” that may update multiple table
values in unison, which adds significant overhead. For example, on highend
hardware (running 64 Itanium
processors) SQL Server 2008 established a world record 1126 transactions per second. Granted such transactions
are not the same as those required for a historian, but even such highend
hardware would be taxed to store
5,000 values per second if each value was a transaction. This means a frontend
buffering application must
collect the data and stream numerous values into the database as a single transaction. Other databases without
full transactional support, such as MySQL’s freeware MyISAM storage engine, can support higher throughputs, but
still require a frontend
buffer to achieve adequate throughput for all but the tiniest historian applications.
Of course, the reason to store data in the first place is so that it is available to retrieve later. Naturally, that
makes retrieval performance quite important, too. Particularly in generalpurpose
solutions like a relational
database, it is possible to organize data so that it is either efficient to store (higher throughput) or efficient to
retrieve (fast retrieval), but not both. Efficient retrieval of timeseries
data from a general purpose database
requires use of a “clustered index,” something not available, for example, in the higherthroughput
MyISAM
storage engine.
In contrast, purposebuilt
storage engines designed specifically for timeseries
data leverage the knowledge of
how data is collected and consumed to store it efficiently for both—this would not be possible if the data were
more generalized.
Myth #3: Using a relational database as a historian is a new revelation
There are frequently new ideas about how to apply existing technology—3M PostIt
® notes were a famous
application of an existing notsosticky
adhesive. However, companies have attempted using a simple relational
database schema as a replacement for purposebuilt
timeseries
storage for over a decade. In spite of this “new
idea” of using a relational database, the market for dedicated historian solutions has continued to grow
significantly.
Myth #4: You can only use SQL to query data in relational databases
Though relational databases have many advantages over alternative technologies, what catapulted them to
prominence was the power of SQL. SQL standardized and fundamentally changed how users can extract value
from their data from being a complex programming exercise to being a relatively simple and flexible language to
describe the data of interest.
Fortunately, it is very practical to adapt SQL to nonrelational
data stores and gain the tremendous benefits and
power of SQL without some of the inherent limitations of a relational database.
Myth #5: There is nothing special about timeseries
data
With all the power of SQL to query data, some may claim that relational databases are just as good at retrieving
timeseries data as they are for transactional data. It is certainly true that SQL gives great flexibility, but it is based
on some fundamental assumptions that do not apply to timeseries
data: (a) there is no inherent order in the
data records (in fact, timeseries
data is ordered by time), (b) all the data is explicitly stored (in fact, most
historian data only represents samples from a continuum of the real data), all data is of equal significance.
These two differences are significant. For example, if an instrument reports a value time stamped at
“7:59:58.603” and a user queries a relational database for the value at “8:00:00.000,” no data will be returned
since there is no records stored for precisely that time—the database does not recognize that time is a
continuum. Similarly, if a temperature was “21.0 ºC” and twominutes
later was “23.0 ºC”, it has no inherent
ability to infer that halfway between these samples the temperature was approximately “22.0 ºC”.
In historian applications, steadystate
operations are rarely most significant. The only way for a client application
to find exceptions is to query all of the data for a measurement, which can place a heavy load on the overall
system: server, network and client. In contrast, historians generally have means of filtering out insignificant data
(based on comparing sequential records) to radically reduce the volume of data that must be delivered to client
applications.
Myth #6: Managing timeseries
data in a relational database is trivial
Relational databases are designed to accumulate massive amounts of data. However, as the amount of data
grows, so do query execution times, the size of backups, and numerous other routine operations. To alleviate this
performance problem of evergrowing
tables, database administrators must routinely purge data from the
database. In any database that protects transactional integrity, this purge operation must suspend normal
database updates—that is a problem for historian applications running 24/7/365. To even make the purge
operation tolerable requires minimizing the amount of data maintained in the database.
In the event purged data is needed later (for example, in response to an audit or some regulatory demands),
restoring the data is nontrivial.
The generally recommended practice is to restore a full database backup that
included the needed data either to a separate system dedicated for this purpose, or to take your production
system offline and use it. This is even more problematic if the required data is not available within a single
database backup—for example, if you only maintain the last 30 days of data in the online database and the audit
requires 90 days of data, you must either manually merge all the data into a single database, have three
systems, each with an isolated 30day
window, or examine each backup serially.
True historians, on the other hand, are designed to both handle the rapid growth in data and to provide simple
means of taking subsets of the data offline and online.
Myth #7: The only options are fully relational or fully proprietary historian
solutions
While it is true that most historian solutions either use fully proprietary technology to address the inherent
limitations of relational database or fully leverage relational database to reduce their own engineering costs,
Wonderware Historian actually delivers the best of both worlds. It relies on a solid relational schema for
managing all the relatively static configuration data, but extends the native transactional storage engine and
query processor of Microsoft® SQL Server with proprietary extensions to address their limitations for timeseries
data.
Building on Microsoft SQL Server delivers a solution that is easier to secure and manage than fully proprietary
solutions, but without compromising on the fundamental capabilities required in a historian.
Myth #8: There is nothing special about industrial applications
True historians provide facilities for dealing with the demanding, realworld
realities of industrial applications that
are outside the realm of pure relational databases. How do you intend to make use of the data? Do you need to
convert rates into quantities for reporting? If so, that is quite complex with a SQL query. Is your instrumentation
and data collection 100 percent reliable, or do you sometimes have to “make do” with data that includes
instrument errors? While generalpurpose
databases can certainly store data, they are not designed to
incorporate notions of “data quality” into calculations and are not able to simply perform routine timeseries
calculations such an integral calculation that are commonly needed.
Myth #9: Relational database applications are not historians
A “historian” addresses several related functions: continuously collecting realtime
data, storing noteworthy
subsets of that data, and providing a means of extracting meaningful information from that data. Whereas
“historian” describes an application, “relational database” names a technology. Though there are certainly some
significant challenges in using a relational database technology for timeseries
data, just because an application
uses one does not mean it is not a historian—it can still be a historian, just a fairly basic one. Rather than
focusing on the underlying technology choices (relational databases or proprietary files), focus on the functionality
needed—that requires an understanding of the
overall application and involves much more than simply storing data.
Myth #10: All data is equal in importance and validity
To a relational database, a stored value is precisely that, a value, and is always assumed to be valid—if it is not,
it is up to someone to correct it. In collecting millions of samples from thousands of data points from around a
process, it is inevitable that some information is incorrect or missing. There may be issues with measurement
equipment where values are out of range, communications was lost, or the data was simply erroneous.
In a plant historian, a stored data point not only has an associated value and time stamp, it also has an indication
of the data quality. Storing a data point from an instrument, outside of the instrument’s normal operating range,
for example, will cause a specific series of quality indicators to be stored with the value. These indicators are not
simply separate columns in the database, but an inherent property of the sample. They can be retrieved, included
in calculations and used to alert operations or engineering personnel to a potential anomaly.
When summarizing the values (for example, calculating an average temperature over the last hour), a historian
must be able to reflect this data quality in calculation results, optionally filter out suspect data, and be able to
extrapolate when data is missing or deemed invalid. If these realworld
aberrations are not handled correctly,
resulting reports, business system integration, and decision making will be incorrectly skewed. Relational
databases alone do not provide these capabilities.
Myth #11: Storing data in a relational database makes it easy to query
Although support for SQL queries is one of the huge benefits of using a relational database, that doesn’t
necessarily mean a particular database design is easy to query via SQL—some designs are even so convoluted
that they cannot effectively use manuallycreated
queries and must, instead, rely on complex, programmatically
generated queries. This can make sense from the perspective of managing the storage and from the standpoint
of portability, but it largely neutralizes all the inherent advantage of using a relational database.
Myth #12: Storing data in a relational database makes it integrated
Putting two Excel spreadsheet files into the same folder does not make them “integrated” in any sense, even
though they might both include production data. Similarly, taking information from an ERP (enterprise resource
planning) system and historian and storing both in a relational database does not make it “integrated”. Certainly
having all that data in a common technology is the requisite first step, but it is only that and often it is by far the
simplest.
Myth #13: Using a relational database is cheaper than a purposebuilt
historian
Before you assume you cannot afford a serious historian solution, make sure you understand your real needs and
explore the options—you might be very pleasantly surprised, even based only on the license costs. When you
factor in the lower ownership costs and value of a solution adapted to your real needs, the purposebuilt
solution
will likely be a lot cheaper.
Myth #14: Only largescale
continuous processes need a historian
The original commercial historian systems began in the 1980s in oil refining, paper mills and other continuous
processes since these were the only industries that could justify the cost of minicomputers required to run them.
With the rapid adoption of Windows NT in the 1990s, the cost of the computers dropped significantly and opened
the door for lower priced solutions developed specifically for the new platform, such as Wonderware®
IndustrialSQL Server (now, “Wonderware Historian”). These new solutions quickly demonstrated their cost
effectiveness outside of the traditionally DCSoriented
continuous process industries.
Summary
This white paper discussed several reasons why a process historian is superior for plant information acquisition
and retrieval when compared to a relational database system. However, this does not mean that commercial
software has no place in an industrial environment. Today, process information is needed both outside of the
plant environment and inside the business systems section of an enterprise. And, there is no better way to
provide this interface between the plant data and the enterprise systems than a commercially accepted, standard
interface. Wonderware Historian
can integrate a commercially available product (Microsoft SQL Server) with an open, standard query interface
(SQL) to provide open access to plant historical data. This interface can easily be used by the IT department for
reporting or integrating into the ERP systems.
Wonderware Historian offers all the capabilities discussed within this paper and more. Trusted and in use in over
25,000 installations worldwide, Wonderware Historian empowers plant operations and enterprise business users
alike, delivering the right information to the right person, and leaves database management where it belongs, in
the enterprise IT department and not on the plant floor.”

nathan · April 1, 2014, 6:18pm

That’s quite the “Wonderware Historian” ad. I haven’t used the product, but I don’t disagree that it could be made to satisfy those 14 myth conditions. There are many implementations out there that do not.

You’ll find that Ignition supports those “specialized” features out of the box without the additional licensing requirements of a dedicated historian. SQLTags Historian, which is a standard feature within Ignition, also satisfies those requirements. It stores based on value changes using a deadband, making the naive multiplicative calculations unnecessary, particularly the assumption that each snapshot of each data point is a separate transaction. It supports partitioning for performance and the ability to prune data and runs on several RDBMS systems including: SQL Server, Oracle, and MySQL.

I think the point of the white paper is that your standard, IT supportable SQL database can capably function as a “Historian” without significant cost or complexity. That’s not to say that there is anything wrong with a “Dedicated Historian”, or that an SQL database can work miracles when configured improperly.

tranh2 · April 1, 2014, 11:17pm

Hi Nathan.

Reason I posted it is because I’m not familiar with SQL databases at all other than knowing it stores data. It was interesting to read all the negatives about relative databases but their latest iteration of their SCADA software has migrated to Microsoft SQL for their Historian purposes which contradicted their own article.

But I can see what you’re saying when you mention that the white paper is probably only referring to using relative SQL databases as a historian without any tweaks or configuration for that type of use as a negative.

Thanks for the reply!

Colby.Clegg · April 1, 2014, 11:52pm

Hi,

I think the paper makes a lot of perfectly valid points. However, when you tally everything up at the end of the day, for most applications that we come across, their points just don’t hold much weight. I’m not sure if this paper is directly combating some of the pieces we do on SQL, or if they’re just trying to keep their expensive products relevant, but to some degree it doesn’t matter. We don’t make money off of databases. We can query their historian just fine. So absolutely, take their points into consideration, and if it seems like you should use a historian, you absolutely should, and you can use it along with Ignition.

Why do we push SQL databases so hard? Again, we don’t sell them… we don’t make money off of it. We could have simply focused on supporting OPC-HDA (a standard for interfacing with historians), but we didn’t… because it turns out people like using sql database for their projects! It’s really hard to describe without writing tons of words, but basically, if I had to summarize quickly, we’ve found:

The way Ignition lets you store and use data in multiple ways in databases makes it possible to do lots of different things quickly.
SQL databases are understood by IT departments, so they don’t even blink when people talk about storing data to them. This moves things along faster, and lets you get things done faster.
It’s easier to find people who understand databases in order to maintain them than it is to find people who understand historians. As much as they want to talk about how “you need a full time dba to administer it”, the reality is I can walk down to the book store and choose from dozens of books on database administration, whereas I can’t find one on their historian. Again, the point is: the technology is more familiar, so there is more information, and thus it is more likely that you’re going to be able to do what you want more quickly.

Those aren’t selling points, they’re just our experience over the years. Honestly there are lots of times that I wish some of our customers would use historians instead. But on the flip side, those times are not nearly as frequent as they’d have you believe.

I was going to try to comment on each point, but that’s too much. I think the main point is… the bulk of their criticisms would be valid for a plain db, but Ignition handles most of them.

“Only 1126 transaction a second! Unless you…”, yes, and Ignition does do those extra things. I’ve seen 30,000 values a second going into a db, on a laptop with a slow harddrive!
“Sure you could, if you used a CLUSTERED INDEX” (scary!) … and Ignition does this for you.
“But what about interpolation!” - Ignition does this for you.
“It’s not just the value, the QUALITY matters too!” - right, which is why we store it… and use it…
“Before you assume you cannot afford a serious historian solution…”. Haha. Oh, maybe they’re right… I just haven’t found the customer yet who said “yeah, when we looked at it, wonderware historian was just the better deal!”. I’ll keep looking though.

I think their most valid criticisms are on the points of maintenance. However, in practical terms, I don’t see too many of our customers actually having a really hard time with this. And when it gets to be a big enough system to be a problem, they usually have the expertise available to deal with it (because, surprise, they have other large data systems too).

Regards,

nathan · April 2, 2014, 5:42am

Great points Colby!

For reference, the Wonderware white paper is attached with link below.

http://global.wonderware.com/EN/PDF%2520Library/WhitePaper_Wonderware_MythsAboutHistorians_201212.pdf

OP - I had mistakenly thought that you were countering points on one of the earlier IA whitepapers. I believe the 2012 Wonderware Whitepaper was written to position their Historian product, probably in light of some 2009/10 information that IA put out questioning the need for a dedicated historian, such as this. In any case, as of 2014, I believe we collectively agree on the basic requirement set to store historical data.

tranh2 · April 2, 2014, 1:05pm

Hello Nathan,

The link you posted doesn’t seem to work. I’ve linked it again below.

global.wonderware.com/EN/PDF%20L … 201212.pdf

With respect to the responses, I appreciate all of them and they are great answers. I haven’t used ignition for development yet and I’ve been a long time user of wonderware. But I did manage to play around with the Ignition online demos and I’m familiar with Java and Python programming. After playing with the demo and the way technology is moving towards mobile/cloud computing, web enabled devices, web based technology, it felt natural to me that Ignition was on the right path for future SCADA technology.

Now to find a machine to install Ignition on!

Kyle_Chase · April 2, 2014, 9:00pm

To put it all into perspective, we have a “test” historian, in which we are inputting 11000 tags every 2 seconds, and it has been running for 3 months now. The database currently has 2.5 billion rows of data. The size on disk, is only 19 GB. Bringing back 8 hours of data for a 50 tags takes about 1.5 seconds, so you can be the judge of what your requirements are, but I know I’ll be sticking to Ignition.

nmudge · April 2, 2014, 9:08pm

Kyle, what database system is your “test” historian using?

Kevin.Herron · April 2, 2014, 9:13pm

MySQL/Maria + TokuDB

nmudge · April 2, 2014, 9:32pm

Are you using MariaDB or MySQL? Or both?

Kyle_Chase · April 2, 2014, 10:19pm

MariaDB with the TokuDB storage engine for any “historical tables” (Historian, Alarm Journal, Audit Log), which any BI tables are running XtraDB (Which replaced standard InnoDB in the latest version). My test system has 3 VMs, one for Ignition, one for MySQL, and one for MariaDB. Using the tag history splitter to write data to both databases. MariaDB is showing a 10x compression, with faster inserts, and roughly 1/25 the disk IO. The only thing you lose using TokuDB, is foreign keys, which aren’t used by the historian any ways. The MariaDB historian returns data quickly, usually under 1 second, while the MySQl historian usually takes about 30 seconds for the same set of data, sometime more.

As for production servers, we have 3 in production right now, with similar results.

nathan · April 3, 2014, 5:23am

Very impressive - I hadn’t been following the MySQL fork (MariaDB).

Would you mind describing your setup? What OS is your DB running on? Have you tried it on Windows? What kind of hardware specs? Are you writing to a local disk/SAN/NAS? How do you support/configure different storage engines (per table, database, or do you have another RDBMS instance)?

Kyle_Chase · April 3, 2014, 6:58am

The VM host is a AMD A10-7850K, with 16 GB Ram. The VMs run on a 120 GB SSD, however, all data is stored on a 1 TB 7200 RPM HD. Nothing special, pretty weak hardware as far as a server is concerned. VM host software is VMWare Workstation 10. Moving to a class 1 hypervisor would increase performance slightly, but this is also my gaming machine, so thats out of the question right now.

All three VMs are running on Ubuntu 13.10, x64 arch. EXT4 file system, with no tuning. The Ignition VM is running Ignition 7.6.4, with the tag history splitter module installed. The MySQL VM is running MySQL 5.6.14. The MariaDB VM is running MariaDB 10.0.0.8, with the TokuDB storage engine enabled. The DB VMs have 4 GB RAM respectively, and the Ignition VM has 2 GB. All VMs have 2 cores, however, the host itself only has 4 cores. All database data is stored to the local 1 TB 7200 RPM drive, each database with it’s own virtual disk.

The connection to the MySQL database is the standard connection, but the JDBC driver has been updated to 5.1.29. The connection to the MariaDB database has a custom translator, to force the table to TokuDB, to set the compression level, and to enable clustered keys.

Also, each database has 2 connections, each with a ignition and historian schema.

On top of better compression and faster reads, inserts are also much faster, as less IO is required. MariaDB does use slightly more CPU however, as it compresses and decompresses the index and table data on the fly.

Would love to test on bigger hardware. If anyone is interested on setting up a bigger machine for a test case, we would love to give it a shot. We are currently using MariaDB in a few production systems, but they are either cloud hosted, or desktop class machines.

nmudge · April 3, 2014, 7:13am

Great data Kyle, thank you.

mazeyrat · April 3, 2014, 7:52am

It’s great to hear from you Kyle,
Hope a packaged IA or Kymera Solution or a procedure to configure will be great as soon the proof of concept is ok !

tranh2 · April 3, 2014, 1:12pm

Damn those are very impressive numbers Kyle! Would definitely like to see a procedure or a document of items/settings/setup/software of the setup. I’ll be starting off installing ignition on a small scale on my macbook to play around with. Eventually I hope I can get my company to pick up a server to run a setup similar to yours (VM’s).

pdibenedetto · April 3, 2014, 1:13pm

I think this not will be possible, considering there is not exists a TokuDB version for windows and not are plans for do it.

Kyle, what do you think if you have the same specs in everything, but you not have the possibility to use tokudb. I guess that could be a degrade in performance (what approximate ratio do you think?) and substantially in compression ratio, for sure.

Without doing tests, my guess it that the big thing is TokuDB, that allows a good relation in compression mantaining performance and is designed for bigs write frequencys (wich is the case for a "historian"). It's a product with a new (patented) indexing mechanism, pointing for a solution for "big data".

Given that you have more experience in MySql, MariaDB and TokuDB and I'm only guessing: what do you think about it?

Kyle_Chase · April 3, 2014, 6:18pm

TokuDB is not available for Windows. We also have systems running on standard MySQL installs. You definitely lose speed and size to InnoDB. InnoDB scales better than MyISAM for the historian, even though its ACID compliant. Also, if you care about the data you are saving, never use MyISAM.

As for what I think about it, you would definitely want to understand the technologies before moving away from a standard MySQL install. TokuDB needs some tuning before moving into production, but to be fair, so does using standard MariaDB/MySQL.

We probably won’t be posting a document, as each install is it’s own set-up. You can get away with a standard MariaDB install and enable TokuDB. For more complex installs, however, we do offer our consulting services if you do require assistance. The last thing we want is people losing data because something out of the box was not configured properly.

tranh2 · April 8, 2014, 6:55pm

Hey guys,

I’d just like to return to this for some clarification on the TokuDB/MariaDB/MySQL and datbase topics. I’ve been reading up on these pieces of software to try to learn about them and how Ignition interacts with those items. I’m not familar with SQL and databases.

So what is the difference between TokuDB/MariaDB/MySQL. From my readings with wiki TokuDB is a storage engine while MariaDB and MySQL are RDBMS. So does that mean that MySQL and MariaD are responsible for setting up data structures, reading, writing, deleting data, updating data etc to and from MariaDB? So Ignition Interacts with MariaDB or MySQL and MariaDB/MySQL in turns interacts with TokuDB?

Thanks!

nathan · April 8, 2014, 9:47pm

Tranh2,
You’re correct. MySQL and MariaDB are different database (RDBMS) systems. Users often distinguish between these as brands or products, Microsoft has MSSQL, Oracle the company has Oracle the database, etc. The storage engine is an important internal component that casual users are often unaware of. Some engines cannot be changed but can be tweaked. Others like MySQL and MariaDB (a MySQL fork) support very different options.

The bottom line is this - with very little expertise, you can download a free version of MySQL, a trial version of Ignition, and have Ignition “do all the work” - set up tables, create the underlying queries, etc. You’ll be able to log quite a bit of PLC data on a modest laptop, no SQL knowledge required. Roughly speaking, I’d guess that you’d get better performance out the box than systems that you may have used that don’t use standard databases (…for me, using legacy RSView32 back in the day, logging to flat text files). Ignition provides much of the same “purpose built” features that the white paper references.

For those projects with intense system requirements, a well configured database (and good hardware) goes a long ways. In those cases, one might hire someone like Kyle for consultation. I always recommend diving in and learning what you can and are interested in - about databases, sql, configuration, etc.

[quote=“tranh2”]Hey guys,

I’d just like to return to this for some clarification on the TokuDB/MariaDB/MySQL and datbase topics. I’ve been reading up on these pieces of software to try to learn about them and how Ignition interacts with those items. I’m not familar with SQL and databases.

So what is the difference between TokuDB/MariaDB/MySQL. From my readings with wiki TokuDB is a storage engine while MariaDB and MySQL are RDBMS. So does that mean that MySQL and MariaD are responsible for setting up data structures, reading, writing, deleting data, updating data etc to and from MariaDB? So Ignition Interacts with MariaDB or MySQL and MariaDB/MySQL in turns interacts with TokuDB?

Thanks![/quote]