FactorySQL/PMI hosted arch with multiplant connectivity

One of our clients having plants at 9 locations and a corporate office (connected via WAN) is looking for the following features:

  1. Local Data Logging and Analysis : Graphics,Alarms, Charts, Trends, KPI, Reporting at 9 Locations. Data Points at each Location : Approx 2000. Data Archival Rate : 5-10s. Data History Length - Upto 5 years.
  2. Central Data Analysis (Graphics,Alarms, Charts, Trends, KPI, Reporting ) of the data integrated from all the 9 Locations. Here the user should be able to have data from different locations combined for performance comparision.
  3. Possibility to have the central data analysis for history data even if WAN is down. (meaning we also want to store required data at the central location)
    These are the main requirements. Can you suggest the best archietecture for this requirement.

We can think of the following architectures:

  1. FactorySQL running at each location logging data to the remote central database. Factory PMI Gateway installed at the central location.
    Disadvantage : Not possible to see data locally when WAN is down.
  2. Factory SQL/Factory PMI running at each location logging data to the local database. Factory PMI Gateway installed at the central location which connects to these different local databases of the 9 Locations.
    Disadvantage : Not possible to see data centrally when WAN is down.
  3. Factory SQL/Factory PMI running at each location logging data to the local database as well as (only selected data) to remote central database .Factory PMI Gateway installed at the central location.This way it is possible to have data both locally and centrally. I am not sure if this architecture is possible?

I would like you to suggest us a better architecture and which architeture to choose for our requirements.

Other questions:

  1. Are there case studies of an installation similar to our requirements?
  2. Is it possible to verify whether this architecture will perform with 9 locations with the number of points and archival rates we require?
  3. What problems do you foresee and your recommendations for the same?
  4. The client (who is one of the big cement manufacturer in India) shall go for this system against Wonderware and Intellution. So we should be absolutely convinced that
    we can meet the requirements of the client and there will be no performance problems or any other issues.

Vipul Shah
eGenietech Software Services,
444/445, Mastermind I,
Royal Palms Estate,
Aarey Colony,
Mumbai 400 065
Web : www.egenietech.com

Logging synchronously at a fixed 10 second time rate, you’re looking at the following:
6 times/min * 60 min/hr * 24 hr/day * 365 day/yr = 3 million sets of 2000 data values for each of the 9 locations per year. You can accomplish this with Inductive Automation software. However, there are numerous techniques to log a smaller volume of the useful data. From an IT perspective, working with and managing large amounts of data can be a hassle. As a project developer, setting up reports, graphs, and views work much butter with less (if possible).

To address your courses of action:

  1. FactorySQL would “cache” the data when the WAN link is down. Go with 2 or 3 instead if users depend on the system for any time sensitive part of your process.
    2 & 3. FactorySQL can log any amount of data to numerous databases, meaning that maintaining a local and remote copy of your database(s) is possible. FactoryPMI has a feature called “retargeting” where a user is seamlessly redirected to a different project, which could be on another FactoryPMI Gateway.

Based on the above features, there are several seamless combinations of where the project and data are being hosted. First you have basics to consider - is your WAN link fast and robust? Are the servers at each location in a suitable setup (HVAC, clean, good power source, etc)? How many users will be accessing each sites data and frequently? Next decide ideal “normal” operation - you can make it seamless to users to work from a central location or “jump between sites”, they won’t know the difference. Keep in mind that you’ll need matching usernames/passwords between gateways for this to work. Also, if you run duplicated projects you’ll have to update both when you make changes.

You seem to have a good handle on the considerations. Based on FSQL “caching”, FSQL being able to log to multiple sources, and FPMI retargeting, you should be able to come up with a solution that fits. There are many “good” ways of doing this.

If it were me (with some assumptions about your network and project)…I’d:

  1. Figure out how to capture the relevant info with much less data with: triggered conditions (max/min values from PLC at a ms resolution), store statistically calculated values, use
    deadbands, circular logs, etc.
  2. Run a local copy of FSQL writing ONLY to a local SQL database. Run a local copy of FPMI with retargeting. Users from any of the 10 locations, or a VPN connection over the internet will run the project from each site. Assuming a similar query load for each location, this balances the servers doing the work. It’s only big reads that matter - writing the data is negligible. You will need a data backup/archiving plan.
  3. (Optional) Run a central FSQL/FPMI setup that runs identical operations as each of the sites - operating from a central database. It will communicate directly with the PLCs from all sites. This will serve as a “hot” running backup of the project, and have a copy of the logged data (minus when the WAN link was down). You won’t need to back up the data (you will want the configuration) since it’s just a duplicate of the sites. Every time you make significant changes to FSQL/FPMI at a site, you’ll incorporate those into your central system. This is easy to do if you export/import the entire project at a time. You’ll want to have separate database schemas and FPMI projects hosted on that server - the FSQL heirarchy can be consolidated. I would slow down polling rates since that causes WAN traffic.
  4. Create a common part of the FPMI project from each location, like a welcome screen, header, or sidebar that allows them to retarget to other locations and a less obvious way to get to the online “backup” (central) system. I like maps or pictures with hover over areas that represent different locations.

In any case, authorized users should be able to pull up any plant from anywhere without knowing the difference from where it’s being hosted.

One other point - many other vendors, Wonderware included, are very concerned with compressing the data prior to logging it. That technique does save space, which tends to be a minor issue with modern hardware. If you’re interested in using your data it necessarily adds another layer of parsing and decoding to the mix - hopefully they work some magic with their indexing scheme. Also your raw data can’t be accessed directly under such a scheme. This is applicable for big reads as mentioned earlier, such as: graphs, reports, or any other data mining. With any vendor, any way you cut it, searching through millions of records is going to be a highly computationally intensive operation.

There are better ways of accomplishing this task at the database level. For example, the MySQL Archive engine will store your data in a tight compressed format (not indexed for reads, though). MyISAMPack will compress a MyISAM table, make reads faster, and make it read only (good for archiving). This problem has been worked with commercial web sites that have immense amounts of data and service orders of magnitude more hits than any SCADA operation.

Now techniques that control what you actually log are another story…

Nathan,Thanks for the detailed reply.
Carrying the above discussion forward we need your advice on the following:-

  1. Choice of the Database
    The customer also have put a q rement for advance reporting similar to that available in Business Intelligence tools.
    In this case would it be more advisable to go for MSSQL considering the powerful reporting options available with it or we can achieve this using MySQL as the Database with the Report Plug-In itself.

  2. Combining data from multiple plants for comparison.
    Considering that there is a FactorySQL / Factory PMI combo with its own project at each site with its own local database.At the Central Head Office we just have a Fanctory PMI gateway with its own project.

In this scenario:-

As it is a database centric architecure-its obvious that the Central Factory PMI will connect to the databases of all the nine plants for both realtime and historical data.
The client is considering a 128 KBPS leased line will this support the above architecture.

What techniques can be used in the Cental HO project to combine the data from multiple plants in a single trend or a graphics page or a report. This will enable to compare the performances of the multiple plants.

Nathan, maybe the following information will also be helpful.
At each plant there is already a local SCADA (Siemens WinCC) which is used for operation and control.

This new system is supposed to a supervisory system on top of the plant SCADA (only read only) for the top management, both inside and outside the plant, to have a overview of the plants and their performances when on the move or the net.

I’m jumping in a bit late here, but I thought your original architecture #3 is the way to go. Keep it simple. Have the whole stack (FactorySQL, database, FactoryPMI) at each location in case WAN links go down. Have FactorySQL log select data additionally to a central database. Have a FactoryPMI gateway there for the central BI reporting and analysis, which could also retarget to the projects of any of the 9 plants.

As for data rate/size, your requirements are not at all unusual. Even logging at the faster rates, generating 6 million records/year and storing 5 years is not asking too much. Now we get into deciding what database to use, and the answer, emphatically, is that you should use the database that you or your customer (whoever is going to be maintaining it) is most comfortable with. Both MySQL or MSSQL can handle the data you’re describing.

Lastly, you asked if there are any case studies of similar setups. The answer is yes. The case study of Bronco Wine Company (available here: inductiveautomation.com/company/casestudies/) describes a very similar system.

Hope this helps,

All our consensus was moving towards the way you suggested.
Some more queries:-

  1. We hear lot of dicussion on FactorySQL with MySQL or MSSQLServer. What about Oracle ? Have any tests been done with it? Can we use it in the current configuration.

  2. Since this is the first time we are attempting a over the clound implementation, We also need to suggest on the WAN link for the whole architecture. Now keeping in mind that each site pushes critical data to the Central database server, (let us say 100 pints at 1 Sec ) , what kind of band width are we looking at ? What latency is acceptable? Are there any preferences like RF, Leased Line etc. I know its like going deep into networking, maybe you can suggest on the past experience / case studies on what worked out well?

  3. Till now we have been extensively doing process reporting using Stord Procedures and Reporting Tools like Crystal / SQL Reporting Services. Is this comparable with what we can do with the Factory-PMI Report plugin?

For eg. in the current scenario, we can avoid the MSSQL Server Licenses ( As we intended to do the reporting using Reporting Services) and consider MySQL and Report-Plugin which will be cheaper and also provide a more integrated look to the end user.

It is true that the majority of our users tend to use either MySQL or MSSQL, but we do have a handful of users on Oracle, PostgreSQL, and DB2.

I'm happy to offer my opinions on this matter, but be aware however that our involvement in the design of inter-facility WAN networking is somewhat limited. That said, I have a strong bias towards wired technologies (and against wireless like vs RF). As far as the bandwidth needed, it greatly depends on the design of your project - things like wide vs tall schema design, and the use of block groups can affect the bandwidth needed for your data logging. However, 128k leased lines sound a bit slow. The customer in the case study I pointed you to use 1.5Mbit T1 connections between their sites. They are also doing a lot more than just logging data between sites - they routinely have FactoryPMI client applications running across the WAN connections.

Yes, it is comparable. I'm not intimately familiar with Crystal Reports, so I can't say that we match their features one-to-one, but they are similar products.

Hope this helps,

Another comment about the leased line - your customer will probably end up paying more money for that “dedicated” point to point connection than they would a broadband connection. Why not use a site to site VPN over an Internet connection? You could probably even get a dedicated Internet connection for that purpose on the cheap.

Yeah, I second recommendation for broadband internet + site-to-site VPN vs a dedicated line.