Best practices for monitoring performance of scale-out enterprise architecture

Our team is building a scale-out architecture for our client. The platform will grow into the following characteristics:

  • Single geographical site
  • Multiple Kepware servers, potential for 500k to 1M tags or more
  • Potential for over 1,000 concurrent active Perspective client sessions
  • Canary historian for tag data
  • MS SQL DB for application data
  • Most client views will be interacting with DB and have few bindings to live tags
  • Ignition will interface with SAP via JCo connector, other enterprise integration points expected also
  • Separate DEV/TEST/PROD gateways for Ignition, with redundancy in PROD

We are currently using only one Ignition PROD gateway that serves both Perspective clients and tags via KepserverEX. As we scale out, we expect to split Ignition into a front-end and back-end gateway, then add additional front ends behind a load balancer to handle the users, and additional back ends as needed to handle tags. EAM in the mix as well, of course.

I understand that Ignition's capacity for concurrent users heavily depends on application design. To that end, we plan to be very careful about how chatty the clients are with the DB and how much tag binding exists per page.

That said, I'm looking for best practices and recommendations on how and what to monitor in order to trigger each scale-out increment. Rather than waiting for stuff to break or for the user experience to suffer, I'd like to start baselining performance up front, historizing the performance metrics and possibly alarming on various thresholds so we can anticipate the growth needs of the system.

Is there a white paper on this? What system metrics should we be looking at?

And by the way - any red flags on expecting to handle 1,000 users with Perspective?

Yeah, RAM.

See this discussion:

1 Like

Per your post, I think vCPUs are the larger concern. 1000 clients would need 125-250 vCPUs compared to RAM which even if you need 250GB of RAM, that's easily doable in a couple of servers, but getting that many vCPUs will take more physical servers than the RAM will.