Chasing CPU Spikes - User Source Cache Updater?

Hey folks,

I’m chasing some CPU spikes on this gateway, hoping you can help me determine a fix.

Gateway specs:

  • Ignition 8.0.7
  • Virtual machine, Windows Server 2012
  • 2 sockets, 1 core each socket (Intel Xeon E5-2543 v4 3.40GHz)
  • 7.9gb ram
  • 3-5 Clients at a time
  • SQL Hosted on separate server

I noticed the CPU was frequently running high / pegged while working in the designer. After some snooping around, I see that it appears to run for about 60 seconds around 20-30% CPU, then spikes to 60-80% for about 20 seconds.

See screenshots for peak usage and trend.

I disabled all gateway scripting to rule it out, still the same result.
After inspecting the threads in the gateway, it appears to be associated with this thread in particular:

gateway-usersources-cache-updater-1, id=234:

Full thread:

Thread [gateway-usersources-cache-updater-1] id=234, (RUNNABLE) (native)
  app//org.sqlite.core.NativeDB.prepare_utf8(Native Method)
  app//org.sqlite.core.NativeDB.prepare(NativeDB.java:134)
  app//org.sqlite.core.DB.prepare(DB.java:257)
  app//org.sqlite.core.CorePreparedStatement.<init>(CorePreparedStatement.java:47)
  app//org.sqlite.jdbc3.JDBC3PreparedStatement.<init>(JDBC3PreparedStatement.java:30)
  app//org.sqlite.jdbc4.JDBC4PreparedStatement.<init>(JDBC4PreparedStatement.java:19)
  app//org.sqlite.jdbc4.JDBC4Connection.prepareStatement(JDBC4Connection.java:35)
  app//org.sqlite.jdbc3.JDBC3Connection.prepareStatement(JDBC3Connection.java:241)
  app//com.inductiveautomation.ignition.gateway.localdb.sqlite.SingleConnectionDatasource$ProtectedConnection.prepareStatement(SingleConnectionDatasource.java:251)
  app//com.inductiveautomation.ignition.gateway.localdb.DelegatingDataSource$DelegatingConnection.prepareStatement(DelegatingDataSource.java:265)
  app//simpleorm.sessionjdbc.SQueryExecute.prepareStatement(SQueryExecute.java:129)
  app//simpleorm.sessionjdbc.SQueryExecute.executeQuery(SQueryExecute.java:58)
  app//simpleorm.sessionjdbc.SSessionJdbc.queryNoFlush(SSessionJdbc.java:525)
  app//simpleorm.sessionjdbc.SSessionJdbc.query(SSessionJdbc.java:534)
  app//com.inductiveautomation.ignition.gateway.authentication.impl.ADInternalHybridUserSource.findInternalUser(ADInternalHybridUserSource.java:168)
  app//com.inductiveautomation.ignition.gateway.authentication.impl.ADInternalHybridUserSource$UserSearchHandler.create(ADInternalHybridUserSource.java:403)
  app//com.inductiveautomation.ignition.gateway.authentication.impl.ADInternalHybridUserSource$UserSearchHandler.create(ADInternalHybridUserSource.java:366)
  app//com.inductiveautomation.ignition.gateway.authentication.impl.LDAPHelper.search(LDAPHelper.java:311)
  app//com.inductiveautomation.ignition.gateway.authentication.impl.LDAPHelper.search(LDAPHelper.java:273)
  app//com.inductiveautomation.ignition.gateway.authentication.impl.ADInternalHybridUserSource.getUsers(ADInternalHybridUserSource.java:177)
  app//com.inductiveautomation.ignition.gateway.authentication.UserSourceWrapper.updateCache(UserSourceWrapper.java:158)
  app//com.inductiveautomation.ignition.gateway.authentication.UserSourceManagerImpl$UpdateCacheTask.run(UserSourceManagerImpl.java:400)
  java.base@11.0.5/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
  java.base@11.0.5/java.util.concurrent.FutureTask.run(Unknown Source)
  java.base@11.0.5/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  java.base@11.0.5/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  java.base@11.0.5/java.lang.Thread.run(Unknown Source)

The timing and name seem to be consistent with the Cache Validation Timeout setting, at 60000ms in out AD/Internal Hybrid user source. User source has ~2,500 users. Other user sources are default and OPC. Hoping you all can help me to tease out how to fix this.

Much appreciated.

Any thoughts here? Is there something I can do to reduce this CPU load from the AD connection?

Can you get a couple of full thread dumps while that status page shows the cache updater thread taking up a lot of CPU?

How often is this happening? all the time or periodically?

There’s about 60 seconds between every CPU spike event. The events last around 15 seconds. Effectively 20% of server run-time.

I have 3 thread dumps attached.

Ignition-thread_dump20200123-093745.txt (143.6 KB)
Ignition-thread_dump20200123-093753.txt (140.4 KB)
Ignition-thread_dump20200123-093915.txt (143.9 KB)

Hmm. I don’t have any suggestions for you. We might have to do some profiling here and see if there’s something obviously inefficient in the code that grows exponentially as user count increases. Not sure what else could be going on.

At this point you should call or email support to get a ticket started so they can track this for real and maybe get a backup or any other information needed to reproduce from you.

Sounds good, thanks Kevin.

Just wanted to update the thread. Working with support, it seems as though this is an AD Hybrid issue w/ the number of users. Working with IT to implement some filtering and see if performance improves. In the meantime, a pure AD connection works quickly and without CPU spike.

I think I had suggested adding an index to a table to a support rep - did that end up making its way to you?

Kevin,

It did make its way back to me, thank you. However, it does not appear to have resolved the performance issues - we still see the same CPU spikes. This particular site has 29 users in the internal user database (via AD-Hybrid connection), but is pulling in ~2500 from the AD server.

I’m working with the client to get the proper filtering implemented, but in the meantime we have a pure AD connection working without performance problems.

Kevin,

We are having the same issue on all of our 8.0.9 gateways, did you guys ever determine the cause of this or a potential fix?

I never heard anything further on this from either @AIO or whoever their support rep was.

Do you also have a large number of users? Do the thread dumps look similar?

@Kevin.Herron

The ticket I had was #113271, if that gives you something to reference.

@dstauft

The summary, as I recall it:

  • Unfiltered AD-Hybrid causing major slow downs
  • Added index to internal AD-Hybrid table, no fix
  • We shifted to a local AD server, which helped, but still experiencing the issue
  • Recommendation to filter AD

Unfortunately, I do not have any access to / knowledge of their AD hierarchy. Filtering AD was part of the next phase of the project, which has since been put on hold. I slowed down the AD-Hybrid refresh rate, but the server still experiences these spikes.

Hope this helps!

Kevin,

We have around 1500 AD users, below is the detail on the thread that is causing the spikes…

Thread [gateway-usersources-cache-updater-1] id=1059, (TIMED_WAITING for java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b3b114b)

java.base@11.0.5/jdk.internal.misc.Unsafe.park(Native Method)
java.base@11.0.5/java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
java.base@11.0.5/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
java.base@11.0.5/java.util.concurrent.LinkedBlockingQueue.poll(Unknown Source)
java.naming@11.0.5/com.sun.jndi.ldap.LdapRequest.getReplyBer(Unknown Source)
java.naming@11.0.5/com.sun.jndi.ldap.Connection.readReply(Unknown Source)
java.naming@11.0.5/com.sun.jndi.ldap.LdapClient.getSearchReply(Unknown Source)
java.naming@11.0.5/com.sun.jndi.ldap.LdapClient.search(Unknown Source)
java.naming@11.0.5/com.sun.jndi.ldap.LdapCtx.doSearch(Unknown Source)
java.naming@11.0.5/com.sun.jndi.ldap.LdapCtx.searchAux(Unknown Source)
java.naming@11.0.5/com.sun.jndi.ldap.LdapCtx.c_search(Unknown Source)
java.naming@11.0.5/com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search(Unknown Source)
java.naming@11.0.5/com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(Unknown Source)
java.naming@11.0.5/com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(Unknown Source)
java.naming@11.0.5/javax.naming.directory.InitialDirContext.search(Unknown Source)
app//com.inductiveautomation.ignition.gateway.authentication.impl.LDAPHelper.search(LDAPHelper.java:305)
app//com.inductiveautomation.ignition.gateway.authentication.impl.LDAPHelper.search(LDAPHelper.java:273)
app//com.inductiveautomation.ignition.gateway.authentication.impl.ADInternalHybridUserSource.getUsers(ADInternalHybridUserSource.java:177)
app//com.inductiveautomation.ignition.gateway.authentication.UserSourceWrapper.updateCache(UserSourceWrapper.java:158)
app//com.inductiveautomation.ignition.gateway.authentication.UserSourceManagerImpl$UpdateCacheTask.run(UserSourceManagerImpl.java:400)
java.base@11.0.5/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
java.base@11.0.5/java.util.concurrent.FutureTask.run(Unknown Source)
java.base@11.0.5/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.base@11.0.5/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.base@11.0.5/java.lang.Thread.run(Unknown Source)