I made a gateway script that was intended to write some tags to a database on a time base. The script was evidently poorly written as it took quite some time to execute and faulted my database connection. I deleted the script, re-established the database connection, and thought life goes on, I'll clean this up and come back to it later. Well, not my database keeps randomly faulting every few hours now and taking up to 30 minutes to re-establish itself. I had never encountered a single Database fault until I broke it, and now I can't seem to put pandora back in her box.
java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:142)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:734)
at com.inductiveautomation.ignition.gateway.datasource.DatasourceImpl.getConnectionInternal(DatasourceImpl.java:300)
at com.inductiveautomation.ignition.gateway.datasource.DatasourceManagerImpl.getConnectionImpl(DatasourceManagerImpl.java:168)
at com.inductiveautomation.ignition.gateway.datasource.DatasourceImpl.getConnection(DatasourceImpl.java:293)
at com.inductiveautomation.ignition.gateway.db.namedquery.NamedQueryExecutor$NamedQueryInstance.runPrepQuery(NamedQueryExecutor.java:475)
at com.inductiveautomation.ignition.gateway.db.namedquery.NamedQueryExecutor$NamedQueryInstance.execute(NamedQueryExecutor.java:399)
at com.inductiveautomation.ignition.gateway.db.namedquery.NamedQueryExecutor.execute(NamedQueryExecutor.java:173)
at com.inductiveautomation.ignition.gateway.db.namedquery.GatewayNamedQueryManager.execute(GatewayNamedQueryManager.java:95)
at com.inductiveautomation.ignition.common.db.namedquery.SecuredNamedQueryManager.execute(SecuredNamedQueryManager.java:78)
at com.inductiveautomation.perspective.gateway.binding.query.QueryBinding.execute(QueryBinding.java:121)
at com.inductiveautomation.perspective.gateway.binding.ValueCache$CachedValue.lambda$fetchValue$0(ValueCache.java:119)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at com.inductiveautomation.perspective.gateway.threading.BlockingWork$BlockingWorkRunnable.run(BlockingWork.java:58)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:321)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:428)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:232)
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
... 17 common frames omitted
There were no Slow Query Warns until the moment I did this, now a simple Query looking at 200 data points will randomly take 3 minutes. There was an absolutely direct correlation the moment I wrote a script that bogged it down, and this happening. Except I entirely deleted said script. At first, I just disabled the event, then deleted.
It seemed to get a lot better after opening the Database connection in the Gateway, clicking edit, and just re-saving it. No edits, but just a re-save. I didn't see any more slow queries or faults for a bit, then over the weekend it must have happened again checking the logs. Without any changes this morning Queries are executing very quickly, no issues, no warns. Just another event over the weekend.
Prior to re-saving the database connection, however, it was happening non-stop.
So this project was largely un-touched for quite some time, zero issues. I wrote a crappy gateway script scheduled for 3 times a day. It broke my DB connection during testing, I disabled it. Database connections re-established after some time but then started going in and out non-stop. I deleted it entirely, re-saved the connection in the gateway as an attempt to just "re-synch" things so to speak, things got immediately way better, but checking logs today I noticed an event over the weekend.
Did your broken script use system.util.invokeAsynchronous() at all? Perhaps with an infinite loop? If so, those are probably still trying to run, because deleting a script doesn't delete any threads already running it.
Delete the script and restart your gateway to clean it all out.
It did not, but I did notice after deleting it some running scripts were still shown in the gateway now that you mention it, that I manually hit the end or stop button on. I cannot recall exact button name as it was last week.
I think it was under Status, Diagnostics, running scripts.
Again, hard to remember exactly where I went. But I tested said horrible script out a few times, and no error was thrown in the logs, but it wasn't updating my table, so I was confused, then I found in there multiple long running, hung up instances of the script that I manually stopped.
Was the first time anything I did, didn't do one of two things. Throw an error in the log, or just work, so naturally I just repeated what I was doing over and over expecting a different result.
Well, getting close to 24 hours after manually stopping and restarting the gateway and not a single other occurrence. One oddity, however, is that CPU usage never really exceeded 1-3%. Now it is regularly floating between 1-7% with very quick, momentary spikes to 20-30% followed by an immediate decline back to 1-7%. I am not stating this so much to say as I think it is an issue, but certainly an interesting development. Before if I expanded it out, I almost never saw a spike. Now I can count about 5 in the 30% range, and about another 5 in the 15% range. Just an odd shift in behavior of the system with no change other than restarting the gateway. I would imagine, the behavior now, is what would be considered more normal.