Very regularly we see our charts flash red which is visually highlighting a connection problem…a second later it resolves. The longer the uptime the higher the frequency…it has been a real PITA for a while as there seemed to be no rhyme and reason to it.,…restarting ignition makes it go away for a while but then it comes back.
Sure enough the logs show “Error executing Historical Tag Read” matching the chart issues. There are other errors and symptoms as well…but this as it turns out is enough to quantify the problem.
Have now done a whole lot of tests and have managed to narrow it down to a point where I can reproduce at will.
- Are running 7.9.0.
- I have already updated jdbc driver to latest just in case (no change in behaviour, but as out DB was later version than jdbc driver probably prudent)
- Our connection pool is set as unlimited but in reality never runs >15 connections, and typically <5
- If you restart the DB connection all is good…no gremlins everything as expected
- If you run a sql that fails (I have some scripts that I am debugging in designer for example), the problem starts
- If I restart DB connection problem goes away
So it seems that defunct connections are being retained in the pool as valid connections, so everytime this defunct connection is selected from the pool the sql fails (hence the itermittancy)
This can also be demonstrated in large number of quarantined events in the store and forward system…As it turns out you cannot rerun the quarantine successfully with the bug…every so often a query hits the bad connection and adds to the list.
If you however reset database connection (simply opening setting in web interface and re-saving does the trick) you can then clear the quarantine successfully.
It also explains why we sometime see spotty data collection…eg instead of getting nice lines in the charts we see sporadic missing data which gets worse with time (likely as more and more connections become defunct)…probably builds up in the store and forward…it was maxed out when I look at it.
One would love to see a solution to this.