Runtime State out of sync trending issues

Colby.Clegg · March 5, 2013, 7:44pm

Hi,

Great, that gives me some useful data to work with. I’ve found a number of good resources on GC tuning that might be helpful. HOWEVER…

Most of the time, when talking about GC pauses, we’re talking BIG heap sizes (10 gb). I just noticed that your entire VM only has 2 gb of ram. From what I can see, Ignition is probably set at 1gb, and may be using more. With that kind of memory, there is really no reason why the garbage collector should take so much time.

I think that something must be going on behind the scenes. Either memory is being paged, or the VM server is over provisioned. How difficult would it be for you to increase the allocation of ram on that VM to maybe 4gb? I can’t say for sure that this would help, especially if the delay is in the VM manager and not the instance itself, but I think right now you’re probably running pretty slim on memory.

I’ve learned a few interesting things about the GC that we might be able to use to improve things, but I don’t think that it’s going to solve the root problem.

Also, for the subscription proxy messages- this is unfortunately a problem that pops up from time to time, and has to do with subscription bookkeeping. It’s not a critical error, everything will keep working fine, but unfortunately once the message starts, it won’t stop until the gateway is restarted. ANY info you can gather or observe in regards to what causes this to start (closing a client? leased tags? client disconnect?) would be helpful. Something has changed recently, we’ve had this reported 3 times or so in the last 2 weeks, but we can’t get a lead on replicating it (and that’s about the only way we can fix this type of issue).

Regards,

Duffanator · March 5, 2013, 8:31pm

Thanks, Colby.

I’ll see what I can do to get our IT guys to try and assign some more RAM to the VM and see if that helps at all, maybe they can do some voodoo magic and tell me if it’s trying to use hard drive space as RAM instead of just RAM? I’ll let you know what I can find out.

For the scan class error, I went back through the logs and picked out the time it started happening, and about an hour before that. Nothing obvious from the logs:

ScanClassError20130305.txt (4.14 KB)

I also checked out the audit log and there was a client log out around that time but not at the exact time, so I don’t know if that would be a culprit or not:

Colby.Clegg · March 5, 2013, 9:35pm

Hi,

In the last message, I was referring to the specific error you posted in your previous message:

The error "No scan class information found for tags" is different, and easier: tag history queries are being executed for tags that don't have history enabled (often caused by bad or initial indirect historical binding). I'll admit the message could be improved, but I'm fairly certain that's what is happening.

To get more info, you could briefly turn the logger "History.SQLTags" to "debug", which will print out the parameters of the query, including all tag paths, and then see what is being sent for queries that result in that message.

Regards,

Duffanator · March 5, 2013, 9:49pm

Ahhh jeez… sorry, I think I started going brain dead after staring at all of the garbage collector messages in the logs…

ProxyError20130305.txt (232 KB)

There are no log in or log outs during that time, nor any lost communications that I can tell from my alarm logs.

Colby.Clegg · March 5, 2013, 10:33pm

Wow, interesting.

That’s good though, gives us something to look at… I’ll let you know if we find anything from it.

Thanks,

Duffanator · March 6, 2013, 9:32pm

I had our IT department allot more ram to this VM and so far it looks like it’s helped. The GC has gone from taking 11+ seconds to 4.5-5 seconds. I’ll keep monitoring it but so far it looks like more ram has helped the issue.

GCMoreRam.txt (6.93 KB)

Duffanator · March 13, 2013, 4:25pm

Hey Colby,

Just an FYI, I’ve been watching the logs for a week now and it’s been working great since the increase in ram was added. That must have been the root issue. Thanks for your help on this!