Hello I am back, this is kind of a several in one post but it's mainly for context but here is the info dump. I have a script that uses a tagpath and tag information to pull in essentially all of the information we would want from a tag based kind of off of the one used in the Data Center demo from Ignition but more for our use. It's being used in an embedded view that is either in a container or flex repeater to show tag values, eng units, and name or if a custom label was given to it.
The issues I am running into are:
A script I have to grab just the last portion of the tagpath to add as a parameter to input as a the label for the datapoint box is causing thread blocks on the server and after only or two pages of being loaded the CPU jumps up to 90%+ CPU usage. I see it's mainly due to when the script fails to get the tagpath and/or if the tag doesn't exist due to the OPC connection being down or the node doesn't exist.
When a lot of information is being displayed on one page, it tends to cause a huge slowdown on the system. This in itself is expected when 100+ tags are being displayed through scrollable containers and the like. An issue but not the main one and one I am sure that ties into the first problem
There isn't a great way to track down specific threads that are blocked on the server gateway so I can see exactly which scripts are doing what and when other than purposely trying to cause a block (which I did and is attached at the end of the post)
I also wanted to know if there is a script function that would allow me to kill blocked threads, say either on a timer or on a page exit/startup? Of course, the prevention of a blocked thread is best I know but worst scenario.
This is the thread log I grabbed from this morning after a fresh restart of the server and purposely caused a thread block using a tagpath with a faulty node.
Here is the code I use that is tied to a parameter in an embedded view that is then sent to a view and used as a label for the name.
retries = 0
max_retries = 5
result = value
#keeptrying = True
while retries <= max_retries: #keeptrying is True
try:
result = value.rsplit("/", 1)[1]
if result: # if result is not Null
return result
except IndexError: # In case the split fails due to missing "/"
self.refreshBinding(self.params.tagPath)
if retries == max_retries:
result = None
return result
else:
retries + 1
Seems for some reason the while loop wasn't ever completing? Why would that be? IF it found a tagpath and split it then it would end, but if it didn't then it would still increment the retry counter regardless. Once it reaches 50 or whatever I set it to then it should end the while loop but it doesn't? Is there something weird about while loops in Ignition or am I just misunderstanding something here?
Alongside that, I am not the only person here running scripts that may or may not be the human pinnacle of coding, so is there a way to clear blocked threads? Whether that's a system util script or maybe even a module we can purchase? It's something I would like to bring up to our team before we commission this onsite, as restarting the gateway is a hassle and not an ideal solution if the cpu memory buildup continues to be an issue.
In general, just avoid while loops entirely. The error condition is catastrophic, and they're rarely actually needed.
If the intent of the while loop here is "keep doing this thing because I'm waiting for a tag value change to arrive" (it's a little unclear from your description) then that's, essentially, just a wrong thing to even try to do.
Everything in Ignition should be written in an "event driven" manner; you shouldn't be locking up threads waiting on conditions to be met, even if you're using a sentinel on the loop. Instead, you should rely on the platform's facilities to deliver events to you, and react to those updates accordingly.
Like I said, it's a little unclear to me exactly what you're doing, so I can't immediately offer a better solution but I can guarantee you that there is one.
Yeah it is like you said, I wanted it to try again until it saw a value being populated. The only other thing I suppose I can think of to ensure that I get the information is to use the expression structure (since you can use the wait for function) but when I went to mess around with it I could not figure out how to reference the single value within the script.
Is that the only way to terminate the threads is the diagnostics page? I was hoping to be able to have a script in the designer to do it as well since that page is a little slow to respond when the CPU is already maxed and has 1000+ threads running from previous iterations of the broken codes.
First off, maybe I'm misunderstanding what you're trying to do, but it seems like a script is entirely the wrong way to go about it. Typically you should be passing a base tag path to instances for performance reasons and using indirect bindings off of that base path. There wouldn't be a script needed at all. That is unless I'm just not understanding what you're trying to accomplish.
Right, it passes the tag path through and in the view it would take the tagpath and split the ending portion given there isn't a custom label for it already in place from a different param source.
You should be able to use an expression binding to do this if you're just trying to strip off some characters or extract a specific part of the tag path.
Yes I have considered it but when experimenting with it I got mixed results as well as this script option. I opted to use the script for no other reason other than I could parse it easier than the expression but if you have an example I can use that splits the very end of a tag path, like only keeping the last portion after the last "/" from the tag path, such as the function of the rsplit.
I also got rid of the while loop, and it seems I am having the same problems as before with the same kind of error in the thread and nearly maxed out CPU usage.
I mean it is the same error except the log and py64 rather than py54 which I am unsure if that matters.
I am pretty sure it is this script specifically that is causing these problems but I am just unsure of why it would do it and cause such a hang up?
I mean it's pretty barebones at this point.
def transform(self, value, quality, timestamp):
result = value.tagpath
#retries = 0
#max_retries = 10 #Used to be used for retry attempts
try:
value = value.tagpath
value = str(value)
result = value.rsplit("/", 1)[1]
if result: # if result is not Null
return result
except IndexError: # In case the split fails due to missing "/"
self.refreshBinding(self.view.params.tagPath)
Is there a way to track down these specific threads in the designer or application by the ID or name or anything? At this point I think I just need to know what is the same between these failures and why they are hanging up because on all of the views I have created the embedded views work perfectly and I can see the scripts open and close within a second or seconds on the monitoring page on the gateway web page.
Having it in 2 expressions helps break it up for easier troubleshooting and allows you to visually see where a value might not be what you are expecting.
Because you are calling refresh binding on the source of your transform which essentially forms an endless loop as long as the source tag path is unpopulated/empty/null.
Here is the ordering:
View initializes with blank /null value in tag path -> transform runs and fails to get end of path and triggers the tag path to revaluate -> tag path re-evaluates to empty -> transform runs and fails to get end of path and triggers tag path to re-evaluate.... and you get where this goes.
Ignore the urge to refresh the binding. If you use an expression binding, it will update on its own once the value changes.
I hope everyone had a good weekend, here is what I resulted with:
I ended up doing mostly like @ryan.white says here and changed at least the beginning portion of my script and I took out the refreshBinding command. Now, after that, instead of doing a regular expression or an indirect tag binding type I used an expression structure (since the tag path is a param that has already been defined previously) and used the option "Wait On All" so that way it has to populate a value before I even execute the script. This has stopped all new locked threads from appearing, and has kept my CPU usage from spiraling out of control due to having 5000 locked threads.
I am also in contact with someone from Ignition to get a little more clear why exactly some of these loops can lock down even if they have a fail condition or maximum retry attempt. If I get anything substantial I will update this post here. Thanks for the support everyone!