Transactional Group Records Getting Stuck In Store & Forward

Background:

I’m making an API call using Ignitions scripting module, the response of which is a JSON structure with roughly 200 items. I’m splitting the structure up and inserting them as rows into a SQL server on the same SQL Instance as my Ignition DB.

=================================================================

Problem:

1. I initially used a sleep function to throttle the amount of outgoing calls to limit tractions to SQL. This sleep function caused a clock drift on the gateway. This caused a desync between Ignition and the backend SQL instance and everything started populating in Store & Forward.

a. This approach is discouraged.

b. Recovery was to disable / reenable SQL connection, and or bounce the gateway.

2. Removing the sleep function, script runs fine but Transactional Group records start getting stuck in Store & Forward.

a. In this case I suspect a performance spike SQL side since I cant throttle connection.

b. SQL server is still responsive, other services are not impact.

c. Checked query performance runs at .0386 ms roughly.

=================================================================

Problem Statement:

Gateway clock drift / SQL performance spikes seem to trap Transactional Group records in Store / Forwards.

1. Not sure how to notify the group when this happens.

2. Manual recovery required?

=================================================================

Test Case:

1. To try to get a bit more detail on the issue I tried to replicate the issue locally on my PC with a local instance of Ignition / SQL.

2. I set up a transactional group, and locked a DB on the SQL instance to see if records would get stuck in Store and Forward… No luck

o I ran the same script locally and no issue on the Ignition instance.

o Mind you the scale is a lot smaller.

=================================================================

Question:

1. For the use case of inserting (X) number of records from an API response, what is the best way to throttle input SQL side?

o I don’t believe invoke later works scripting side.

o I cant delay the active thread without causing clock drift on the gateway.

o I could cache results with one script and process the records with a secondary script. I’ve been actively discouraged to make files for this approach.

§ If a file was made, potentially could do a SQL merge to reduce burden SQL side.

o Potentially could limit the scope via the API call, but given the number of records preference would be to have the delay roughly around 50 ms per call not seconds.

2. For the Ignition Gateway, do I need to be concerned about long running scripts adding to the clock drift?

a. I thought the gateway was multithreaded, but it was acting as if it was executing scripts procedurally on a single thread.

From what I can find, it recommends a controlled merge into SQL via a batch insert into a temp SQL table. Alternatively I could delete records from the existing table and just batch insert.

Had some issues getting Ignition to run the bulk insert as a named query.

Ended up settling on exporting the data set to a CSV on a shared network location. From there a SQL Job performs the bulk insert using the file. Frequency set at once a week. As long as I space out when the data sets are made Ignition side, should greatly reduce the time it takes the script to operate and limit the number of requests SQL side to one per group.

Think this will work.

Seems you found a solution, but I will share my solution as another option.

The issue on my end stemmed from each row insert being done as a separate query and the delay of waiting for the response between each query.

My only reason for using a separate table before merging was because I needed to upsert (update if the id already exists, and insert if it does not). Unfortunately Microsoft SQL Server does not have an upsert command. If you only need to insert rows, then I would modify this to insert directly to the correct table and avoid the stored procedure call.

I would not call it perfect, but it reduced the runtime of the script from 10-15 minutes to under a minute (probably faster, been a while since I tested).

Script Function:

def api_sql_write(table_name, api_data):
	# type: (str, list[dict]) -> None
	""" Inserts rows into Temp table, then calls stored procedure to merge with actual table.
		
		Args:
			table_name (str): Name of SQL table to be updated. Note: this should not be the Temp table.
			api_data (list[dict]): Array of dictionaries to be upserted. Each dictionary represents one row.
		
		Returns:
			None
	"""
	# Standardized logger format (ProjectName.function_name)
	logger = system.util.getLogger('{}.{}'.format(system.util.getProjectName(), sys._getframe().f_code.co_name))
	if api_data == []:
		logger.debug("No data to insert into {}.".format(table_name))
		return
	
	base_query = """ INSERT INTO Temp_{} """.format(table_name)
	
	# Get Column Names from SQL
	columns = system.db.runQuery(""" SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{}'""".format(table_name), database='My_Database').getColumnAsList(0)
	
	if columns == []:
		logger.warn('No columns found for table {}. Are you sure you provided the correct table name?'.format(table_name))
		return

	# Filters api_data to only include keys that are in the column list from sql.
	api_data = [{k: d[k] for k in columns if k in d} for d in api_data]
	
	# convert column list to sql string
	columns = '({})'.format(', '.join(['"{}"'.format(item) for item in api_data[0].keys()]))
	num_of_columns = len(api_data[0].keys())
	
	base_query += columns + ' VALUES '
	
	# Our query will be run as a prepared statement to avoid issues with special characters in strings.
	# Create a row in the form (?, ?, ..., ?, ?) based on the number of columns in the dataset.
	# We concatenate a number of these onto our query based on the number of rows in the dataset.
	row_template = '({})'.format(', '.join(['?' for x in range(num_of_columns)]))
	
	# This is a limit of either Microsoft SQL Server or the Java SQL Driver.  To get around it we will break the insert into chunks.
	sql_parameter_limit = 2100 - 1
	row_limit = sql_parameter_limit//num_of_columns
	
	# If dataset is small enough, process the simple way. Otherwise break it into chunks.
	if len(api_data) <= row_limit:
		rows = ', '.join([row_template for row in api_data])
		query = base_query + rows
		args = [value for row in api_data for value in row.values()]
		system.db.runPrepUpdate(query, args, database='My_Database')
	else:
		
		for index in range(0, len(api_data), row_limit):
			# Avoid overindexing.
			end_index = index + row_limit
			if end_index > len(api_data):
				end_index = len(api_data)
			
			# Create rows string in format (?, ?, ..., ?, ?), (?, ?, ..., ?, ?), ..., (?, ?, ..., ?, ?), (?, ?, ..., ?, ?)
			# This is concatenated on the end of our query.
			rows = ', '.join([row_template for row in api_data[index:end_index]])
			query = base_query + rows
			
			# Create arguments array
			args = [value for row in api_data[index:end_index] for value in row.values()]
			
			# Run Insert Query
			system.db.runPrepUpdate(query, args, database='My_Database')
	
	# The daily and shift summary tables do not merge on the id column.  For now, I just created separate stored procedures to handle them.
	if 'DailySummary' in table_name:
		sp = 'API_DailySummary_Merge_sp'
	elif 'Shift_Summary' in table_name:
		sp = 'API_ShiftSummary_Merge_sp'
	else:
		sp = 'API_ID_Merge_sp'
	
	# Run stored procedure to merge data from the temp table into the actual table.
	system.db.runUpdateQuery("EXEC	[dbo].[{}] @target_table = N'{}'".format(sp, table_name), database='My_Database')

Stored Procedure:

ALTER PROCEDURE [dbo].[API_ID_Merge_sp] 
	-- Add the parameters for the stored procedure here
	@target_table NVARCHAR(128) = ''
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;

    -- Insert statements for procedure here
	DECLARE @sql NVARCHAR(MAX);
	DECLARE @columns NVARCHAR(MAX);
	DECLARE @columndefinitions NVARCHAR(MAX);
	DECLARE @updateColumns NVARCHAR(MAX);
	DECLARE @source_table NVARCHAR(128) = 'Temp_' + @target_table;
	
	-- Generate the column list dynamically
	SELECT  @columns = STRING_AGG('[' + c.name + ']', ', '),
			@columndefinitions = STRING_AGG('[' + c.name + ']' + ' ' + t.name, ', '),
			@updateColumns = STRING_AGG('target.' + '[' + c.name + ']' + ' = source.' + '[' + c.name + ']', ', ')
	FROM sys.columns c
	JOIN sys.types t ON c.user_type_id = t.user_type_id
	WHERE c.object_id = OBJECT_ID(@target_table) AND c.is_computed = 0


	-- Construct the dynamic SQL for the MERGE statement
	SET @sql = N'
	DECLARE @ProcessedRows TABLE (performed_action NVARCHAR(10), id bigint);

	MERGE ' + @target_table + ' AS target
	USING ' + @source_table + ' AS source
	ON target.id = source.id
	WHEN MATCHED THEN
		UPDATE SET ' + @updateColumns + '
	WHEN NOT MATCHED BY TARGET THEN
		INSERT (' + @columns + ')
		VALUES (' + @columns + ')
	OUTPUT $action, inserted.id
	INTO @ProcessedRows;

	DELETE FROM ' + @source_table + ' 
	WHERE id IN (SELECT id FROM @ProcessedRows);
	'
	-- Execute the dynamic SQL
	EXEC sp_executesql @sql
END

200 records to SQL Server is NOT a heavy load by any means.

I would script this and either do a gateway tag change script to trigger it, or schedule it with the API call and then use the bulk insert code from this thread utilizing the prepInsertQMarks specifically.

I gather around 4000± tag values every 15 minutes and write the values to rows in a database that are 4 columns wide, the entire thing is done in around 8 seconds, mind you this is also gathering those data points from 15 different gateways.

Agreed a bulk insert on a scheduled interval is preferred. I have been trying to propose this with the team locally, but more or less has fallen on deaf ears. The problem is the solution we tried to implement works in other languages; in previous systems we’d handle the insert in one thread. Anything under 1k records we’d normally just insert individually off setting the insert by around 50 ms or longer per record as we weren’t as concerned about execution time.

The issue I ran into here is structural by the looks of it. Upon further research it looks like the issue may be with Python itself and how it handles threads.

Python itself is multithreaded…. but not in the traditional sense. As a safety factor it does not allow daemon threads. Meaning, whilst resource utilization is spread across multiple thread dynamically how each thread is processed by the global interpreter procedurally… It the Global Interpreter Lock (GIL) problem. GlobalInterpreterLock - Python Wiki

Performance hence is limited for a safety factor. It would appear that some of these scripts caused just enough of a delay to the global interpreters que to trigger a safety mechanism in registered transactional groups resulting in records being trapped in Store & Forward. Requiring manual intervention to recover from…

If I had a choice, I’d have the data acquisition in a separate windows service outside of Ignition. But since my options are limited, the compromise I found was just to do a flat export of the API response in a CSV format. From there a scheduled SQL job preforms a bulk insert on the data set.

=================================================================

Two take aways from this,

  1. Need to modify structures to bulk insert records if I cannot piece mail it between multiple calls.
  2. Transactional Groups are sensitive.. greater care has to be taken on execution times

*Ideal case would be Edge devices with direct SQL connectivity to eliminate this and make the environment more robust. Optimizing performance will resolve the issue, but depending on the use cases down the line I suspect this will become an issue again as is.

Jython does not have any GIL. Jython's native collection and map objects are thread-safe. Algorithms built with them may not be (and are typically not thread-safe).

I had tried a similar approach, but was running into the same issue populating the temp table. Caused a similar desync with the transactional groups.

Only way I could make things functional was export the data set to a csv then preform a bulk insert. From there I could do a controlled merge on the data, but to be honest the data was just a snapshot from another system and could be volatile. More or less I could clear existing entries and push in data from the current snapshot.

Consider switching to a DB brand that:

  1. Supports the SQL standard ON CONFLICT clause, or similar non-standard functions (aka UPSERT).

  2. Supports arrays in JDBC and the SQL standard UNNEST function to convert arrays to table-like objects.

( * Cough * PostgreSQL * Cough * )

1 Like

It looks like you’re correct. To be frank, I’m a little bit glad that I was wrong.

Upon further research though, it seems like it still may operate in a similar manner resulting in a similar outcome to GIL. Is it true that Ignition has a limited number of threads? Some forms suggested that real world 20-50 roughly is what it seems like. I would hope though that it could dynamically size the number of threads based off hardware specifications. If not you could run into limitations on the number of operations per one instance.

I’m a bit confused, with a manual trigger realistically I’d only be locking 1 thread. If traffic is dynamically allocated it should be able to route traffic amongst the remaining channels. Unless under extremely high load, this shouldn’t cause an issue. Is what I’m experiencing thread exhaustion then, where somewhere in the execution time load peaks and results in a cascading failure?

Unfortunately I do not have a choice SQL side.

That being said, in this case there was a method to do something similar. More or less exported the data set to a CSV format in a shared location. From there I can trigger the bulk insert from a stored procedure SQL server side. I had tried making the bulk insert statement a named query, but ran into some issues. I ended up dividing the process into two steps to reduce execution time and share the resource usage between the two servers.

The method works pretty well, I use it on another C# system that deals with 150k → 200k records per hour.

No. No limit to # of threads. But since calls into jython generally come from events in Ignition, and those tend to operate with limited thread pools, it can appear to have limited threads. (But abuse system.util.invokeAsynchronous() and you will suffer.)

Not clear. You script and SQL and execution pattern is not trivially understood.

Sigh. You should share issues like this with those who mandate MS products.

My SQL execution pattern is pretty straightforward.

  1. Call API requesting information, depending on the request the size of the JSON structure varies from 10~2000 records at most.
  2. With the JSON response, i’d loop through contents calling a stored proc SQL server side that checks if the record exists. If it does it updates, if it dosent it inserts the record. Execution time roughly .00386ms.
  3. Once all records have been processed the script ends. Timing wise, the script only runs once a week.

2a* Concerns were raised with regards to the stored procedures performance… Upon review we did not need to update existing records. Instead the records could be delete prior to the script executing and the stored proc could be change to a straight insert statement. This did not resolve the problem.

  • To rule out issues with delete as well trialed removing that and trying straight inserts.
  • Same issue arose with transactional groups.

2b* Concern was raised that potentially a higher volume of requests could result in a Denial of Service SQL side… This does not appear to be the case as server is still responsive and no other systems are impacted. We do see a small spike in Processor time when the script executes.

===================================================================

Additional information: A less optimized version of the script had been running for around a year on Sundays without issues when production was down. The main difference being the insert queries were directly defined in Ignition as opposed to being defined stored procedures on the SQL Server. This past weekend though someone had scheduled a poorly optimized query causing a spike in performance SQL side. Seemingly this spike processor time SQL server time from the typical range of 10-20% to (50-65%) roughly.

The script in question then caused the same issue with transactional groups. It appears that if the script is executed about roughly 40% processor time on the server the issue arises.

Do you know by chance if Ignition processes Named Queries quicker than stored procedures Server side? That could explain why one performs a bit better than the other. That being said, it appears the issue is with the (number of calls x response time) from the SQL instance.

  • Another recommendation that was raised was to increase memory SQL Server side to improve performance.
  • But if I better optimize the script, may not be needed as it should remove the potential spike. That being said I’d like a better idea of the limitation im hitting so I can avoid it into the future.

I would not expect NQs to be quicker. Be aware that vanilla JDBC does not expect to run SQL scripts, just single statements. That it works with some JDBC drivers (like MS SQL Server) is vendor-specific behavior.

Using ON CONFLICT in an insert operation sourcing from UNNEST allows the entire operation (many records) to be executed properly in one statement with one round trip to the database. Competence.

I don't have any particular advice for your current problem. You simply do not have the tools at hand to do this efficiently.

When you are saying transactional groups, are you referring to transactions on the SQL Server side or Transaction Groups in the SQL Bridge Module on the Ignition side?

Actually, now that I reread the post, am I understanding correctly that you are running the API call and the SQL insert from a script and while that script is running you see unrelated transaction groups getting stuck in the store and forward?

Have you checked the gateway’s memory usage when you are getting into this state?

Have you checked your database connection status page to see if you are using all available connections? May just need to increase the available number of connections.

That may’ve been the issue. Other people had poorly performing queries running in the environment. The normal state is 0-1 connections being used. But watching it roughly every 60 seconds something spikes and we hit 7/8 connections used. If my long running script ran during that duration we’d be maxed out and things would start to fail. I suspect someone is running something in my environment I’m unaware of. A deeper dive is needed on my end.

The other problem may be that we did not separate the Database the transactional groups are logging into from these side project. I suspect if I set up a redundant connection on the Same SQL Server just with a different default DB I can limit the number of transactions on the main thread. In both cases, hardware side we should be fine.

So this may be a case of connection exhaustion under peak load as opposed to thread exhaustion. Better house keeping is needed on my end.

You can increase the number of connections in the configuration for the database connection instead of adding another, depends if you want dedicated connections for this script separate from the transaction groups.

Probably still worth figuring out what is using all of the connections.

Yeah in this case there were calls to a separate DB in the instance and some un optimized reporting going on. Effectively in a lot of cases non critical transactions going on in the primary thread.

I could just set the default to -1 to increase the number of connections pretty much based off available hardware, but by making a different datasource I can still cap the number of connections. What should that cap be? No Idea. But we can more or less define that threshold based off SQL server performance. I could also set a default for 8 connections and then reduce it over time with a time study on peak traffic.

But this route makes a lot more sense. On my end the script I was using was fairly optimized and historically worked… I was unaware other people in the instance were able to mature some of their collection systems. I tried troubleshooting with local support, but I did not have much luck.

Thankyou very much!