Issues Moving Dataset to DB

Still in dev for the script so alot of hardcoding that ill fix after it works.

getting this error, at a standstill on what to try next

def valueChanged(tag, tagPath, previousValue, currentValue, initialChange, missedEvents):
	def populate_dbtable_from_pydataset(pds, dbTable, dbConn, maxRecs):
		
		cols = pds.getColumnCount()
		rows = pds.getRowCount()
		
		#clear db table each time
		queryStr = "TRUNCATE TABLE {:s}".format(dbTable)
		system.db.runUpdateQuery(queryStr, dbConn)
		
		# no inserts if no data
		if (rows >0):
			
			# set up insert query string
			
			#pholders = "({:s})".format(",".join("?"*cols))
			pholders = "({:s})".format("Seq_Number","Next_Seq_Number","External_Serialized","Manual_Process","Bypass_Task","BOM_PN,Deviation_PN","Deviation_Number","Deviation_Experation","Work_Instructions","Picture_Name","Tool","Job","PV1_Name","PV1_UL","PV1_LL","PV2_Name","PV2_UL","PV2_LL","PV3_Name","PV3_UL","PV3_LL","Doc_Name")
			queryStr = "INSERT INTO {:s} VALUES {:s}".format(dbTable, "{pholders}")

			# indices for first N records
			minIdx = 0
			maxIdx = maxRecs
	
			# DB inserts in batches of N
			while (maxIdx < rows):
				queryArgs = []
				queryData = []
				for row in pds[minIdx:maxIdx]:
					queryData.extend(row)
					queryArgs.append(pholders)
			
				#queryStr1 = queryStr.format(pholders=",".join(queryArgs))
				queryStr1 = queryStr.format(pholders(queryArgs))
				
				# insert N rows in DB
				system.db.runPrepUpdate(queryStr1, queryData, dbConn)
		
				#advance and leftover records
				minIdx = rows
				MaxIdx += maxRecs
		
		# handle any leftover records
		maxIdx = rows
		queryArgs = []
		queryData = []
		for row in pds[minIdx:maxIdx]:
			queryData.extend(row)
			queryArgs.append(pholders)
			
		queryStr2 = query.format(pholders=",".join(queryArgs))
		
		# insert remaining rows into DB
		system.db.runPrepUpdate(queryStr2, queryData, dbConn)
		
	ds = system.tag.read("[.]MESv4_Data/500 WIs Dataset").value
	pds = system.dataset.toPyDataSet(ds)
	dbTable = "WI_Test3"
	dbConn = "LOMESSQL01_BE35k"
	maxRecs = 100
	populate_dbtable_from_pydataset(pds, dbTable, dbConn, maxRecs)

just updated the picture with the proper error

system.db.runUpdateQuery(query, dbConn)

Where is the variable query set?
I think you need this to be:

system.db.runUpdateQuery(queryStr, dbConn)

yea I realized after posting.. the error and script are updated

pholders in line 17 is = "(Seq_Number)"
Since pholders is a string, you cant call pholders() later in the script.

Also, this looks like it is built in a Tag Event Script.
This is definitely not the place to be putting this type of script, as it will clog up your tag update threads with high latency DB calls.

Looking at your script:

Line 17:

pholders = "({:s})".format("Seq_Number","Next_Seq_Number","External_Serialized","Manual_Process","Bypass_Task","BOM_PN,Deviation_PN","Deviation_Number","Deviation_Experation","Work_Instructions","Picture_Name","Tool","Job","PV1_Name","PV1_UL","PV1_LL","PV2_Name","PV2_UL","PV2_LL","PV3_Name","PV3_UL","PV3_LL","Doc_Name")
queryStr = "INSERT INTO {:s} VALUES {:s}".format(dbTable, "{pholders}")```

This looks almost like you are copy/pasting Ch@GPT code into you project.
your two strings here at this point contain:

`queryStr =  "INSERT INTO dbtable VALUES {pholders}"`
`pholders = "(Seq_Number)"`

What are you expecting to be in these variables at this point?
1 Like

built this out of a inductive university video. will definitely move out of a tag event, this is just for testing using real values.

pholders are my column names in the DB

Oy! Running DB queries and updates from a tag valueChange event is extremely unwise. Lock up all of your tags kind of unwise.

You really should be doing such complex work in a project's gateway tag change event, where it won't impact anything else.

1 Like

Tag Change events in a project and tag valueChange events have completely different arguments. You will have to redo significantly.

A project tag change event can also easily use a project library function, which would make your situation much easier to handle. (Particularly if you float the library script window.)

hey Phil,

this is on a isolated gateway I use just for testing. Once (if) i get this to work then ill move to gateway events

pholders = "{:s})".format("Seq_Number","Next_Seq_Number","External_Serialized","Manual_Process","Bypass_Task","BOM_PN,Deviation_PN","Deviation_Number","Deviation_Experation","Work_Instructions","Picture_Name","Tool","Job","PV1_Name","PV1_UL","PV1_LL","PV2_Name","PV2_UL","PV2_LL","PV3_Name","PV3_UL","PV3_LL","Doc_Name")
queryStr = "INSERT INTO {:s} VALUES {:s}".format(dbTable, "{pholders}")

Dump this code, its statically coded column names, just write the query.

queryStr = 'INSERT INTO {:s} ("Seq_Number","Next_Seq_Number","External_Serialized","Manual_Process","Bypass_Task","BOM_PN,Deviation_PN","Deviation_Number","Deviation_Experation","Work_Instructions","Picture_Name","Tool","Job","PV1_Name","PV1_UL","PV1_LL","PV2_Name","PV2_UL","PV2_LL","PV3_Name","PV3_UL","PV3_LL","Doc_Name") VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)'.format(dbTable)

Then delete line 33:
queryStr1 = queryStr.format(pholders(queryArgs))

Change line 36:
system.db.runPrepUpdate(queryStr, queryData, dbConn)

update, I wasn't able to get the previous script to run. So I built a new one. This is working but looking for recommendations on best practices to implementation and optimization so less hardcoding of values

process is
new data is uploaded to a dataset
user clicks button to commit new data to DB
run script somewhere

heres the code

ds = system.tag.read("[default]ABS/Jason Dev/Dev Station/MESv4_Data/500 WIs Dataset").value
# Convert to PyDataset
pds = system.dataset.toPyDataSet(ds)

dbTable = "WI_Test4"
dbConn = "LOMESSQL01_BE35k"

clearTable = "TRUNCATE TABLE {:s}".format(dbTable)
system.db.runUpdateQuery(clearTable, dbConn)

# Loop through the rows in the dataset, and insert them into the database.
for row in pds:
	column1 = row['Seq_Number']
	column2 = row['Next_Seq_Number']
	column3 = row['External_Serialized']
	column4 = row['Manual_Process']
	column5 = row['Bypass_Task']
	column6 = row['BOM_PN']
	column7 = row['Deviation_PN']
	column8 = row['Deviation_Number']
	column9 = row['Deviation_Experation']
	column10 = row['Work_Instructions']
	column11 = row['Picture_Name']
	column12 = row['Tool']
	column13 = row['Job']
	column14 = row['PV1_Name']
	column15 = row['PV1_UL']
	column16 = row['PV1_LL']
	column17 = row['PV2_Name']
	column18 = row['PV2_UL']
	column19 = row['PV2_LL']
	column20 = row['PV3_Name']
	column21 = row['PV3_UL']
	column22 = row['PV3_LL']
	column23 = row['Doc_Name']

	query = "INSERT INTO "+str(dbTable)+" (Seq_Number,Next_Seq_Number,External_Serialized,Manual_Process,Bypass_Task,BOM_PN,Deviation_PN,Deviation_Number,Deviation_Experation,Work_Instructions,Picture_Name,Tool,Job,PV1_Name,PV1_UL,PV1_LL,PV2_Name,PV2_UL,PV2_LL,PV3_Name,PV3_UL,PV3_LL,Doc_Name) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
  
	
	system.db.runPrepUpdate(query, [column1, column2, column3, column4, column5, column6, column7, column8, column9, column10, column11, column12, column13, column14, column15, column16, column17, column18, column19, column20, column21, column22, column23], dbConn)

a good note is the number or columns is constant, but rows is a variable. Hence the loop

If you can guarantee the columns in your dataset exactly match your columns in the database, you can script the creation of basically the entire query.

There's a few different list comprehensions and I'm heavily leaning on Python's built in join function here, and I only did a few columns in my example dataset to prove the concept, but by far the most important gain for efficiency is to do the minimum number of operations to the database as possible.
Besides being more efficient on the DB (by ingesting all the rows at once, you can minimize the pain of rebuilding indexes, holding table locks, etc), it's also much more efficient in terms of network time, which is essentially always orders of magnitude slower than CPU time on the local machine where you're processing data.
I'm doing this with a multiple row insert statement - that is, VALUES (?, ?), (?, ?).

ds = system.dataset.toDataset(
	["Seq_Number", "Next_Seq_Number", "External_Serialized"],
	[
		[123, 456, 111],
		[456, 789, 222],
		[789, 999, 333]
	]
)

pds = system.dataset.toPyDataSet(ds)

dbTable = "WI_Test4"
dbConn = "LOMESSQL01_BE35k"

#clearTable = "TRUNCATE TABLE {:s}".format(dbTable)
#system.db.runUpdateQuery(clearTable, dbConn)

# build up a multi-value-insert statement, in the form:
# (?, ?, ?), (?, ? ,?)
values_statement = ", ".join(
	"(" + ", ".join("?" for _ in range(len(row))) + ")"
	for row in pds
)

query = """
	INSERT INTO {}
	({}) 
	VALUES {}
""".format(dbTable, ", ".join(pds.columnNames), values_statement)

print query

# flatten the pydataset to pass into the runPrepUpdate function as one long list
flat_values = [
	value 
	for row in pds
	for value in row
]

print flat_values
#system.db.runPrepUpdate(query, pds, dbConn)

You also have to 'flatten' the pydataset (which acts like a list of lists) into a single long list to pass it into runPrepUpdate as a single set of arguments.

Note that depending on your DB flavor and how many rows you're trying to insert at once, you may run into issues with the number of parameters you're allowed to send in one statement. As long as you're below ~1000 values you should be fine on every DB I'm aware of, though.

I obviously commented out the actual DB operations (and also haven't tested the full script) but this is what it outputs with my test dataset:

>>> 

	INSERT INTO WI_Test4
	(Seq_Number, Next_Seq_Number, External_Serialized) 
	VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?)

[123, 456, 111, 456, 789, 222, 789, 999, 333]

MSSQL Server is 2100 items, I learned the hard way. That also led to the creation of this helper function:

def calculateRequiredSQLExecutionCount(columnCount, recordCount, maximumParams=2100):
	"""
	Calculates the number of sql queries required to offload data of specified
	length into a table with specified number of columns

	Arguments:
		columnCount: Number of columns in target db table
		recordCount: Number of records(rows) to be written
	Returns:
		Number of records(rows) to process per execution,
		Number of executions required to write all records
	"""

	requiredExecutions = int((recordCount * columnCount) / maximumParams) + 1
	chunkSize = int(recordCount / requiredExecutions) + 1
	logger.debug(str(chunkSize))

	return chunkSize, requiredExecutions
1 Like

mostly worked, issue is null values. Which I expect to have many of

update, looking through the logs.. I hit that 2100 number and the server shut it down

If you use a competent database backend, you can do essentially unlimited rows per batch with just one ? placeholder per column:

{ Sorry, Jason, MS SQL Server is not a competent database. }

2 Likes

Then you're going to have to 'batch' the inserts into smaller sets. I'd divide that 2100 by the number of columns in your dataset to know how many rows of the pydataset to take at one time.