Python script optomisation

Hello, I have a script to open a log file, search for a string, and if found insert the string into a database.

There is a log file generated per day of the month for each test cell. There are 16 test cells. Each log file holds around 650K results but I am not interested in a lot of them. I am currently experiencing a bottle neck however whenever I exceed 4 search strings or 10 test cells which results in ~38K db inserts.

If I search 2 strings in 3 cells it takes 15 seconds and generates 4.6k records.
3 in 3 takes 22 secs with 9.2k results.
4 in 3 takes 30s with 13k results
4 in 8 takes 60s with 28k results

For any of these looking at performance in task manager the 8 threads are happily ticking over peaking at maybe 15%.

Once I exceed 38K results the 8 threads go to 95% and I have to kill the j2plauncher process, no error messages are displayed.

Anyway, here’s the code as it stands, many thanks to althepal for getting me started on this;

cells = range(153,162)

day = "01"


string_1 = "NOREAD"
string_2 = "cu to rotate to slide"
string_3 = "center unit to rotate to robot"
string_4 = "closing gripper 2"

strings = [string_1, string_2, string_3, string_4]

for cell in cells:
	filepath = "\\\selcfts" + str(cell) + "\\SEL\\Logs\\" + day + "_Handling.log"
	for string in strings:
		file = open(filepath)
		for line in file:
			if line.find(string) != -1:
				system.db.runPrepUpdate("INSERT INTO logdata (time, string, cell, date) VALUES (?,?,?,?)", [line[:12], string, cell, "01/02/2016"])
		file.close()

There is a problem with 152’s network path, I can’t remote browse to it. Tried taking it off and back on the domain which fixed it for 30 mins but then reverted to non-browseable. The end goal is cells in range 150-165 on this site. I need to add some form of error handling, because if I do range(150,166) it fails on 152 and therefore doesn’t do any more. There is a chance that when this script runs at midnight every day that one of the cells could be powered down. Not sure how to go about this?

I have increase the max ram in the .conf file to 8GB. Are there any other reasons why I can’t exceed 38k entries? The sql server is MS SQL MS 2012.

[quote=“the_msp”]Anyway, here’s the code as it stands, many thanks to althepal for getting me started on this;[code]
cells = range(153,162)

day = “01”

string_1 = “NOREAD”
string_2 = “cu to rotate to slide”
string_3 = “center unit to rotate to robot”
string_4 = “closing gripper 2”

strings = [string_1, string_2, string_3, string_4]

for cell in cells:
filepath = “\\selcfts” + str(cell) + “\SEL\Logs\” + day + “_Handling.log”
for string in strings:
file = open(filepath)
for line in file:
if line.find(string) != -1:
system.db.runPrepUpdate(“INSERT INTO logdata (time, string, cell, date) VALUES (?,?,?,?)”, [line[:12], string, cell, “01/02/2016”])
file.close()[/code][/quote]I can’t speak to your remote browsing or SQL server issues, but your code has one huge speed killer: you are opening and reading each log file separately for every search string. Try it this way instead:for cell in cells: filepath = "\\\selcfts" + str(cell) + "\\SEL\\Logs\\" + day + "_Handling.log" file = open(filepath) for line in file: for string in strings: if line.find(string) != -1: system.db.runPrepUpdate("INSERT INTO logdata (time, string, cell, date) VALUES (?,?,?,?)", [line[:12], string, cell, "01/02/2016"]) file.close()

Thanks Phil, I tried both methods, for a 28k result set my method took 59 seconds and yours took 43 seconds. I originally did have just one file open, but it was slow because I was doing multiple search strings concurrently (no for loop).

Regarding the out of memory, I ran this on my laptop and got the following error;

[quote]

[quote]Traceback (most recent call last):
File “event:propertyChange”, line 8, in
java.lang.OutOfMemoryError: Java heap space

java.lang.OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space

at org.python.core.Py.JavaError(Py.java:495)
at org.python.core.Py.JavaError(Py.java:488)
at org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:188)
at com.inductiveautomation.ignition.common.script.ScriptManager$ReflectedInstanceFunction.__call__(ScriptManager.java:428)
at org.python.core.PyObject.__call__(PyObject.java:404)
at org.python.core.PyObject.__call__(PyObject.java:408)
at org.python.pycode._pyx15.f$0(<event:propertyChange>:8)
at org.python.pycode._pyx15.call_function(<event:propertyChange>)
at org.python.core.PyTableCode.call(PyTableCode.java:165)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1275)
at com.inductiveautomation.ignition.common.script.ScriptManager.runCode(ScriptManager.java:624)
at com.inductiveautomation.factorypmi.application.binding.action.ActionAdapter.runActions(ActionAdapter.java:168)
at com.inductiveautomation.factorypmi.application.binding.action.ActionAdapter.access$000(ActionAdapter.java:40)
at com.inductiveautomation.factorypmi.application.binding.action.ActionAdapter$ActionRunner.run(ActionAdapter.java:280)
at java.awt.event.InvocationEvent.dispatch(Unknown Source)
at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.awt.EventQueue.access$500(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)

Caused by: java.lang.OutOfMemoryError: Java heap space

Ignition v7.8.0 (b2015101414)
Java: Oracle Corporation 1.8.0_73
[/quote][/quote]

Try this:

[code]def multiRowPrepUpdate(query, valuesString, values, split, database = None):
import system, app
import java
import sys

baserow = 0
while baserow < len(values):
	subrow = 0

	# Grab the number of rows to insert
	if baserow+split < len(values):
		insertRows = split
	else:
		insertRows = len(values) - baserow

	# Build the query
	multiRowQuery = query
	multiRowQuery += " "
	multiRowQuery += (("%s,"%valuesString) * insertRows)[:-1]

	# Grab the values
	multiValues = getList(values, baserow, insertRows)

	numColumns = len(valuesString.split(","))




	# Run the query
	system.db.runPrepUpdate(multiRowQuery, multiValues, database)


						
	# Increment the count
	baserow = baserow + insertRows

def getList(values, start, count):
returnList = []
i = 0
while i < count:
for item in values[start + i]:
returnList.append(item)
i += 1
return returnList

cells = range(153,162)

day = “01”

string_1 = “NOREAD”
string_2 = “cu to rotate to slide”
string_3 = “center unit to rotate to robot”
string_4 = “closing gripper 2”

strings = [string_1, string_2, string_3, string_4]

valuesToInsert = []
query = “”"
INSERT INTO logdata (time, string, cell, date) VALUES
“”"

for cell in cells:
i = 0
valuesToInsert = []
filepath = “\\selcfts%d\SEL\Logs\%s_Handling.log”%(cell, day)

try:
	file = open(filepath)
	for line in file:
		valuesToInsert.extend( [[line[:12],s,cell,"01/02/2016"] for s in strings if s in line])

	file.close()
	multiRowPrepUpdate(query,"(?,?,?,?)",valuesToInsert, 500, "LocalSQL")
	print "FileData: ", filepath, len(valuesToInsert)
except:
	print "Error reading ", filepath[/code]

Wow, thank you. 80k entries in 1min 14 seconds. For cells 153-165. If I include 152, it errors out on all cells (output shows error in file path for 150, 151, 152, etc) and no results get input to db.

In work again tomorrow, I’ll try and hack a solution.

How did you arrive at the figure of 500 for, what I assume is 500 inserts per time? I tried 100, 1000, 10,000 just to see the effect but 500 is best.

Regards,

Patrick

Well it appears I jumped the gun a little on the filepath error. I was running it in the script console which generated the error, but if I run it on an action performed button script, it runs perfectly for all cells including 152. Just for testing, I added 12 unique strings for all cells. It took 5 minutes 25 seconds to do 2.1M SQL entries. This is on a quad core (w/ht) i7 with 8GB ram, but the ram stayed stable and did not increase as the script ran. Speed isn’t a major issue anyway, as this script is going to run at 00:15 every day and look at yesterdays log files.

Next step is to copy this into a gateway tag change event script, triggered by a timer script;

[code]from time import localtime, strftime
curDay = strftime("%d", localtime())
curTime = strftime("%H:%M:", localtime())

if curTime == “00:15:”:
system.tag.write(‘FT/Trigger_logfile_analysis’, 1)

if curTime == “07:15:”:
system.tag.write(‘FT/Trigger_logfile_analysis’, 0)[/code]

But I need it to run for yesterdays date, not todays. There is an expression that can do this:

dateArithmetic(now(), -1, "days")

which handles potential start of month issues nicely, but is there any way to achieve this in python?

I don’t have time to test it, but I think I can point you in the right direction.

Take a look at the datetime python module. It has a timedelta object that I think can be used similarly to the expression that you mention.

Look here for documentation on it: https://docs.python.org/2/library/datetime.html#module-datetime

I picked 500 because it wasn’t too big or too small.

To further increase the speed, you can try using invokeasynchronous for each file. Or run each file on a separate thread.

Have you thought about piping everything through the store and forward system. Using system.db.runSFPrepUpdate might be a good start.

I haven’t Kyle, I will look into it.

Thanks

I got the date code sorted. Is there anything in this script that prevents it from not running as a gateway event script? I.e. any of the functions not having the correct scope. As far as I can see they are all scoped “all”.

When I try and run it nothing happens, if I try a simpler script such as the delete from db command it executes so I know my trigger is working.

[code]def multiRowPrepUpdate(query, valuesString, values, split, database = None):
import system, app
import java
import sys

baserow = 0
while baserow < len(values):
	subrow = 0

	# Grab the number of rows to insert
	if baserow+split < len(values):
		insertRows = split
	else:
		insertRows = len(values) - baserow

	# Build the query
	multiRowQuery = query
	multiRowQuery += " "
	multiRowQuery += (("%s,"%valuesString) * insertRows)[:-1]

	# Grab the values
	multiValues = getList(values, baserow, insertRows)

	numColumns = len(valuesString.split(","))




	# Run the query
	system.db.runPrepUpdate(multiRowQuery, multiValues, database)


						
	# Increment the count
	baserow = baserow + insertRows

def getList(values, start, count):
returnList = []
i = 0
while i < count:
for item in values[start + i]:
returnList.append(item)
i += 1
return returnList

cells = range(150,166)

#get the date and month of yesterday

from java.util import Date, Calendar
now = Calendar.getInstance()
now.add(Calendar.DATE, -1)
yesterday = Date(now.getTimeInMillis())
yday = system.db.dateFormat(yesterday,“dd”)
month = system.db.dateFormat(yesterday,“MM”)

#day = event.source.parent.getComponent(‘Dropdown’).selectedStringValue

string_1 = “NOREAD;Status:NOREAD;”
string_2 = “cu to rotate to slide”
string_3 = “center unit to rotate to robot”
string_4 = “Slide in position”
string_5 = “closing gripper 2”
string_6 = “LEC Output 2 reset”
string_7 = “LEC Output 1 Set”
string_8 = “LEC Output 2 set”
string_9 = “LEC Output 1 reset”
string_10 = “LEC Output 0 set”
string_11 = “LEC Output 0 reset”
string_12 = “Slide transfer in”

strings = [string_1]
#strings = [string_1, string_2, string_3, string_4, string_5, string_6, string_7, string_8, string_9, string_10, string_11, string_12]

valuesToInsert = []
query = “”"
INSERT INTO logdata (time, string, cell, date) VALUES
“”"

for cell in cells:
i = 0
valuesToInsert = []
filepath = “\\selcfts%d\SEL\Logs\%s_Handling.log”%(cell, yday)

try:
	file = open(filepath)
	for line in file:
		valuesToInsert.extend( [[line[:12],s,cell, month + "/" + yday + "/2016"] for s in strings if s in line])

	file.close()
	multiRowPrepUpdate(query,"(?,?,?,?)",valuesToInsert, 500, "MsSQL")
	print "FileData: ", filepath, len(valuesToInsert)
except:
	print "Error reading ", filepath[/code]

Thanks

@the_msp, so you’re opening 15 files from the network. How large are these files and what does one look like?

The files are ~30MB each, around 650K lines per day, and look like this:

00:00:00.176- LEC Busy output reset 00:00:00.176- Slide state unload cell-->load cell 00:00:00.245- Slide loading cell: waiting for turn unit rotating g1 to chamber 00:00:01.024- Slide starting move to requested position 00:00:01.025- Slide loading cell: Moving to drum position 00:00:01.103- Slide starting move to requested position 00:00:01.112- LEC Busy output set 00:00:01.114- LEC Output 2 reset 00:00:01.120- LEC Output 1 set 00:00:01.120- LEC Output 2 set 00:00:01.215- LEC Output 1 reset 00:00:01.215- LEC Output 2 reset 00:00:01.265- LEC Output 1 set 00:00:01.265- LEC Output 2 set 00:00:01.832- Slide in position 00:00:01.883- LEC Busy output reset 00:00:01.883- Slide loading cell: waiting for gripper 1 to open 00:00:01.953- ChamberTransferIn: waiting for slide clear 00:00:01.961- Gripper open sensor set 00:00:01.961- Slide loading cell: Allowing gripper to fully open before moving clear 00:00:01.966- Gripper closed sensor reset 00:00:02.429- Slide starting move to requested position 00:00:02.429- Slide loading cell: Moving clear of cell 00:00:02.553- Slide starting move to requested position 00:00:02.585- Slide starting move to requested position 00:00:02.663- LEC Busy output set 00:00:02.666- LEC Output 1 reset 00:00:02.666- LEC Output 2 reset 00:00:02.671- LEC Output 2 set 00:00:02.841- Gripper part present sensor reset 00:00:03.147- Slide in position 00:00:03.179- LEC Busy output reset 00:00:03.179- Slide loading cell: cell loaded 00:00:03.179- Slide state load cell-->transfer in 00:00:03.287- Slide transfer in: opening gripper 1 00:00:03.287- Slide starting move to requested position 00:00:03.287- Slide transfer in: moving to lift position 00:00:03.353- Slide transfer in: moving to lift position 00:00:03.353- ChamberTransferIn: transfer in complete 00:00:03.365- Slide starting move to requested position 00:00:03.365- Slide transfer in: moving to lift position 00:00:03.443- Slide starting move to requested position 00:00:03.443- Slide transfer in: moving to lift position 00:00:03.464- LEC Busy output set 00:00:03.466- LEC Output 2 reset 00:00:03.472- LEC Output 0 set 00:00:03.472- LEC Output 1 set 00:00:03.567- LEC Output 0 reset 00:00:03.567- LEC Output 1 reset 00:00:03.608- Slide in position 00:00:03.617- LEC Output 0 set 00:00:03.617- LEC Output 1 set 00:00:03.636- LEC Busy output reset 00:00:03.753- ChamberTest: closing door 00:00:03.755- Slide transfer in: lowering lift 00:00:03.940- Door open sensor reset

Are you stuck to doing this in Ignition?
This seems like a task better suited for a file watcher application…
Have you tried regular expressions instead of parsing string?
Depending on what you need to do with the data later on I personally would use biztalk and either shove it into a db or splunk.

I’m not stuck to Ignition but to me it makes sense as the data is displayed live in a client and is also e-mail reported from Ignition. I haven’t tried regular expressions.

Any input on why this won’t execute as a gateway event script?

Thanks

[quote=“the_msp”]Any input on why this won’t execute as a gateway event script?[/quote]Gateway scripts often won’t report errors. Use system.util.getLogger() and a try/except block to catch them and send them to the gateway console. That’ll let you see all of your exceptions. You might also want to send intermediate reports to the log in the midst of your script to see progress, much like you’d put ‘print’ statements in a client script.

In my opinion, 1 minute delay creates a bad user experience.

/c

So seems to be that this line;

multiRowPrepUpdate

is the cause. Running it in the script console, or on a gateway event, won’t write anything to the db. It will on an action performed button in a client for some reason. I could change this to system.db.runPrepUpdate, as my script in the 1st post does execute and write to the db, but then I’d be back to my original issue once I exceed 30k entries, and also the filepath error.

So I tried the store and forward suggestion, and also combining the try/except for filepath

here is the script, works well, a little slower but we have a 7 hour window to gather the data.

[code]if newValue.value:
cells = range(150,166)

#get the date and month of yesterday
from java.util import Date, Calendar
now = Calendar.getInstance()
now.add(Calendar.DATE, -1)
yesterday = Date(now.getTimeInMillis())
yday = system.db.dateFormat(yesterday,"dd")
month = system.db.dateFormat(yesterday,"MM")

#build list of strings to search for
string_1 = "NOREAD"
string_2 = "cu to rotate to slide"
string_3 = "center unit to rotate to robot"
string_4 = "closing gripper 2"
string_5 = "closing gripper 2"
string_6 = "LEC Output 2 reset"
string_7 = "LEC Output 1 Set"
string_8 = "LEC Output 2 set"
string_9 = "LEC Output 1 reset"
string_10 = "LEC Output 0 set"
string_11 = "LEC Output 0 reset"
string_12 = "Slide transfer in"

strings = [string_1]
#strings = [string_1, string_2, string_3, string_4, string_5, string_6, string_7, string_8, string_9, string_10, string_11, string_12]

for cell in cells:
	filepath = "\\\selcfts" + str(cell) + "\\SEL\\Logs\\" + yday + "_Handling.log"
	try:
		file = open(filepath)
		print "Success reading ", filepath
		for line in file:
			for string in strings:
				if line.find(string) != -1:
					system.db.runSFPrepUpdate("INSERT INTO logdata (time, string, cell, date) VALUES (?,?,?,?)", [line[:12], string, cell, month + "/" + yday + "/2016"], datasources=["MsSQL"])
		file.close()
	except:
		print "Error reading ", filepath[/code]

Chris, sorry I meant other data is live, this data will never be live as we can’t look at the log files when the cells are writing to them.