It is a common occurrence that I want to write multiple rows from a dataset into SQL.
For example, I have a table where operators can enter a part number, and how many were produced during that shift. If there are multiple types of parts produced, then there will be multiple entries in the table.
Currently, the style I have been writing this code looks something like the following:
for row in pydataset :
params = {'part_number': row[0],
'part_quantity': row[1]}
system.db.runNamedQuery("InsertIntoLogs", params)
Is there any advantage to moving this to be a singular transaction?
Eg, something like
tx = system.db.beginNamedQueryTransaction()
for row in pydataset :
params = {'part_number': row[0],
'part_quantity': row[1]}
system.db.runNamedQuery("InsertIntoLogs", params, tx)
system.db.commitTransaction(tx)
system.db.closeTransaction(tx)
The page on beginNamedQueryTransaction does not provide much information on what it actually accomplishes, other than running the transactions atomically.
My questions are
- How is this atomicity accomplished? Does it effectively place a lock on the table for the entire run time between when I begin the transaction and close it, meaning no other queries can access it? (EDIT: it seems like Atomicity in SQL means "either completely happens, or doesn't happen at all". My understanding of this word comes from Java, where it has to do more with concurrency and intermediate states)
- If I was inserting ten thousand rows into sql, would running ten thousand individual named queries cause an issue? Would transactions make this worse or better? Alternatively, I could build a prep query with ten thousand lines.
- If for some reason one of my queries fails due to an SQL error (breaks some constraint), or I have an error in my python (divide by 0), it would be nice to write no data to sql, rather than get halfway through writing, and then be left in a weird state. From some basic testing, it seems like unless commitTransaction is called, the transaction is discarded automatically (likely after it times out). Should I do anything else to handle this, such as try except?
How is this atomicity accomplished? Does it effectively place a lock on the table for the entire run time between when I begin the transaction and close it, meaning no other queries can access it?
Long answer: No Dirty Reads: Everything you always wanted to know about SQL isolation levels (but were too afraid to ask)
Short answer: things that read the table while a transaction is running won't see data from that transaction
Longer answer, if you really want the nitty gritty: Syllabus | Database Systems | Electrical Engineering and Computer Science | MIT OpenCourseWare
If I was inserting ten thousand rows into sql, would running ten thousand individual named queries cause an issue?
Not on the SQL side, in either direction. Start worrying when you hit 1m rows, is my advice. (Even for SQLite).
EDIT: THIS IS ASSUMING SEQUENTIAL WRITES, NOT 10k CONCURRENT WRITES.
If for some reason one of my queries fails due to an SQL error (breaks some constraint), or I have an error in my python (divide by 0), it would be nice to write no data to sql, rather than get halfway through writing, and then be left in a weird state.
Transactions do this, though you should probably put a rollback explicitly in the except rather than timeouts.
Here's a somewhat simplified version of what I use to bulk insert.
The idea is simple: Feed it the name of table where you want to insert things, and a list containing a dict for each row you want to insert.
def build_insert_string(table, data, columns=None):
"""
Build the query string and the values list for a multi-insert query.
Use with `system.db.runPrepUpdate`
params:
table (string): The table to insert into
data (list of dicts): A list containing a dict for each row to insert, where the keys are the columns names
columns_names (list of strings): The columns to be inserted. If None, they're deduced from the first row's keys.
return:
a tuple where:
- the first element is the query string formatted for a `prepUdate` (with question marks as place holders)
- the second element is the list of values
"""
if not data:
return None, None
if columns is None:
columns = data[0].keys()
marks = "({})".format(','.join("?" for _ in columns))
marks = ",".join(marks for _ in data)
col_names = ','.join(columns)
q = "insert into {} ({}) values {}".format(table, col_names, marks)
values = [row[c] for row in data for c in columns]
return q, values
Typical use looks like this:
q, v = build_insert_string('some_table', data)
system.db.runPrepUpdate(q, v)
3 Likes