Help with SQL Query to write data from one DB table to another for a monthly report

Real quick crash course:

A primary key is key that uniquely identifies a record in a database table. Sort of like a mathematical function (in the strictest definition of a function) where every input maps to exactly one output, and no two inputs will give you the same output.

In the most basic form, one that’s widely used, this would be an id column in a table. So this means if you want record 5, you do SELECT * FROM table WHERE id = 5, and that is the ONLY way you can see record 5 (when only using the id to get stuff). The uniqueness implies other things as well. Assume we have that record with an id of 5. Then if you try INSERT INTO table(id, someColumn) VALUES (5, 'someData') the database will reject this - because then you would have two records with the same primary key - this is not allowed. If it was, then doing SELECT * FROM table WHERE id=5 would give two records and it would not longer UNIQUELY identify the record.

So now with the definition of a primary key out of the way, now its worth noting that you do not need to only use an id column as your primary key. In fact using an id column is often referred to as a surrogate key because although it helps provide a key to the table, that key does not reflect any real world meaning.

The alternative to the surrogate key is what is called a natural key. In your table example, you said you wanted one record per day, so if you did SELECT * FROM tableC WHERE dateColumn='2022-07-20' - this is what should uniquely identify the record. This is called a natural key because there’s actual semantic meaning to it - you only have one record count per day due to some business logic or rational (whereas id’s being a primary key is kind of just learned and repeated as suggested by @pturmel cargo-cult comment).

The reason you’d want to use a natural key in this manner is the data uniqueness and constraint it provides for you. Imagine if your tableC used id as the primary key - you could have multiple records where dateColumn='2022-07-20' and then when you’re selecting - which one is the “right” one? You’d have to do further investigation. If you had dateColumn as your primary key - this would not ever be possible.

Using natural keys then lends itself to then leveraging other database tools and syntax that otherwise make no sense with a surrogate key, like INSERT ... ON DUPLICATE KEY UPDATE. This statement allows you to try to insert into a table and if there is already a dateColumn valued ‘2022-07-20’ then gives you the opportunity to then just update the existing record, kind of a two for one.

I don’t personally use this syntax (though maybe I should start) but you can do the same logic explicitly with python in Ignition as I outline here - Insert into database only if the entry doesn't exist - #2 by bkarabinchak.psi

Duplicate data is probably the most common cause of a database being in an unexpected state and leaving people scratching their heads and getting your primary keys correctly defined is the easiest most powerful way to prevent it.

Highly recommend googling and reading up on: primary key, natural key, surrogate key, and how to make a primary key that consists of multiple columns as in the real world, its usually a few pieces of information that uniquely identify a record.

3 Likes