Help with SQL Query to write data from one DB table to another for a monthly report

Hi guys,

I need to create a monthly report that uses data from 2 existing database tables - Table A and Table B. My plan is to create a new table - Table C and write data to it from Tables A & B on the daily. Table C would then be used to generate the monthly report. It will have 12 columns (Column A will be the date and Columns B thru L will hold report data from Tables A & B). All 3 tables will reside in the same database.

  • Step 1: Create New Table C
  • Step 2: Insert Table A data into Columns G thru L of Table C
    • Data will be inserted at the end of each day (11:59:59 PM)
    • Need to use aggregate functions to determine daily Min, Max, and Avg values

Draft code:

INSERT INTO Table C (Column_G, Column_H, Column_I, Column_J, Column_K, Column_L)

SELECT

DATEPART(Year, time_stamp) AS YEAR

DATEPART(Month, time_stamp) AS MONTH

DATEPART(Day, time_stamp) AS DAY

Min(Results_PSI) AS Column_G,

Max(Results_PSI) AS Column_H,

Avg(Results_PSI) AS Column_I,

Min(Results_TMP) AS Column_J,

Max(Results_TMP) AS Column_K,

Avg(Results_TMP) AS Column_L,

FROM Table A

GROUP BY DATEPART(Year, time_stamp), DATEPART(Month, time_stamp), DATEPART(Day, time_stamp),

ORDER BY YEAR, MONTH, DAY

  • Step 2: Update Table C data entry (row) from Step 1 by inserting data into Columns B thru F
    • This update/data insert will occur at 1:30am of the following day (data is representative of yesterday’s total)

Draft code:

INSERT INTO Table C (Column_B, Column_C, Column_D, Column_E, Column_F)

SELECT * FROM Table B

ORDER BY time_stamp

Questions:

  1. How do I specify the time ( in this case 11:59:59PM) for the data to be inserted into Table C in Step 1?

  2. In Step 2, how do I update the row of data (inserted in Step 1) with additional data from Table B without creating a separate row? Since the update will occur at 1:30am on the following day, the time stamp will be different than the one of the data that was inserted in Step 1.

  3. Any better/easier ways of accomplishing the above? I don’t think I can use transaction groups to do what I want to do.

Any help would be much appreciated.

Welcome to the forum.
Could you format the SQL code in your question? Hit the 🖉 edit link and then select a block of code and press the </> button to format it. It will preserve indentation and do some syntax highlighting for you. It will be much easier to read too. Thanks.

1 Like
  1. You can either supply the date as a parameter (which depends on how and where you are calling this query) or you can use your DB flavors version of GET_DATE() in the query.

  2. You should configure Table C with a record ID column to act as a key for the row. Usually this would be an integer value set to auto increment, though there are more complex ways to do it.

  3. I really dislike duplicating data. I would tend to just query the data and do the needed aggregation during report creation, there really is no need for Table C.

  4. No transaction groups aren’t what you’re looking for in this case.

1 Like

Strongly disagree. If a column or combination of columns is expected to uniquely identify a row, that should be your primary key. This application calls for one output row per day. The DATE column should be the primary key. As a side-benefit, you get optimized searches for dates and date spans with no extra overhead.

{Ignition's use of auto-increment columns is a legacy of old MySQL's inability to use multi-column primary keys. Escape from the cargo cult.}

2 Likes

I concur with this. If ONLY Ignition is going to be viewing it, just make a Named Query that does the column aggregation from A and B.

If other applications are also looking at this database and looking for this data, you might consider a view that looks at the correct columns of A and B.

The only reason I could see putting both into another table is if the query performance is super slow - but given this is for a monthly report, and not something you're running consistently, it would have to be really slow to justify it imo.

Definitely agree with @pturmel suggestion that you should make the natural key of the table your primary key, however you define that. People are definitely tied to just using surrogate keys of an id. I still do that when I'm not sure what the natural key of a table is but you should rectify that once you learn what the restrictions of the data should be. The database should be in charge of managing data uniqueness and data constraints full stop.

2 Likes

Thank you for the response. As you can probably see, I’m as beginner as they get when it comes to SQL queries so please bear with me. I’m not really sure how to exactly call the 2 queries I included in my writeup. They will not be triggered from a button or some other manual action. The idea is that these Named Queries run daily (automatically at the times I specified) resulting in a single row entry in a monthly report (sounds like Table C in not needed). Values from Table A get written at the end of a given day and values from Table B get written at 1:30am the following day, although they need to be written as previous day values. This (how to run queries on a schedule) is the part that I’m missing.

Thank you. Can please clarify what you mean by the DATE column being the “primary key”?

You tagged this as Ignition 80, so a Gateway timer script etc running with an If current time == trigger time, Then go and execute the query.

If you have the option of upgrading to 8.1, as of a certain version there are scheduled scripts.

To avoid the duplication of data, if you have the report module and that suits your needs, set it up to publish a report once per month, and in the query syntax of that report, do the selection from each day and group by day if you want it split like that. You can of course view the report live at any stage of the month and have it go -30 days from now, etc.

Real quick crash course:

A primary key is key that uniquely identifies a record in a database table. Sort of like a mathematical function (in the strictest definition of a function) where every input maps to exactly one output, and no two inputs will give you the same output.

In the most basic form, one that’s widely used, this would be an id column in a table. So this means if you want record 5, you do SELECT * FROM table WHERE id = 5, and that is the ONLY way you can see record 5 (when only using the id to get stuff). The uniqueness implies other things as well. Assume we have that record with an id of 5. Then if you try INSERT INTO table(id, someColumn) VALUES (5, 'someData') the database will reject this - because then you would have two records with the same primary key - this is not allowed. If it was, then doing SELECT * FROM table WHERE id=5 would give two records and it would not longer UNIQUELY identify the record.

So now with the definition of a primary key out of the way, now its worth noting that you do not need to only use an id column as your primary key. In fact using an id column is often referred to as a surrogate key because although it helps provide a key to the table, that key does not reflect any real world meaning.

The alternative to the surrogate key is what is called a natural key. In your table example, you said you wanted one record per day, so if you did SELECT * FROM tableC WHERE dateColumn='2022-07-20' - this is what should uniquely identify the record. This is called a natural key because there’s actual semantic meaning to it - you only have one record count per day due to some business logic or rational (whereas id’s being a primary key is kind of just learned and repeated as suggested by @pturmel cargo-cult comment).

The reason you’d want to use a natural key in this manner is the data uniqueness and constraint it provides for you. Imagine if your tableC used id as the primary key - you could have multiple records where dateColumn='2022-07-20' and then when you’re selecting - which one is the “right” one? You’d have to do further investigation. If you had dateColumn as your primary key - this would not ever be possible.

Using natural keys then lends itself to then leveraging other database tools and syntax that otherwise make no sense with a surrogate key, like INSERT ... ON DUPLICATE KEY UPDATE. This statement allows you to try to insert into a table and if there is already a dateColumn valued ‘2022-07-20’ then gives you the opportunity to then just update the existing record, kind of a two for one.

I don’t personally use this syntax (though maybe I should start) but you can do the same logic explicitly with python in Ignition as I outline here - Insert into database only if the entry doesn't exist - #2 by bkarabinchak.psi

Duplicate data is probably the most common cause of a database being in an unexpected state and leaving people scratching their heads and getting your primary keys correctly defined is the easiest most powerful way to prevent it.

Highly recommend googling and reading up on: primary key, natural key, surrogate key, and how to make a primary key that consists of multiple columns as in the real world, its usually a few pieces of information that uniquely identify a record.

3 Likes

Yes, I do have the reporting module. Does this mean I should be able to do everything I need to in the Reporting Module, without running separate gateway scripts?

Thanks for a detailed response. Sounds like the INSERT…ON DUPLICATE KEY UPDATE would do exactly what I need as far as updating an existing row of data but I would need a third table (Table C) to insert into. Some of the responses I received indicated Table C was not really needed.

Another thing that Primary Key’s (weather natural or surrogate) provide is an Index for the table. You can have Indexes that are not Primary Key’s, but all Primary Keys are an index. It’s a complicated topic and not really with in the scope of this particular discussion. Just know that generally having an Indexed table (particularly as the size of the table increases) will result in better performance.

Table C isn’t needed. You already have all of the data stored that you need. You know that at 1:30am every day the last data to produce your report will have been inserted into the database. So perhaps at 2am you query the data from Table A and B and do the aggregation. There is no need to insert that aggregation into another table.

Something like:

SELECT DATEPART(Year, [Table A].time_stamp) AS YEAR,
               DATEPART(Month, [Table A].time_stamp) AS MONTH,
               DATEPART(Day, [Table A].time_stamp) AS DAY,
               [Table B].Column_B,
               [Table B].Column_C,
               [Table B].Column_D,
               [Table B].Column_E,
               [Table B].Column_F,
               Min([Table A].Results_PSI) AS Min_PSI,
               Max([Table A].Results_PSI) AS Max_PSI,
               Avg([Table A].Results_PSI) AS Avg_PSI,
               Min([Table A].Results_TMP) AS Min_TMP,
               Max([Table A].Results_TMP) AS Max_TMP,
               Avg([Table A].Results_TMP) AS Avg_TMP
FROM [Table A] INNER JOIN [Table B] ON [Table A].time_stamp = [Table B].time_stamp
WHERE time_stamp >= @startDate AND time_stamp <= @endDate
GROUP BY DATEPART(Year, [Table A].time_stamp) AS YEAR, DATEPART(Month, [Table A].time_stamp) AS MONTH, DATEPART(Day, [Table A].time_stamp) AS DAY
ORDER BY YEAR,MONTH,DAY

Obviously, I’m lacking some knowledge about your data structure so this query will not work as is, but hopefully you can use it as an example. In theory this query combines your two queries into one and provides a WHERE clause that will allow it to aggregate the the data between the paramaters @startDate and @endDate.

Note that there are potentially better/more performant ways to write this query, and also that the @ parameter operator is a MSSQL thing and will not work in Ignition or other DB flavors.

In a named query parameters use the : operator.

WHERE time_stamp >= :startDate AND time_stamp <= :endDate

For queries in the Reporting module it varies depending on the query type.

1 Like

Thanks Irose, this is really helpful. I’m not sure I can query the data simultaneously from both Tables A and B at 2am since Table A values that need aggregation are the previous day’s values. If I query the data at 2am, won’t it only aggregate values that accumulated between 12:00am and 2:00am? Sorry if I’m misunderstanding something obvious here.

No, it will aggregate any data between startDate and endDate. Those dates and times are provided by “you” at query time. All you need to be concerned with is that all of the data you need is available.

Ok. As far as running this query on a (daily) schedule, is there a way to do this in the Reporting module or do I have to set up a gateway timer script? It is my understanding that the scheduling tab of the Reporting module is only used for issuing/generating the report. I’m running Ignition 8.0.

There is a difference. The query I provided is for generating the Aggregation, not for inserting data. The reporting module can run the query that aggregates the data, it can not insert data into the database. That will need to be done in a different place with different queries. A query to insert data into Table A, and A query to insert data into Table B. Those could certainly be run from some type of Gateway Event.

I highly recomend the videos at Inductive University.

1 Like

I already have transaction groups set up that insert data into Table A and Table B. The only caveat is that values from Table B have a time stamp of 1:30am of next day and need to be “logged” under the previous day in the report. So the data is there and available. The query example you provided is to be run (after being modified accordingly) daily at say 2am (as you suggested) and that’s why I asked the question about how to set up this query to run daily at 2am. Sorry for my confusion.

No worries.

You would design the report with this (modified) query as the data source. Using the parameters as appropriate for the type of Query Data source that you use.

Then you schedule the report to run, when you want. Doesn’t have to be 2:00am could be 6:00am. What is important is that you resolve the parameters so that the query returns the data that you need. When the report runs, it will run this query and use the returned data.

Can you elaborate what you mean by “using the parameters as appropriate for the type of Query Data source that you use”?

As far as scheduling the report, it sounds like I would just add a schedule in the Schedule Tab of the Reporting Module (daily, at say 2:00am) and that would run the query every time it executed? So if the Action I selected for my schedule is “New Save File Action”, I would have a separate report file every day? Or do I not need to select an Action and instead add another schedule that executes only once a month, with an Action to save to save file?

When you create a report you have several Tabs that allow for configuration of the report. The Data Tab is where you will configure your data sources for the report.

You can add several different types of queries. Of interest in this case are:

Named Query

If you have a Named Query, then you will configure it to accept parameters from the Named Query interface, and then add it to the data sources for the report in the Data Tab. When you add it, there you will be presented with a Parameters Table and the ability to set the parameters to either a constant value or the value of a data key or parameter.

In order to supply a dynamic value, you will need to add Parameters to the Report in the Data tab as well. You can provide an expression to be used as a default value for each parameter.

SQL Query

If you add a SQL Query to the report then you will use a ? as the place holder for parameters.

For instance the where clause would then look something like

WHERE time_stamp >= ? AND time_stamp <= ?

For each ? a Parameter text field will be added to the interface. Here you can provide an expression that will resolve to an appropriate value for that parameter. In this case you will need to keep track of which parameter does what. Not very difficult for simple queries, but as queries become more complex then it can be come harder to keep track of.

Basic SQL Query

If you add a Basic SQL Query then parameters are provided by wrapping them in { }. You can again add parameters to the report and access them here. You can also use the expression language directly here, which can lead to some confusion when reading it (IMHO). Using Named Report Parameters actually makes it fairly readable.

WHERE time_stamp >= {startDate} AND time_stamp <= {endDate}

Which type you use is up to you and what the rest of your application requires. For instance you could use a Named Query which allows for calling this query outside of a report with different that "standard" date ranges, but would also be used when the report runs.

Officially you don't have to schedule the report at all. If your users are only ever going to access this report from within and Ignition client of some sort, then you may not want to schedule it. If, however, you want to say distribute the report to a group of users over email, then that is what the Scheduling Tab is for.

All of this information and more is readily available through the manual Reporting - Ignition User Manual 8.1 - Ignition Documentation and the videos at IU as I posted earlier. I strongly encourage your to check those out, they are extremely helpful and free (two things not often found now days).