Machine states durations grouped by time in SQL best options

I have a table of machine state change events.

Initially I do two things I believe to be right for sure.
I get all the events in the range.
I get one event prior to the range.
I set the prior event tstamp to the range start.


Then it gets murky for me.

Problem:
States are active across the grouped time ranges.
If I truncate durations to the end of (shift, day, week, etc) then I won't be able to account for all the time.

Option 1:
I expect my best option is to create tstamps on the grouping edges.
What is a good way to do that, or is there a better solution?

Option 2:
Maybe I can get the previous state and tstamp, then set that tstamp to the start of the range in each grouping?

I thought I had a solution, but it did not work.

I think I need some way where I can detect a state overlaps a boundary.
Then I need to split it into two new rows.

Sounds like you've correctly identified the key factors here.

I usually use a table-generating function to produce the sequence of boundary timestamps, then either lag() or lead() on those to get grp_start_ts and grp_end_ts per period in the row together.

Separately (separate subquery), I gather the events as you describe, using lead() to get the ev_start_ts and ev_end_ts for each. Starting with the last event before the overall time span ensures that you have the event that spans that first boundary.

I then left join the groups subquery with the events subquery, capturing every event for a period where it overlaps. Like so:

... ON periods.grp_start_ts < events.ev_end_ts AND events.ev_start_ts < periods.grp_end_ts

This duplicates event rows into all periods where they overlap. This allows you to truncate to the period boundaries to get the right value in the period, without losing the overall event length.

Your outer query would sum the truncated durations, grouping by grp_start_ts.

If you have complex shift scheduling that you want to handle, consider using something like this for the period generation:

1 Like

Zach, I think you have the right idea in your problem statement. I had the same issue when I was trying to create this state chart and then sum up the "up time" as displayed on the left hand side.
The amount of data was not huge in my case (depending on typical time range selected) so I chose to: first create a SQL query like you mentioned that retreives the state values before and after the selected date range, then itterate over dataset using (Python) and update the timestamps of any records that are outside of the selected date range to match the selected start/end date depending on if record timestamp< start date or record timestamp> end date. Also, while I itterate, I sum the timestamps (total minutes I beleive) where the state of the machine is "up" or "running" and divide by total total selected date range minutes to get up time %. I could have done more in SQL but since I am using a Ignition historian tag for my states, it gets messy quick with the partitioned tables.

2 Likes

I appreciate learning both those techniques.

I tried another way, and it did not work at all.
Tried to derive the boundaries from existing tstamps, but it will not work as you can have multiple boundaries between the state changes, like if a machine is down a day.

I had some issues with using a table generator.
I did my best to adapt the idea of the table generator.



select 
  Line 
, tstamp
, case 	when tstamp < dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) and leadTstamp > dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) then dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0))
		when tstamp < dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) and leadTstamp > dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) then dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0))
		else leadTstamp end as leadTstamp
, State


from allLeadsSQ 

union 

select 
  Line  
, case 	when tstamp < dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) and leadTstamp > dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) then dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0))
		when tstamp < dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) and leadTstamp > dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) then dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0))
		else leadTstamp end as  tstamp
, leadTstamp
, State


from allLeadsSQ 
where tstamp < dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) and leadTstamp > dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) 

or( tstamp < dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) and leadTstamp > dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)))


I adapted thinking this:

  1. Get the rangestart, previous state
  2. Get the tstamps and states
  3. Fetch all tstamps and truncate preboundaries
    unioned with post boundaries only

I think this is a variation of what Phil is doing, where I will use a union instead of a join.

This technique does not work.
Now I just have to figure out how to write the below such that tstamp is in UTC, and the 7 and 19 are in eastern easily. Maybe I can just subtract 5 hours to make sure the tstamp based shift end is the right day.

case 	when tstamp < dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) and leadTstamp > dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) then dateadd(hh,  7, DATEADD(dd,DATEDIFF(dd,0,tstamp),0))
		when tstamp < dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) and leadTstamp > dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0)) then dateadd(hh, 19, DATEADD(dd,DATEDIFF(dd,0,tstamp),0))
		else leadTstamp end as  tstamp

Maybe for EST:

CONVERT(datetime, SWITCHOFFSET(	DATEADD(hh,  7,DATEADD(dd,DATEDIFF(dd,0,dateadd(hh,-5,tstamp)),0)) at time zone 'Eastern Standard Time'
, DATEPART(TZOFFSET,			DATEADD(hh,  7,DATEADD(dd,DATEDIFF(dd,0,dateadd(hh,-5,tstamp)),0)) at time zone 'Eastern Standard Time'
AT TIME ZONE 'UTC')))

I think that means that whatever time it is, the current day at 7am eastern is checked.
Or that tomorrow's 7am might also be checked.
I think my alternative is to convert the UTC tstamp to EST, then move the time to the shift end, and convert back.

Why are you not using a database's timezone-aware datetime type (datetime2 for MS SQL Server, timestamptz for PostgreSQL, timestamp for MariaDB) ? Then there's no conversion required on the query arguments passed as JDBC parameters.

1 Like

That is a great question.
Here is the answer as best that I understand it.

The datetime type was not set datetime2 from the start on the gateway.
The gateway is setup EST.
The table for states has timestamps in UTC.

I don't know if we can change the datetime type to datetime2 now.
I think it will break things.

I don't know enough about setting the datetime type to datetime2 to know for sure that I understand this.

I know that when I checked to see what the datetime type was set to currently, I think it was set to datetime. I think it has been that way a long time. I don't know how many things switching to datetime2 would impact.

I think this applied to me:

Or this one:

I think one way or another, I am kind of stuck.
I don't know for sure if nothing will break.

I do know that the Gateway has been running a long time without the datetime type being switched to datetime2.
I don't fully understand the implications of the change.

Maybe you can help me understand it more and if I am stuck or if it is a simple change. I think it sounds like the change would break things.

Am I stuck?

Consider adding a new column of type datetime2.

Then, from a connection in EST, do UPDATE the_table SET newDT2 = oldDT.

That should replicate the timestamps into the timezone-aware columns with the correct timezone.

Do some test queries using newDT2 to see if it behaves as desired. Exercise some inserts, too.

Or do all of your testing in a separate table, possibly with just a subset of the data. Exercise any ALTER TABLE commands to see that it works as you expect.

You don't have to just study the reference manuals. Test.

If it all works, start migrating by making all inserts populate both timestamp columns. Then making all queries reference just the new column. Then you can stop inserting to the old column. Then you can drop the old column.

2 Likes

How do I demo the improvement?

Is there a way that I can write a query that demonstrates the utility of the datetime2 type?

Or is it only functional if the db is configured with datetime type set to datetime2?

If you execute the SQL in your DB administrator application you should get a time to execute on the statusbar or the on-screen event logger. Compare the two.

Then create an index on the new column and check the execution time again.

How do create a table generating function to get the sequence of boundary tstamps?

I linked a PostgreSQL example (in my very first comment). I don't know how to do it in MS SQL Server off the top of my head, but it would likely be a recursive CTE.

1 Like

I was pretty confused about this. Took me a while to figure how to select, what to group, and how the join would work.
On left join though, I got a row in my events to repeat for each range from my days table.

I didn't do a table generating function yet. I did not want to use recursion.
It would be better to have such a table.

I joined the production days instead for now, which is not as good.

select 
  CONVERT(datetime, SWITCHOFFSET(	DATEADD(hh,  7,DATEADD(dd,DATEDIFF(dd,0,ReportDate),0))  at time zone 'Eastern Standard Time'
			, DATEPART(TZOFFSET,	DATEADD(hh,  7,DATEADD(dd,DATEDIFF(dd,0,ReportDate),0))  at time zone 'Eastern Standard Time' AT TIME ZONE 'UTC')))
			as startPeriodUTC
   
, CONVERT(datetime, SWITCHOFFSET(	DATEADD(hh, 19,DATEADD(dd,DATEDIFF(dd,0,ReportDate),0)) at time zone 'Eastern Standard Time'
			, DATEPART(TZOFFSET,	DATEADD(hh, 19,DATEADD(dd,DATEDIFF(dd,0,ReportDate),0))  at time zone 'Eastern Standard Time' AT TIME ZONE 'UTC')))
			as endPeriodUTC 

I also union-ed with between 19 and 31 hours for the second shift of two.


When I got the startPeriodUTC and endPeriodUTC columns in the row with my tstamps and leads though, I was able to simply truncate them like you said. It is really nice.

The convert, switchoffset() version was not as fast as the dateadd datepart at time zone version.

, isnull(lead(tstamp)over(partition by LineID order by tstamp),
	dateadd(hh,  7+DATEPART(HOUR, DATEADD(dd,DATEDIFF(dd,0,:dateEnd  ),0) AT TIME ZONE 'Eastern Standard Time'  AT TIME ZONE 'UTC')
		,  DATEADD(dd,DATEDIFF(dd,0,:dateEnd),0))
) as leadTstamp

Would option 2 perform better for me?

Option 1
Joining the shift boundaries and truncating tstamps after I have tstamps and leads

Option 2
Union a dummy state value at shift boundaries early

Or it is just too difficult to fetch the appropriate prior state value when multiple dummy values are in the state column in a row?