Machine states durations grouped by time in SQL best options

pturmel · March 5, 2024, 1:18pm

Sounds like you've correctly identified the key factors here.

I usually use a table-generating function to produce the sequence of boundary timestamps, then either lag() or lead() on those to get grp_start_ts and grp_end_ts per period in the row together.

Separately (separate subquery), I gather the events as you describe, using lead() to get the ev_start_ts and ev_end_ts for each. Starting with the last event before the overall time span ensures that you have the event that spans that first boundary.

I then left join the groups subquery with the events subquery, capturing every event for a period where it overlaps. Like so:

... ON periods.grp_start_ts < events.ev_end_ts AND events.ev_start_ts < periods.grp_end_ts

This duplicates event rows into all periods where they overlap. This allows you to truncate to the period boundaries to get the right value in the period, without losing the overall event length.

Your outer query would sum the truncated durations, grouping by grp_start_ts.

If you have complex shift scheduling that you want to handle, consider using something like this for the period generation: