SQL Query Group data in Table 2 by time events in Table 1

beckya · April 18, 2022, 5:38pm

I have 2 tables, and I am trying to group the data in Table 2 using the time between Table 1 events, start time being output 1 and end time being output 0. I could use some SQL query help on how to write the query. I want to be able to sum the outputs in Table 2 to show high level what occurred between the start and end time of each event in Table 1 and also display the details but separate based on Table 1 events.

Table 1

Table 2

dkhayes117 · April 18, 2022, 6:53pm

What SQL flavor are you using? MariaDB/MySQL, Postgres, MSSQL…

pturmel · April 18, 2022, 7:13pm

And what version of DB? MySQL v5 has limited tools for this kind of analysis.

beckya · April 18, 2022, 7:59pm

It’s a SQL Server version 15, MSSQL

beckya · April 20, 2022, 6:02pm

@pturmel or @dkhayes117 would you be able to help now that I provided the information you requested?

lrose · April 20, 2022, 6:54pm

Curious how this will work if you can have 0 output entries with no corresponding 1 entry in Table 1.

dkhayes117 · April 20, 2022, 6:55pm

I don’t know MSSQL, but I fiddled with it a bit in MySQL on www.db-fiddle.com. Take a look
DB Fiddle - SQL Database Playground

Create table table1 (
  output tinyint,
  t_stamp datetime
);

Create table table2 (
  output tinyint,
  t_stamp datetime
);

INSERT INTO table1 (output, t_stamp) VALUES (1, '2022-04-16 03:28:47');
INSERT INTO table1 (output, t_stamp) VALUES (0, '2022-04-16 03:29:43');
INSERT INTO table1 (output, t_stamp) VALUES (1, '2022-04-16 03:30:42');
INSERT INTO table1 (output, t_stamp) VALUES (0, '2022-04-16 03:31:39');
INSERT INTO table1 (output, t_stamp) VALUES (0, '2022-04-17 04:44:00');
INSERT INTO table1 (output, t_stamp) VALUES (0, '2022-04-17 04:44:00');
INSERT INTO table1 (output, t_stamp) VALUES (1, '2022-04-18 05:08:07');

INSERT INTO table2 (output, t_stamp) VALUES (1, '2022-04-16 03:28:48');
INSERT INTO table2 (output, t_stamp) VALUES (1, '2022-04-16 03:28:55');
INSERT INTO table2 (output, t_stamp) VALUES (1, '2022-04-16 03:29:03');
INSERT INTO table2 (output, t_stamp) VALUES (0, '2022-04-16 03:29:06');
INSERT INTO table2 (output, t_stamp) VALUES (1, '2022-04-16 03:29:08');
INSERT INTO table2 (output, t_stamp) VALUES (0, '2022-04-16 03:29:11');
INSERT INTO table2 (output, t_stamp) VALUES (1, '2022-04-16 03:29:13');

select start, end, 
	(select ifnull(sum(t2.output),0) from table2 t2 where t2.t_stamp between start and end) as output
from
  (select t_stamp as start, LEAD(t_stamp,1) over (partition by output) as end
  from table1 
  where output = 1) t1;

pturmel · April 20, 2022, 8:12pm

I presume you want to ignore repeat ones and zeros in the output column? If so, running periods from table1 would be something like this:

Select t_stamp as start_ts,
  lead(t_stamp, 1, current_timestamp) Over (Order by t_stamp) as end_ts
From (
  Select t_stamp, output
  From (
    Select t_stamp,
      output,
      lag(output, 1, 0) Over (Order By t_stamp) as lag_out
    From table1
  ) innerq
  Where output != lag_out
) changesq
Where output = 1
Order By start_ts

And similarly, downtime periods from table2 would be something like this:

Select t_stamp as start_ts,
  lead(t_stamp, 1, current_timestamp) Over (Order by t_stamp) as end_ts
From (
  Select t_stamp, output
  From (
    Select t_stamp,
      output,
      lag(output, 1, 1) Over (Order By t_stamp) as lag_out
    From table2
  ) innerq
  Where output != lag_out
) changesq
Where output = 0
Order By start_ts

You would make a join of these two queries where the time spans overlap.

In each of those, innerq obtains the previous record’s output value and then changesq throws away the repeat rows. The outermost Select then produces time spans and throws away the intervening opposite states.

beckya · April 21, 2022, 4:07pm

Thank you @pturmel, this is working so far. May I ask what type of join would you use to get the time spans to overlap?

pturmel · April 21, 2022, 4:55pm

I would probably use a left join something like this:

Select runs.start_ts as run_start,
  runs.end_ts as run_end,
  downs.start_ts as down_start,
  downs.end_ts as down_end
From (.... nested table1 stuff ....) runs
Left Join (.... nested table2 stuff ....) downs
On runs.start_ts < downs.end_ts And downs.start_ts < runs.end_ts

A run that is uninterrupted will show up as one row with nulls for down_start and down_end. A run with multiple stops (including overlapping) will show a row for each stop. If you want to add up downtime only within each run, you will need to clip down_start/end with a case statement to the limits of run_start/end before you use sum().