Calculating Total Time in Seconds

pturmel · November 22, 2021, 7:59pm

You aren't doing a sum() or count() or anything else that needs grouping, so ditch the group by clause.

zacharyw.larson · November 22, 2021, 8:00pm

ah, so when I do the sum, then I can group?

select myCode, Sum(Seconds)

from(
select myCode, datediff(second,t_stamp,lead) as seconds

from(
SELECT  myTable_ndx , myCode,
		t_stamp,
	lead(t_stamp) over (order by t_stamp)  AS lead
	from  myTable 	
)subq

)subqtwo

group by myCode
order by sum(seconds) desc

ITS ALIVE

I just need to go to a meeting and come back and order by date

omg thanks so much

And probably add the truncating for dates like you had

zacharyw.larson · November 23, 2021, 2:20pm


select stopCode, sum(seconds)

from(
select stopCode, datediff(second,t_stamp,lead) as seconds

FROM (
	SELECT  mytable_ndx , stopCode,
		t_stamp,
	lead(t_stamp) over (order by t_stamp)  AS lead
	from  mytable 
	) subq
where t_stamp >= :beginTS and ( lead < :endLeadTS)
) subqtwo

group by stopCode
order by sum(seconds) desc

This is working
The date parameters seem to working correctly

I am not sure it is very efficient as I could not seem to get the where clause in that primary query

edit
also, I need to look more at pturmel’s post because his code was capturing accurate conditions at the beginning and ending of the time range.

Can you do a case in the where clause?
how does current_timestamp work in that post?

pturmel · November 23, 2021, 2:40pm

They are excluding time spans that cross the boundaries. (You have reversed the comparisons compared to my recommendation.) Your count and time total will be off.
Using a window function's value in the inner where clause will bog this query down when your table grows large. But the query with the lead() function needs both boundaries on t_stamp to go fast, which is why I used the innermost queries to determine the bounds.

The layering in my example code was not accidental.

pturmel · November 23, 2021, 2:43pm

It provides a secondary clip for the situation where end_ts is beyond now, like might happen if you set the bounds to cover today. The final entry in your table is still "in progress", so its endpoint should be now, not midnight tonight.

jlandwerlen · November 23, 2021, 3:50pm

@zacharyw.larson I have gotten into the habit where I test my queries with a large table, say 4-5 million rows, to test efficiency. This will typically reveal areas where I can optimize now, instead of months/years down the road.

Might be useful for you to try as well.

zacharyw.larson · November 24, 2021, 4:32pm

select myCode, sum(datediff(second,
	(CASE WHEN  t_stamp< :beginTS THEN :beginTS Else t_stamp END),
	(CASE WHEN lead> :endLead THEN :endLead else lead END)))as seconds

FROM (
	SELECT  myCode,
		t_stamp,
	lead(t_stamp) over (order by t_stamp)  AS lead
	from  myTable 
	) subq
where lead >= :beginTS and ( t_stamp  < :endLead )


group by myCode
order by sum(datediff(second,t_stamp,lead)) desc

It isn’t throwing an error.
It looks right.

I think it captures the boundaries precisely at beginTS and endLead which are parameters for the interval the query is to look at.

pturmel · November 24, 2021, 4:38pm

Yes, that should get the right answers. However, with no where clause in the inner query, it (the inner query) will always search the entire table. That will crush you when the table grows to any significant length.

Edit: Hmm. It will also get the wrong answer if there isn’t at least one row in the table after :endLead.

zacharyw.larson · November 24, 2021, 4:48pm

I am struggling to figure out how to put

where lead >= :beginTS and ( t_stamp  < :endLead )

inside the inner query

Might have fixed when there isn’t a row after :endLead

(CASE WHEN lead> :endLead or lead is null  THEN :endLead else lead END)))
as seconds

removed null codes

where lead >= :beginTS and ( t_stamp  < :endLead ) and myCode is not null

pturmel · November 24, 2021, 5:13pm

You don't. You need a different set of comparisons. Look at my example more closely.

pturmel · November 24, 2021, 5:14pm

It isn't a null in myCode that is the problem, it is a null in lead when there's no row after.

zacharyw.larson · November 24, 2021, 5:16pm

I had problems in both

I am looking at your example lots

It has shown me a lot, and I will keep looking at it as you have recommended.
Thanks very much for your help.

edit
well now it is nearly identical to what you had posted
not sure why it didn’t work when I first tried it

select stopCode, sum(datediff(second,
	(CASE WHEN  t_stamp< :beginTS THEN :beginTS Else t_stamp END),
	(CASE WHEN lead> :endLead or lead is null  THEN :endLead else lead END)))
    as seconds
FROM (	SELECT  stopCode, t_stamp,
	        coalesce(lead(t_stamp) OVER (ORDER BY t_stamp),
			CASE WHEN current_timestamp < :endLead THEN 
            current_timestamp ELSE :endLead END) AS lead
	from  mytable 
	WHERE t_stamp >= coalesce((
		SELECT TOP 1 t_stamp
		FROM mytable
		WHERE t_stamp < :beginTS
		ORDER BY t_stamp DESC), '1900-01-01')
	AND t_stamp <= coalesce((
		SELECT TOP 1 t_stamp
		FROM mytable
		WHERE t_stamp > :endLead
		ORDER BY t_stamp), '2100-01-01') ) subq
	where lead >= :beginTS and ( t_stamp  < :endLead ) and stopCode is not null
group by stopCode
order by sum(datediff(second,t_stamp,lead)) desc

pturmel · November 24, 2021, 5:39pm

Let me break this part down for you:

	WHERE t_stamp >= coalesce((
		SELECT TOP 1 t_stamp
		FROM myTable
		WHERE t_stamp < :begin_ts
		ORDER BY t_stamp DESC), '1900-01-01')
	  AND t_stamp <= coalesce((
		SELECT TOP 1 t_stamp
		FROM myTable
		WHERE t_stamp > :end_ts
		ORDER BY t_stamp), '2100-01-01')

For lead() to work as we desire, it always needs two rows, the one we are “on”, and the next one. On the last row, lead is null because there’s no row after it. Which means, to compute durations at the boundaries of our desired time span, we have to include at least one row outside the time span on either end.
We can find the last row before our timespan like so:

SELECT TOP 1 t_stamp FROM myTable WHERE t_stamp < :begin_ts ORDER BY t_stamp DESC

We can find the first row after our timespan like so:

SELECT TOP 1 t_stamp FROM myTable WHERE t_stamp > :end_ts ORDER BY t_stamp

We are allowed to use ORDER BY in these subqueries because we are using TOP 1 to get just the first result. However, these two queries can return no row at all, which would be a null. So these are wrapped in the coalesce() function, supplying a far past date or a far future date when necessary.

Using these timestamps in the inner query gives that query all the rows in the desired timespan, plus one row just outside the timespan on either end. That gives the lead() function something to work with at the boundaries.

mcgheeiv · November 24, 2021, 8:38pm

You might want to consider using a table with two dates, let’s call them date_beg and date_end, in addition to all the other data you want with the event. When an event starts insert a record into the table, filling in date_beg but leaving date_end null. When the event ends, update the record, putting the end time into date_end. The query to determine the duration becomes trivial at this point.

pturmel · November 24, 2021, 8:48pm

These events don't end. Your approach would require recording each timestamp in two places. Not good. Competent databases can do intervals with lead() or lag() with essentially zero cost, guaranteeing consistent results.

zacharyw.larson · December 2, 2021, 7:04pm

When you said easily generated, did you mean there is an easier way to get the time between timestamps and sum them up?

Or did you mean a query similar to discussed here used with
lead(t_stamp) over (partition by machine order by t_stamp)
to omit other machines if the table has many machines

pturmel · December 2, 2021, 7:30pm

I meant using lead(). If you are using a query to get multiple machine’s results at once, you would use partition by machine in the OVER clause so that lead() looks for the next timestamp for the matching machine. If you use a WHERE clause in the query to only return results for a single machine, you can omit that part of the over clause.

I didn’t mean “easy” from your point of view, but from the database point of view.

zacharyw.larson · December 3, 2021, 6:28pm

select myCode as Fault_Code,count(myCode) as Stop_Occurences, 
       sum(datediff(second,
	(CASE WHEN  t_stamp< :beginTS THEN :beginTS Else t_stamp END),
	(CASE WHEN lead> :endLead or lead is null  THEN :endLead else lead END)))
             as Total_Seconds
FROM ( SELECT  myCode, t_stamp,
	coalesce(lead(t_stamp) OVER (ORDER BY t_stamp),
			CASE WHEN current_timestamp < :endLead THEN current_timestamp
                 ELSE :endLead END) AS lead
	from  myTable 
	WHERE t_stamp >= coalesce((
		SELECT TOP 1 t_stamp
		FROM myTable
		WHERE t_stamp < :beginTS
		ORDER BY t_stamp DESC), '1900-01-01')
	AND t_stamp <= coalesce((
		SELECT TOP 1 t_stamp
		FROM myTable
		WHERE t_stamp > :endLead
		ORDER BY t_stamp), '2100-01-01')	) subq
	where lead >= :beginTS and ( t_stamp  < :endLead ) and myCode is not null 
group by myCode
order by sum(datediff(second,t_stamp,lead)) desc

sample results

How do I append the table to show the reason from a dictionary.
I have a dictionary with all the code matched to text for the reason.
For example in project library I have a script dictionary:

machineCode_Map ={
    42 : "We lost the towel"
    66: "The drones betrayed us"
}
def get_machineCode(v):
    return machineCode_Map[v]

How can I append to the table to show the reason text next to the codes?

I know one way is for me to make a table of my dictionary, then join.
I have many machines though, and I was going to make this work for multiple machines.
I have made a transaction group already with Machine_ID columns.

Therefore, I am hoping on the perspective table component I can somehow attached the dictionary values based on the code in that row. Maybe this is unlikely.

pturmel · December 3, 2021, 7:06pm

That's what I would do.

zacharyw.larson · December 3, 2021, 7:11pm

What if you had 60 machines with unique dictionaries?

where lead >= :beginTS and ( t_stamp  < :endLead ) and Code is not null and ( :machine=-1 )

I got this working to show me all the machines codes.

I was thinking like

where lead >= :beginTS and ( t_stamp  < :endLead ) and Code is not null 
and ( :machine=-1 or Line=Case when :machine=1 then 'blueMachine' else null end)