SQL help, organizing some user input data with automatic data over shifts

data sample:

red	    Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	0	0
blue	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	65	7
blue	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	65	8
green	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	11	1
purple	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	11	3
pruple	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	11	5
orange	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	12	8
orange	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	12	8
teal	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	14	9
teal	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	14	5
black	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	15	9
black	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	15	1
magenta	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	18	7
brown	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	19	9
gray	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	19	1
gray	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	19	1

I need only one row for each color in my query summed

Ideal output:

Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	sumOfColumn, sumOfNextColumn

My query is complex

Has join, filtering of words, and lots of summing with math
I think if I could see how to sort specific colors to sum correctly, I can solve my query.

I think I need to use rank.

updated 2022 04 11 1650

Is Select Distinct what you want?

1 Like

I will revise my top comment.

I should capture the additional need for summing

I am able to apply a rank to the color column.
I can even get it to assign 1 to the highest value in that last column.

just now I am worried about how I join and union so much, then also subq with rank will be a pain

not sure if I can use distinct, will try it out

Sounds like you are building quite the complicated mouse trap. It may be worth taking another look at how you are getting from a to be. you can have separate queries and use scripting to combine datasets. That may not be easier, but might allow you to untangle your SQL.

Edit:
Also, if you want more specific answers post your code.

I avoid posting company code.

I want to learn how to receive that input set, and then get that output.
Then I will try to adapt it to the complex version.

This is why my query is complex:
I get some standards from manual entry.
I get some automatic data from transaction groups.

How do we know whatā€™s redundant? Is it when all columns are the same, like the orange rows? Or is the sum already in here and youā€™re looking for the max values?

2 Likes

highest value in the time period grouped by machine/color

or said another way, I need to sum the maxes for each color during that time period


Right now, when I sum, my data is wrong because like my orange will have 5 entries in the sum of all machines over the time period. Said production was through the roof high haha.

My current solution is to apply a rank to the data and use a subquery.

I was looking at this post;

One guy says the time test on the cross apply is very fast and I didnā€™t see a subquery
I never used cross apply though, and when I tried it I got a bunch of errors.

Another got some votes using has ties, but I didnā€™t get to researching it either.

I think I will be researching them both as time permits. The distinct function that josborn mentioned might work really well, but I have to research it more too.

Using rank for now, but the query is not finished. I clean up the data, then do the operations I was doing on the subquery. I donā€™t think it is efficient.


My query might have more issues though.

I kind of do a query for each time period and union it.
I say like:

SELECT 
DATEADD(hh,23,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-1),0)) as 'Start',
DATEADD(hh,7,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-0),0)) as 'End',

Then do a bunch of math selects.
some are like sum( x/y), however because x or y might be zero:
I have some isnull(), nullif(), format (,ā€˜pā€™)

Then I also do a join because some values came from manual entry an some from transactions.
so it is like select, join, union, repeat for a few days of 8hr shifts.

10 rows in under 0.5s before implementing rank
I might make a separate thread for trying to clean up the entry side.

Is there a better way I can write this division with summing?

isnull(sum(table.count),0)   as count,
isnull(sum(table.target),0)   as target,
format( isnull(sum(cast(isnull(table.quantity,0) as float))/nullif(sum(isnull(tableB.target,0)),0),0)  ,'P') as this,

Looks fine to me, are you casting table.quantity as a float because target is a float?

@josborn
In order to get the decimals point values on the division for showing percentages.
Both are integers.

Can I do this:

select * from (
subquery here ) t

union

select * from t

What is your natural key of the table?

You can run something like

ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);

and this would drop all redundant rows (Based on the unique constraint you define).

You can play with this a bit, do a
INSERT INTO newTemporaryTable SELECT * FROM whateverYourRealTableIs

and run ALTER IGNORE TABLE newTemporaryTable ADD UNIQUE INDEX myindex (A, B, C, D);

so you can avoid touching the ā€œrealā€ data, and have this newTemporaryTable (or whatever you call it) be a version of the real data, without duplicates.

I donā€™t mean to keep on harping about UNIQUE constraints to you lol. But they keep being a very viable option to what you need.

3 Likes

I might also add that anytime you are considering modifying data, particularly if that means dropping rows of data, you should consider using Transactions, that way if something happens, you can rollback to the original state.

2 Likes

@bkarabinchak.psi
I appreciate your advice. You have taught me benefits of indexes.
I have not put in indexes because I have not had time to perform the testing on them that I wish to.

@lrose
I appreciate the help.
I have questions about what you said, but I havenā€™t time to inquire at the moment.

I understand. Sometimes too you inherit a index-less database and you canā€™t exactly re-work the entire systme. So my other suggestion would be running a GROUP BY on the natural key of your table use that to get the right idā€™s, then join onto the full table of only the appropriate IDā€™s. You can check out how this is done here SQL - Select unique rows from a group of results - Stack Overflow

@bkarabinchak.psi
I have a giant word document of my composing, revising, adding, removing, and above that I have a list of items I want to do. I have added researching both your index posts to that.
Above that is a priority list from the people above me. I donā€™t mean to talk past. I mean that I have put this on the list too. You keep giving me cool and interesting ways to improve things. I need to catch up, but I also need to meet deadlines haha.


I could use some answers to this question.

The topic is offtopic as I had made a title, and then through a process of learning and discovery, the things I needed help with kind of changed for this query.

In the same query, can you start to reference subqueries as just the subquery name?

Can I do this:

select * from (
subquery here ) t

union

select * from t

I will talk about if I should do this next. Because I have question about timing and grouping.

I donā€™t believe

select * from (
subquery here ) t

union

select * from t

would work - as the second query in your UNION is going to be looking directly at the database for tables named t. Treat each query in a union as a completely separate object. Iā€™m also unsure what you are trying to accomplish with such a query - to double the data from the subquery?

If you look at the second SO link I posted thereā€™s a decent example as the top answer -

select  r.registration, r.recent, t.id, t.unittype
from    ( 
    select  registration, max([date]) recent
    from   someTable
    group by 
        registration
    ) r
left outer join someTable t
on  r.recent = t.[date]
and r.registration = t.registration 

Where the inner sub query with the GROUP BY is your chance to get the idā€™s (or some other matching column) of the right rows you want to keep, and then you join that to the the full table, so you only extract the appropriate rows.

The last answer is probably a bit less performant but perhaps easier to read as you can also do something like this

select registration,ID,date,unittype
from your_table
where (registration, date) IN (select registration,max(date)
                            from your_table
                            group by registration)  

The key here is using GROUP BY to find the rows you want to keep and then using that information to grab those rows from the full table.

I just reread your original question and I am a bit confused as it seems like you might just want a group by/sum?

I donā€™t know what the columns are but for these two rows -

blue	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	65	7
blue	Fri Apr 08 07:00:00 EDT 2022	Fri Apr 08 15:00:00 EDT 2022	65	8

Do you want to sum both of them? Or only grab the last one with the 65/8?

Thanks

I will work on it and post back with questions or findings.

1 Like

about half a second for 10 rows

If you can help me clean it up, i appreciate it.
it seems to be working fine though for now if not
Thanks for help

is similar to this:

select 
DATEADD(hh,15,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-3),0)) as 'Begin',
DATEADD(hh,23,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-3),0)) as 'Conclude',
'2 Three Days Ago' as Shift,
Sum(qty1) as qty1, sum(qty2) as qty2,
format(isnull((cast(sum(qty1) as float) - sum(qty2))  / nullif( sum(qty2),0) ,0) ,'p') as thisNumber,
from 
(SELECT  particularMachine,
Row_Number() over (partition by  particularMachine order by Hours desc) as particularMachine2,
isnull(autoTable.qty1Tools,0) as qty1, 
isnull(autoTable.qty2Tools,0) as qty2,
sum(Hours) as Hours
FROM autoTable left join manualTable on 
autoTable.Line=manualTable.particularMachine,
and Format(autoTable.t_stamp, 'MM/dd/yy')= Format(manualTable.RecordedDate, 'MM/dd/yy')  
WHERE autoTable.Line like '%Archer%' and manualTable.Hours>0 
and autoTable.t_stamp BETWEEN DATEADD(hh,15,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-3),0))
							AND DATEADD(hh,23,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-3),0))
group by autoTable.qty1Tools, autoTable.qty2Tools,manualTable.Hours,particularMachine
)t
where particularMachine2<2																																			

Union

select 
DATEADD(hh,7,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-3),0)) as 'Begin',
DATEADD(hh,15,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-3),0)) as 'Conclude',
'1 Three Days Ago' as Shift,
Sum(qty1) as qty1, sum(qty2) as qty2,
format(isnull((cast(sum(qty1) as float) - sum(qty2))  / nullif( sum(qty2),0) ,0) ,'p') as thisNumber,
from 
(SELECT  particularMachine,
Row_Number() over (partition by  particularMachine order by Hours desc) as particularMachine2,
isnull(autoTable.qty1Tools,0) as qty1, 
isnull(autoTable.qty2Tools,0) as qty2,
sum(Hours) as Hours
FROM autoTable left join manualTable on 
autoTable.Line=manualTable.particularMachine
and Format(autoTable.t_stamp, 'MM/dd/yy')= Format(manualTable.RecordedDate, 'MM/dd/yy')  
WHERE autoTable.Line like '%Archer%' and manualTable.Hours>0 
and autoTable.t_stamp BETWEEN DATEADD(hh,7,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-3),0))
							AND DATEADD(hh,15,DATEADD(dd,DATEDIFF(dd,0,GETDATE()-3),0))
group by autoTable.qty1Tools, autoTable.qty2Tools,manualTable.Hours,particularMachine
)t
where particularMachine2<2



order by Begin desc

What database type are you using? Are you writing this in your databases workbench?

Oh wait you said clean it up as in the syntax? It runs fast enough for you?