SQL query help, find duplicates based on groups

jlandwerlen · November 21, 2022, 7:09pm

I'm having a hard time getting pointed in the right direction for how to write a SQL query. Using the data example, I would like to find devices where the elements are a perfect match, so in this case device 1 and 2. Can someone help me on keywords to help search for something that would work? In my actual table, Device is an Id, but I showed text for readability.

Device	Element
Device1	Element1
Device1	Element2
Device1	Element3
Device1	Element4
Device1	Element5
Device1	Element6
Device2	Element1
Device2	Element2
Device2	Element3
Device2	Element4
Device2	Element5
Device2	Element6
Device3	Element1
Device3	Element2
Device3	Element3
Device3	Element4
Device3	Element5
Device3	Element6
Device3	Element7

zacht · November 21, 2022, 7:25pm

What does the actual table look like? Is the data in the Element column also an id? It sounds like you're looking for a where clause in your SQL query.

Does this work?

SELECT device, element
FROM yourTableHere
WHERE device=element

jlandwerlen · November 21, 2022, 7:36pm

Id = INT (PK,NN, AI)
Element = VARCHAR
Device = INT

Element and device will not match

zacht · November 21, 2022, 7:42pm

What is the data in Element? Is it 'Element1, Element2, etc.' or something else?

Substring might help.

Gaurav.Shitole · November 21, 2022, 7:43pm

SELECT MAX(ID), Device, Element
FROM tblName
Group By ID, Device, Element

nminchin · November 21, 2022, 8:00pm

This won't work as you think. Grouping by all fields is effectively grouping by no fields

@jlandwerlen I don't quite understand what you want to return. If elements exactly match, I.e. if device 1 and 2 both have elements 1-6, devices 1 and 2 are returned. But what if devices 6 and 10 both have elements 1-4, return those as well?

jlandwerlen · November 21, 2022, 9:06pm

Yes, exact match. The closest I have come is this:

select device from elements
where element IN (select element from elements)
group by device
having count(*)=7

This only works if I know the count, obviously.

nminchin · November 21, 2022, 9:28pm

What's your criteria though for filtering which elements you want?

Maybe it would be more obvious if you used actual (or actual but modified if it's private) table data, and a larger array of table rows, and then show a couple examples of the result you'd like to see for different criteria

jlandwerlen · November 21, 2022, 9:46pm

Here is another example. The basic reason is we have a (long) list of tags given to us and I want to see which ones I can group together because they match (think UDT). So, in this example Motor 1 and 3 are a (full) match. Motor 2 is a one-off (it has one more than the others). There isn't any filtering.

Device	Tag
Motor1	Started
Motor1	Stopped
Motor1	Running
Motor2	Started
Motor2	Stopped
Motor2	Running
Motor2	Faulted
Motor3	Started
Motor3	Stopped
Motor3	Running

jlandwerlen · November 21, 2022, 9:52pm

I think I will just do a hybrid, since this doesn't need to be a production query, I will be lazy and simply loop through counts, up to the max count for a group. It's ugly but I'm just going tor functional at this point. Thanks for the help all.

nminchin · November 21, 2022, 9:54pm

Is this just for dev?

I do this kind of thing in Excel with a PIVOT table. There's a PIVOT function in MS SQL as well - what DB are you using? I know postgres doesn't have that function (at least from posts on this forum)

E.g.

Or for your first example:

Then you can either visually look for patterns (sorting by some columns helps), or alternatively and more accurately, binary encode each property and then sum them together if they have a 1. Then all the matching sums will have exactly the same properties

E.g.

The two SUM = 63 devices have exactly the same elements

You could do something similar in MS SQL with PIVOT, but I would go with Excel if it's a once off as then you can save the file for future reference

WillMT10 · November 22, 2022, 3:51am

It's not pretty, but this query may help:

SELECT GROUP_CONCAT(device ORDER BY device SEPARATOR ', ') AS identical_devices,
       device_elements
FROM
	(
		SELECT
			device,
			GROUP_CONCAT(element ORDER BY element SEPARATOR ', ') AS device_elements
		FROM
			your_table
		GROUP BY
			device
	) grouped_devices
GROUP BY device_elements;

Note: Written for MySQL. SQL Server (>=2017) would use STRING_AGG instead of GROUP_CONCAT.