SQL query help, find duplicates based on groups

I'm having a hard time getting pointed in the right direction for how to write a SQL query. Using the data example, I would like to find devices where the elements are a perfect match, so in this case device 1 and 2. Can someone help me on keywords to help search for something that would work? In my actual table, Device is an Id, but I showed text for readability.

Device Element
Device1 Element1
Device1 Element2
Device1 Element3
Device1 Element4
Device1 Element5
Device1 Element6
Device2 Element1
Device2 Element2
Device2 Element3
Device2 Element4
Device2 Element5
Device2 Element6
Device3 Element1
Device3 Element2
Device3 Element3
Device3 Element4
Device3 Element5
Device3 Element6
Device3 Element7

What does the actual table look like? Is the data in the Element column also an id? It sounds like you're looking for a where clause in your SQL query.

Does this work?

SELECT device, element
FROM yourTableHere
WHERE device=element

Id = INT (PK,NN, AI)
Element = VARCHAR
Device = INT

Element and device will not match

What is the data in Element? Is it 'Element1, Element2, etc.' or something else?

Substring might help.

SELECT MAX(ID), Device, Element
FROM tblName
Group By ID, Device, Element

This won't work as you think. Grouping by all fields is effectively grouping by no fields

@jlandwerlen I don't quite understand what you want to return. If elements exactly match, I.e. if device 1 and 2 both have elements 1-6, devices 1 and 2 are returned. But what if devices 6 and 10 both have elements 1-4, return those as well?

Yes, exact match. The closest I have come is this:

select device from elements
where element IN (select element from elements)
group by device
having count(*)=7

This only works if I know the count, obviously.

What's your criteria though for filtering which elements you want?

Maybe it would be more obvious if you used actual (or actual but modified if it's private) table data, and a larger array of table rows, and then show a couple examples of the result you'd like to see for different criteria

Here is another example. The basic reason is we have a (long) list of tags given to us and I want to see which ones I can group together because they match (think UDT). So, in this example Motor 1 and 3 are a (full) match. Motor 2 is a one-off (it has one more than the others). There isn't any filtering.

Device Tag
Motor1 Started
Motor1 Stopped
Motor1 Running
Motor2 Started
Motor2 Stopped
Motor2 Running
Motor2 Faulted
Motor3 Started
Motor3 Stopped
Motor3 Running

I think I will just do a hybrid, since this doesn't need to be a production query, I will be lazy and simply loop through counts, up to the max count for a group. It's ugly but I'm just going tor functional at this point. Thanks for the help all.

Is this just for dev?

I do this kind of thing in Excel with a PIVOT table. There's a PIVOT function in MS SQL as well - what DB are you using? I know postgres doesn't have that function (at least from posts on this forum)

E.g.

Or for your first example:

Then you can either visually look for patterns (sorting by some columns helps), or alternatively and more accurately, binary encode each property and then sum them together if they have a 1. Then all the matching sums will have exactly the same properties

E.g.

The two SUM = 63 devices have exactly the same elements

You could do something similar in MS SQL with PIVOT, but I would go with Excel if it's a once off as then you can save the file for future reference

It's not pretty, but this query may help:

SELECT GROUP_CONCAT(device ORDER BY device SEPARATOR ', ') AS identical_devices,
       device_elements
FROM
	(
		SELECT
			device,
			GROUP_CONCAT(element ORDER BY element SEPARATOR ', ') AS device_elements
		FROM
			your_table
		GROUP BY
			device
	) grouped_devices
GROUP BY device_elements;

Note: Written for MySQL. SQL Server (>=2017) would use STRING_AGG instead of GROUP_CONCAT.

3 Likes