Dataset find missing value

jlandwerlen · September 3, 2020, 8:09pm

Has anyone created a script to find a missing value between the min/max of a dataset? Basically, if a certain column has a min value of 1 and max value of 100, what’s the best way in script to determine if there are any missing values between those with a base of 1.

jlandwerlen · September 3, 2020, 8:13pm

Nevermind, Google was my friend:

lst = [1, 2, 4, 6, 7, 9, 10] 

for x in range(lst[0], lst[-1]+1):  
	if x not in lst:
		print x

Kevin.Herron · September 3, 2020, 8:20pm

Is your list always going to be sorted for you?

jlandwerlen · September 3, 2020, 8:25pm

Oh, good point. Maybe, maybe not. Yep, the code I stole won’ t work well unless it’s sorted.

jlandwerlen · September 3, 2020, 8:34pm

Any reason this approach is unwise?

lst = [4, 1, 2, 6, 7, 9, 10] 
sortList = sorted(lst)

for x in range(sortList[0], sortList[-1]+1):  
	if x not in sortList:
		print x

Kevin.Herron · September 3, 2020, 8:45pm

Should be fine unless you have an exceptionally large number of rows.

cmallonee · September 3, 2020, 8:47pm

That’s actually preferable to lst.sort() because it leaves the original alone. The biggest problem you could encounter here is unexpected types in the column. As long as you’re confident this will always receive integers, then you’re fine. If it ever encounters str representations (“2”), then you’ll encounter a TypeError.

jlandwerlen · September 3, 2020, 9:05pm

Thanks guys. I’m confident it will always be integers and the size will be fairly small, most is 200 rows or so.

JordanCClark · September 4, 2020, 10:06am

Using min and max might save you a step. Also, you don’t really need to add one to the upper end of the range because we know the max value of the list already exists. Three possible examples:

lst = [4, 1, 2, 6, 7, 8, 10] 

print 'For loop'
for x in range(min(lst), max(lst)):  
	if x not in lst:
		print x

print '---'

print 'List comprehension'
print [x for x in range(min(lst), max(lst)) if x not in lst]

print '---'

print 'Using sets'
print set(range(min(lst), max(lst))) - set(lst)

Output:

For loop
3
5
9
---
List comprehension
[3, 5, 9]
---
Using sets
set([3, 5, 9])

jlandwerlen · September 4, 2020, 9:26pm

Thanks, I took your advice. I actually didn’t need min() as I always want to start at 1 (which I overlooked), so I ended up doing this…

for x in range(1, max(list)):

It works well. Thanks everyone for your input, much appreciated.