Has anyone created a script to find a missing value between the min/max of a dataset? Basically, if a certain column has a min value of 1 and max value of 100, what’s the best way in script to determine if there are any missing values between those with a base of 1.
Nevermind, Google was my friend:
lst = [1, 2, 4, 6, 7, 9, 10]
for x in range(lst[0], lst[-1]+1):
if x not in lst:
print x
Is your list always going to be sorted for you?
Oh, good point. Maybe, maybe not. Yep, the code I stole won’ t work well unless it’s sorted.
Any reason this approach is unwise?
lst = [4, 1, 2, 6, 7, 9, 10]
sortList = sorted(lst)
for x in range(sortList[0], sortList[-1]+1):
if x not in sortList:
print x
Should be fine unless you have an exceptionally large number of rows.
That’s actually preferable to lst.sort()
because it leaves the original alone. The biggest problem you could encounter here is unexpected types in the column. As long as you’re confident this will always receive integers, then you’re fine. If it ever encounters str
representations (“2”), then you’ll encounter a TypeError.
Thanks guys. I’m confident it will always be integers and the size will be fairly small, most is 200 rows or so.
Using min and max might save you a step. Also, you don’t really need to add one to the upper end of the range because we know the max value of the list already exists. Three possible examples:
lst = [4, 1, 2, 6, 7, 8, 10]
print 'For loop'
for x in range(min(lst), max(lst)):
if x not in lst:
print x
print '---'
print 'List comprehension'
print [x for x in range(min(lst), max(lst)) if x not in lst]
print '---'
print 'Using sets'
print set(range(min(lst), max(lst))) - set(lst)
Output:
For loop
3
5
9
---
List comprehension
[3, 5, 9]
---
Using sets
set([3, 5, 9])
Thanks, I took your advice. I actually didn’t need min() as I always want to start at 1 (which I overlooked), so I ended up doing this…
for x in range(1, max(list)):
It works well. Thanks everyone for your input, much appreciated.