Jython filtering profiling

Just getting back to this, I was curious how well any would perform compared with a for..if, and for..if won pants down. Although I was a little surprised at how slow both were :confused:

from time import time

a = [0]
a.extend([0]*10000000)
a.append(1)

print 'Using any:'
for i in range(10):
	t = time()
	p = any([b == 1 for b in a])
	print time()-t

print ''
print 'Using for..if:'
for i in range(10):
	t = time()
	for c in a:
		if c == 1:
			break
	print time()-t

Results:

Using any:
3.72000002861
3.90000009537
4.01099991798
3.86999988556
3.65700006485
3.8029999733
4.21200013161
3.59899997711
3.67499995232
3.9960000515

Using for…if:
2.89499998093
2.45299983025
2.5640001297
2.35199999809
2.58299994469
3.492000103
2.80900001526
2.52099990845
2.44099998474
2.64800000191

Try dropping the square brackets. any can take a generator expression directly. With the square brackets, you're collecting to a list, then evaluating any; no chance to short-circuit.

Without square brackets they're pretty close, though for still wins:

Using any:
0.674999952316
0.595999956131
0.587000131607
0.569000005722
0.625999927521
0.608999967575
0.606000185013
0.639999866486
0.697000026703
0.632999897003

Using for..if:
0.588000059128
0.618999958038
0.526999950409
0.52799987793
0.533999919891
0.526999950409
0.531999826431
0.53200006485
0.531000137329
0.523000001907
2 Likes

Ah, I must have missed that Victor didn’t include the brackets. I admit i’ve never bothered to read up on generators despite seeing them mentioned, but I will! Cheers

Using a generator is far quicker than using a list, but for…if still appears to win?

Using any:
1.62400007248
1.63999986649
1.66700005531
1.85400009155
1.54399991035
1.29500007629
0.995000123978
1.13599991798
1.11899995804
1.40300011635

Using for…if:
1.35600018501
0.944000005722
0.863999843597
0.843000173569
1.01399993896
1.00300002098
1.07899999619
0.926999807358
0.773999929428
0.802999973297

Perhaps I’m missing something but wouldn’t you want to test worst case? Doesn’t the for section still need to generate a list?

That’s probably down to inefficiencies in Jython & additional manipulation needed to make the generator expression.

On CPython 3.9.9, any is faster:

C:\Users\pgriffith\Downloads>python -m bench.py
Using any:
0.33617353439331055
0.3341856002807617
0.3485901355743408
0.3371281623840332
0.3347737789154053
0.3402841091156006
0.3347022533416748
0.34109067916870117
0.3483257293701172
0.3388524055480957

Using for..if:
0.39873743057250977
0.3966805934906006
0.3977961540222168
0.39963269233703613
0.39687132835388184
0.39591336250305176
0.4033031463623047
0.4000084400177002
0.39662909507751465
0.4020504951477051

Almost certainly not a difference that will matter, though; list iteration is rarely the actual bottleneck in the world of file and network IO.

1 Like

I thought I was testing worst case with the thing I’m looking for at the very end of the list :man_shrugging:

Both should be using an iterator object created from the initial list via the for loop's syntactic sugar.
The manual entry for any essentially contains the snippet of code in the second case:
https://docs.python.org/2/library/functions.html#any

Yeah, but in the generator you’re not. [b==1 for b in a] would generate a list like [1,1,1,....,1] so any would short circuit on the first element not the last.

For instance:

from time import time

a = [0]
a.extend([0] * 10000000)
a.append(1)

print 'Using any(b == a):'
for i in range(3):
	t = time()
	p = any(b == 1 for b in a)
	print time() - t

print ''
print 'Using for..if:'
for i in range(3):
	t = time()
	for c in iter(b == a for b in a):
		if c:
			break
	print time() - t

results in:

Using any(b == 1):
1.50999999046
1.54399991035
1.47600007057

Using for..if:
1.90899991989
1.9430000782
1.87400007248
>>>

but:

from time import time

a = [0]
a.extend([0] * 10000000)
a.append(1)

print 'Using any(b == a):'
for i in range(3):
	t = time()
	p = any(b == a for b in a)
	print time() - t

print ''
print 'Using for..if:'
for i in range(3):
	t = time()
	for c in iter(b == a for b in a):
		if c:
			break
	print time() - t

results in:

Using any(b == a):
1.09200000763
1.04399991035
0.960000038147

Using for..if:
1.99799990654
1.97000002861
1.95200014114
>>>

But I could also be missing something.

this would generate a list:
[False,False,False,False,False,False,...,True] <--Edit: replaced 0's 1's with False/True

This part is invalid - b will never equal a (a is a list, b is an integer) maybe just a typo though?
image

2 Likes

No?
If b = 1 for b in a were valid syntax, maybe, but that's not what happens:

a = [0] * 10
a.append(1)
print [b == 1 for b in a]
>>> 
[False, False, False, False, False, False, False, False, False, False, True]

Comparing b to a as you are seems strictly incorrect; you're comparing a single element (retrieved via the for loop) with the entire list.

Nope, I’m just tired.

Coffee??? Anyone???

2 Likes

Remember when I said I was missing something, well it was understanding and sleep. :rofl:

5 Likes

Me? where? i might have just made a mistake xD