TIL: constant folding in python.
September 10, 2022 · 6 mins · 1085 words
I’m working on a blog post about which option is faster x*x
or x**2
, and while writing it I discovered some interesting things about how python is implemented at a low level. So for today’s post I won’t be able to answer which option is faster, but I’ll explain you how python optimizes your code.
First experiment: literals vs symbols
To start let’s go back to the original question: what is faster: x*x
or x**2
? Before getting too rigorous I wanted to run some experiments to get some intuition about the problem, and surprisingly here’s where I found the first weird results. My first experiment was to check if data types had a big impact in this experiment, so I created this first snippet
import time
EXPERIMENTS = 1_000_000
ti = time.time()
for i in range(EXPERIMENTS):
2**2
tf = time.time()
print(f"int**int took {tf-ti:.5f} seconds")
ti = time.time()
for i in range(EXPERIMENTS):
2.**2.
tf = time.time()
print(f"float**float took {tf-ti:.5f} seconds")
And after running it I got
> int**int took 0.03004 seconds
> float**float took 0.03070 seconds
So apparently data types do not have a huge impact. However, the code didn’t look nice to me, so I decided to refactor it a little bit
import time
EXPERIMENTS = 1_000_000
def power_time(base, exponent):
ti = time.time()
for i in range(EXPERIMENTS):
base**exponent
tf = time.time()
return tf-ti
print(f"int**int took {power_time(2, 2):.5f} seconds")
print(f"float**float took {power_time(2., 2.):.5f} seconds")
Same logic as before but less lines of code. However, after running it I got
> int**int took 0.20140 seconds
> float**float took 0.05051 seconds
It makes sense that in the case of float**float
it takes a little bit more, since python needs to pass the arguments to the method and this takes time. But what the hell is happening in the case of int**int
?
After asking about it in several places, I discovered that the problem comes from using literals in the first case and using symbols in the second case. By refactoring the first case to
import time
EXPERIMENTS = 1_000_000
+ BASE = 2
+ EXPONENT = 2
ti = time.time()
for i in range(EXPERIMENTS):
- 2**2
+ BASE**EXPONENT
tf = time.time()
print(f"int**int took {tf-ti:.5f} seconds")
+ BASE = 2.
+ EXPONENT = 2.
ti = time.time()
for i in range(EXPERIMENTS):
- 2.**2.
+ BASE**EXPONENT
tf = time.time()
print(f"float**float took {tf-ti:.5f} seconds")
I got the same results in two cases. The problem didn’t come from data types or methods, but rather from literals and symbols. In the case of using literals, it was much more faster because a technique named constant folding was used by the interpreter.
Constant Folding
This post does not intend to be a deep dive about constant folding, so if you’re here to learn the internals of constant folding I recommend you this post. Here I will just share the intuitions I developed while reading about constant folding.
Basically, constant folding consists in the interpreter replacing constant expressions with their value. For example, the expression A = 10*10
is replaced by the parser by A = 100
.
You can check that by using the dis
module. This module allows you to check the python bytecode of your code. Basically, it allows you to check the “compiled” version of your code. If we run this example
import dis
dis.dis("A=123*10")
we’ll get
1 0 LOAD_CONST 0 (1230)
2 STORE_NAME 0 (A)
4 LOAD_CONST 1 (None)
6 RETURN_VALUE
where the first line indicated that a constant value 1230
has been loaded without having to evaluate 123*10
. This means that the interpreter has replaced 123*10
by 1230
before compiling the code. If instead be run the code
import dis
dis.dis("A=123;B=10;A*B")
we’ll get
1 0 LOAD_CONST 0 (10)
2 STORE_NAME 0 (A)
4 LOAD_CONST 0 (10)
6 STORE_NAME 1 (B)
8 LOAD_NAME 0 (A)
10 LOAD_NAME 1 (B)
12 BINARY_MULTIPLY
14 POP_TOP
16 LOAD_CONST 1 (None)
18 RETURN_VALUE
where we can see that the operation is not pre-computed, and it’ll be executed by the instruction 12 BINARY_MULTIPLY
.
Another interesting point is that the constants are not always folded. For example
import dis
dis.dis("2**64")
dis.dis("2**65")
returns
1 0 LOAD_CONST 0 (4294967296)
2 RETURN_VALUE
1 0 LOAD_CONST 0 (2)
2 LOAD_CONST 1 (65)
4 BINARY_POWER
6 RETURN_VALUE
As we can see, in the first case the constant is folded, but in the second one it’s not. One may ask now which are the rules that python uses to decide which constants are folded and which not, and the answer is surprising: there are not rules, it’s an implementation detail! This means that constant folding implementation can change between python versions and you shouldn’t rely on it. See this answer for more details about the implementation details of constant folding.