Slow code Python? Hands on how to speed up your code by 30%

Yun Qi Hao:
The first-hand cloud information, the selected cloud enterprise case base of different industries, and the best practices extracted from many successful cases help you to make cloud decision!

Python has been loved by programmers all over the world, but it has been criticized by some people, one of the reasons is that it runs slowly.

In fact, the running speed of a specific program (no matter what programming language is used) is fast or slow, which largely depends on the quality of the developers who write the program and their ability to write optimized and efficient code.

A little brother in Medium explained in detail how to speed up Python by 30%, so as to prove that code running slowly is not a problem of python, but of the code itself.

Time series analysis

Before we start any optimization, we need to find out which parts of the code slow down the whole program. Sometimes program problems are obvious, but if you don't know where the problem is for a while, here are some possible options:

Note: This is the program I will use for demonstration. It will perform index calculation (from Python document):


from decimal import *

def exp(x):
    getcontext().prec += 2
    i, lasts, s, fact, num = 0, 0, 1, 1, 1
    while s != lasts:
        lasts = s
        i += 1
        fact *= i
        num *= x
        s += num / fact
    getcontext().prec -= 2
    return +s


The simplest "profile"

First of all, the simplest and most lazy way is Unix time command.

~ $ time python3.8

real  0m11,058s
user 0m11,050s
sys 0m0,008s

That's enough if you can only get to the running time of the whole program, but usually it's not enough.

Most detailed analysis

Another instruction is cProfile, but it provides too much information.

~ $ python3.8 -m cProfile -s time

         1297 function calls (1272 primitive calls) in 11.081 seconds

   Ordered by: internal time

   ncalls tottime percall cumtime percall filename:lineno(function)
        3   11.079    3.693   11.079    3.693
        1    0.000    0.000    0.002    0.002 {built-in method _imp.create_dynamic}
      4/1    0.000    0.000   11.081   11.081 {built-in method builtins.exec}
        6    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x9d12c0}
        6    0.000    0.000    0.000    0.000
       23    0.000    0.000    0.000    0.000
      245    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        2    0.000    0.000    0.000    0.000 {built-in method marshal.loads}
       10    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1233(find_spec)
      8/4    0.000    0.000    0.000    0.000
       15    0.000    0.000    0.000    0.000 {built-in method posix.stat}
        6    0.000    0.000    0.000    0.000 {built-in method builtins.__build_class__}
        1    0.000    0.000    0.000    0.000
       48    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:57(_path_join)
       48    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:59(<listcomp>)
        1    0.000    0.000   11.081   11.081<module>)

Here, we use the cProfile module and the time parameter to run the test script to sort the rows by internal time (cumtime). This gives us a lot of information. The line you see above is about 10% of the actual output. This shows that exp function is the main culprit. Now we can learn more about timing and performance analysis.

Timing specific functions

Now that we know where to focus, we may want to time slow functions without measuring the rest of the code. To do this, we can use a simple decorator:

def timeit_wrapper(func):
    def wrapper(*args, **kwargs):
        start = time.perf_counter() # Alternatively, you can use time.process_time()
        func_return_val = func(*args, **kwargs)
        end = time.perf_counter()
        print('{0:<10}.{1:<8} : {2:<8}'.format(func.__module__, func.__name__, end - start))
        return func_return_val
    return wrapper

This decorator can then be applied to the function to be tested as follows:


def exp(x):

print('{0:<10} {1:<8} {2:^8}'.format('module', 'function', 'time'))

This gives us the following output:

~ $ python3.8
module function   time  
__main__ .exp      : 0.003267502994276583
__main__ .exp      : 0.038535295985639095
__main__ .exp      : 11.728486061969306

One thing to consider is the time we actually want to measure. The time package provides two functions: time. Perf? Counter and time. Process? Time. The difference between them is the absolute value returned by perf'counter, including the time when your Python program process is not running, so it may be affected by the computer load. On the other hand, process "time only returns user time (excluding system time), which is only your process time.

Speed up!

Let Python program run faster, this part will be very interesting! I'm not going to show you the skills and code to solve your performance problems. It's more about ideas and strategies that can have a huge impact on performance when used. In some cases, it can increase the speed by 30%.

Use built-in data types

This is obvious. Built in data types are very fast, especially compared to our custom types, such as trees or link lists. This is mainly because the built-in program is implemented in C, so we can't match it when we use Python for coding.

Use LRU cache to cache / remember

I've shown this in my last blog, but I think it's worth repeating it with a simple example:

import functools
import time
# caching up to 12 different results
def slow_func(x):
    time.sleep(2) # Simulate long computation
    return x

slow_func(1) # ... waiting for 2 sec before getting result
slow_func(1) # already cached - result returned instantaneously!
slow_func(3) # ... waiting for 2 sec before getting result

The above function uses time.sleep to simulate a large number of calculations. The first time it is called with parameter 1, it waits for 2 seconds before returning the result. When called again, the result is cached, so it skips the body of the function and returns the result immediately. For more practical examples, see previous blog posts.

Use local variables

This is related to the speed of finding variables in each scope, because it uses not only local variables but also global variables. In fact, even between a function's local variables (fastest), class level attributes (such as -- slower), and global (such as imported functions) such as time.time (slowest), the search speed is actually different.

You can improve performance by using seemingly unnecessary allocations, as follows:

# Example #1
class FastClass:
    def do_stuff(self):
        temp = self.value # this speeds up lookup in loop
        for i in range(10000):
            ... # Do something with `temp` here

# Example #2
import random
def fast_function():
    r = random.random
    for i in range(10000):
        print(r()) # calling `r()` here, is faster than global random.random()

Usage function

This seems counterintuitive, because calling a function puts more things on the stack and incurs overhead from function returns, but it's related to the previous point. If you put the entire code in a single file instead of a function, it will run much slower because of global variables. Therefore, you can speed up the code by wrapping the entire code in the main function and calling it once, as follows:

def main():

    ... # All your previously global code


Do not access properties

Another thing that may slow your program down is the dot operator (.), which is used to get object properties. This operator uses \\\\\\\\\\. So how can we really avoid (limit) using it?

# Slow:
import re
def slow_func():
    for i in range(10000):
        re.findall(regex, line) # Slow!

# Fast:
from re import findall
def fast_func():
    for i in range(10000):
        findall(regex, line) # Faster!

Caution string

String operations can become very slow when looping with modulus (% s) or. format(). What better choice do we have? According to Raymond Hettinger's latest tweet, the only thing we should use is the f string, which is the most readable, concise and fast way. According to the tweet, this is a list of methods you can use - fastest to slowest:

f'{s} {t}'  # Fast!
s + ' ' + t
' '.join((s, t))
'%s %s' % (s, t)
'{} {}'.format(s, t)
Template('$s $t').substitute(s=s, t=t) # Slow!

Generators are not inherently faster because they are allowed to do deferred computation, saving memory rather than time. However, saved memory may cause your program to actually run faster. How is this done? If you have a large data set and don't use a generator (iterator), the data may overflow the CPU L1 cache, which will greatly slow down the search speed of the value in memory.

In terms of performance, it is very important that the CPU can keep all the data being processed in the cache as much as possible. You can watch Raymond Hettingers' video, in which he mentions these issues.


The first rule of optimization is not to optimize. But if it does, I hope these tips can help you. However, be careful when optimizing your code, as it may eventually make your code hard to read and therefore difficult to maintain, which may outweigh the benefits of optimization.

Yun Qi Hao:
The first-hand cloud information, the selected cloud enterprise case base of different industries, and the best practices extracted from many successful cases help you to make cloud decision!

Original release time: January 13, 2020
Author: Medium
This article is from Alibaba cloud Qihao partner“ Big data digest ”, you can pay attention to“ Big data digest"

Tags: Python Big Data Programming Unix

Posted on Mon, 13 Jan 2020 06:58:28 -0500 by Shagrath