Yun Qi Hao: https://yqh.aliyun.com
The first-hand cloud information, the selected cloud enterprise case base of different industries, and the best practices extracted from many successful cases help you to make cloud decision!
Python has been loved by programmers all over the world, but it has been criticized by some people, one of the reasons is that it runs slowly.
In fact, the running speed of a specific program (no matter what programming language is used) is fast or slow, which largely depends on the quality of the developers who write the program and their ability to write optimized and efficient code.
A little brother in Medium explained in detail how to speed up Python by 30%, so as to prove that code running slowly is not a problem of python, but of the code itself.
Time series analysisBefore we start any optimization, we need to find out which parts of the code slow down the whole program. Sometimes program problems are obvious, but if you don't know where the problem is for a while, here are some possible options:
Note: This is the program I will use for demonstration. It will perform index calculation (from Python document):
# slow_program.py from decimal import * def exp(x): getcontext().prec += 2 i, lasts, s, fact, num = 0, 0, 1, 1, 1 while s != lasts: lasts = s i += 1 fact *= i num *= x s += num / fact getcontext().prec -= 2 return +s exp(Decimal(150)) exp(Decimal(400)) exp(Decimal(3000))The simplest "profile"
First of all, the simplest and most lazy way is Unix time command.
~ $ time python3.8 slow_program.py real 0m11,058s user 0m11,050s sys 0m0,008s
That's enough if you can only get to the running time of the whole program, but usually it's not enough.
Most detailed analysisAnother instruction is cProfile, but it provides too much information.
~ $ python3.8 -m cProfile -s time slow_program.py 1297 function calls (1272 primitive calls) in 11.081 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 3 11.079 3.693 11.079 3.693 slow_program.py:4(exp) 1 0.000 0.000 0.002 0.002 4/1 0.000 0.000 11.081 11.081 6 0.000 0.000 0.000 0.000 6 0.000 0.000 0.000 0.000 abc.py:132(__new__) 23 0.000 0.000 0.000 0.000 _weakrefset.py:36(__init__) 245 0.000 0.000 0.000 0.000 2 0.000 0.000 0.000 0.000 10 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1233(find_spec) 8/4 0.000 0.000 0.000 0.000 abc.py:196(__subclasscheck__) 15 0.000 0.000 0.000 0.000 6 0.000 0.000 0.000 0.000 1 0.000 0.000 0.000 0.000 __init__.py:357(namedtuple) 48 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:57(_path_join) 48 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:59(<listcomp>) 1 0.000 0.000 11.081 11.081 slow_program.py:1(<module>)
Here, we use the cProfile module and the time parameter to run the test script to sort the rows by internal time (cumtime). This gives us a lot of information. The line you see above is about 10% of the actual output. This shows that exp function is the main culprit. Now we can learn more about timing and performance analysis.
Timing specific functionsNow that we know where to focus, we may want to time slow functions without measuring the rest of the code. To do this, we can use a simple decorator:
def timeit_wrapper(func): @wraps(func) def wrapper(*args, **kwargs): start = time.perf_counter() # Alternatively, you can use time.process_time() func_return_val = func(*args, **kwargs) end = time.perf_counter() print('. : '.format(func.__module__, func.__name__, end - start)) return func_return_val return wrapper
This decorator can then be applied to the function to be tested as follows:
@timeit_wrapper def exp(x): ... print(' '.format('module', 'function', 'time')) exp(Decimal(150)) exp(Decimal(400)) exp(Decimal(3000))
This gives us the following output:
~ $ python3.8 slow_program.py module function time __main__ .exp : 0.003267502994276583 __main__ .exp : 0.038535295985639095 __main__ .exp : 11.728486061969306
One thing to consider is the time we actually want to measure. The time package provides two functions: time. Perf? Counter and time. Process? Time. The difference between them is the absolute value returned by perf'counter, including the time when your Python program process is not running, so it may be affected by the computer load. On the other hand, process "time only returns user time (excluding system time), which is only your process time.
Speed up!Let Python program run faster, this part will be very interesting! I'm not going to show you the skills and code to solve your performance problems. It's more about ideas and strategies that can have a huge impact on performance when used. In some cases, it can increase the speed by 30%.
Use built-in data typesThis is obvious. Built in data types are very fast, especially compared to our custom types, such as trees or link lists. This is mainly because the built-in program is implemented in C, so we can't match it when we use Python for coding.
Use LRU cache to cache / rememberI've shown this in my last blog, but I think it's worth repeating it with a simple example:
import functools import time # caching up to 12 different results @functools.lru_cache(maxsize=12) def slow_func(x): time.sleep(2) # Simulate long computation return x slow_func(1) # ... waiting for 2 sec before getting result slow_func(1) # already cached - result returned instantaneously! slow_func(3) # ... waiting for 2 sec before getting result
The above function uses time.sleep to simulate a large number of calculations. The first time it is called with parameter 1, it waits for 2 seconds before returning the result. When called again, the result is cached, so it skips the body of the function and returns the result immediately. For more practical examples, see previous blog posts.
Use local variablesThis is related to the speed of finding variables in each scope, because it uses not only local variables but also global variables. In fact, even between a function's local variables (fastest), class level attributes (such as self.name -- slower), and global (such as imported functions) such as time.time (slowest), the search speed is actually different.
You can improve performance by using seemingly unnecessary allocations, as follows:
# Example #1 class FastClass: def do_stuff(self): temp = self.value # this speeds up lookup in loop for i in range(10000): ... # Do something with `temp` here # Example #2 import random def fast_function(): r = random.random for i in range(10000): print(r()) # calling `r()` here, is faster than global random.random()Usage function
This seems counterintuitive, because calling a function puts more things on the stack and incurs overhead from function returns, but it's related to the previous point. If you put the entire code in a single file instead of a function, it will run much slower because of global variables. Therefore, you can speed up the code by wrapping the entire code in the main function and calling it once, as follows:
def main(): ... # All your previously global code main()Do not access properties
Another thing that may slow your program down is the dot operator (.), which is used to get object properties. This operator uses \\\\\\\\\\. So how can we really avoid (limit) using it?
# Slow: import re def slow_func(): for i in range(10000): re.findall(regex, line) # Slow! # Fast: from re import findall def fast_func(): for i in range(10000): findall(regex, line) # Faster!Caution string
String operations can become very slow when looping with modulus (% s) or. format(). What better choice do we have? According to Raymond Hettinger's latest tweet, the only thing we should use is the f string, which is the most readable, concise and fast way. According to the tweet, this is a list of methods you can use - fastest to slowest:
f' ' # Fast! s + ' ' + t ' '.join((s, t)) '%s %s' % (s, t) '{} {}'.format(s, t) Template('$s $t').substitute(s=s, t=t) # Slow!
Generators are not inherently faster because they are allowed to do deferred computation, saving memory rather than time. However, saved memory may cause your program to actually run faster. How is this done? If you have a large data set and don't use a generator (iterator), the data may overflow the CPU L1 cache, which will greatly slow down the search speed of the value in memory.
In terms of performance, it is very important that the CPU can keep all the data being processed in the cache as much as possible. You can watch Raymond Hettingers' video, in which he mentions these issues.
conclusionThe first rule of optimization is not to optimize. But if it does, I hope these tips can help you. However, be careful when optimizing your code, as it may eventually make your code hard to read and therefore difficult to maintain, which may outweigh the benefits of optimization.
Yun Qi Hao: https://yqh.aliyun.com
The first-hand cloud information, the selected cloud enterprise case base of different industries, and the best practices extracted from many successful cases help you to make cloud decision!
Original release time: January 13, 2020
Author: Medium
This article is from Alibaba cloud Qihao partner“ Big data digest ”, you can pay attention to“ Big data digest"