Searching for different Python data types
Language: Python 3.7.2
System: Win10 Ver. 10.0.17763
Subject: 004.01 search for different Python data types
Recently, when doing the case of data search and comparison, we found that a large number of data in search and comparison is very slow, too slow to accept. What I want is' immediate 'results, but the results are to wait for several hours, dizzy! Although Python is certainly not comparable to C or Assembly language, we still need to find a way to improve the speed. The following are various methods and time needed to find 10000 data in 10000 data. Although the last method index_list_sort() is much faster, I still don't think it's fast enough. Moreover, it's only integer search. What if it's a string? What if it's a substring? If you have a better way, please also prompt, thank you!
Result:
0:00:04.734338 : index_sequence 0:00:01.139984 : index_list 0:00:00.330116 : index_np 0:00:00.233343 : index_np_sort 0:00:00.223401 : index_dict 0:00:00.213462 : index_set 0:00:00.007977 : index_list_sort
Code:
Code:from datetime import datetime import numpy as np import bisect import time import random import inspect import copy size = 10000 value = size-1 db = random.sample(range(size), size) db_sort = copy.deepcopy(db) db_sort.sort() db_set = set(db) db_dict = {db[i]:i for i in range(size)} db_np = np.array(db) value = [i for i in range(size)] def call(func): # Call function and calculate execution time, then print duration and function name start_time = datetime.now() func() print(datetime.now() - start_time,':',func.__name__) def do_something(): # Do something here, it may get duration different when multi-loop method used for i in range(1000): pass def index_sequence(): # List unsort and just by Python without any method used or built-in function. for i in range(size): for j in range(size): if value[j] == db[i]: index = j do_something() break def index_list(): # Unsorted list, use list.index() for i in range(size): try: index = db.index(value[i]) except: index = -1 if index >= 0: do_something() def index_np(): # By using numpy and np(where) for i in range(size): result = np.where(db_np==value[i]) if len(result[0])!=0: do_something() def index_np_sort(): # By using numpy and sorted numpy array for i in range(size): result = np.searchsorted(db_np, value[i]) if result != size: do_something() def index_list_sort(): # By using bisect library for i in range(size): index = bisect.bisect_left(db, value[i]) if index < size-1 and value[index]==db[index]: do_something() def index_set(): # Set serach for i in range(size): if value[i] in db_set: do_something() def index_dict(): # Dictionary search for i in range(size): try: index = db_dict[value[i]] except: index = -1 if index >= 0: do_something()
Test execution time
call(index_sequence) call(index_list) call(index_np) call(index_np_sort) call(index_dict) call(index_set) call(index_list_sort)Copy code database search