Experiment and analysis -- numpy.vectorize

Function function

Numpy.vectorize function can realize the vectorization of any function, so as to avoid using loops in python and improve efficiency (it may not improve efficiency...). In addition, by making good use of the signature parameter of the function, the function that originally processes vectors can batch process the vectors of vectors according to its own needs (see the following example), which can be said to be very convenient. The official link is as follows: https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html

Next, take different types of addition as an example for experiment and analysis.

Different types of addition effects are realized by vectorizing the following functions.

def add(a,b):
	return a + b

In order to better understand the actual calculation process, we add a line of output to this function:

def add(a,b):
	print("a:", a, "b:", b)
	return a + b

Example 1

If we want to achieve the following addition effects:
[ 1 , 2 , 3 ] + [ 4 , 5 , 6 ] = [ 5 , 7 , 9 ] [1,2,3] + [4,5,6] = [5,7,9] [1,2,3]+[4,5,6]=[5,7,9]
If the numpy array is not used, such an operation cannot be implemented directly. If the function is called directly, the result will be:

>>> add([1,2,3],[4,5,6])
a: [1, 2, 3] b: [4, 5, 6]

[1, 2, 3, 4, 5, 6]

However, if we use numpy.vectorize to transform this function, we can realize vectorized addition:

>>> add_vectorized_func1 = np.vectorize(add)
>>> add_vectorized_func1([1,2,3],[4,5,6])
a: 1 b: 4
a: 1 b: 4
a: 2 b: 5
a: 3 b: 6

array([5, 7, 9])

It can be seen that the vectorized function will take out two arrays and add each element. When I carry out this small experiment, I will output one more line a: 1 b: 4. The reason is unknown, but the calculation result is correct.

Run time comparison

The following code is tested on the Jupiter notebook

def add(a, b):
    return a + b

n = int(1e7)
a, b = np.random.rand(n), np.random.rand(n)

res = np.zeros(n)
def test1_novec(a, b, res):
    for i in range(n):
        res[i] = add(a[i], b[i])
    return res

add_vectorized_func1 = np.vectorize(add)
def test1_vec(a, b, res):
    res = add_vectorized_func1(a, b)
    return res
%timeit test1_novec(a, b, res)
5.05 s ± 158 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit test1_vec(a, b, res)
2.2 s ± 339 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

It can be seen that for this example, the vectorized code is about 2.7 times faster than the loop.

If you use the signature parameter again, you can realize more convenient functions, such as the following example

Example 2

If we want to add each element in [1,2,3] and [4,5,6] to get a two-dimensional array:
[ 1 , 2 , 3 ] + [ 4 , 5 , 6 ] = [ 1 + [ 4 , 5 , 6 ] 2 + [ 4 , 5 , 6 ] 3 + [ 4 , 5 , 6 ] ] = [ 5 6 7 6 7 8 7 8 9 ] \begin{aligned} [1,2,3] + [4,5,6] &= \left[\begin{matrix}1 + [4,5,6] \\ 2 + [4,5,6] \\ 3+[4,5,6]\end{matrix}\right] \\ &=\left[\begin{matrix} 5&6&7\\6&7&8 \\ 7&8&9\end{matrix}\right] \end{aligned} [1,2,3]+[4,5,6]​=⎣⎡​1+[4,5,6]2+[4,5,6]3+[4,5,6]​⎦⎤​=⎣⎡​567​678​789​⎦⎤​​

In other words, we want the a parameter of the add function to be a number and the b parameter to be an array every time. At this time, the return value of the add function is also an array.

We can use the signature parameter of numpy.vector to indicate this. Signature is a string, where we can use "()" to represent a scalar parameter (or return value), "(n)" to represent an n-dimensional array parameter (or return value), "(n,m)" to represent a (n,m) two-dimensional array parameter (or return value).

In this way, we can use "(), (n) - > (n)" to indicate our requirements, that is, each time the first parameter passed to the add function is a number, the second parameter is an n-dimensional array, and the return value is also an n-dimensional array, which is represented by the letter n. The second parameter is the same as the return value dimension. Note that this string cannot contain any spaces.

The code and operation results are as follows:

>>> add_vectorized_func2 = np.vectorize(add, signature="(),(n)->(n)")
>>> add_vectorized_func2([1,2,3], np.array([4,5,6])) # Here, np.array is used to ensure that the addition of numbers and arrays can be realized
a: 1 b: [4 5 6]
a: 2 b: [4 5 6]
a: 3 b: [4 5 6]

array([[5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

Of course, we can also relax the requirements for this string. For example, we don't know the dimension of the array returned by the add function after passing in a number and an n-dimensional array. It doesn't matter. Just change a different letter, such as "(), (n) - > (k)". The code and running results are as follows:

>>> add_vectorized_func2 = np.vectorize(add, signature="(),(n)->(k)")
>>> add_vectorized_func2([1,2,3], np.array([4,5,6])) # Here, np.array is used to ensure that the addition of numbers and arrays can be realized
a: 1 b: [4 5 6]
a: 2 b: [4 5 6]
a: 3 b: [4 5 6]

array([[5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

Run time comparison

def add(a, b):
    return a + b

n = int(1e4)
a, b = np.random.rand(n), np.random.rand(n)

res = np.zeros((n,n))
def test2_novec(a, b, res):
    for i in range(n):
        res[i] = add(a[i], b)
    return res

add_vectorized_func2 = np.vectorize(add, signature="(),(n)->(n)")
def test2_vec(a, b, res):
    res = add_vectorized_func2(a, b)
    return res
%timeit test2_novec(a, b, res)
133 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit test2_vec(a, b, res)
351 ms ± 40.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

It can be seen that for this example, the for loop is faster than the np.vertorize vectorization code.

Example 3

If the above example 2 can be extended to higher dimensions, for example, we want to realize such addition:
[ 1 1 1 2 2 2 3 3 3 ] + [ 4 4 4 5 5 5 6 6 6 ] = [ [ 1 1 1 ] + [ 4 4 4 5 5 5 6 6 6 ] [ 2 2 2 ] + [ 4 4 4 5 5 5 6 6 6 ] [ 3 3 3 ] + [ 4 4 4 5 5 5 6 6 6 ] ] = [ [ 5 5 5 6 6 6 7 7 7 ] [ 6 6 6 7 7 7 8 8 8 ] [ 7 7 7 8 8 8 9 9 9 ] ] \begin{aligned} \left[\begin{matrix}1&1&1\\2&2&2\\3&3&3\end{matrix}\right] + \left[\begin{matrix}4&4&4\\5&5&5\\6&6&6\end{matrix}\right]&=\left[\begin{matrix} \left[\begin{matrix}1&1&1\end{matrix}\right]+\left[\begin{matrix}4&4&4\\5&5&5\\6&6&6\end{matrix}\right]\\\left[\begin{matrix}2&2&2\end{matrix}\right]+\left[\begin{matrix}4&4&4\\5&5&5\\6&6&6\end{matrix}\right]\\\left[\begin{matrix}3&3&3\end{matrix}\right]+\left[\begin{matrix}4&4&4\\5&5&5\\6&6&6\end{matrix}\right]\end{matrix}\right] &= \left[\begin{matrix} \left[\begin{matrix}5&5&5\\6&6&6\\7&7&7\end{matrix}\right]\\\left[\begin{matrix}6&6&6\\7&7&7\\8&8&8\end{matrix}\right]\\\left[\begin{matrix}7&7&7\\8&8&8\\9&9&9\end{matrix}\right]\end{matrix}\right] \end{aligned} ⎣⎡​123​123​123​⎦⎤​+⎣⎡​456​456​456​⎦⎤​​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​[1​1​1​]+⎣⎡​456​456​456​⎦⎤​[2​2​2​]+⎣⎡​456​456​456​⎦⎤​[3​3​3​]+⎣⎡​456​456​456​⎦⎤​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​⎣⎡​567​567​567​⎦⎤​⎣⎡​678​678​678​⎦⎤​⎣⎡​789​789​789​⎦⎤​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​​

Then, for the add function, the first parameter of each input is a 3-dimensional vector, the second parameter is a 3x3-dimensional vector, and the output is also a 3x3-dimensional vector. Therefore, the string can be expressed as "(n), (m, n) - > (m, n)".
The code and running results are as follows:

>>> add_vectorized_func3 = np.vectorize(add, signature="(n),(m,n)->(m,n)")
>>> add_vectorized_func3(
    np.array([[1,1,1],
             [2,2,2],
             [3,3,3]]), 
    np.array([[4,4,4],
             [5,5,5],
             [6,6,6]]
    )) # Here, np.array is used to ensure that the addition of array and array can be realized

a: [1 1 1] b: [[4 4 4]
 [5 5 5]
 [6 6 6]]
a: [2 2 2] b: [[4 4 4]
 [5 5 5]
 [6 6 6]]
a: [3 3 3] b: [[4 4 4]
 [5 5 5]
 [6 6 6]]
 
array([[[5, 5, 5],
        [6, 6, 6],
        [7, 7, 7]],

       [[6, 6, 6],
        [7, 7, 7],
        [8, 8, 8]],

       [[7, 7, 7],
        [8, 8, 8],
        [9, 9, 9]]])

Run time comparison

def add(a, b):
    return a + b

n = int(5e2)
a, b = np.random.rand(n,n), np.random.rand(n,n)

res = np.zeros((n,n,n))
def test3_novec(a, b, res):
    for i in range(n):
        res[i] = add(a[i], b)
    return res

add_vectorized_func3 = np.vectorize(add, signature="(n),(m,n)->(m,n)")
def test3_vec(a, b, res):
    res = add_vectorized_func3(a, b)
    return res
%timeit test3_novec(a, b, res)
450 ms ± 9.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit test3_vec(a, b, res)
702 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

For this example, the for loop is also faster than the np.vertorize vectorization code.

Tags: Python

Posted on Sat, 06 Nov 2021 13:22:53 -0400 by alanho