Love, love, 20 Python functions easy to use

Hello everyone, today I share 20 Python functions that are essential to daily work. These functions are not seen much, but they are very convenient to use. They can greatly improve work efficiency. The content is long, welcome to collect and learn, like praise and support, and there is a technical exchange group at the end of the article. Welcome to join.

isin() method

The isin() method is mainly used to confirm whether the values in the data set are included in the given list

df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
                  index=['A', 'B', 'C', 'D'],
                  columns=['one', 'two', 'three'])
df.isin([3, 5, 12])

output

     one    two  three
A  False  False   True
B  False   True  False
C  False  False  False
D  False  False   True

If the value is included in the list, that is, 3, 5 and 12, it returns True, otherwise it returns False

df.plot.area() method

Now let's talk about how to draw a chart through a line of code in Pandas, and draw all columns through an area chart

df = pd.DataFrame({
    'sales': [30, 20, 38, 95, 106, 65],
    'signups': [7, 9, 6, 12, 18, 13],
    'visits': [20, 42, 28, 62, 81, 50],
}, index=pd.date_range(start='2021/01/01', end='2021/07/01', freq='M'))

ax = df.plot.area(figsize = (10, 5))

output

df.plot.bar() method

Let's take a look at how to draw a histogram with one line of code

df = pd.DataFrame({'label':['A', 'B', 'C', 'D'], 'values':[10, 30, 50, 70]})
ax = df.plot.bar(x='label', y='values', rot=20)

output

Of course, we can also draw histogram according to different categories

age = [0.1, 17.5, 40, 48, 52, 69, 88]
weight = [2, 8, 70, 1.5, 25, 12, 28]
index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
df = pd.DataFrame({'age': age, 'weight': weight}, index=index)
ax = df.plot.bar(rot=0)

output

Of course, we can also draw the chart horizontally

ax = df.plot.barh(rot=0)

output

df.plot.box() method

Let's take a look at the specific drawing of the box diagram, which is realized by a line of code of pandas

data = np.random.randn(25, 3)
df = pd.DataFrame(data, columns=list('ABC'))
ax = df.plot.box()

output

df.plot.pie() method

Next is the drawing of pie chart

df = pd.DataFrame({'mass': [1.33, 4.87 , 5.97],
                   'radius': [2439.7, 6051.8, 6378.1]},
                  index=['Mercury', 'Venus', 'Earth'])
plot = df.plot.pie(y='mass', figsize=(8, 8))

output

In addition, there are broken line diagram, histogram, scatter diagram, etc. the steps and methods are similar to the above skills. If you are interested, you can try it by yourself.

items() method

The items() method in pandas can be used to traverse each column in the dataset and return the column name and the content in each column at the same time. It is in the form of tuples, as shown in the following example

df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
                  'population': [1864, 22000, 80000]},
                  index=['panda', 'polar', 'koala'])
df

output

         species  population
panda       bear        1864
polar       bear       22000
koala  marsupial       80000

Then we use the items() method

for label, content in df.items():
    print(f'label: {label}')
    print(f'content: {content}', sep='\n')
    print("=" * 50)

output

label: species
content: panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
==================================================
label: population
content: panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
==================================================

The column names and corresponding contents of the 'categories' and' population 'columns are printed successively

iterrows() method

For the iterrows() method, its function is to traverse each row in the dataset and return the index of each row and the content of each row with column name. An example is as follows

for label, content in df.iterrows():
    print(f'label: {label}')
    print(f'content: {content}', sep='\n')
    print("=" * 50)

output

label: panda
content: species       bear
population    1864
Name: panda, dtype: object
==================================================
label: polar
content: species        bear
population    22000
Name: polar, dtype: object
==================================================
label: koala
content: species       marsupial
population        80000
Name: koala, dtype: object
==================================================

insert() method

The insert() method is mainly used to insert data at a specific location in the dataset. An example is as follows

df.insert(1, "size", [2000, 3000, 4000])

output

         species  size  population
panda       bear  2000        1864
polar       bear  3000       22000
koala  marsupial  4000       80000

It can be seen that in the DataFrame dataset, the column index also starts from 0

assign() method

The assign() method can be used to add new columns to the dataset, as shown in the following example

df.assign(size_1=lambda x: x.population * 9 / 5 + 32)

output

         species  population    size_1
panda       bear        1864    3387.2
polar       bear       22000   39632.0
koala  marsupial       80000  144032.0

As can be seen from the above example, we add a new column named 'size' to the dataset through a lambda anonymous function_ 1 ', of course, we can also create more than one column through the assign() method

df.assign(size_1 = lambda x: x.population * 9 / 5 + 32,
          size_2 = lambda x: x.population * 8 / 5 + 10)

output

         species  population    size_1    size_2
panda       bear        1864    3387.2    2992.4
polar       bear       22000   39632.0   35210.0
koala  marsupial       80000  144032.0  128010.0

eval() method

The eval() method is mainly used to perform operations represented by strings, such as

df.eval("size_3 = size_1 + size_2")

output

         species  population    size_1    size_2    size_3
panda       bear        1864    3387.2    2992.4    6379.6
polar       bear       22000   39632.0   35210.0   74842.0
koala  marsupial       80000  144032.0  128010.0  272042.0

Of course, we can also perform multiple operations on at the same time

df = df.eval('''
size_3 = size_1 + size_2
size_4 = size_1 - size_2
''')

output

         species  population    size_1    size_2    size_3   size_4
panda       bear        1864    3387.2    2992.4    6379.6    394.8
polar       bear       22000   39632.0   35210.0   74842.0   4422.0
koala  marsupial       80000  144032.0  128010.0  272042.0  16022.0

pop() method

The pop() method is mainly used to delete a specific column of data in the dataset

df.pop("size_3")

output

panda      6379.6
polar     74842.0
koala    272042.0
Name: size_3, dtype: float64

The original data set does not have this' size '_ S the data for this example

truncate() method

The truncate() method mainly filters the data of the specified row according to the row index. An example is as follows

df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
                   'B': ['f', 'g', 'h', 'i', 'j'],
                   'C': ['k', 'l', 'm', 'n', 'o']},
                  index=[1, 2, 3, 4, 5])

output

   A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o

Let's try it with the truncate() method

df.truncate(before=2, after=4)

output

   A  B  C
2  b  g  l
3  c  h  m
4  d  i  n

We can see that the parameters before and after exist in the truncate() method. The purpose is to exclude the data before row index 2 and after row index 4 and filter out the remaining data

count() method

The count() method is mainly used to calculate the number of non null values in a column. An example is as follows

df = pd.DataFrame({"Name": ["John", "Myla", "Lewis", "John", "John"],
                   "Age": [24., np.nan, 25, 33, 26],
                   "Single": [True, True, np.nan, True, False]})

output

    Name   Age Single
0   John  24.0   True
1   Myla   NaN   True
2  Lewis  25.0    NaN
3   John  33.0   True
4   John  26.0  False

We use the count() method to calculate the number of non null values in the dataset

df.count()

output

Name      5
Age       4
Single    4
dtype: int64

add_prefix() method / add_suffix() method

add_prefix() method and add_suffix() method will add suffix and prefix to column name and row index respectively. For Series() dataset, prefix and suffix are added at row index, while for DataFrame() dataset, prefix and suffix are added at column index, as shown in the following example

s = pd.Series([1, 2, 3, 4])

output

0    1
1    2
2    3
3    4
dtype: int64

We use add_prefix() method and add_ The suffix () method is on the Series() dataset

s.add_prefix('row_')

output

row_0    1
row_1    2
row_2    3
row_3    4
dtype: int64

Another example

s.add_suffix('_row')

output

0_row    1
1_row    2
2_row    3
3_row    4
dtype: int64

For datasets in the form of DataFrame(), add_prefix() method and add_ The suffix () method adds a prefix and suffix to the column index

df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})

output

   A  B
0  1  3
1  2  4
2  3  5
3  4  6

Examples are as follows

df.add_prefix("column_")

output

   column_A  column_B
0         1         3
1         2         4
2         3         5
3         4         6

Another example

df.add_suffix("_column")

output

   A_column  B_column
0         1         3
1         2         4
2         3         5
3         4         6

clip() method

The clip() method mainly changes the value in the data set by setting the threshold. When the value exceeds the threshold, it will make corresponding adjustments

data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)

output

df.clip(lower = -4, upper = 4)

output

   col_0  col_1
0      4     -2
1     -3     -4
2      0      4
3     -1      4
4      4     -4

We can see that the parameters lower and upper represent the upper and lower limits of the threshold respectively, and the values exceeding the upper and lower limits in the data set will be replaced.

filter() method

The filter() method in pandas is used to filter out a specific range of data. An example is as follows

df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
                  index=['A', 'B', 'C', 'D'],
                  columns=['one', 'two', 'three'])

output

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9
D   10   11     12

We use the filter() method to filter the data

df.filter(items=['one', 'three'])

output

   one  three
A    1      3
B    4      6
C    7      9
D   10     12

We can also use regular expressions to filter data

df.filter(regex='e$', axis=1)

output

   one  three
A    1      3
B    4      6
C    7      9
D   10     12

Of course, the axis parameter is used to adjust and filter the data in the row direction or column direction

df.filter(like='B', axis=0)

output

   one  two  three
B    4    5      6

first() method

When the row index in the dataset is a date, you can filter the data of the first few rows through this method

index_1 = pd.date_range('2021-11-11', periods=5, freq='2D')
ts = pd.DataFrame({'A': [1, 2, 3, 4, 5]}, index=index_1)
ts

output

            A
2021-11-11  1
2021-11-13  2
2021-11-15  3
2021-11-17  4
2021-11-19  5

We use the first() method to perform some operations, such as filtering out the data of the previous 3 days

ts.first('3D')

output

            A
2021-11-11  1
2021-11-13  2

Technical exchange

Welcome to reprint, collect, gain, praise and support!

At present, a technical exchange group has been opened, with more than 2000 group friends. The best way to add notes is: source + Interest direction, which is convenient to find like-minded friends

  • Method ① send the following pictures to wechat, long press identification, and the background replies: add group;
  • Mode ②. Add micro signal: dkl88191, remarks: from CSDN
  • WeChat search official account: Python learning and data mining, background reply: add group

Tags: Python function pandas

Posted on Thu, 25 Nov 2021 18:57:26 -0500 by bubbasheeko