# Love, love, 20 Python functions easy to use

Hello everyone, today I share 20 Python functions that are essential to daily work. These functions are not seen much, but they are very convenient to use. They can greatly improve work efficiency. The content is long, welcome to collect and learn, like praise and support, and there is a technical exchange group at the end of the article. Welcome to join.

### isin() method

The isin() method is mainly used to confirm whether the values in the data set are included in the given list

```df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
index=['A', 'B', 'C', 'D'],
columns=['one', 'two', 'three'])
df.isin([3, 5, 12])
```

output

```     one    two  three
A  False  False   True
B  False   True  False
C  False  False  False
D  False  False   True
```

If the value is included in the list, that is, 3, 5 and 12, it returns True, otherwise it returns False

### df.plot.area() method

Now let's talk about how to draw a chart through a line of code in Pandas, and draw all columns through an area chart

```df = pd.DataFrame({
'sales': [30, 20, 38, 95, 106, 65],
'signups': [7, 9, 6, 12, 18, 13],
'visits': [20, 42, 28, 62, 81, 50],
}, index=pd.date_range(start='2021/01/01', end='2021/07/01', freq='M'))

ax = df.plot.area(figsize = (10, 5))
```

output ### df.plot.bar() method

Let's take a look at how to draw a histogram with one line of code

```df = pd.DataFrame({'label':['A', 'B', 'C', 'D'], 'values':[10, 30, 50, 70]})
ax = df.plot.bar(x='label', y='values', rot=20)
```

output Of course, we can also draw histogram according to different categories

```age = [0.1, 17.5, 40, 48, 52, 69, 88]
weight = [2, 8, 70, 1.5, 25, 12, 28]
index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
df = pd.DataFrame({'age': age, 'weight': weight}, index=index)
ax = df.plot.bar(rot=0)
```

output Of course, we can also draw the chart horizontally

```ax = df.plot.barh(rot=0)
```

output ### df.plot.box() method

Let's take a look at the specific drawing of the box diagram, which is realized by a line of code of pandas

```data = np.random.randn(25, 3)
df = pd.DataFrame(data, columns=list('ABC'))
ax = df.plot.box()
```

output ### df.plot.pie() method

Next is the drawing of pie chart

```df = pd.DataFrame({'mass': [1.33, 4.87 , 5.97],
index=['Mercury', 'Venus', 'Earth'])
plot = df.plot.pie(y='mass', figsize=(8, 8))
```

output In addition, there are broken line diagram, histogram, scatter diagram, etc. the steps and methods are similar to the above skills. If you are interested, you can try it by yourself.

### items() method

The items() method in pandas can be used to traverse each column in the dataset and return the column name and the content in each column at the same time. It is in the form of tuples, as shown in the following example

```df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
'population': [1864, 22000, 80000]},
index=['panda', 'polar', 'koala'])
df
```

output

```         species  population
panda       bear        1864
polar       bear       22000
koala  marsupial       80000
```

Then we use the items() method

```for label, content in df.items():
print(f'label: {label}')
print(f'content: {content}', sep='\n')
print("=" * 50)
```

output

```label: species
content: panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
==================================================
label: population
content: panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
==================================================
```

The column names and corresponding contents of the 'categories' and' population 'columns are printed successively

### iterrows() method

For the iterrows() method, its function is to traverse each row in the dataset and return the index of each row and the content of each row with column name. An example is as follows

```for label, content in df.iterrows():
print(f'label: {label}')
print(f'content: {content}', sep='\n')
print("=" * 50)
```

output

```label: panda
content: species       bear
population    1864
Name: panda, dtype: object
==================================================
label: polar
content: species        bear
population    22000
Name: polar, dtype: object
==================================================
label: koala
content: species       marsupial
population        80000
Name: koala, dtype: object
==================================================
```

### insert() method

The insert() method is mainly used to insert data at a specific location in the dataset. An example is as follows

```df.insert(1, "size", [2000, 3000, 4000])
```

output

```         species  size  population
panda       bear  2000        1864
polar       bear  3000       22000
koala  marsupial  4000       80000
```

It can be seen that in the DataFrame dataset, the column index also starts from 0

### assign() method

The assign() method can be used to add new columns to the dataset, as shown in the following example

```df.assign(size_1=lambda x: x.population * 9 / 5 + 32)
```

output

```         species  population    size_1
panda       bear        1864    3387.2
polar       bear       22000   39632.0
koala  marsupial       80000  144032.0
```

As can be seen from the above example, we add a new column named 'size' to the dataset through a lambda anonymous function_ 1 ', of course, we can also create more than one column through the assign() method

```df.assign(size_1 = lambda x: x.population * 9 / 5 + 32,
size_2 = lambda x: x.population * 8 / 5 + 10)
```

output

```         species  population    size_1    size_2
panda       bear        1864    3387.2    2992.4
polar       bear       22000   39632.0   35210.0
koala  marsupial       80000  144032.0  128010.0
```

### eval() method

The eval() method is mainly used to perform operations represented by strings, such as

```df.eval("size_3 = size_1 + size_2")
```

output

```         species  population    size_1    size_2    size_3
panda       bear        1864    3387.2    2992.4    6379.6
polar       bear       22000   39632.0   35210.0   74842.0
koala  marsupial       80000  144032.0  128010.0  272042.0
```

Of course, we can also perform multiple operations on at the same time

```df = df.eval('''
size_3 = size_1 + size_2
size_4 = size_1 - size_2
''')
```

output

```         species  population    size_1    size_2    size_3   size_4
panda       bear        1864    3387.2    2992.4    6379.6    394.8
polar       bear       22000   39632.0   35210.0   74842.0   4422.0
koala  marsupial       80000  144032.0  128010.0  272042.0  16022.0
```

### pop() method

The pop() method is mainly used to delete a specific column of data in the dataset

```df.pop("size_3")

```

output

```panda      6379.6
polar     74842.0
koala    272042.0
Name: size_3, dtype: float64
```

The original data set does not have this' size '_ S the data for this example

### truncate() method

The truncate() method mainly filters the data of the specified row according to the row index. An example is as follows

```df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
'B': ['f', 'g', 'h', 'i', 'j'],
'C': ['k', 'l', 'm', 'n', 'o']},
index=[1, 2, 3, 4, 5])
```

output

```   A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o
```

Let's try it with the truncate() method

```df.truncate(before=2, after=4)

```

output

```   A  B  C
2  b  g  l
3  c  h  m
4  d  i  n
```

We can see that the parameters before and after exist in the truncate() method. The purpose is to exclude the data before row index 2 and after row index 4 and filter out the remaining data

### count() method

The count() method is mainly used to calculate the number of non null values in a column. An example is as follows

```df = pd.DataFrame({"Name": ["John", "Myla", "Lewis", "John", "John"],
"Age": [24., np.nan, 25, 33, 26],
"Single": [True, True, np.nan, True, False]})
```

output

```    Name   Age Single
0   John  24.0   True
1   Myla   NaN   True
2  Lewis  25.0    NaN
3   John  33.0   True
4   John  26.0  False
```

We use the count() method to calculate the number of non null values in the dataset

```df.count()
```

output

```Name      5
Age       4
Single    4
dtype: int64
```

add_prefix() method and add_suffix() method will add suffix and prefix to column name and row index respectively. For Series() dataset, prefix and suffix are added at row index, while for DataFrame() dataset, prefix and suffix are added at column index, as shown in the following example

```s = pd.Series([1, 2, 3, 4])
```

output

```0    1
1    2
2    3
3    4
dtype: int64
```

We use add_prefix() method and add_ The suffix () method is on the Series() dataset

```s.add_prefix('row_')
```

output

```row_0    1
row_1    2
row_2    3
row_3    4
dtype: int64
```

Another example

```s.add_suffix('_row')
```

output

```0_row    1
1_row    2
2_row    3
3_row    4
dtype: int64
```

For datasets in the form of DataFrame(), add_prefix() method and add_ The suffix () method adds a prefix and suffix to the column index

```df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
```

output

```   A  B
0  1  3
1  2  4
2  3  5
3  4  6
```

Examples are as follows

```df.add_prefix("column_")

```

output

```   column_A  column_B
0         1         3
1         2         4
2         3         5
3         4         6
```

Another example

```df.add_suffix("_column")
```

output

```   A_column  B_column
0         1         3
1         2         4
2         3         5
3         4         6
```

### clip() method

The clip() method mainly changes the value in the data set by setting the threshold. When the value exceeds the threshold, it will make corresponding adjustments

```data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)
```

output

```df.clip(lower = -4, upper = 4)
```

output

```   col_0  col_1
0      4     -2
1     -3     -4
2      0      4
3     -1      4
4      4     -4
```

We can see that the parameters lower and upper represent the upper and lower limits of the threshold respectively, and the values exceeding the upper and lower limits in the data set will be replaced.

### filter() method

The filter() method in pandas is used to filter out a specific range of data. An example is as follows

```df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
index=['A', 'B', 'C', 'D'],
columns=['one', 'two', 'three'])
```

output

```   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9
D   10   11     12
```

We use the filter() method to filter the data

```df.filter(items=['one', 'three'])
```

output

```   one  three
A    1      3
B    4      6
C    7      9
D   10     12
```

We can also use regular expressions to filter data

```df.filter(regex='e\$', axis=1)
```

output

```   one  three
A    1      3
B    4      6
C    7      9
D   10     12
```

Of course, the axis parameter is used to adjust and filter the data in the row direction or column direction

```df.filter(like='B', axis=0)
```

output

```   one  two  three
B    4    5      6
```

### first() method

When the row index in the dataset is a date, you can filter the data of the first few rows through this method

```index_1 = pd.date_range('2021-11-11', periods=5, freq='2D')
ts = pd.DataFrame({'A': [1, 2, 3, 4, 5]}, index=index_1)
ts
```

output

```            A
2021-11-11  1
2021-11-13  2
2021-11-15  3
2021-11-17  4
2021-11-19  5
```

We use the first() method to perform some operations, such as filtering out the data of the previous 3 days

```ts.first('3D')
```

output

```            A
2021-11-11  1
2021-11-13  2
```

## Technical exchange

Welcome to reprint, collect, gain, praise and support! At present, a technical exchange group has been opened, with more than 2000 group friends. The best way to add notes is: source + Interest direction, which is convenient to find like-minded friends

• Method ① send the following pictures to wechat, long press identification, and the background replies: add group;
• Mode ②. Add micro signal: dkl88191, remarks: from CSDN
• WeChat search official account: Python learning and data mining, background reply: add group Tags: Python function pandas

Posted on Thu, 25 Nov 2021 18:57:26 -0500 by bubbasheeko