Python playing Excel: statistical functions

1, On statistical function

This is a student's three times examination result registration form. The actual goal is to find out the total score and average score. If you have other needs, you can refer to it.

import  pandas as pd
datas = pd.read_excel('students.xlsx',index_col='ID')
#It is convenient for data processing to separate the three transcripts
temp = datas[['test1','test2','test3']]
#Total score
datas['total']=temp.sum(axis=1)
#Average performance
datas['average']=temp.mean(axis=1)
#Save to Excel
datas.to_excel('students.xlsx')
print(datas)

There are many more statistical functions. Here, we just throw a brick to draw a jade. Other functions can refer to this method. Here, notice that axis=1, which means line by line operation, axis=0, which means column by column operation. Select according to the needs.

If I want to average all columns and add them to the last row of Excel, I can do this:

import  pandas as pd
import matplotlib.pyplot as plt
datas = pd.read_excel('students.xlsx')
temp = datas[['test1','test2','test3']]
datas['total']=temp.sum(axis=1)
datas['average']=temp.mean(axis=1)
#Average all columns and add to the last row of Excel
col_mean=datas[['test1','test2','test3','total','average']].mean(axis=0)
datas=datas.append(col_mean,ignore_index=True)
datas.to_excel('students.xlsx')
print(datas)

2, Data De duplication

Function 1: filter out duplicate data

  • Use the duplicated function to judge whether there are duplicates, return a set of bool values, and TRUE means duplicates
  • dupe=dupe[dupe==True] filters out duplicates, dupe.index Duplicate index
  • Using iloc functions datas.iloc[dupe.index ]Get duplicate column

Function 2: remove duplicate data

Using drop_ Duplicate. Remember to set the parameter: inplace=True

import  pandas as pd
datas = pd.read_excel('students.xlsx')
print('Source data:n',datas)
dupe = datas.duplicated(subset='name')
dupe=dupe[dupe==True]
print('Duplicate data:n',datas.iloc[dupe.index])
datas.drop_duplicates(subset='name',inplace=True)
print('Data after de duplication:n',datas)

*********************************************************************** 
  //Source data:
            name  test1  test2  test3  total    average
0   student_001     88     85     91    264  88.000000
1   student_003     59     56     77    192  64.000000
2   student_005     67     64     70    201  67.000000
3   student_007     78     75     81    234  78.000000
4   student_009     50     66     53    169  56.333333
5   student_011     90     87     84    261  87.000000
6   student_013     78     79     81    238  79.333333
7   student_015     76     75     79    230  76.666667
8   student_003     59     56     77    192  64.000000
9   student_005     67     64     70    201  67.000000
10  student_007     78     75     81    234  78.000000
//Duplicate data:
            name  test1  test2  test3  total  average
8   student_003     59     56     77    192     64.0
9   student_005     67     64     70    201     67.0
10  student_007     78     75     81    234     78.0
//Data after de duplication:
           name  test1  test2  test3  total    average
0  student_001     88     85     91    264  88.000000
1  student_003     59     56     77    192  64.000000
2  student_005     67     64     70    201  67.000000
3  student_007     78     75     81    234  78.000000
4  student_009     50     66     53    169  56.333333
5  student_011     90     87     84    261  87.000000
6  student_013     78     79     81    238  79.333333
7  student_015     76     75     79    230  76.666667

3, Rotate data sheet

I watched "houlang" in station B a few days ago. The video was really good. In my spare time, I grabbed all the bullet scenes in the video, and saved them in the Excel shown in the figure above after de duplication. The first line shows the subtitle, and the second line shows the number of times the subtitle appears in the video.

Requirement: rotate data sheet

import  pandas as pd
datas = pd.read_excel('<Data of the barrage of houlang.xlsx')
table=datas.transpose()
table.to_excel('<Data of the barrage of houlang.xlsx')

In fact, it's not hard to see that there are many useful functions in pandas. It's easy to play Excel after mastering them.

Don't panic. I have a set of learning materials, including 40 + E-books, 800 + teaching videos, involving Python foundation, reptile, framework, data analysis, machine learning, etc. I'm not afraid you won't learn! https://shimo.im/docs/JWCghr8prjCVCxxK/ Python learning materials

Pay attention to the official account [Python circle].

file

Tags: Programming Excel Python

Posted on Tue, 26 May 2020 23:02:08 -0400 by kjelle392