Python automated testing - use Pandas to process test data efficiently

1, Thinking

1. What is pandas?
  

  • Extremely powerful data analysis library

  • It can operate various data sets efficiently

    1. csv format file
    2. Excel file
    3. HTML file
    4. XML format file
    5. JSON format file
    6. Database operation

2. Classic interview questions

Through the interview questions, readers can think. If you encounter this question, how can you answer it?

2, Using pandas to manipulate Excel files

1. Installation

a. Install via Pypi

pip install pandas

b. Install through source code

git clone git://github.com/pydata/pandas.git

cd pandas

python setup.py install

2. Read data by column

Lemon in the case_ The contents of the cases.xlsx file are as follows:
  

import pandas as pd

# Read excel file

# Return a DataFrame object, multidimensional data structure

df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply')

print(df)

# 1. Read a column of data

# df["title"] returns a Series object to record the data in the title column

print(df["title"])

# The Series object can be converted to any sequence type and dict dictionary type

print(list(df['title']))    # Convert to list

# title is the property of the DataFrame object

print(list(df.title))    # Convert to list

print(tuple(df['title']))   # Convert to tuple

print(dict(df['title']))    # Convert to dictionary and key to numeric index

# 2. Read a cell data

# Excluding headers, specify column names and row indexes

print(df['title'][0])   # title column, excluding the first cell of the header

# 3. Read multi column data

print(df[["title", "actual"]])

3. Read data by line

import pandas as pd

# Read excel file

df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply')   # Return a DataFrame object, multidimensional data structure

print(df)

# 1. Read a row of data

# Excluding the header, the first index value is 0

# Get the first row of data and convert it into list, tuple and dict

print(list(df.iloc[0]))  # Convert to list

print(tuple(df.iloc[0]))  # Convert to tuple

print(dict(df.iloc[0]))  # Turn into a dictionary

print(dict(df.iloc[-1]))  # Negative indexes are also supported

# 2. Read a cell data

# Excluding header, specify row index and column index (or column name)

print(df.iloc[0]["l_data"])   # Specify row index and column name

print(df.iloc[0][2])    # Specify row and column indexes

# 3. Read multiple lines of data

print(df.iloc[0:3])

4.iloc and loc methods

import pandas as pd

  # Read excel file

  df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply')   # Return a DataFrame object, multidimensional data structure

  print(df)

  # 1.iloc method

  # iloc uses numeric indexes to read rows and columns

  # You can also use the iloc method to read a column

  print(df.iloc[:, 0])

  print(df.iloc[:, 1])

  print(df.iloc[:, -1])

  # Read multiple columns

  print(df.iloc[:, 0:3])

  # Read multiple rows and columns

  print(df.iloc[2:4, 1:4])

  print(df.iloc[[1, 3], [2, 4]])

  # 2.loc method

  # loc method, which is selected based on the tag name or index name

  print(df.loc[1:2, "title"])             # Multiple rows and one column

  print(df.loc[1:2, "title":"r_data"])    # Multi column and multi row

  # Select based on boolean type

  print(df["r_data"] > 5)  # A value greater than 5 in a column is True, otherwise it is False

  print(df.loc[df["r_data"] > 5])  # Put R_ If the data column is greater than 5, select the row in which it is located

  print(df.loc[df["r_data"] > 5, "r_data":"actual"])  # Put R_ Select data from the actual column

5. Read all data

import pandas as pd

# Read excel file

df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply')   # Return a DataFrame object, multidimensional data structure

print(df)

# The data read is the list type of nested list. This method is not recommended

print(df.values)

# List of nested Dictionaries

datas_list = []

for r_index in df.index:

datas_list.append(df.iloc[r_index].to_dict())

print(datas_list)

6. Write data

import pandas as pd

# Read excel file

df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply')   # Return a DataFrame object, multidimensional data structure

print(df)

df['result'][0] = 1000

print(df)

with pd.ExcelWriter('lemon_cases_new.xlsx') as writer:

df.to_excel(writer, sheet_name="New", index=False)

3, Use pandas to manipulate csv files

1. Read csv file

The contents of the data.log file in the case are as follows:

TestID,TestTime,Success

0,149,0

1,69,0

2,45,0

3,18,1

4,18,1
import pandas as pd

# Read csv file

# Method 1: use read_csv reading, columns are separated by commas by default (recommended method)

# a. First row column name information

csvframe = pd.read_csv('data.log')

# b. The first row has no column name information and is directly data

csvframe = pd.read_csv('data.log', header=None)

# c. The first row has no column name information. It is data directly, or you can specify a column name

csvframe = pd.read_csv('data.log', header=None, names=["Col1", "Col2", "Col3"])

# Method 2: read_table, you need to specify that the separator between columns is comma

csvframe = pd.read_table('data.log', sep=",")

2. Answer interview questions

i

mport pandas as pd

# 1. Read csv file

csvframe = pd.read_csv('data.log')

# 2. Select the row with Success 0

new_csvframe = csvframe.loc[csvframe["Success"] == 0]

result_csvframe = new_csvframe["TestTime"]

avg_result = round(sum(result_csvframe)/len(result_csvframe), 2)

print("TestTime The minimum value is:{}\nTestTime The maximum value is:{}\nTestTime The average value is:{}".

format(min(result_csvframe), max(result_csvframe), avg_result))

4, Summary

Pandas is widely used in the fields of data analysis and data visualization; It is very efficient in large-scale data and multi type data processing

There are also applications in the field of software testing. However, if you only use excel to store test data, you will feel like "killing a chicken is killing an ox with a knife" when using Pandas. It is recommended to use specific modules (such as openpyxl)

If you have any questions or want to discuss technology, etc.:

  • Software testing learning exchange group: 785128166
  • Software testing learning route + supporting resources, can pay attention to the official account: programmers two black, free access
  • Let's refuel together in the future!

Wonderful recommendation

Working in Ali for 6 years, the voice of a 29 year old female software testing engineer

Refuse the invitation of station B, from 3k a month to 47W a year. My experience is worth learning from every tester

Ali p8, the new comer of the company, read the APP and interface tests I did and gave me this document

Tags: Python Interview

Posted on Fri, 03 Sep 2021 17:15:37 -0400 by The Bat