Python common modules
openpyxl module is a Python library for reading and writing Excel 2010 documents. If you want to process Excel documents with earlier formats, you need to use additional libraries. openpyxl is a relatively comprehensive tool that can read and modify Excel documents at the same time.
5.1. Install openpyxl module
pip3 install openpyxl
Install directly in pycharm
5.2. Basic usage of openpyxl
To operate excel, you need to understand the basic concept of Excel. Columns in Excel are named after subtitles, and rows are named after numbers. For example, the coordinates of the first cell in the upper left corner are A1, the bottom is A2, and the right is B1.
There are three different levels of classes in openpyxl. Workbook is the abstraction of workbook, Worksheet is the abstraction of table, Cell is the abstraction of Cell. Each class contains many attributes and methods.
General scenarios for operating Excel:
To open or create an Excel, you need to create a Workbook object To get a table, you need to create a Workbook object, and then use the method of the object to get a Worksheet object If you want to get the data in the table, get the Worksheet object and then get the Cell object representing the Cell from it
5.2.1 Workbook object
A Workbook object represents an Excel document, so you should create a Workbook object before operating Excel. For creating a new Excel document, you can directly call the Workbook class. For an existing Excel document, you can use the load Workbook function of openpyxl module to read it. The function contains multiple parameters, but only the filename parameter is a required parameter. Filename is a file name or an open file object.
import openpyxl excel = openpyxl.Workbook() # Create local Workbook excel = openpyxl.load_workbook("abc.xlsx") # Load a workbook that already exists locally # Workbook needs to be saved after operation excel.save("hello.xlsx")
PS: Workbook is the same as load ﹣ workbook, and the returned workbook object is a workbook object.
The Workbook object provides many properties and methods, most of which are related to sheet. Some properties are as follows:
# attribute excel.active # Get the currently active Worksheet object excel.worksheets # Returns all Worksheet objects as a list excel.sheetnames # Get table name in workbook [name1, name2, name3] excel.encoding # Get the character set encoding of the document excel.properties # Get the metadata of the document, such as title, creator, creation date, etc # Set the currently active worksheet by index value excel.active = 0
# You can get the worksheet object by worksheet name excel_sheet = excel["Sheet1"]e
# Worksheet creation and deletion excel.remove(excel_sheet) # Delete a table with the parameter worksheet object # excel.remove_sheet() has been discarded sh excel.create_sheet() # Create an empty table with table name and index as parameters
5.2.2 Worksheet object
With the Worksheet object, we can get the attributes of the table, get the data in the cell, and modify the contents of the table. openpyxl provides a very flexible way to access the cells and data in the table. The commonly used Worksheet properties are as follows:
# attribute excel_sheet.title # Table title (readable and writable) excel_sheet.dimensions # The size of the table. Here, the size refers to the size of the table containing the data, that is, the coordinates of the upper left corner and the coordinates of the lower right corner excel_sheet.max_row # Maximum rows of table excel_sheet.min_row # Minimum rows of table excel_sheet.max_column # Maximum number of columns in the table excel_sheet.min_column # Minimum number of columns in the table excel_sheet.rows # Get cells by row (Cell object) - generator excel_sheet.columns # Get cells by column (Cell object) - generator excel_sheet.values # Get table contents (data) by row - generator
# Method excel_sheet.iter_rows() # Get all cells by row. The built-in attributes are (min row, Max row, min col, Max Col) excel_sheet.iter_cols() # Get all cells by column # Get by key value excel_sheet["A"] # Returns all cell cell objects in column A excel_sheet["1"] # Returns all cell objects in the first row excel_sheet["A1"] # Return the cell object
for row in excel2['abc'].iter_rows(min_row=2,max_row=4,min_col=2,max_col=4): print(row) (<Cell 'abc'.B2>, <Cell 'abc'.C2>, <Cell "abc".D2>) (<Cell 'abc'.B3>, <Cell 'abc'.C3>, <Cell "abc".D3>) (<Cell 'abc'.B4>, <Cell 'abc'.C4>, <Cell "abc".D4>)
Cell objects are relatively simple, and the commonly used attributes are as follows:
# attribute row # Cell row column # Column in which cell sits value # Cell value coordinate # Coordinates of cells
>>> excel2['abc'].cell(row=1,column=2).coordinate 'B1 >>> excel2['abc'].cell(row=1,column=2).value 'test' >>> excel2['abc'].cell(row=1,column=2).row 1 >>> excel2['abc'].cell(row=1,column=2).column 'B' ·