Preface
This article summarizes the third-party libraries and methods for working with Excel files using python.
Introduction to Common Libraries
1.xlrd
xlrd is a library that reads data and formatting information from Excel files and supports.xls and.xlsx files.
Address: http://xlrd.readthedocs.io/en/latest/
-
xlrd supports reading of.xls,.xlsx files
-
Save time and memory by setting the on_demand variable so that the open_workbook() function only loads those sheet s it needs (this method is not valid for.xlsx files).
-
The xlrd.Book object has an unload_sheet method that unloads the worksheet from memory, specified by the worksheet index or the worksheet name (this method is not valid for.xlsx files)
2.xlwt
xlwt is a library (such as.xls) for writing data and formatting information to old Excel files.
Address: https://xlwt.readthedocs.io/en/latest/
-
xlwt supports.xls file writing.
3.xlutils
xlutils is a library that handles Excel files and relies on xlrd and xlwt.
Address: http://xlutils.readthedocs.io/en/latest/
-
xlutils supports.xls files.
-
Supports Excel operations.
4.xlwings
xlwings is a library that allows you to call Python from Excel or Excel from python.
Address: http://docs.xlwings.org/en/stable/index.html
-
xlwings supports.xls read and.xlsx file read and write.
-
Supports Excel operations.
-
Support VBA.
4. Powerful converters can handle most data types, including numpy array s and pandas DataFrame s in both directions.
5.openpyxl
openpyxl is a library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
Address: https://openpyxl.readthedocs.io/en/stable/
-
openpyxl supports reading and writing of.xlsx files.
-
Supports Excel operations.
-
Loading large.xlsx files can use read_only mode.
-
Writing large.xlsx files can use write_only mode.
6.xlsxwriter
xlsxwriter is a library for creating Excel.xlsx files.
Address: https://xlsxwriter.readthedocs.io/
-
xlswriter supports writing.xlsx files.
-
Support VBA.
-
Use memory optimization mode when writing large.xlsx files.
7.win32com
The win32com inventory is located in pywin32 and is a library for reading, writing and processing Excel files.
Address: http://pythonexcels.com/python-excel-mini-cookbook/
-
win32com supports reading and writing of.xls,.xlsx files, and writing of.xlsx files.
-
Supports Excel operations.
8.DataNitro
DataNitro is a plug-in embedded in Excel.
Address: https://datanitro.com/docs/
-
DataNitro supports reading and writing of.xls,.xlsx files.
-
Supports Excel operations.
-
Support VBA.
-
charge
9.pandas
pandas implements data input and output by reading and writing Excel files
Address: http://pandas.pydata.org/
-
pandas supports reading and writing of.xls,.xlsx files.
-
Supports loading only a single workpage for each table.
Reminders and Notes:
-
xlutils only supports xls files, i.e. versions below 2003;
-
win32com and DataNitro only support windows systems;
-
After successful installation of xlwings, install pypiwin32 or pywin32 package if the running prompt error "ImportError: no module named Win32 api";
-
win32com is not a stand-alone extension library, but is integrated into other libraries, which can be used by installing pypiwin32 or pywin32 packages.
-
DataNitro is an Excel plug-in that needs to be downloaded from the official website for installation.
Basic functions:
Due to different design purposes, each module usually focuses on a certain aspect of functionality and has its own strengths.
1.xlwings
With VBA, Excel can be programmed, powerful data input and analysis capabilities, and rich interfaces. With pandas/numpy/matplotlib, Excel data processing can be handled easily.
2.openpyxl
Simple and easy to use, wide range of functions, cell format/picture/table/formula/filter/annotation/file protection and so on, the chart function is one of the highlights, but the disadvantage is not good enough support for VBA.
3.pandas
Data processing is the foundation of pandas, and Excel serves as a container for pandas input/output data.
4.win32com
As you can see from the naming, this is an extension that handles windows applications, and Excel is only a small part of what the library can do.The library also supports many operations of office.It is important to note that this library does not exist separately and can be obtained by installing pypiwin32 or pywin32.
5.xlsxwriter
It has rich features, supports pictures/tables/charts/filters/formats/formulas, etc. Similar to openpyxl, it also supports VBA file import, mini-graphics, etc. The disadvantage is that it cannot open/modify existing files, which means xlsxwriter needs to start from scratch.
6.DataNitro
Embedded in Excel as a plug-in, it completely replaces VBA and uses python scripts in Excel.Since it's called python in Excel, working with other python libraries is a trivial matter.However, this is a paid plugin...
7.xlutils
Based on xlrd/xlwt, the old python package is a pioneer in this field. Its functional features are regular and moderate. The major disadvantage is that it only supports xls files.
Read-Write Test
test case
-
Use case 1. Read the entire table of the.xls file (the table has five pages, each with an integer of 2000 rows and 1200 columns).
-
Use Case 2. Read the entire table of the.xlsx file (the table has five pages, each with an integer of 2000 rows and 1200 columns).
-
Use case 3. Read the entire table of the.xls file (the table has a page break and the page has an integer of 2000 rows and 1200 columns).
-
Use case 4. Read the entire table of the.xlsx file (the table has a page break and the page has an integer of 2000 rows and 1200 columns).
-
Use case 5. Write the entire table of the.xls file (the table has five pages, each with an integer of 2000 rows and 1200 columns).
-
Use case 6. Write the entire table of the.xlsx file (the table has five pages, each with an integer of 2000 rows and 1200 columns).
-
Use case 7. Write the entire table of the.xls file (the table has a page break and the page has an integer of 2000 rows and 1200 columns).
-
Use case 8. Write the entire table of the.xlsx file (the table has a page break and the page has an integer of 2000 rows and 1200 columns).
test result
Note:
-
xlwt and pandas write a maximum of 256 columns per workpage, so the test case was modified to an integer of 2000 rows and 256 columns per page.
-
xlutils are read and write dependent on xlrd and xlwt and are not tested separately.
-
openpyxl tests two modes: normal load write and read_only/write_only.
-
DataNitro is charged and relies on Excel. It will not be tested this time.
performance comparison
Considering read and write performance alone, win32com has the best performance, followed by xlwings.
While openpyxl is powerful for working with Excel, its read and write performance is too bad, especially when writing large tables, which takes up a lot of memory (runs out of my 4G memory). Turning on read_only and write_only modes greatly improves its performance, especially when it improves reading performance so much that it takes little time (0.01 seconds is a little exaggeration).Page, but it's loaded.
Pandas regards Excel as a container for data read and write, and serves its powerful data analysis services. Therefore, read and write performance is moderate, but it is best compatible with Excel files, supports read and write.xls,.xlsx files, and supports a single workpage in a read-only table.There are also xlrd libraries that support this feature, but xlrd only supports reading, does not support writing, and has poor performance. You need to work with xlutils for Excel operations and save data using xlwt, while xlwt can only write to.xls files (the other library that can write to.xls files is pandas, and these two Excel files can only be written to 256 files at most)Column, the rest of the libraries, as far as I know, can only write to.xlsx files, with average performance.
xlsxwriter has a single function and is typically used to create.xlsx files with moderate write performance.Win32com has the best read and write performance, but the inventory is in pywin32's library, which does not have perfect documentation and takes a little bit of effort to use.xlwings has the same read and write performance as win32com. Powerful converters handle most data types, including two-dimensional numpy array s and pandas DataFrame s, making it easy to do data analysis.
Taken together, xlwings performs best, as its name implies, xlwings - Make Excel Fly!
Convenience comparison
This test is currently only for reading and writing Excel files, not related to Excel operation. From the convenience of reading and writing alone, the performance of each library is indispensable. However, win32com and xlwings libraries can operate in open Excel files in real time while the program is running, to achieve the visualization of the process, followed by xlwings data knotThe structure converter allows it to quickly add a two-dimensional data structure to an Excel file without having to relocate rows and columns of the data in the Excel file, so xlwings still wins over the convenience of reading and writing.
Code Samples
Finally, some demonstration codes are attached, so you can experience the use of different modules on your own.
6.1 xlwings base code
import xlwings as xw #Connect to excel workbook = xw.Book(r'path/myexcel.xlsx')#Connect excel file #Connect to the specified cell data_range = workbook.sheets('Sheet1').range('A1') #Write data data_range.value = [1,2,3] #Preservation workbook.save()
6.2 xlsxwriter base code
import xlsxwriter as xw #New excel workbook = xw.Workbook('myexcel.xlsx') #New Workbook worksheet = workbook.add_worksheet() #Write data worksheet.write('A1',1) #Close Save workbook.close()
6.3 xlutils basic code import xlrd #read data
import xlwt #Write data import xlutils #Operation excel #----xlrd Library #Open excel file workbook = xlrd.open_workbook('myexcel.xls') #Get Form worksheet = workbook.sheet_by_index(0) #Read data data = worksheet.cell_value(0,0) #----xlwt Library #New excel wb = xlwt.Workbook() #Add Workbook sh = wb.add_sheet('Sheet1') #Write data sh.write(0,0,'data') #Save File wb.save('myexcel.xls') #----xlutils Library #Open excel file book = xlrd.open_workbook('myexcel.xls') #Copy a copy new_book = xlutils.copy(book) #Get Workbook worksheet = new_book.getsheet(0) #Write data worksheet.write(0,0,'new data') #Preservation new_book.save()
6.4 win32com base code
import win32com.client as wc #Start Excel Application excel_app = wc.Dispatch('Excel.Application') #Connect excel workbook = excel_app.Workbooks.Open(r'e:/myexcel.xlsx' ) #Write data workbook.Worksheets('Sheet1').Cells(1,1).Value = 'data' #Close and Save workbook.SaveAs('newexcel.xlsx') excel_app.Application.Quit()
6.5 openpyxl base code
import openpyxl #New File workbook = openpyxl.Workbook() #Write to file sheet = workbook.activesheet['A1']='data' #Save File workbook.save('test.xlsx')
6.6 DataNitro Basic Code
#Single cell assignment Cell('A1').value = 'data' #Cell Area Assignment CellRange('A1:B2').value = 'data'
openpyxl uses
1. Create an excel file and write to different classes
# -*- coding: utf-8 -*- from openpyxl import Workbook import datetime import time #Create File Object wb = Workbook() #Get the first sheet ws = wb.active #In row 1, column A, write the number 23333 ws['A1'] = 23333 #Write in Chinese (unicode Chinese is also possible) ws['B1'] = "Hello!"+"Welcome openpyxl" #On the next line, write multiple cells ws.append([1, 2, 3]) #Write a current time ws['A2'] = datetime.datetime.now() #Write a custom time format ws['A3'] = time.strftime('%Y%m%d%H%M%S', time.localtime()).format(y='year', m='month', d='day', h='time', f='branch', s='second') #Save file, note file overwrite wb.save("test.xlsx") #Close Stream wb.close()
2. Create sheet s
from openpyxl import Workbook wb = Workbook() #Create a sheet ws1 = wb.create_sheet("sheet1") #Set the name of a sheet ws1.title = "sheet1 Title" #Set the insertion position of the sheet. Insert behind by default ws2 = wb.create_sheet("Mysheet", 0) ws2.title = "Mysheet" #Set the background color of the sheet's label ws1.sheet_properties.tabColor = "1072BA" #Get a sheet object print(wb["sheet1 Title"]) print(wb["Mysheet"]) #Get the names of all sheets, traverse sheet names print(wb.sheetnames) for i in wb.sheetnames: print(i) print("*"*50) for sheet in wb: print(sheet.title) #Assign cell A1 in sheet 1 to zeke wb["sheet1 Title" ]["A1"] = "zeke" #Copy a sheet source = wb["sheet1 Title"] target = wb.copy_worksheet(source) wb.save("test2.xlsx") wb.close()
3. Action Cells
# -*- coding: utf-8 -*- from openpyxl import Workbook wb = Workbook() #Create a sheet ws1 = wb.create_sheet("Sheet1") #Assign A1 cell to 123.11 ws1["A1"] = 123.11 #Assign B2 cells to Hello ws1["B2"] = "Hello" #Assign 10 to the cell in row 4, column 2 temp = ws1.cell(row=4, column=2, value=10) print(ws1["A1"].value) print(ws1["B2"].value) print(temp.value) wb.save("test3.xlsx") wb.close()
4. Manipulate existing files
# -*- coding: utf-8 -*- from openpyxl import Workbook from openpyxl import load_workbook #Open test5.xlsx file wb = load_workbook('test5.xlsx') #Guess Format Type wb.guess_types = True ws = wb.active ws["A1"] = "12%" print(ws["A1"].value) #Note that if the original file has some pictures or icons, they may be lost when saved wb.save("test5.xlsx") wb.close()
5. Operating batch cells
# -*- coding: utf-8 -*- from openpyxl import Workbook wb = Workbook() #Create a sheet ws1 = wb.create_sheet("Sheet") ws1["A1"] = 1 ws1["A2"] = 2 ws1["A3"] = 3 ws1["B1"] = 4 ws1["B2"] = 5 ws1["B3"] = 6 ws1["C1"] = 7 ws1["C2"] = 8 ws1["C3"] = 9 #Operation Single Column print(ws1["A"]) for cell in ws1["A"]: print(cell.value) #Get each value from column A to column C print(ws1["A:C"]) for column in ws1["A:C"]: for cell in column: print(cell.value) #Get each value from line 1 to line 3 row_range = ws1[1:3] print(row_range) for row in row_range: for cell in row: print(cell.value) print("*"*50) #From row 1 to row 3, from column 1 to column 3 for row in ws1.iter_rows(min_row=1, min_col=1, max_col=3, max_row=3): for cell in row: print(cell.value) #Get all rows print(ws1.rows) for row in ws1.rows: print(row) print("*"*50) #Get all columns print(ws1.columns) for col in ws1.columns: print(col) wb.save("test4.xlsx") wb.close()
6. Get all row (column) objects:
# coding=utf-8 from openpyxl import Workbook from openpyxl import load_workbook #Open test5.xlsx file wb = load_workbook('test5.xlsx') ws = wb.active rows = [] for row in ws.iter_rows(): rows.append(row) #All rows print(rows) #Get the first row print(rows[0]) #Get the cell object for the first row and column print(rows[0][0]) #Gets the value of the cell object in the first row and column print(rows[0][0].value) #Get the last line print rows[-1] print(rows[len(rows) - 1]) #Get the cell object for the last row and column print(rows[len(rows) - 1][len(rows[0]) - 1]) #Gets the value of the cell object in the last row and column print(rows[len(rows) - 1][len(rows[0]) - 1].value) cols = [] for col in ws.iter_cols(): cols.append(col) #All columns print(cols) #Get the first column print(cols[0]) #Get the cell object for the first row of the first column print(cols[0][0]) #Get the value of the first row of the first column print(cols[0][0].value) print("*" * 30) #Get the last column print(cols[len(cols) - 1] ) #Get the cell object for the last row of the last column print(cols[len(cols) - 1][len(cols[0]) - 1]) #Gets the value of the cell object in the last row of the last column print(cols[len(cols) - 1][len(cols[0]) - 1].value) wb.close()
Recommended reading (click to skip reading)
1. SpringBoot Content Aggregation
2. Interview Question Content Aggregation
3. Design Mode Content Aggregation