1 Introduction
1.1 text and binary files
Text file:
Ordinary "character" text is stored, and python defaults to unicode character set (two bytes represent one character, up to 65536), which can be opened by Notepad program. However, documents edited by word software are not text files.
Binary file
Binary files store the data content in "bytes" and cannot be opened with Notepad. Special software must be used
decode. Common are: MP4 video files, MP3 audio files, JPG pictures, doc documents, etc
1.2 modules related to document operation
2 file operation
2.1 operation of text file
The steps are:
1 create file object
2 write data
3 close file object
2.1.1 create file object
open(file name[,Open mode])
The opening methods are as follows:
Creation of text file objects and binary file objects:
If we do not add the mode "b", we will create a text file object by default, and the basic unit of processing is "character". In case of binary mode "b", a binary file object is created, and the basic unit of processing is "byte".
2.1.2 write data
Relationship between common codes:
Chinese garbled Code:
The default code of windows operating system is GBK, and the default code of Linux operating system is UTF-8. When we use open(), we call the file opened by the operating system, and the default code is GBK
Solve the problem of Chinese garbled code by specifying the file code
f = open((r"b.txt","w",encoding="utf-8") f.write("Shang Xuetang\n Baizhan programmer\n") f.close()
write()/writelines() writes data
write(a): writes the string a to a file
writelines(b): writes a list of strings to a file without adding line breaks
s = ["Gao Qi\n","Gao Laosan\n","Gao Laosi\n"] f.writelines(s)
2.1.3 close() closes the file stream
Since the underlying file is controlled by the operating system, the file object we open must explicitly call the close() method to close the file object. When the close() method is called, the buffer data will be written to the file first (or the flush() method can be called directly), and then the file will be closed to release the file object.
In order to ensure that the open file object is closed normally, it is generally implemented in combination with the finally or with keyword of the exception mechanism
Open file objects can be closed in any case.
finally close: try: f = open(r'my.txt','a') str = 'xxx' f.write(str) except BaseException as e: print(e) finally: f.close()
The with keyword (context manager) can automatically manage context resources. No matter what reason you jump out of the with block, it can ensure that the file is closed correctly, and can automatically restore the scene when you enter the code block after the code block is executed
with close s = ["Gao Qi\n","Gao Laosan\n","Gao Laowu\n"] with open(r"d:\bb.txt","w") as f: f.writelines(s)
2.1.4 file reading
There are generally several methods:
1,read([size])
Read size characters from the file and return them as results. If there is no size parameter, the entire file is read.
Reading to the end of the file returns an empty string
2,readline()
Read a line and return it as a result. Reading to the end of the file returns an empty string
3,readlines()
In the text file, each line is stored in the list as a string, and the list is returned
[Operation] the file is small, and the file content is read into the program at one time with open(r"d:\bb.txt","r") as f: print(f.read()) [Operation] read a file by line with open(r"bb.txt","r") as f: while True: fragment = f.readline() if not fragment: #If it is empty, the loop will jump out break else: print(fragment,end="") [Operation] use the iterator (return one line at a time) to read the text file with open(r"d:\bb.txt","r") as f: for a in f: print(a,end="")
2.1.5 exercise: add a line number to the end of each line of the text file
My approach: with open(r'123.txt','r+') as f: txt = [] for i in f: txt.append(i) print(i,end='') for i in range(len(txt)): txt[i] = txt[i][:-1]+' #'+ str(i) +txt[i][-1:] with open(r'123.txt','w') as f: f.writelines(txt)
Reference answer:
with open(r'123.txt','r+') as f: lines = f.readlines() a = enumerate(lines,start = 1) lines = ['#' + str(index) + line.rstrip() +'\n' for index,line in a] #rstrip() removes spaces by default with open(r'123.txt','w') as f: f.writelines(lines)
2.2 binary file operation
f = open(r'a.jpg','wb') #Writable, overridden binary object f = open(r"a.jpg", 'ab') #Writable, append mode binary object f = open(r"a.jpg", 'rb') #Readable binary object
copy picture
with open('a.gif','rb') as f: with open('b.gif','wb') as w: for line in f.readlines(): w.write(line)
2.3 common attributes and methods of file objects
attribute | method |
---|---|
name | Returns the name of the file |
mode | Returns the open mode of the file |
closed | Returns True if the file is closed |
Method name | explain |
---|---|
read([size]) | Read the contents of size bytes or characters from the file and return. If [size] is omitted, it will be read to the end of the file, that is, all contents of the file will be read at one time |
readline() | Read a line from a text file |
readlines() | Each line in the text file is treated as an independent string object, and these objects are returned in the list |
write(str) | Writes the string str contents to a file |
writelines(s) | Writes the string list s to the file without adding line breaks |
seek(offset [,whence]) | Move the file pointer to the new position, and offset represents the offset of how many bytes relative to where; Offset: off is positive to the end direction and negative to the start direction; Different values of where represent different meanings: 0: calculate from the file header (default), 1: calculate from the current location, and 2: calculate from the end of the file |
tell() | Returns the current position of the file pointer |
truncate([size]) | No matter where the pointer is, only the first size bytes of the pointer are left, and the rest are deleted; If no size is passed in, all contents will be deleted when the pointer reaches the end of the file |
flush() | Writes the contents of the buffer to the file without closing the file |
close() | Write the contents of the buffer to the file, close the file at the same time, and release the resources related to the file object |
with open("e.txt","r",encoding="utf-8") as f: print("The file name is:{0}".format(f.name)) print(f.tell()) print("Read content:{0}".format(str(f.readline()))) print(f.tell()) f.seek(0,0) print("Read content:{0}".format(str(f.readline())))
Note: in a text file, if the file is not opened with the b mode option, it is only allowed to calculate the relative position from the file header. An exception will be thrown when calculating from the end of the file, such as seek(-2,2). The error content can't do non zero end relative searches. It is necessary to change 'r +' to rb
2.4 using pickle serialization
In Python, everything is an object, which is essentially a "memory block for storing data". Sometimes, we need to save the "memory block data" to the hard disk or transfer it to other computers through the network. At this time, just
Serialization and deserialization of objects is required. Object serialization mechanism is widely used in distributed and parallel systems.
Serialization refers to converting objects into "serialized" data form, storing them on hard disk or transmitting them to other places through network. Deserialization refers to the reverse process of converting the read "serialized data" into objects.
We can use the functions in the pickle module to realize serialization and deserialization
pickle.dump(obj, file) obj Is the object to be serialized, file Refers to stored files pickle.load(file) from file Read the data and deserialize it into an object
import pickle Serialize objects into a file with open(r'234.dat','wb') as f: a1 = 'x' a2 = 234 a3 = [20,30] pickle.dump(a1,f) pickle.dump(a2,f) pickle.dump(a3,f) Deserialize the obtained data into objects with open(r"d:\data.dat","rb") as f: a1 = pickle.load(f) a2 = pickle.load(f) a3 = pickle.load(f) print(a1) print(a2) print(a3)
2.5 csv files
read
import csv with open(r"d:\a.csv") as a: a_csv = csv.reader(a) #Create a csv object, which is a list of all data, one element per line headers = next(a_csv) #Gets a list object that contains information about the title row print(headers) for row in a_csv: #Cycle through lines print(row)
write in
import csv headers = ["Job number","full name","Age","address","a monthly salary"] rows = [("1001","Gao Qi",18,"Xisanqi No. 1 hospital","50000"),("1002","Gao Ba",19,"Xisanqi No. 1 hospital","30000")] with open(r"d:\b.csv","w") as b: b_csv = csv.writer(b) #Create csv object b_csv.writerow(headers) #Write one line (title) b_csv.writerows(rows) #Write multiple rows (data)
3 OS module
os module can help us operate the operating system directly. We can directly call the executable of the operating system
Files, commands, direct operation of files, directories, etc.
3.1 calling operating system commands
os.system can help us call system commands directly
[operation] os.system calls windows
import os os.system('notepad.exe')
[operation] os.system calls ping command in windows system
import os os.system("ping www.baidu.com)
[operation] run the installed wechat
import os os.startfile(r"C:\Program Files (x86)\Tencent\WeChat\WeChat.exe")
3.2 documents and directories
Common file operation methods of os module:
Operation method of directory:
3.3 os.path module
It provides directory related operations (path judgment, path segmentation, path connection, folder traversal)
method | describe |
---|---|
computer | $1600 |
mobile phone | $12 |
isabs(path) | Determine whether the path is an absolute path |
isdir(path) | Determine whether the path is a directory |
isfile(path) | Determine whether the path is a text |
exists(path) | Judge whether the file in the specified path exists |
getsize(filename) | Returns the size of the file |
abspath(path) | Return absolute path |
dirname§ | Returns the path to the directory |
getatime(filename) | Returns the last access time of the file |
getmtime(filename) | Returns the last modification time of the file |
walk(top,func,arg) | Traversing directories recursively |
join(path,*paths) | Connecting multiple path s |
split(path) | Split the path and return it as a list |
splitext(path) | Splits the file extension from the path |
3.4 shutil module (copy and compression)
The shutil module is provided in the python standard library. It is mainly used to copy, move and delete files and folders; You can also compress and decompress files and folders.
The os module provides general operations on directories or files. As a supplement, the shutil module provides operations such as moving, copying, compressing and decompressing, which are not provided by these os modules.
[operation] copy files
import shutil #copy file content shutil.copyfile('1.txt','1_copy.txt')
[operation] copy folder contents recursively (using shutil module)
import shutil #The music folder does not exist to use. shutil.copytree("film/study","music ",ignore=shutil.ignore_patterns("*.html","*.htm"))
[operation] compress all contents of the folder (using the shutil module)
import shutil import zipfile #Compress all contents under the "movies / learning" folder into the "music 2" folder to generate movie.zip #shutil.make_archive("music 2/movie","zip", "movie / learning") #Compress: compress the specified multiple files into a zip file #z = zipfile.ZipFile("a.zip","w") #z.write("1.txt") #z.write("2.txt") #z.close()
[operation] decompress the compressed package to the specified folder (using the shutil module)
import shutil import zipfile #Decompression: z2 = zipfile.ZipFile("a.zip","r") z2.extractall("d:/") #Set the decompression address z2.close()