Character encoding
Refer to Baidu Encyclopedia:
https://baike.baidu.com/item/%E5%AD%97%E7%AC%A6%E7%BC%96%E7%A0%81/8446880?fr=aladdin
File operation
''' 1 What is a file The file is the operating system for the user/An abstract unit provided by an application to operate a hard disk 2 Why use files user/The file read and write operation of the application will be converted from the operating system to the specific hard disk operation So users/Applications can be easily read\Write files to indirectly control the access operation of complex hard disk Realize the permanent saving of data in memory to hard disk user=input('>>>>: ') #user = "Xiao Wang" 3 How to use files Basic steps of file operation: f=open(...) #Open the file and get a file object f,f is equivalent to a remote control, which can send instructions to the operating system f.read() # Read and write files and send instructions to the operating system to read and write files f.close() # Close the file and recycle the resources of the operating system Context management: with open(...) as f: pass ''' # Absolute path f = open(r'/Users/jaidun/data/python_space/a.txt', encoding='utf-8') print(f.read()) f.close() # Relative path # Read current file f = open(r'a.txt', encoding='utf-8') # .. / relative path f = open(r'../a.txt', encoding='utf-8') print(f.read()) f.close() # # Stress: be sure to close the open file before the end of the program # May forget # # f.close() # print(f.read()) # Context management: with # with open(...) as f: # pass # Files can be closed automatically with open(r'a.txt', encoding='utf-8')as f: print(f.read()) # print(f.read())
File other common operations
Open mode of a file
r: Read only mode (default)
w: Write only mode
a: Append write only mode
2. The way to control the reading and writing of file units (must be used in conjunction with r\w\a)
t: text mode (default). The encoding parameter must be specified
Advantages: the operating system will decode the binary numbers in the hard disk into unicode and then return
Emphasis: valid only for text files
b: In binary mode, the encoding parameter must not be specified
Advantages: it can be transmitted directly through the network
read-only
#I r: read only mode (default) # 1. When the file is not saved, an error will be reported # 2 when the file exists, the file pointer points to the beginning of the file with open('a.txt',mode='rt',encoding='utf-8')as f: res1=f.read() print('111>>>',res1) # # I finished reading it for the first time res2 = f.read() print('222>>>', res2) # Determine whether the rt module is readable print(f.readable()) # # Determine that the rt mode is not writable print(f.writable()) # The read file is too large. It's not good print(f.readline(),end='') # # #There is a newline character in the file. print has its own newline character \ n print(f.readline()) # for loops through file objects for line in f: print(line,end='') L = [] for line in f: L.append(line) print(L) # One line of code print(f.readlines())
Write only
# II wt: write only mode # 1. When the file is not saved, create a new empty document (create if there is no one) # with open('b.txt',mode='wt',encoding='utf-8')as f: # pass # 2. When the file exists, the file content will be cleared, and the file pointer will run to the beginning of the file (if any, it will be cleared) with open('b.txt',mode='wt',encoding='utf-8')as f: # Empty all # Write what we want below # Can't read print(f.readable()) # Can write print(f.writable()) # f.read() # Remember to write line breaks # Overwrite the previous content # f. Write (string) f.write('Xiao Wang\n') f.write('king\n') f.write('Xiao Dai\n') # Write multiple lines at once f.write('111\n2222\n3333\n') # Write the contents of the list line by line info = ['sea\n','sea\n','sea\n'] for line in info: f.write(line) # One line of code # Writelines (list) f.writelines(info)
append mode
# Three at: append write mode only # When the file is not saved, create a new empty document, and the file pointer runs to the end of the file (the beginning is the end) # with open('c.txt',mode='at',encoding='utf-8')as f: # pass # 2 when the file exists, the file pointer runs to the end of the file with open('c.txt',mode='at',encoding='utf-8')as f: # Can't read print(f.readable()) # Can write print(f.writable()) f.write('Miss Wang\n') f.write('Miss Dai\n') f.write('Miss Zhou\n') with open('c.txt',mode='at',encoding='utf-8')as f: # Can't read print(f.readable()) # Can write print(f.writable()) f.write('Miss Dai\n') f.write('Miss Yang\n') f.write('Teacher Fu\n')
The difference between w mode and a mode
wt mode
Write continuously when the file is open but not closed,
The next write must continue based on the position of the last write pointer
The a mode is closed. The next opening is written at the end of the file, so the previous content will not be overwritten
Difference between t and b
2. The way to control the reading and writing of file units (must be used in conjunction with r\w\a)
t: text mode (default). The encoding parameter must be specified
Advantages: the operating system will decode the binary numbers in the hard disk into unicode and then return
Emphasis: valid only for text files
b: In binary mode, the encoding parameter must not be specified
Advantages: direct network transmission
# Operation t mode limitations for text files only # Binary file b mode # Pictures and videos with open('1.jpeg', mode='rb', )as f: data = f.read() print(data) print(type(data)) with open('2.jpeg', mode='wb')as f1: f1.write(data) # Using b mode, you can also operate on text files, but you need to decode them # decode binary into characters # encode characters into binary # Convert to characters when decoding and reading with open('b pattern.txt', mode='rb')as f: data = f.read() print(data) print(data.decode('utf-8')) # When encoding, convert characters into binary to write with open('wb pattern.txt', mode='wb')as f: f.write('Xiao Hong\n'.encode('utf-8')) f.write('Xiao Wang\n'.encode('utf-8')) f.write('Xiao Dai\n'.encode('utf-8'))
Readable and writable
r+t mode
1. When the file is not saved, an error will be reported
2 when the file exists, the file pointer points to the beginning of the file
3 more than one end write
with open('Readable and writable r+t pattern.txt', mode='r+t', encoding='utf-8')as f: print(f.readable()) print(f.writable()) msg = f.readline() print(msg) f.write('xxxxxxxx')
w+t mode
1. When the file is not saved, create a new empty document (create if there is no one)
2. When the file exists, the file content will be cleared, and the file pointer will run to the beginning of the file (if any, it will be cleared)
with open('Readable and writable w+t pattern.txt', mode='w+t', encoding='utf-8')as f: print(f.readable()) print(f.writable()) f.write('aaaaaaaa\n') f.write('bbbbbbbb\n') # Pointer moves seek (number of bytes moved, starting with 0) # Move 0 from the beginning f.seek(0, 0) print(f.readline()) f.write('cccccccc\n')
a+t mode
It is also written at the end of the second opening
with open('Readable and writable a+t pattern.txt',mode='a+t',encoding='utf-8')as f: print(f.readable()) print(f.writable()) f.write('aaaaaaaa\n') f.write('bbbbbbbb\n') # Pointer moves seek (number of bytes moved, starting with 0) # Move 0 from the beginning f.seek(0,0) print(f.readline()) f.write('cccccccc\n') # Pictures and videos don't work # r+b w+b a+b law and R + T W + T A + T
Pointer movement
seek() function
Pointer movement within file
Read (n) in t mode, n represents the number of characters
The movement of pointers in b-mode files is in bytes
Pointer operation
f. Seek (offset, where) has two parameters:
offset: represents the number of bytes that control pointer movement
Where: represents where to move by reference
Where = 0: the beginning of the reference file (default), special???, It can be used in t and b modes
Where = 1: refers to the current location, which must be used in mode b
Where = 2: refer to the end of the file, which must be used in mode b
# t mode # with open('pointer movement. txt',mode='rt',encoding='utf-8')as f: # print(f.read(1)) # print(f.read(1)) # print(f.read(1)) # b mode # with open('pointer movement. txt', mode='rb')as f: # # The two hexadecimals are 2 * * 4 and 2 * * 8 # # One third Chinese character # print(f.read(1).decode('utf-8')) # print(f.read(1).decode('utf-8')) # print(f.read(3).decode('utf-8')) # print(f.read(3).decode('utf-8')) # print(f.read(3).decode('utf-8')) # Pointer operation # f. Seek (offset, where) has two parameters: # offset: represents the number of bytes that control pointer movement # Where: represents where to move by reference # Where = 0: the beginning of the reference file (default), special???, It can be used in t and b modes # Where = 1: refers to the current location, which must be used in mode b # Where = 2: refer to the end of the file, which must be used in mode b # t mode is calculated according to characters with open('seek.txt',mode='rt',encoding='utf-8')as f: f.seek(2,0) print(f.read(1)) # The number of bytes moved in mode b is also the number of bytes read with open('seek.txt',mode='rb')as f: f.seek(5,0) print(f.read(3).decode('utf-8')) with open('seek.txt',mode='rb')as f: msg = f.read(5) # The number of bytes in which the current cursor is located print(f.tell()) f.seek(3,1) print(f.read(3).decode('utf-8')) with open('seek.txt',mode='rb')as f: f.seek(0,2) print(f.tell()) f.seek(-3,2) print(f.read(3).decode('utf-8'))
Detect what is added at the end of the file
with open('history.txt',mode='rb')as f: f.seek(0,2) while True: line=f.readline() # If it is 0 bytes, it means that the cursor is at the end # There is no operation to close this file if len(line) != 0: print(line.decode('utf-8'),end= '')
How to modify files
How to modify a file
1 read all the contents of the file from the hard disk into the memory
2 complete the modification in memory
3 overwrite the modified results in memory and write them back to the hard disk
with open('File modification.txt', mode='rt', encoding='utf-8')as f: all_data = f.read() # # The read data has been saved to all_ In the data variable with open('File modification.txt', mode='wt', encoding='utf-8')as f1: f1.write(all_data.replace('Xiao Wang', 'king'))
Method 2 of modifying documents
1 open the source file in read mode and a temporary file in write mode
2. After each content read from the source file is modified, it is written to the temporary file until the source file is read
3 delete the source file and rename the temporary file to the source file name
import os with open('Document revision II.txt',mode='rt',encoding='utf-8')as read_f,open('Temporary documents.txt',mode='wt',encoding='utf-8') as write_f: for line in read_f: write_f.write(line.replace('Xiao Dai','Xiao Yang')) # File modification II deletion os.remove('Document revision II.txt') # # Change the temporary file. txt to file modification 2 os.rename('Temporary documents.txt','Document revision II.txt')
Mode 1:
Advantage: there is always a copy of data on the hard disk during the process of file modification
Disadvantages: it takes up too much memory and is not suitable for large files
Mode 2:
Advantages: there is only one line of the source file in memory at the same time, which will not occupy too much memory
Disadvantages: in the process of file modification, the source file and temporary file will coexist, and there will be two copies of data on the hard disk at the same time, that is, too much hard disk will be occupied in the process of modification,
Avoid garbled code
# Heaven has endowed me with talents for eventual use # Japanese with open('text1.txt', mode='w', encoding='shift_jis')as f1: f1.write('livingまれながらにしてわたくしprivateはかならずhave toずやくserviceにたつstandつ') # with open('text1.txt', mode='r', encoding='utf-8')as f1: # a = f1.read() # print(a) # english with open('text2.txt', mode='w', encoding='shift_jis')as f1: f1.write('I believe') with open('text2.txt', mode='r', encoding='utf-8')as f2: a = f2.read() print(a)
!!! Summarize two very important points!!!
1. The core rule to ensure that there is no garbled code is what standard characters are encoded according to,
The standard here refers to character coding
2. All characters written in memory are unicode without discrimination. For example, when we open the editor,
If you enter a "you", we can't say that "you" is a Chinese character. At this time, it is just a symbol,
This symbol may be used in many countries, and the style of this word may be different according to the input method we use.
Only when we save to the hard disk or transmit based on the network,
To determine whether "you" is a Chinese character or a Japanese character, this is the process of converting unicode into other coding formats