48. Serialization module

1, Serialization module

Let's learn about serialization today. What is serialization? The essence of serialization is to convert a data structure (such as dictionary and list) into a special sequence (string or bytes), which is called serialization. Then some students will ask, why should we convert it into this sequence? Haven't we learned it?

dic = {'name': 'Guo Baoyuan'}
ret = str(dic)

First of all, you should see clearly! I'm talking about a special sequence, not the str string we often use.

Why should there be a serialization module?

Secondly, what's the use of transforming this data structure into this special sequence? This is the key to serialization. This special sequence is very useful. For example:

For example, your program needs a dictionary type of data to store your personal information:

 dic = {'username':'Baoyuan', 'password': 123,'login_status': True}

There are some places in your program that need to use this dic data, which will be used during login and registration. So we wrote this dic in the global before, but this is unreasonable. We should write this data into a local storage (we haven't learned the database yet) Store it in a file first, and then you can read the file and take out the information you need where the data is needed in the program. Is there any problem? You can't write the dictionary directly to the file. It must be converted into a string, and the dictionary you read is also a string dictionary (which can be displayed in code).

So what's the use of getting a str(dic)? It can't be converted into a dic at all (it's dangerous not to use eval) , so it's inconvenient. At this time, the serialization module works. If the string you write to the file is a special serialized string, it can be converted back to the original data structure when you read it from the file. This is very awesome.

The following is json serialization, which is different from pickle serialization.

json serialization can not only solve the problem of writing files, but also solve the problem of network transmission. For example, if you transmit a list data structure to another developer through the network, you can't transmit it directly. As we said before, you must use the bytes type if you want to transmit it. However, the bytes type can only be transformed with the string type, and it can't be transformed with other data structures Direct conversion, so you can only send the list - > string - > bytes, and then the other party receives it in decode() Decode into the original string. At this time, the string cannot be the str string we learned before, because it cannot be inversely solved. It must be a special string, which can be inversely solved into a list, so that developers can transfer data to each other through the network, not only between developers, but also through the network. Most of these data are this special string , after you receive it, you will reverse solve it into the data type you need.

Let's make a brief summary of this serialization module:

Serialization module is to convert a common data structure into a special sequence, and this special sequence can also be disassembled. Its main purpose: file reading and writing data, network transmission data.

There are three kinds of serialization modules in Python:

json module: (key points)

  1. A data conversion format followed by different languages, that is, a special string used by different languages. (for example, a Python list [1, 2, 3] is converted into a special string by json, and then encoded into bytes and sent to php developers. php developers can decode it into a special string, and then inverse it into the original array (list): [1, 2, 3])

  2. json serialization only supports some Python data structures: dict,list, tuple,str,int, float,True,False,None

pickle module:

  1. It can only be a data conversion format followed by Python language and can only be used in Python language.

  2. Supports all Python data types, including instantiated objects.

shelve module: operate a special string in a dictionary like manner (you can understand it after class).

Of course, the most used serialization module is the json module. Next, let's talk about the json and pickle modules.

1.1 json module

json module converts the qualified data structure into a special string, and can also be deserialized and restored back.

As I mentioned above, the serialization module has only two uses, either the intermediate link for network transmission or the intermediate link for file storage. Therefore, the json module has two pairs of four methods:

For network transmission: dumps, loads

For file write / read: dump, load


  1. Convert dictionary type to string type
import json
dic = {'k1':'v1','k2':'v2','k3':'v3'}
str_dic = json.dumps(dic)  #Serialization: converts a dictionary into a string
print(type(str_dic),str_dic)  #<class 'str'> {"k3": "v3", "k1": "v1", "k2": "v2"}
#Note that the string in the dictionary of the string type converted by json is represented by ""
  1. Converts a string type dictionary to a dictionary type
import json
dic2 = json.loads(str_dic)  #Deserialization: convert a dictionary in string format into a dictionary
#Note that the string in the dictionary of string type to be processed with json's loads function must be represented by ""
print(type(dic2),dic2)  #<class 'dict'> {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
  1. List types are also supported
list_dic = [1,['a','b','c'],3,{'k1':'v1','k2':'v2'}]
str_dic = json.dumps(list_dic) #You can also work with nested data types 
print(type(str_dic),str_dic) #<class 'str'> [1, ["a", "b", "c"], 3, {"k1": "v1", "k2": "v2"}]
list_dic2 = json.loads(str_dic)
print(type(list_dic2),list_dic2) #<class 'list'> [1, ['a', 'b', 'c'], 3, {'k1': 'v1', 'k2': 'v2'}]


  1. Convert the object into a string and write it to a file
import json
f = open('json_file.json','w')
dic = {'k1':'v1','k2':'v2','k3':'v3'}
json.dump(dic,f)  #The dump method receives a file handle and directly converts the dictionary into a json string and writes it to the file
# json files are also files, that is, files that specifically store json strings.
  1. Converts a dictionary of string types in a file to a dictionary
import json
f = open('json_file.json')
dic2 = json.load(f)  #The load method receives a file handle and directly converts the json string in the file into a data structure to return

Description of other parameters

Ensure_ascii: when it is True, all non ASCII characters are displayed as \ uXXXX sequence. Just set ensure_ascii to False during dump, and the Chinese stored in json can be displayed normally.

separators: separator, which is actually a tuple of (item_separator, dict_separator). The default is( 😅; This means that keys in the dictionary are separated by "," and keys and value s are separated by ":".

sort_keys: sort the data according to the value of keys. See the source code for the rest

json serialization stores multiple data into the same file

For json serialization, there is a problem in storing multiple data in one file. By default, one json file can only store one json data, but it can also be solved. For example:

about json Storing multiple data into a file
dic1 = {'name':'oldboy1'}
dic2 = {'name':'oldboy2'}
dic3 = {'name':'oldboy3'}
f = open('serialize',encoding='utf-8',mode='a')

f = open('serialize',encoding='utf-8')
ret = json.load(f)
ret1 = json.load(f)
ret2 = json.load(f)

The code above will report an error. Solution:

dic1 = {'name':'oldboy1'}
dic2 = {'name':'oldboy2'}
dic3 = {'name':'oldboy3'}
f = open('serialize',encoding='utf-8',mode='a')
str1 = json.dumps(dic1)
str2 = json.dumps(dic2)
str3 = json.dumps(dic3)
f = open('serialize',encoding='utf-8')
for line in f:

1.2 pickle module

The pickle module converts all Python data structures and objects into bytes, and then it can be deserialized and restored back.

Just now I mentioned the pickle module, which is a serialization module that can only be recognized by Python language. If the serialization module is compared to a communication language recognized all over the world, that is, the standard, json is like English, which is followed all over the world (Python, java, php, C, etc.). Pickle is Chinese, and only Chinese (Python) is the first communication language.

Since it is only used by Python language, it supports all data types of python, including instantiated objects we will talk about later. It can serialize all these data structures into special bytes, and then deserialize and restore them. It is almost the same as json in use. It is also a two-to-four method.

For network transmission: dumps, loads

For file write / read: dump, load


import pickle
dic = {'k1':'v1','k2':'v2','k3':'v3'}
str_dic = pickle.dumps(dic)
print(str_dic)  # bytes type

dic2 = pickle.loads(str_dic)
print(dic2)    #Dictionaries

# You can also serialize objects
import pickle
def func():
ret = pickle.dumps(func)
print(ret,type(ret))  # b'\x80\x03c__main__\nfunc\nq\x00.' <class 'bytes'>
f1 = pickle.loads(ret)  # f1 get the memory address of func function
f1()  # Execute func function


dic = {(1,2):'oldboy',1:True,'set':{1,2,3}}
f = open('pick serialize',mode='wb')
with open('pick serialize',mode='wb') as f1:

pickle serialization stores multiple data into a file

dic1 = {'name':'oldboy1'}
dic2 = {'name':'oldboy2'}
dic3 = {'name':'oldboy3'}
f = open('pick Multiple data',mode='wb')
f = open('pick Multiple data',mode='rb')
while True:
    except EOFError:

At this time, you are smart enough to say that since pickle is so powerful, why learn json? Here we want to explain that json is a data structure that can be recognized by all languages. If we convert a dictionary or sequence into a json file, java code or js code can also be used. However, if we use pickle for serialization, other languages can't understand what this is ~ therefore, if your serialized content is a list or dictionary, we highly recommend that you use json module, but if you have to serialize other data types for some reason, and you will use python to deserialize this data in the future, Then you can use pickle.

Tags: Python data structure crawler

Posted on Thu, 30 Sep 2021 16:36:43 -0400 by spaddict