Python3 Standard Library: Pile Object Serialization

1. Pile object serialization

The pickle module implements an algorithm that converts any Python object into a series of bytes.This process is also known as serializing objects.You can transfer or store byte streams that represent objects, then reconstruct them to create new objects of the same nature.

1.1 Encoding and decoding data in strings

The first example uses dumps() to encode a data structure as a string and print the string to the console.It uses a data structure that consists entirely of built-in types.Instances of any class can be picked, as shown in the following example.

import pickle
import pprint

data = [{'a': 'A', 'b': 2, 'c': 3.0}]
print('DATA:', end=' ')
pprint.pprint(data)

data_string = pickle.dumps(data)
print('PICKLE: {!r}'.format(data_string))

By default, pickle s are written in a binary format that is best compatible when shared between Python 3 programs.

After data serialization, you can write to a file, socket, pipeline, or other location.You can then read the file and unpick the data to construct a new object with the same values.

import pickle
import pprint

data1 = [{'a': 'A', 'b': 2, 'c': 3.0}]
print('BEFORE: ', end=' ')
pprint.pprint(data1)

data1_string = pickle.dumps(data1)

data2 = pickle.loads(data1_string)
print('AFTER : ', end=' ')
pprint.pprint(data2)

print('SAME? :', (data1 is data2))
print('EQUAL?:', (data1 == data2))

The newly constructed object is equal to the original object, but not the same object.

1.2 Processing Stream

In addition to dumps() and loads(), pickle provides some convenient functions to handle streams of similar files.You can write multiple objects to a stream and read them from the stream without having to know in advance how many objects to write or how large they are.

import io
import pickle

class SimpleObject:

    def __init__(self, name):
        self.name = name
        self.name_backwards = name[::-1]
        return

data = []
data.append(SimpleObject('pickle'))
data.append(SimpleObject('preserve'))
data.append(SimpleObject('last'))

# Simulate a file.
out_s = io.BytesIO()

# Write to the stream
for o in data:
    print('WRITING : {} ({})'.format(o.name, o.name_backwards))
    pickle.dump(o, out_s)
    out_s.flush()

# Set up a read-able stream
in_s = io.BytesIO(out_s.getvalue())

# Read the data
while True:
    try:
        o = pickle.load(in_s)
    except EOFError:
        break
    else:
        print('READ    : {} ({})'.format(
            o.name, o.name_backwards))

This example uses two BytesIO buffers to simulate streams.The first buffer receives the pickled object, its value is filled in the second buffer, and load() reads the buffer.Simple database formats can also use pickles to store objects.The shell module is such an implementation.

In addition to storing data, pickle s are also convenient for interprocess communication.For example, os.fork() and os.pipe() can be used to set up worker processes, read job instructions from one pipeline, and write the results to another.Core code that manages the pool of worker threads and sends jobs and receives responses can be reused because jobs and response objects do not have to be based on a specific class.When using pipes or sockets, don't forget to refresh the output after dumping each object to push data across the connection to the other end.See the multiprocessing module for a reusable worker thread pool manager.

1.3 Refactoring Objects

When working with custom classes, the pickled class must appear in the namespace where the process reading the pickle resides.Only pickled data for this instance, not class definitions.The class name is used to find the constructor to see the new object when the pickled is released.The following example writes an instance of a class to a file.

import pickleclass SimpleObject:

    def __init__(self, name):
        self.name = name
        l = list(name)
        l.reverse()
        self.name_backwards = ''.join(l)

if __name__ == '__main__':
    data = []
    data.append(SimpleObject('pickle'))
    data.append(SimpleObject('preserve'))
    data.append(SimpleObject('last'))

    with open('Test.py', 'wb') as out_s:
        for o in data:
            print('WRITING: {} ({})'.format(
                o.name, o.name_backwards))
            pickle.dump(o, out_s)

When you run this script, a file is created based on the name given as a command line parameter.

The pickled object from a simple attempt to load will fail.(

import pickle

with open('Test.py', 'rb') as in_s:
    while True:
        try:
            o = pickle.load(in_s)
        except EOFError:
            break
        else:
            print('READ: {} ({})'.format(
                o.name, o.name_backwards))

This version failed because there was no SimpleObject class.

The revised version imports SimpleObject from the original script and will run successfully this time.Now that the import statement is added at the end of the import list, the script can find the class and construct the object.(

from demo import SimpleObject

Now allow modified scripts to produce the desired results.

1.4 Objects of Unpicklable

Not all objects are pickled.Sockets, file handles, database connections, and other objects whose runtime state depends on the operating system or other processes may not be saved in a meaningful way.If the object contains attributes that are not pickled, you can define u getstate_() and u setstate_() to return a subset of the state of the pickled instance.

The u getstate_() method must return an object containing the internal state of the pickled object.A convenient way to represent state is to use a dictionary, but the value can be any pickled object.Save the state, then pass the saved state into u setstate_() when loading the object from the pickle.

import pickle

class State:

    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return 'State({!r})'.format(self.__dict__)

class MyClass:

    def __init__(self, name):
        print('MyClass.__init__({})'.format(name))
        self._set_name(name)

    def _set_name(self, name):
        self.name = name
        self.computed = name[::-1]

    def __repr__(self):
        return 'MyClass({!r}) (computed={!r})'.format(
            self.name, self.computed)

    def __getstate__(self):
        state = State(self.name)
        print('__getstate__ -> {!r}'.format(state))
        return state

    def __setstate__(self, state):
        print('__setstate__({!r})'.format(state))
        self._set_name(state.name)

inst = MyClass('name here')
print('Before:', inst)

dumped = pickle.dumps(inst)

reloaded = pickle.loads(dumped)
print('After:', reloaded)

This example uses a separate State object to hold the internal state of MyClass.When an instance of MyClass is loaded from a pickle, a State instance is passed to u setstate_() to initialize the object.

1.5 Circular Reference

The pickle protocol automatically handles circular references between objects, so complex data structures do not require any special processing.(

import pickle

class Node:
    """A simple digraph
    """
    def __init__(self, name):
        self.name = name
        self.connections = []

    def add_edge(self, node):
        "Create an edge between this node and the other."
        self.connections.append(node)

    def __iter__(self):
        return iter(self.connections)

def preorder_traversal(root, seen=None, parent=None):
    """Generator function to yield the edges in a graph.
    """
    if seen is None:
        seen = set()
    yield (parent, root)
    if root in seen:
        return
    seen.add(root)
    for node in root:
        recurse = preorder_traversal(node, seen, root)
        for parent, subnode in recurse:
            yield (parent, subnode)

def show_edges(root):
    "Print all the edges in the graph."
    for parent, child in preorder_traversal(root):
        if not parent:
            continue
        print('{:>5} -> {:>2} ({})'.format(
            parent.name, child.name, id(child)))

# Set up the nodes.
root = Node('root')
a = Node('a')
b = Node('b')
c = Node('c')

# Add edges between them.
root.add_edge(a)
root.add_edge(b)
a.add_edge(b)
b.add_edge(a)
b.add_edge(c)
a.add_edge(a)

print('ORIGINAL GRAPH:')
show_edges(root)

# Pickle and unpickle the graph to create
# a new set of nodes.
dumped = pickle.dumps(root)
reloaded = pickle.loads(dumped)

print('\nRELOADED GRAPH:')
show_edges(reloaded)

The reloaded nodes are not the same object, but maintain the relationship between the nodes, and if the object has multiple references, only one copy of the object will be reloaded.To verify these two points, check the id() value of the node before and after passing it through a pickle.

Tags: Python Database encoding socket

Posted on Sun, 22 Mar 2020 02:42:22 -0400 by wacook