Python3 Standard Library: mmap Memory Mapping File

1. mmap memory map file

Creating a memory map for a file will use the operating system virtual memory to directly access the data on the file system instead of using regular I/O functions to access the data.Memory mapping typically provides I/O performance because using memory mapping does not require a separate system call for each access or data replication between buffers; in fact, both the kernel and user applications can access memory directly.

Memory mapped files can be viewed as modifiable strings or objects of similar files, depending on your needs.Mapping files support general file API methods such as close(), flush(), read(), readline(), seek(), tell(), and write().It also supports string APIs, provides features such as fragmentation, and methods like find().

All the examples below use the text file lorem.txt, which contains some Lorem Ipsum.For ease of reference, the following code list gives the text of this file.

Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Donec egestas, enim et consectetuer ullamcorper, lectus ligula rutrum leo,
a elementum elit tortor eu quam. Duis tincidunt nisi ut ante. Nulla
facilisi. Sed tristique eros eu libero. Pellentesque vel
arcu. Vivamus purus orci, iaculis ac, suscipit sit amet, pulvinar eu,
lacus. Praesent placerat tortor sed nisl. Nunc blandit diam egestas
dui. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Aliquam viverra fringilla
leo. Nulla feugiat augue eleifend nulla. Vivamus mauris. Vivamus sed
mauris in nibh placerat egestas. Suspendisse potenti. Mauris
massa. Ut eget velit auctor tortor blandit sollicitudin. Suspendisse
imperdiet justo.

1.1 Read Files

You can create a memory-mapped file using the mmap() function.The first parameter is the file descriptor, either from the fileno() method of the file object or from os.open().The caller is responsible for opening the file before calling mmap(), and closing it when it is no longer needed.

The second parameter of mmap() is the size (in bytes) of the file part to be mapped.If the value is 0, the entire file is mapped.If this size is larger than the current size of the file, the file will be expanded.

Both platforms support an optional keyword parameter access.Use ACCESS_READ for read-only access; ACCESS_WRITE for write-through, where assignments to memory are written directly to a file; ACCESS_COPY for copy-on-write, where assignments to memory are not written to a file.

import mmap

with open('lorem.txt', 'r') as f:
    with mmap.mmap(f.fileno(), 0,
                   access=mmap.ACCESS_READ) as m:
        print('First 10 bytes via read :', m.read(10))
        print('First 10 bytes via slice:', m[:10])
        print('2nd   10 bytes via read :', m.read(10))

The file pointer tracks the last byte accessed through a fragmentation operation.In this example, the pointer moves forward 10 bytes after the first reading.The fragmentation operation then resets the pointer back to the starting point of the file, and the fragmentation moves the pointer 10 bytes forward again.Calling read() after the fragmentation operation gives the file 11-20 bytes.

1.2 Writing Files

To create a memory mapped file to receive updates, first open the file using the mode'r+'(instead of'w') before mapping to complete the append.You can then use any API method that alters the data, such as write(), or assign to a slice.

The following example uses the default access mode ACCESS_WRITE and assigns it to a slice to modify a portion of a row in place.

import mmap
import shutil

# Copy the example file
shutil.copyfile('lorem.txt', 'lorem_copy.txt')

word = b'consectetuer'
reversed = word[::-1]
print('Looking for    :', word)
print('Replacing with :', reversed)

with open('lorem_copy.txt', 'r+') as f:
    with mmap.mmap(f.fileno(), 0) as m:
        print('Before:\n{}'.format(m.readline().rstrip()))
        m.seek(0)  # rewind

        loc = m.find(word)
        m[loc:loc + len(word)] = reversed
        m.flush()

        m.seek(0)  # rewind
        print('After :\n{}'.format(m.readline().rstrip()))

        f.seek(0)  # rewind
        print('File  :\n{}'.format(f.readline().rstrip()))

The word "consectetuer" in the middle of the first line in the memory file will be replaced.

Access settings for ACCESS_COPY do not write changes to files on disk.(

import mmap
import shutil

# Copy the example file
shutil.copyfile('lorem.txt', 'lorem_copy.txt')

word = b'consectetuer'
reversed = word[::-1]

with open('lorem_copy.txt', 'r+') as f:
    with mmap.mmap(f.fileno(), 0,
                   access=mmap.ACCESS_COPY) as m:
        print('Memory Before:\n{}'.format(
            m.readline().rstrip()))
        print('File Before  :\n{}\n'.format(
            f.readline().rstrip()))

        m.seek(0)  # rewind
        loc = m.find(word)
        m[loc:loc + len(word)] = reversed

        m.seek(0)  # rewind
        print('Memory After :\n{}'.format(
            m.readline().rstrip()))

        f.seek(0)
        print('File After   :\n{}'.format(
            f.readline().rstrip()))

In this example, you must rotate the file handle and mmap handle separately, since the internal state of the two objects is maintained separately.

1.3 Regular Expression

Since a memory-mapped file is like a string, it is often used with other modules that handle strings, such as regular expressions.The following example finds all the sentences that contain "nulla".(

import mmap
import re

pattern = re.compile(rb'(\.\W+)?([^.]?nulla[^.]*?\.)',
                     re.DOTALL | re.IGNORECASE | re.MULTILINE)

with open('lorem.txt', 'r') as f:
    with mmap.mmap(f.fileno(), 0,
                   access=mmap.ACCESS_READ) as m:
        for match in pattern.findall(m):
            print(match[1].replace(b'\n', b' '))

Since this pattern contains two groups, the return value of findall() is a tuple sequence.The print statement finds matching sentences and replaces line breaks with spaces so that the results are printed on the same line.

Tags: Python

Posted on Wed, 18 Mar 2020 23:49:29 -0400 by dkjohnson