MappedByteBuffer (illustration + second understanding + most complete in History)

MappedByteBuffer (illustration + second understanding + most complete in History)

Java NiO introduces a way to operate large files based on MappedByteBuffer, which has high read and write performance. This paper will introduce its internal implementation principle with such high performance.

memory management

Before delving into MappedByteBuffer, let's look at some terms of computer memory management:

  • MMU: memory management unit of CPU.
  • Physical memory: the memory space of the memory module.
  • Virtual memory: a technology of computer system memory management. It makes the application think that it has continuous available memory (a continuous and complete address space), but in fact, it is usually separated into multiple physical memory fragments, and some are temporarily stored on external disk memory for data exchange when needed.
  • Swap space: a file created by the operating system to reflect the size of the hard disk space for building and using virtual memory. Under windows, that is, pagefile.sys file. Its existence means that after the physical memory is full, the temporarily unused data will be moved to the hard disk.
  • Page missing interrupt: an interrupt issued by MMC when a program attempts to access a page mapped in the virtual address space but not loaded into physical memory. If the operating system determines that the access is valid, it attempts to load the relevant pages from the virtual memory file into physical memory.

Why is there a difference between virtual memory and physical memory?

If a running process needs more memory than the sum of the memory module capacity, for example, the memory module is 256M, but the program needs to create a 2G data area, then all data cannot be loaded into memory (physical memory), and there must be data to be placed in other media (such as hard disk). When the process needs to access that part of data, Then it is scheduled to enter physical memory.

What are virtual memory addresses and physical memory addresses?

Assuming that your computer is 32 bits, its address bus is 32 bits, that is, it can address the address space of 00xfffffff (4G). However, if your computer has only 256M physical memory 0x0x0FFFFFFF (256M), and your process generates an address that is not in this 256M address space, how should the computer deal with it?

Before answering this question, explain the memory paging mechanism of the computer.

The computer will page the virtual memory address space (4G for 32 bits) and frame the physical memory address space (assuming 256M). The size of pages and page frames are the same, so the number of virtual memory pages must be greater than that of physical memory page frames.

There is a page table on the computer, which maps virtual memory pages to physical memory pages. More specifically, it is the mapping from page number to page frame number, and it is a one-to-one mapping.
The problem is, the number of virtual memory pages > the number of physical memory page frames, isn't it that the addresses of some virtual memory pages never have the corresponding physical memory address space?

No, that's how the operating system handles it. The operating system has a page fault function. The operating system finds a least used page frame, invalidates it, writes it to disk, then puts the pages to be accessed into the page frame, and modifies the mapping in the page table to ensure that all pages will be scheduled.

Now let's look at what are virtual memory addresses and physical memory addresses:

  • Virtual memory area: it is composed of page number (associated with the page number in the page table) and offset (the size of the page, that is, how much data this page can store).

For example, if a virtual address has a page number of 4 and an offset of 20, its addressing process is as follows:

First, find the page frame number corresponding to page number 4 (for example, 8) in the page table. If the page is not in memory, use the invalidation mechanism to call in the page, then pass the page frame number and offset to MMC to form a real physical address, and finally access the data in physical memory.

Use of basic MMap in Java

What is MappedByteBuffer? From the perspective of inheritance structure, MappedByteBuffer inherits from ByteBuffer and internally maintains a logical address address.

The file channel class that connects shared memory with disk files: FileChannel.

This class is added by JDK to unify the access methods to external devices (files, network interfaces, etc.) and strengthen the security of multi-threaded access to the same file.

Here, it is only used to establish shared memory. It establishes a channel between shared memory and disk files.

FileChannel provides a map method to map files to virtual memory. Generally, the whole file can be mapped. If the file is large, segment mapping can be performed.

General steps:

  • First, get the file channel through RandomAccessFile.

  • Then, memory mapping is performed through channel to obtain a virtual memory area VMA

//Obtain FileChannel through RandomAccessFile.
try (FileChannel channel = new RandomAccessFile(decodePath, "rw").getChannel();) {

    //Memory mapping through channel to obtain a virtual memory area VMA
    MappedByteBuffer mapBuffer = channel.map(FileChannel.MapMode.PRIVATE, 0, length);
    ....

Parameters of channel.map method:

  • mapping type

Mapmode: the access method of memory image file. There are three constants defined in FileChannel:

  1. MapMode.READ_ONLY: read only. Attempting to modify the resulting buffer will result in an exception being thrown.
  2. MapMode.READ_WRITE: read / write. Changes to the resulting buffer will eventually be written to the file; However, the change is not necessarily visible to other programs mapped to the same file.
  3. MapMode.PRIVATE: for private use, it is readable and writable, but the modified content will not be written to the file. It is only the change of the buffer itself. This ability is called "copy on write".
  • Position: the starting position of file mapping.
  • Length: the length of the mapping area. Length is in bytes. Length in bytes

Example 1: reading files through MappedByteBuffer

package com.crazymakercircle.iodemo.fileDemos;

import com.crazymakercircle.NioDemoConfig;
import com.crazymakercircle.util.IOUtil;
import com.crazymakercircle.util.Logger;

import java.io.*;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

/**
 * Created by Nien @ crazy maker circle
 */
public class FileMmapDemo {

    /**
     * Demo program entry function
     *
     * @param args
     */
    public static void main(String[] args) {
        doMmapDemo();
    }

    /**
     * read
     */
    public static void doMmapDemo() {
        String sourcePath = NioDemoConfig.MMAP_FILE_RESOURCE_SRC_PATH;
        String decodePath = IOUtil.getResourcePath(sourcePath);

        Logger.debug("decodePath=" + decodePath);
        mmapWriteFile(decodePath);
    }


    /**
     * Read file contents and output
     *
     * @param fileName file name
     */
    public static void mmapWriteFile(String fileName) {

        //Save 1M data to the file
        int length = 1024;//
        try (FileChannel channel = new RandomAccessFile(fileName, "rw").getChannel();) {

            //An integer 4 bytes
            MappedByteBuffer mapBuffer = channel.map(FileChannel.MapMode.READ_WRITE, 0, length);
            for (int i = 0; i < length; i++) {
                mapBuffer.put((byte) (Integer.valueOf('a') + i % 26));
            }
            for (int i = 0; i < length; i++) {
                if (i % 50 == 0) System.out.println("");
                //Access like an array
                System.out.print((char) mapBuffer.get(i));
            }

            mapBuffer.force();

        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

Output results

decodePath=/E:/refer/crazydemo/netty_redis_zookeeper_source_code/NioDemos/target/classes//mmap.demo.log 

abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
yzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuv
wxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrst
uvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr
stuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnop
qrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmn
opqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijkl
mnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij
klmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefgh
ijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef
ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcd
efghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzab
cdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
yzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuv
wxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrst
uvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr
stuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnop
qrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmn
opqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijkl
mnopqrstuvwxyzabcdefghijDisconnected from the target VM, address: '127.0.0.1:50970', transport: 'socket'

Process finished with exit code 0

Example 2: reading private mappings through MappedByteBuffer

Private use, readable and writable, but the modified content will not be written to the file, but only the change of the buffer itself. This ability is called "copy on write".

   /**
     * Read file contents and output
     *
     */
    public static void mmapPrivate() {

        String sourcePath = NioDemoConfig.MMAP_FILE_RESOURCE_SRC_PATH;
        String decodePath = IOUtil.getResourcePath(sourcePath);

        Logger.debug("decodePath=" + decodePath);

        //Save 1M data to the file
        int length = 1024;//
        try (FileChannel channel = new RandomAccessFile(decodePath, "rw").getChannel();) {

            //An integer 4 bytes
            MappedByteBuffer mapBuffer = channel.map(FileChannel.MapMode.PRIVATE, 0, length);
            for (int i = 0; i < length; i++) {
                mapBuffer.put((byte) (Integer.valueOf('a') + i % 26));
            }
            for (int i = 0; i < length; i++) {
                if (i % 50 == 0) System.out.println("");
                //Access like an array
                System.out.print((char) mapBuffer.get(i));
            }

           mapBuffer.force();

        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

Execute the program, and you can see the contents not written in the file.

Example 3: shared memory via MMap

Significance of shared memory corresponding to application development

For programmers familiar with UNIX system application development, IPC (interprocess communication) mechanism is very familiar,

IPC basically includes shared memory, semaphore operation, message queue, signal processing and so on. It is a very important and essential tool in development and application.

In all IPC, shared memory is the key. It has unique advantages for data sharing, rapid system query, dynamic configuration and reducing resource consumption.

For UNIX systems, shared memory is divided into general shared memory and file mapping shared memory, while for Windows, there is only image file shared memory.

Therefore, java applications can only create image files and share memory.

Shared memory scenario in Java

In the java language, the concept of shared memory is basically not mentioned, but in some applications, shared memory is indeed very useful.

For example, in the distributed application system using java language, there are a large number of distributed shared objects. It is often necessary to query the status of these objects to see whether the system is running normally or to understand some current statistics and status of these objects.

If network communication is adopted, it will obviously increase the additional burden of applications and some unnecessary application programming.

If you use shared memory, you can view the status data and statistics of objects directly through shared memory, which reduces some unnecessary trouble.

The use of shared memory has the following characteristics:

  • Can be opened and accessed by multiple processes;
  • When a read-write process performs a read-write operation, other processes cannot perform a write operation;
  • Multiple processes can alternately write to a shared memory;
  • After a process performs a memory write operation, it does not affect the access of other processes to the memory. At the same time, other processes have visibility to the updated memory.
  • When a process executes a write operation, if it exits abnormally, the write operation prohibition on other processes shall be automatically released.
  • Compared with sharing files, data access is more convenient and efficient

Implementation of shared memory in java

The class MappedByteBuffer provided in jdk1.4 provides a better method for us to realize shared memory.

The buffer is actually a memory image of a disk file. The changes of the two will be synchronized, that is, the changes of memory data will be immediately reflected in the disk file, which will effectively ensure the implementation of shared memory.

The file channel class that connects shared memory with disk files: FileChannel.

This class is added by JDK to unify the access methods to external devices (files, network interfaces, etc.) and strengthen the security of multi-threaded access to the same file.

Here, it is only used to establish shared memory. It establishes a channel between shared memory and disk files.

To open a file and establish a file channel, you can use the method getChannel in the RandomAccessFile class.

This method will directly return a file channel.

Since the corresponding file of the file channel is set as a random access file, on the one hand, it can read and write, and on the other hand, it will not destroy the contents of the image file (if you directly open an image file with FileOutputStream, the size of the file will be set to 0, and all data will be lost).

Why can't FileOutputStream and FileInputStream ideally meet the requirements of shared memory?

Because it is much more difficult for these two classes to implement free read and write operations at the same time.

How to ensure write mutex

Since only one file can have write permission, exclusivity can be guaranteed through distributed locking.

If there is a simple mutual exclusion method on the same machine:

  • File lock is adopted.

Reference code for the application of shared memory in java

package com.crazymakercircle.iodemo.sharemem;

import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileLock;
import java.util.Properties;

import com.crazymakercircle.NioDemoConfig;
import com.crazymakercircle.util.IOUtil;


/**
 * Shared memory operation class
 */
public class ShareMemory {
    String sourcePath = NioDemoConfig.MEM_SHARE_RESOURCE_SRC_PATH;
    String decodePath = IOUtil.getResourcePath(sourcePath);

    int fsize = 1024;                          //The actual size of the file  
    MappedByteBuffer mapBuf = null;         //Define shared memory buffers
    FileChannel fc = null;                  //Define the corresponding file channel  
    FileLock fl = null;                     //Defines the tag for file area locking.        
    Properties p = null;
    RandomAccessFile randomAccessFile = null;         //Define a random access file object


    public ShareMemory() {


        try {
            // Gets a read-only random access file object "rw" open for reading and writing. If the file does not already exist, try creating it.    
            randomAccessFile = new RandomAccessFile(decodePath, "rw");
            //Get the corresponding file channel  
            fc = randomAccessFile.getChannel();
             //Map the file area of this channel directly to memory.
            mapBuf = fc.map(FileChannel.MapMode.READ_WRITE, 0, fsize);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * @param pos  The position where the locking area starts; Must be non negative
     * @param len  The size of the locking area; Must be non negative
     * @param buff Data written
     * @return
     */
    public synchronized int write(int pos, int len, byte[] buff) {
        if (pos >= fsize || pos + len >= fsize) {
            return 0;
        }
        //Defines the tag for file area locking.  
        FileLock fl = null;
        try {
            //Gets the lock on the given area of the file for this channel.   
            fl = fc.lock(pos, len, false);
            if (fl != null) {

                mapBuf.position(pos);
                ByteBuffer bf1 = ByteBuffer.wrap(buff);
                mapBuf.put(bf1);
                //Release this lock.  
                fl.release();

                return len;
            }
        } catch (Exception e) {
            if (fl != null) {
                try {
                    fl.release();
                } catch (IOException e1) {
                    System.out.println(e1.toString());
                }
            }
            return 0;
        }

        return 0;
    }

    /**
     * @param pos  The position where the locking area starts; Must be non negative
     * @param len  The size of the locking area; Must be non negative
     * @param buff Data to be retrieved
     * @return
     */
    public synchronized int read(int pos, int len, byte[] buff) {
        if (pos >= fsize) {
            return 0;
        }
        //Defines the tag for file area locking.  
        FileLock fl = null;
        try {
            fl = fc.lock(pos, len, false);
            if (fl != null) {
                //System.out.println( "pos="+pos );  
                mapBuf.position(pos);
                if (mapBuf.remaining() < len) {
                    len = mapBuf.remaining();
                }

                if (len > 0) {
                    mapBuf.get(buff, 0, len);
                }

                fl.release();

                return len;
            }
        } catch (Exception e) {
            if (fl != null) {
                try {
                    fl.release();
                } catch (IOException e1) {
                    System.out.println(e1.toString());
                }
            }
            return 0;
        }

        return 0;
    }

    /**
     * When finished, close related operations
     */
    protected void finalize() throws Throwable {
        if (fc != null) {
            try {
                fc.close();
            } catch (IOException e) {
                System.out.println(e.toString());
            }
            fc = null;
        }

        if (randomAccessFile != null) {
            try {
                randomAccessFile.close();
            } catch (IOException e) {
                System.out.println(e.toString());
            }
            randomAccessFile = null;
        }
        mapBuf = null;
    }

    /**
     * Turn off shared memory operation
     */
    public synchronized void closeSMFile() {
        if (fc != null) {
            try {
                fc.close();
            } catch (IOException e) {
                System.out.println(e.toString());
            }
            fc = null;
        }

        if (randomAccessFile != null) {
            try {
                randomAccessFile.close();
            } catch (IOException e) {
                System.out.println(e.toString());
            }
            randomAccessFile = null;
        }
        mapBuf = null;
    }


}  

Core principles of map process

Next, by analyzing the source code, we can understand the internal implementation of the map process.

  1. Obtain FileChannel through RandomAccessFile.
public final FileChannel getChannel() {
    synchronized (this) {
        if (channel == null) {
            channel = FileChannelImpl.open(fd, path, true, rw, this);
        }
        return channel;
    }
}

As can be seen from the above implementation, only one thread can initialize FileChannel due to synchronized.

Map the file to the virtual memory through the FileChannel.map method and return the logical address address as follows:

**Only the core code is retained**
public MappedByteBuffer map(MapMode mode, long position, long size)  throws IOException {
        int pagePosition = (int)(position % allocationGranularity);
        long mapPosition = position - pagePosition;
        long mapSize = size + pagePosition;
        try {
            addr = map0(imode, mapPosition, mapSize);
        } catch (OutOfMemoryError x) {
            System.gc();
            try {
                Thread.sleep(100);
            } catch (InterruptedException y) {
                Thread.currentThread().interrupt();
            }
            try {
                addr = map0(imode, mapPosition, mapSize);
            } catch (OutOfMemoryError y) {
                // After a second OOME, fail
                throw new IOException("Map failed", y);
            }
        }
        int isize = (int)size;
        Unmapper um = new Unmapper(addr, mapSize, isize, mfd);
        if ((!writable) || (imode == MAP_RO)) {
            return Util.newMappedByteBufferR(isize,
                                             addr + pagePosition,
                                             mfd,
                                             um);
        } else {
            return Util.newMappedByteBuffer(isize,
                                            addr + pagePosition,
                                            mfd,
                                            um);
        }
}

As can be seen from the above code, the final map completes the file mapping through the native function map0.

  1. If the first file mapping leads to OOM, garbage collection will be triggered manually. After sleeping for 100ms, try mapping again. If it fails, an exception will be thrown.
  2. Initialize the MappedByteBuffer instance through the newMappedByteBuffer method, but it finally returns the DirectByteBuffer instance. The implementation is as follows:
static MappedByteBuffer newMappedByteBuffer(int size, long addr, FileDescriptor fd, Runnable unmapper) {
    MappedByteBuffer dbb;
    if (directByteBufferConstructor == null)
        initDBBConstructor();
    dbb = (MappedByteBuffer)directByteBufferConstructor.newInstance(
          new Object[] { new Integer(size),
                         new Long(addr),
                         fd,
                         unmapper }
    return dbb;
}
// Access rights
private static void initDBBConstructor() {
    AccessController.doPrivileged(new PrivilegedAction<Void>() {
        public Void run() {
            Class<?> cl = Class.forName("java.nio.DirectByteBuffer");
                Constructor<?> ctor = cl.getDeclaredConstructor(
                    new Class<?>[] { int.class,
                                     long.class,
                                     FileDescriptor.class,
                                     Runnable.class });
                ctor.setAccessible(true);
                directByteBufferConstructor = ctor;
        }});
}

Because FileChannelImpl and DirectByteBuffer are not in the same package, there is a permission access problem. Obtain the constructor of DirectByteBuffer through AccessController class for instantiation.

DirectByteBuffer is a subclass of MappedByteBuffer, which implements direct operations on memory.

get procedure

The get method of MappedByteBuffer is finally implemented through the DirectByteBuffer.get method.

public byte get() {
    return ((unsafe.getByte(ix(nextGetIndex()))));
}
public byte get(int i) {
    return ((unsafe.getByte(ix(checkIndex(i)))));
}
private long ix(int i) {
    return address + (i << 0);
}

The map0() function returns an address, so that the file can be operated through address without calling the read or write methods. The bottom layer uses the unsafe.getByte method to obtain the data of the specified memory through (address + offset).

  1. Accessing the memory area pointed to by address for the first time leads to page missing interrupt. The interrupt response function will find the corresponding page in the exchange area. If it is not found (that is, the file has never been read into memory), read the specified page of the file from the hard disk to the physical memory (non jvm heap memory).
  2. If the physical memory is not enough when copying data, the temporarily unused physical pages will be exchanged to the virtual memory of the hard disk through the virtual memory mechanism (swap).

performance analysis

From the code level, when reading files from the hard disk into the memory, the data must be copied through the file system, and the data copy operation is realized by the file system and hardware driver. In theory, the efficiency of copying data is the same.
But the efficiency of accessing files on the hard disk through memory mapping is higher than that of read and write system calls. Why?

  • read() is a system call. First, the file is copied from the hard disk to a buffer in the kernel space, and then the data is copied to the user space. In fact, the data is copied twice;
  • map() is also a system call, but it does not copy the data. When page missing interrupt occurs, the file is directly copied from the hard disk to the user space, and only one data copy is made.

Therefore, the read-write efficiency of memory mapping is higher than that of traditional read/write.

summary

  1. MappedByteBuffer uses virtual memory, so the memory size allocated (map) is not limited by the - Xmx parameter of the JVM, but there is also a size limit.
  2. If the file exceeds the 1.5G limit, you can re map the contents behind the file through the position parameter.
  3. MappedByteBuffer does have high performance when processing large files, but there are also some problems, such as memory occupation and uncertain file closing. The opened files will be closed only when they are garbage collected, and this time point is uncertain.
    javadoc also mentioned: a mapped byte buffer and the file mapping that it representatives remain valid until the buffer itself is garbage collected

Posted on Tue, 30 Nov 2021 05:09:54 -0500 by genius_supreme