The optimization process of compressing 20 m files from 30 seconds to 1 second

There is a requirement to transmit 10 photos from the front end, and then compress them into a compressed package to transmit and output through the network after the back end is processed. I didn't touch the Java compressed file before, so I went online to find an example and changed it. I can use it after the change. However, with the size of the front-end picture increasing, the time consumed is also increasing dramatically. Finally, I measured that it took 30 seconds to compress the 20 m file. The code of the compressed file is as follows.

 


 
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
public static void zipFileNoBuffer() {    File zipFile = new File(ZIP_FILE);    try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile))) {        //start time        long beginTime = System.currentTimeMillis();
        for (int i = 0; i < 10; i++) {            try (InputStream input = new FileInputStream(JPG_FILE)) {                zipOut.putNextEntry(new ZipEntry(FILE_NAME + i));                int temp = 0;                while ((temp = input.read()) != -1) {                    zipOut.write(temp);                }            }        }        printInfo(beginTime);    } catch (Exception e) {        e.printStackTrace();    }}

 

Here we find a 2M image and test it ten times. The printing results are as follows. The time is about

It's 30 seconds.

 


 
  •  
  •  
fileSize:20Mconsum time:29599

 

#First optimization process - from 30 seconds to 2 seconds

 

The first thing that comes to mind when optimizing is to use buffer InputStream. In FileInputStream, the read() method reads only one byte at a time. It is also described in the source code.

 


 
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
/** * Reads a byte of data from this input stream. This method blocks * if no input is yet available. * * @return     the next byte of data, or <code>-1</code> if the end of the *             file is reached. * @exception  IOException  if an I/O error occurs. */public native int read() throws IOException;

 

This is a call to a local method to interact with the native operating system to read data from disk. It is very time-consuming to call a local method to interact with the operating system every time a byte of data is read. For example, we now have 30000 bytes of data. If we use FileInputStream, we need to call 30000 local methods to get these data. If we use buffer (assuming the initial buffer size is enough to hold 30000 bytes of data), we only need to call once. Because the buffer will directly read the data from the disk into the memory when the read() method is called for the first time. Then one byte by one slowly returns.

  •  
BufferedInputStream internally encapsulates a byte array for storing data. The default size is 8192

 

The optimized code is as follows

 


 
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
public static void zipFileBuffer() {    File zipFile = new File(ZIP_FILE);    try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));            BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(zipOut)) {        //start time        long beginTime = System.currentTimeMillis();        for (int i = 0; i < 10; i++) {            try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(JPG_FILE))) {                zipOut.putNextEntry(new ZipEntry(FILE_NAME + i));                int temp = 0;                while ((temp = bufferedInputStream.read()) != -1) {                    bufferedOutputStream.write(temp);                }            }        }        printInfo(beginTime);    } catch (Exception e) {        e.printStackTrace();    }}

 

output

 


 
  •  
  •  
  •  
------BufferfileSize:20Mconsum time:1808

 

As you can see, the efficiency of FileInputStream is much higher than that of the first time

 

#Second optimization process - from 2 seconds to 1 second

 

The use of buffer has met my needs, but with the idea of learning to apply, I want to use the knowledge in NIO for optimization.

 

#Using Channel

 

Why channel? Because channel and ByteBuffer are new in NIO. Because their structure is more in line with the way the operating system performs I/O, their speed is significantly faster than that of traditional IO. Channel is like a mine with a coal mine, while ByteBuffer is a truck sent to the mine. That is to say, our interaction with data is the interaction with ByteBuffer.

 

There are three classes that can generate FileChannel in NIO. They are FileInputStream, FileOutputStream, and RandomAccessFile that can read and write.

 

Source code is as follows

 


 
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
public static void zipFileChannel() {    //start time    long beginTime = System.currentTimeMillis();    File zipFile = new File(ZIP_FILE);    try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));            WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {        for (int i = 0; i < 10; i++) {            try (FileChannel fileChannel = new FileInputStream(JPG_FILE).getChannel()) {                zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE));                fileChannel.transferTo(0, FILE_SIZE, writableByteChannel);            }        }        printInfo(beginTime);    } catch (Exception e) {        e.printStackTrace();    }}

We can see that instead of using ByteBuffer for data transmission, we use the transferTo method. This method is to connect two channels directly.

 


 
  •  
  •  
  •  
  •  
This method is potentially much more efficient than a simple loop* that reads from this channel and writes to the target channel.  Many* operating systems can transfer bytes directly from the filesystem cache* to the target channel without actually copying them.

 

This is the description text on the source code, which roughly means that the efficiency of using transferTo is better than looping one Channel to read and then to write to another Channel. The operating system can directly transfer bytes from the file system cache to the target Channel without the actual copy phase.

The copy phase is a process from kernel space to user space

You can see that the speed of using buffer has been improved.

 


 
  •  
  •  
  •  
  •  
------ChannelfileSize:20Mconsum time:1416

 

#Kernel space and user space

 

So why is it slow to move from kernel space to user space? First of all, we need to understand what is kernel space and user space. In order to protect the core resources in the common operating system, the system is designed into four areas, and the more the permissions are, the greater. So Ring0 is called kernel space, which is used to access some key resources. Ring3 is called user space.

image

User state, kernel state: thread in kernel space is called kernel state, and thread in user space belongs to user state

So what if we need to access the core resources for the application (the application belongs to the user state)? It needs to call the interface exposed in the kernel for calling, which is called system call. For example, at this time our application needs to access files on disk. At this time, the application program will call the interface open method called by the system, and then the kernel will access the files in the disk and return the contents of the files to the application program. The general process is as follows

 

image

 

#Direct and indirect buffers

 

Since we are going to read a disk file, we need to waste such a big trouble. Is there any simple way for our application to directly operate disk files without the need for kernel transfer? Yes, that is to establish a direct buffer.

 

  • Indirect buffer: indirect buffer refers to the kernel state mentioned above as an intermediary, which needs the kernel as a transit every time.

     

    image

     

  • Direct buffer: the direct buffer does not need the kernel space as the transit copy data, but directly applies for a piece of space in the physical memory, which maps to the kernel address space and the user address space. The access of data between the application program and the disk interacts through the directly applied physical memory.

    image

Since direct buffers are so fast, why don't we all use them? In fact, direct buffer has the following disadvantages. Disadvantages of direct buffers:

 

  1. Unsafe

  2. It costs more because it doesn't directly open up space in the JVM. The collection of this part of memory can only depend on the garbage collection mechanism. When the garbage collection is not controlled by us.

  3. When the data is written to the physical memory buffer, the program loses the management of the data, that is, when the data is finally written to the disk can only be determined by the operating system, and the application program can no longer interfere.

     

To sum up, we use the transferTo method to directly open a direct buffer. So the performance is much better than that

 

#Use memory map file

 

Another new feature of NIO is the memory mapping file. Why is the memory mapping file fast? In fact, as mentioned above, it is also a direct buffer in memory. Interact directly with data. Source code is as follows

 


 
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
//Version 4 uses Map mapping filespublic static void zipFileMap() {    //start time    long beginTime = System.currentTimeMillis();    File zipFile = new File(ZIP_FILE);    try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));            WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {        for (int i = 0; i < 10; i++) {
            zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE));
            //Mapping file in memory            MappedByteBuffer mappedByteBuffer = new RandomAccessFile(JPG_FILE_PATH, "r").getChannel()                    .map(FileChannel.MapMode.READ_ONLY, 0, FILE_SIZE);
            writableByteChannel.write(mappedByteBuffer);        }        printInfo(beginTime);    } catch (Exception e) {        e.printStackTrace();    }}

Print as follows

 


 
  •  
  •  
  •  
---------MapfileSize:20Mconsum time:1305

 

You can see that the speed is about the same as that of using Channel.

 

Using Pipe

 

The Java NIO pipeline is a one-way data connection between two threads. Pipe has a source channel and a sink channel. The source channel is used to read data and the sink channel is used to write data. You can see the introduction in the source code, which means that the write thread will block until the read thread reads data from the channel. If no data is readable, the read thread will block until the write thread writes data. Until the channel is closed.

 


 
  •  
  •  
Whether or not a thread writing bytes to a pipe will block until another thread reads those bytes

image

The effect I want is this. Source code is as follows

 


 
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
//Version 5 uses Pippublic static void zipFilePip() {
    long beginTime = System.currentTimeMillis();    try(WritableByteChannel out = Channels.newChannel(new FileOutputStream(ZIP_FILE))) {        Pipe pipe = Pipe.open();        //Asynchronous task        CompletableFuture.runAsync(()->runTask(pipe));
        //Get read channel        ReadableByteChannel readableByteChannel = pipe.source();        ByteBuffer buffer = ByteBuffer.allocate(((int) FILE_SIZE)*10);        while (readableByteChannel.read(buffer)>= 0) {            buffer.flip();            out.write(buffer);            buffer.clear();        }    }catch (Exception e){        e.printStackTrace();    }    printInfo(beginTime);
}
//Asynchronous taskpublic static void runTask(Pipe pipe) {
    try(ZipOutputStream zos = new ZipOutputStream(Channels.newOutputStream(pipe.sink()));            WritableByteChannel out = Channels.newChannel(zos)) {        System.out.println("Begin");        for (int i = 0; i < 10; i++) {            zos.putNextEntry(new ZipEntry(i+SUFFIX_FILE));
            FileChannel jpgChannel = new FileInputStream(new File(JPG_FILE_PATH)).getChannel();
            jpgChannel.transferTo(0, FILE_SIZE, out);
            jpgChannel.close();        }    }catch (Exception e){        e.printStackTrace();    }}

 

Summary

 

  • Life needs learning everywhere, sometimes it's just a simple optimization, which can let you learn a variety of different knowledge in depth. Therefore, in learning, we need to know not only this knowledge but also why we should do it.

     

  • Unity of knowledge and practice: try to apply knowledge once after learning. In this way, you can remember firmly.

 

#Reference article

 

  • https://www.jianshu.com/p/f90866dcbffc

  • https://juejin.im/post/5af942c6f265da0b7026050c

  • Interesting talk about Linux operating system

  • JAVA NIO direct buffer and indirect buffer

626 original articles published, 324 praised, 300000 visitors+

Tags: Java network jvm Unity

Posted on Sun, 12 Jan 2020 06:34:17 -0500 by ramas