FileChannel reads files by line

What is FileChannel

It is a channel for reading, writing, mapping, and manipulating files. In addition to familiar byte channel read, write, and close operations, this class defines the following file specific operations:

  • Bytes can be read or written to absolute locations in the file in a way that does not affect the current location of the channel.

  • The area of the file can be mapped directly to memory. For large files, this is usually much more efficient than calling the usual read or write methods.

  • Updates to files may be forced to the underlying storage device to ensure that data is not lost in the event of a system crash.

  • Bytes can be transferred from file to other channels, and vice versa, and can be optimized through many operating systems to transfer bytes to or directly from the file system cache.

  • The area of the file may be locked to prevent access by other programs.

FileChannel cooperates with ByteBuffer to cache the read/write data into memory, and then read/write in batch / cache mode, which eliminates the repeated intermediate operations in non batch operations and significantly improves the efficiency in handling large files. ByteBuffer can use allocateDirect (system memory) without jvm recycle.

To sum up, reading by byte is more efficient for large files, which cannot be set to non blocking mode. It always runs in blocking mode.

But sometimes we need to read the file by line, while FileChannel can only read by byte, so we need to judge the line breaking. Here I implement it for your reference.

realization

import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class test1 {
    public static void readLineByChannel(String path) throws IOException {
        long lineNumber = 0;
        FileInputStream fileIn = new FileInputStream(path);
        FileChannel fileChannel = fileIn.getChannel();
        // Start read by line
        int bufferSize = 1024 * 1024;  // Size of each piece
        ByteBuffer buffer = ByteBuffer.allocate(bufferSize);
        byte b;

        while(fileChannel.read(buffer) > 0)
        {
            buffer.flip();
            for (int i = 0; i < buffer.limit(); i++)
            {
                b = buffer.get();
                if(b==10){  // If a new line is encountered
                    lineNumber++;
                }

            }
            buffer.clear(); // Clear buffer
        }
        fileChannel.close();
        System.out.println(lineNumber);
    }

    public static void readLineByBufferedReader(String path) throws IOException {
        long lineNumber = 0;
        FileInputStream inputStream = new FileInputStream(path);
        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
        String line;
        while((line=bufferedReader.readLine()) != null)
        {
            lineNumber++;
        }
        inputStream.close();
        bufferedReader.close();

        System.out.println(lineNumber);
    }

    public static void main(String[] args) throws IOException {
        String path = "Big file";
        long startTime = System.currentTimeMillis();
        readLineByChannel(path);
        System.out.println("readLineByChannel Time consuming:" + (System.currentTimeMillis() - startTime));
        startTime = System.currentTimeMillis();
        readLineByBufferedReader(path);
        System.out.println("readLineByBufferedReader Time consuming:" + (System.currentTimeMillis() - startTime));
    }
}

Use FileChannel and BufferedReader to read large files respectively, and calculate how many lines there are.

//First test:
169860474
 readLineByChannel time consumption: 27310
169860474
 readLineByBufferedReader time: 24944
    
//Second test
169860474
 readLineByChannel time: 28677
169860474
 readLineByBufferedReader time: 21229

The test file is 12GB, and it can be seen that the file has more than 160 million lines. The actual test shows that the gap between the two is not big, and even the BufferedReader is faster.

Tags: Java jvm

Posted on Sun, 21 Jun 2020 05:45:27 -0400 by schoolmommy