ReadStream and WriteStream in Node

Previously, I saw others use Koa to realize the file download function and directly transfer the Readable Stream to ctx.body:

const fs = require('fs');
const Koa = require('koa');
const app = new Koa();

app.use(async ctx => {
  try {
    ctx.body = fs.createReadStream(resolve(__dirname, 'test.json'));
  } catch (err) {
    ctx.body = err
  }
});

app.listen(3000);

At that time, I thought it was incredible to do so, because I understood the concept of stream before, that is, listening to data events and splicing chunk s:

const http = require('http');

http.createServer((req, res) => {
  const readable = fs.createReadStream(resolve(__dirname, 'test.json'));
  let data = '';
  readable.on('data', (chunk) => {
    data += chunk.toString();
  })
  readable.on('end', () => {
    res.end(data);
  })
}).listen(3000);

However, after looking at the Koa source code, I found that the original framework was encapsulated:

// https://github.com/koajs/koa/blob/master/lib/application.js:256
function respond(ctx) {
  ...
  let body = ctx.body;
  if (body instanceof Stream) return body.pipe(res);
  ...
}

The framework makes a layer judgment when returning, because res is a writable Stream object. If the body is also a Stream object (at this time, the body is a readable Stream), use body.pipe(res) to respond in a Stream manner

So what is the role of the pipe method, what problems are solved, and what are the advantages of Stream over fs.readFile?

Disadvantages of fs.readFile

If fs.readFile is used to read the file, it seems to be OK, but there is a hidden danger, because it reads the data into memory at one time. When the data file is large, it will occupy a lot of memory and lead to memory leakage. Therefore, it is not recommended:

const http = require('http');
const path = require('path');
const fs = require('fs').promises;

http.createServer(async (req, res) => {
  // Read data into memory at one time
  // Large data files can lead to memory leaks
  const file = await fs.readFile(path.resolve(__dirname, 'test.json'));
  res.end(file);
}).listen(3000);

Advantages of Stream

There are four basic flows in Node.js:

  • Writable: a stream that can write data, such as fs.createWriteStream();
  • Readable: a stream from which data can be read. For example, fs.createReadStream();
  • Duplex: can be used as both Writable and Readable streams;
  • Transform: the stream that modifies or converts data when Duplex writes and reads data;

Here we mainly introduce Readable and Writable.

Streams can be seen almost everywhere in Node.js. For example, the following is a simple HTTP service:

const http = require('http');

http.createServer((req, res) => {
  let body = '';
  req.setEncoding('utf-8');
  
  req.on('data', (chunk) => {
    body += chunk;
  })
  
  req.on('end', () => {
    res.write(body);
    res.end();
  })
}).listen(3000);

In the above code, req is a Readable, which reads the request body data by listening to the data event. res is Writable, which is responsible for writing data and responding. write() and end() are API s provided by Writable.

Why is it better to use Stream? The reason is that the Stream processes chunks instead of all data. These chunks are stored in the internal buffer queue until they are consumed. If the size of the buffer reaches the specified threshold, the Stream will suspend reading or writing data until the current buffered data can be consumed.

The following code is an example of copying content from text.txt to dest-text.txt. It can be seen that we do not read all text.txt into memory at one time, but write part of it after reading, and repeat this process until the data is copied. This can make more efficient use of memory and avoid memory leakage.

const fs = require('fs');
const path = require('path');

const readable = fs.createReadStream(path.resolve(__dirname, 'text.txt'));
const writable = fs.createWriteStream(path.resolve(__dirname, 'dest-text.txt'));

readable.on('data', (chunk) => {
  writable.write(chunk);
})

readable.on('end', () => {
  writable.end();
})

Stream back pressure and Solutions

The advantages of Stream were introduced earlier to avoid memory leakage caused by loading a large amount of data at one time.

But there is still a problem with the above code. We know that the data flows from Readable to Writable in the form of stream, and will not be read into memory. Generally, the speed of writing to the disk is lower than that of reading the disk. In this way, the upstream flow rate is too fast and the downstream has no time to consume, resulting in data backlog, that is, "back pressure". At this time, a balance mechanism is needed to make the data flow smoothly from one end to the other.

We know that the write(chunk) method of the Writable object can accept some data write streams. When the internal buffer is smaller than that of the Writable object, the configured highWaterMark returns true. Otherwise, the return of false indicates that the internal buffer is full or overflowing. This is a manifestation of back pressure. Let's deal with the back pressure:

const fs = require('fs');
const path = require('path');

function _write(dest, chunk) {
  return new Promise(resolve => {
    if (dest.write(chunk)) {
      // Buffer not full, continue writing
      return resolve(null);
    }
    // Buffer space is full, pause writing data to stream
    // Until the drain event is triggered, it indicates that the data in the buffer has been emptied and can be written again
    dest.once('drain', resolve);
  })
}

function myCopy(src, dest) {
  return new Promise(async (resolve, reject) => {
    dest.on('error', reject);
    
    try {
      for await (const chunk of src) {
        // Combined with asynchronous iterators, there is no need to listen for data and end events
        // If the buffer is full, wait for the drain event to trigger a call to resolve
        // When the buffer has space to continue writing, read and write again
        await _write(dest, chunk);
      }
      resolve();
    } catch (err) {
      reject(err);
    }
  })
}

const readable = fs.createReadStream(path.resolve(__dirname, 'text.txt'));
const writable = fs.createWriteStream(path.resolve(__dirname, 'dest-text.txt'));
myCopy(readable, writable);

If we use the write() method to write data without correctly processing the back pressure, it will lead to buffer data overflow, and the later data that is too late to consume will have to reside in memory and will not be cleared until the program is processed. In the whole process of data backlog, processes will continue to consume system memory and have a great impact on other process tasks.

Some methods provided by the Stream module of Node.js, such as pipe() and pipeline(), have been used for us. Using these API methods, we do not need to consider dealing with the problem of "back pressure":

const fs = require('fs');
const path = require('path');
const stream = require('stream');
const util = require('util');
// Convert API to Promise form
const pipeline = util.promisify(stream.pipeline);

// readable.pipe() method
const readable = fs.createReadStream(path.resolve(__dirname, 'text.txt'));
const writeable = fs.createWriteStream(path.resolve(__dirname, 'dest-text.txt'));
readable.pipe(writeable);

// stream.pipeline() method
(async () => {
  await pipeline(
    fs.createReadStream(path.resolve(__dirname, 'text.txt')),
    fs.createWriteStream(path.resolve(__dirname, 'dest-text.txt'))
  )
  console.log('Pipeline succeeded');
})()

The above code implements the replication from Readable to Writable. In addition, if the back end needs to respond to a text content, it only needs to be written as follows:

const http = require('http');
const fs = require('fs');
const path = require('path');

http.createServer((req, res) => {
  const readable = fs.createReadStream(path.resolve(__dirname, 'text.txt'));
  readable.pipe(res);
  // After using the pipe() method, you don't need to call res.end()
  // res.end();
}).listen(3000);

reference resources

Stream - Node.js official document

Comment out why the on('data ') request is always hung? - Understand the two modes of Node.js Stream

Analysis on the use and implementation principle of pipe method of Node.js Stream module

Tags: Javascript node.js

Posted on Tue, 23 Nov 2021 19:33:07 -0500 by sineadyd