Protect the source code of the Node.js project

SaaS (Software as a Service) is a mode of providing software services through the Internet. Service providers are fully responsible for the construction, maintenance and management of software services, freeing their customers from these cumbersome work. For many small and medium-sized enterprises, SaaS is the best way to adopt advanced technology.

However, for large enterprises, the situation is different. Considering product customization, stable functions and mastering their own data assets, even if the cost increases, they are more willing to deploy relevant services on the enterprise's own hardware equipment, which is often referred to as privatization deployment.

In the process of privatization deployment, service providers must first ensure that their source code is not disclosed, otherwise the products can be copied and changed at will, and the gains outweigh the losses. The source code of traditional back-end running environments, such as Java and. NET, is compiled and deployed to the server, so there is no risk of disclosure. For Node.js, which is more and more widely used, it runs the source code. Even after compression confusion, it can be restored to a large extent.

This paper introduces a code protection scheme that can be used on the Node.js side, so that the Node.js project can also be privatized and deployed safely.

principle

When V8 compiles JavaScript code, the parser will generate an abstract syntax tree to further generate bytecode. Node.js has a built-in module called VM. When creating an instance of vm.Script, you can obtain the bytecode of the corresponding code as long as you pass in the produceCachedData attribute in the constructor and set it to true. For example:

const vm = require('vm');
const CODE = 'console.log("Hello world");'; // source code
const script = new vm.Script(CODE, {
  produceCachedData: true
});
const bytecodeBuffer = script.cachedData; // Bytecode

Moreover, this bytecode can run away from the source code:

const anotherScript = new vm.Script(' '.repeat(CODE.length), {
  cachedData: bytecodeBuffer
});
anotherScript.runInThisContext(); // 'Hello world'

This code is not easy to understand, which is mainly reflected in the first parameter passed in when creating vm.Script instance:

  1. Since the bytecode of the source code is already in bytecodeBuffer, why pass in the first parameter?
  2. Why pass in spaces of the same length as the source code?

First, when creating a vm.Script instance, V8 will check whether the bytecode (cachedData) matches the source code (the code passed in by the first parameter), so the first parameter cannot be omitted. Secondly, this check is very simple. It only compares whether the code length is consistent, so you can "cheat" this check by using spaces with the same length as the source code.

Careful readers will find that in this way, the bytecode is not completely separated from the source code, because the data of source code length is required. In fact, there are other ways to solve this problem. Imagine that since the length of the source code is checked, it means that the length information of the source code must be saved in the bytecode, otherwise it will not be compared. By reference   Relevant codes for V8 , it can be found that the header of the bytecode holds this information:

// The data header consists of uint32_t-sized entries:
// [0] magic number and (internally provided) external reference count
// [1] version hash
// [2] source hash
// [3] cpu features
// [4] flag hash

Item [2]   source hash   Is the length of the source code. However, because the buffer of Node.js is an array of Uint8Array type, [2] in uint32 array is equivalent to [8, 9, 10, 11] in uint8 array.

Uint8Array and Uint32Array

Then extract the data of the above positions:

const lengthBytes = bytecodeBuffer.slice(8, 12);

The results are similar to:

<Buffer 1b 00 00 00>

This is a byte order called little endian. The low order bytes are arranged at the low address end of the memory and the high order bytes are arranged at the high address end of the memory.

<Buffer 1b 00 00 00>   mean   0x0000001b, that is, decimal 27. The calculation method is as follows:

firstByte + (secondByte * 256) + (thirdByte * 256**2) + (forthByte * 256**3)

The code is as follows:

const length = lengthBytes.reduce((sum, number, power) => {
  return sum += number * Math.pow(256, power);
}, 0); // 27

In addition, there is a simpler method:

const length = bytecodeBuffer.readIntLE(8, 4); // 27

To sum up, the code running bytecode can be optimized as follows:

const length = bytecodeBuffer.readIntLE(8, 4);
const anotherScript = new vm.Script(' '.repeat(length), {
  cachedData: bytecodeBuffer
});
anotherScript.runInThisContext();

Compiled file

After clarifying the principle, try to compile a very simple project. The directory structure is as follows:

  • src/
    • lib.js
    • index.js
  • dist/
  • compile.js

The two files in the src directory are source code, and the contents are:

// lib.js
console.log('I am lib');
exports.add = function(a, b) {
  return a + b;
};
// index.js
console.log('I am index');
const lib = require('./lib');
console.log(lib.add(1, 2));

The dist directory is used to place compiled code. compile.js is a file that performs compilation. Its process is also very simple. Read the content of the source file, compile it into bytecode and save it as a file (dist/*.jsc):

const path = require('path');
const fs = require('fs');
const vm = require('vm');
const glob = require('glob'); // Third party dependent packages

const srcPath = path.resolve(__dirname, './src');
const destPath = path.resolve(__dirname, './dist');

glob.sync('**/*.js', { cwd: srcPath }).forEach((filePath) => {
  const fullPath = path.join(srcPath, filePath);
  const code = fs.readFileSync(fullPath, 'utf8');
  const script = new vm.Script(code, {
    produceCachedData: true
  });
  fs.writeFileSync(
    path.join(destPath, filePath).replace(/\.js$/, '.jsc'),
    script.cachedData
  );
});

After running node compile, you can generate the bytecode file corresponding to the source code in the dist directory, and then run the bytecode file. However, directly executing node index.jsc cannot run, because Node.js will execute the target file as JavaScript source code by default.

At this point, you need to use special loading logic for jsc files. Create a new file main.js in dist directory, as follows:

const Module = require('module');
const path = require('path');
const fs = require('fs');
const vm = require('vm');

// Loading extensions for jsc files
Module._extensions['.jsc'] = function(module, filename) {
  const bytecodeBuffer = fs.readFileSync(filename);
  const length = bytecodeBuffer.readIntLE(8, 4);
  const script = new vm.Script(' '.repeat(length), {
    cachedData: bytecodeBuffer
  });
  script.runInThisContext();
};

// Call bytecode file
require('./index');

Execute node dist/main. Although the jsc file can be loaded, another exception message appears:

ReferenceError: require is not defined

This is a strange problem. In Node.js, require is a very basic function. Why is it undefined? It turns out that Node.js will wrap its contents during the compilation of JS files. Take index.js as an example, the packaged code is as follows:

(function (exports, require, module, __filename, __dirname) {
  console.log('I am index');
  const lib = require('./lib');
  console.log(lib.add(1, 2));
});

The operation of wrapping is not in the step of compiling bytecode, but before. Therefore, to add a wrapper (Module.wrap) to compile.js:

const script = new vm.Script(Module.wrap(code), {
  produceCachedData: true
});

After wrapping, script.runInThisContext will return a function, which can be executed to run the module. The modified code is as follows:

Module._extensions['.jsc'] = function(module, filename) {
  // Omit N lines of code

  const compiledWrapper = script.runInThisContext();
  return compiledWrapper.apply(module.exports, [
    module.exports,
    id => module.require(id),
    module,
    filename,
    path.dirname(filename),
    process,
    global
  ]);
};

Execute node dist/main.js again, and another error message appears:

SyntaxError: Unexpected end of input

This is a mistake that makes people look confused and don't know where to start. However, a closer look at the console shows that two logs have been printed before the error message:

I am index
I am lib

It can be seen that the error message is generated when lib.add is executed. Therefore, the conclusion is that the logic outside the function can be executed normally, and the logic inside the function fails.

Recall the process of V8 compilation. In the process of parsing JavaScript code, the Toplevel part will be fully parsed by the interpreter to generate abstract syntax tree and bytecode. The Non Toplevel part is only pre parsed (syntax check), and will not generate a syntax tree, let alone a bytecode. The Non Toplevel part, that is, the function body part, is compiled only when the function is called.

So the problem is clear at a glance: the function body is not compiled into bytecode. Fortunately, this behavior can be changed:

const v8 = require('v8');
v8.setFlagsFromString('--no-lazy');

Set   no-lazy   Flag, and then execute node compile to compile, and the function body can also be fully parsed. The final compile.js code is as follows:

const path = require('path');
const fs = require('fs');
const vm = require('vm');
const Module = require('module');
const glob = require('glob');
const v8 = require('v8');
v8.setFlagsFromString('--no-lazy');

const srcPath = path.resolve(__dirname, './src');
const destPath = path.resolve(__dirname, './dist');

glob.sync('**/*.js', { cwd: srcPath }).forEach((filePath) => {
  const fullPath = path.join(srcPath, filePath);
  const code = fs.readFileSync(fullPath, 'utf8');
  const script = new vm.Script(Module.wrap(code), {
    produceCachedData: true
  });
  fs.writeFileSync(
    path.join(destPath, filePath).replace(/\.js$/, '.jsc'),
    script.cachedData
  );
});

dist/main.js code is as follows:

const Module = require('module');
const path = require('path');
const fs = require('fs');
const vm = require('vm');
const v8 = require('v8');
v8.setFlagsFromString('--no-lazy');

Module._extensions['.jsc'] = function(module, filename) {
  const bytecodeBuffer = fs.readFileSync(filename);
  const length = bytecodeBuffer.readIntLE(8, 4);
  const script = new vm.Script(' '.repeat(length), {
    cachedData: bytecodeBuffer
  });

  const compiledWrapper = script.runInThisContext();
  return compiledWrapper.apply(module.exports, [
    module.exports,
    id => module.require(id),
    module,
    filename,
    path.dirname(filename),
    process,
    global
  ]);
};

require('./index');

bytenode

In fact, if you really need to compile JavaScript source code into bytecode, you don't need to write so much code yourself. There is already a platform called npm   bytenode   Your package can do these things, and it does better in detail and compatibility.

Bytecode problem

Although source code can be protected after compilation into bytecode, bytecode also has some problems:

  • JavaScript source code can run in the Node.js environment of any platform, but bytecode is platform related. It can only run on what platform it is compiled on (for example, bytecode compiled under Windows cannot run under macOS).
  • After modifying the source code, it is cumbersome to compile it into bytecode again. For some configuration information such as database server address and port number, it is recommended not to compile it into bytecode, but still use the source file to modify it at any time.

Postscript

As a clever reader, you can guess that this article is written in a flashback way. The author first uses bytenode to complete the requirements, and then studies its principle.

Tags: Javascript node.js SaaS

Posted on Thu, 02 Sep 2021 18:54:11 -0400 by chrille112