After a month's exploration, how can I operate AST as naturally as breathing

For a long time, front-end students have complex views on the compilation principle. Most people feel that they can't use such profound theoretical knowledge to write business. Moreover, the compilation principle is obscure and difficult to understand, which can't improve their professional knowledge in the front-end field. I don't think there's anything wrong with this idea, and I thought so before. In the front-end field, there are mainly several frameworks and tool class libraries strongly related to the compilation principle:

  • Represented by Babel, it mainly supports ECMAScript syntax, such as And?? Corresponding Babel plugin optional chaining [1] and Babel plugin nullish coalescing operator [2], such tools include ESBuild and swc. Similarly, there are "supersets" such as Scss and Less that are finally compiled into CSS. The characteristic of this kind of tools is that the code before transformation is actually at the same level as the transformation product, and their goal is to get the product that can run in the standard environment.
  • Represented by Vue, svelte and ASTRO, which has just been born, it mainly compiles and converts other custom files into JavaScript (or other products), such as. Vue. Svelte. Astro. The characteristic of this kind of tools is that the converted code may have a variety of products. For example, Vue's SFC will eventually build HTML, CSS and JavaScript.
  • A typical DSL implementation does not have a compilation product, but is consumed by a single compilation Engine, such as GraphQL (.graphql) and Prisma (.prisma) (and more familiar, such as HTML, SQL, Lex, XML, etc.), which does not need to be compiled into JavaScript. For example, GraphQL files are directly consumed by the engines implemented by GraphQL languages.
  • Language level conversion, such as TypeScript, Flow, CoffeeScript, and languages in which users are no longer necessarily front-end developers in a narrow sense, such as teacher Zhang Hongbo's ReScript (original BuckleScript), Dart, etc.

In either case, it seems that it is a hell of difficulty for non science class front-end students, but in fact, the community has always had various schemes to try to reduce the cost of AST operation, such as FB's jscodeshift[3]. Compared with Babel's Visitor API, jscodeshift provides an API of Command + chain call, which is more in line with the cognitive mode of front-end students (because like Lodash and RxJS), see how they are used:

The example is from the article of teacher Shenguang [4]. Since the focus of this article is not jscodeshift and gogocode, the ready-made examples are directly used here.

// Babel
const { declare } = require("@babel/helper-plugin-utils");

const noFuncAssignLint = declare((api, options, dirname) => {
  api.assertVersion(7);

  return {
    pre(file) {
      file.set("errors", []);
    },
    visitor: {
      AssignmentExpression(path, state) {
        const errors = state.file.get("errors");
        const assignTarget = path.get("left").toString();
        const binding = path.scope.getBinding(assignTarget);
        if (binding) {
          if (
            binding.path.isFunctionDeclaration() ||
            binding.path.isFunctionExpression()
          ) {
            const tmp = Error.stackTraceLimit;
            Error.stackTraceLimit = 0;
            errors.push(
              path.buildCodeFrameError("can not reassign to function", Error)
            );
            Error.stackTraceLimit = tmp;
          }
        }
      },
    },
    post(file) {
      console.log(file.get("errors"));
    },
  };
});

module.exports = noFuncAssignLint;

// jscodeshift
module.exports = function (fileInfo, api) {
  return api
    .jscodeshift(fileInfo.source)
    .findVariableDeclarators("foo")
    .renameTo("bar")
    .toSource();
};

Although the above is not a comparison of the same kind of operations, we can still see the difference between the two API styles.

And Ali's mother's gogocode[5], which encapsulates a layer based on Babel and obtains a command + Chain API similar to jscodeshift. At the same time, its API naming can be seen that it mainly faces the compilation principle Xiaobai. Jscodeshift also has the method of findVariableDeclaration, but gogocode is completely like find and replace:

$(code)
    .find("var a = 1")
    .attr("declarations.0.id.name", "c")
    .root()
    .generate();

It seems really simple, but doing so may also bring some problems. Why does Babel adopt the Visitor API? Similarly, in GraphQL Tools[6], the Visitor API is also used when adding Directive to GraphQL Schema, such as

import { SchemaDirectiveVisitor } from "graphql-tools";

export class DeprecatedDirective extends SchemaDirectiveVisitor {
  visitSchema(schema: GraphQLSchema) {}
  visitObject(object: GraphQLObjectType) {}
  visitFieldDefinition(field: GraphQLField<any, any>) {}
  visitArgumentDefinition(argument: GraphQLArgument) {}
  visitInterface(iface: GraphQLInterfaceType) {}
  visitInputObject(object: GraphQLInputObjectType) {}
  visitInputFieldDefinition(field: GraphQLInputField) {}
  visitScalar(scalar: GraphQLScalarType) {}
  visitUnion(union: GraphQLUnionType) {}
  visitEnum(type: GraphQLEnumType) {}
  visitEnumValue(value: GraphQLEnumValue) {}
}

The Visitor API is declarative. We declare what to do with which part of the statement. For example, I want to add a new condition to the judgment of all qualified If statements. When Babel traverses the AST (@ babel/traverse), it finds that the If statement is registered, and then execute it. The chain API of jscodeshift and gogocode is Imperative (implicit), we need to obtain the AST node first, and then use the API provided (encapsulated) for this node, which makes us likely to miss some boundary conditions and produce unexpected results.

What about the API of TypeScript? The Compiler API of TypeScript is mostly open, which is enough for tools such as CodeMod and AST Checker. For example, we use the native Compiler API to assemble a function:

import * as ts from "typescript";

function makeFactorialFunction() {
  const functionName = ts.factory.createIdentifier("factorial");
  const paramName = ts.factory.createIdentifier("n");
  const paramType = ts.factory.createKeywordTypeNode(
    ts.SyntaxKind.NumberKeyword
  );
  const paramModifiers = ts.factory.createModifier(
    ts.SyntaxKind.ReadonlyKeyword
  );
  const parameter = ts.factory.createParameterDeclaration(
    undefined,
    [paramModifiers],
    undefined,
    paramName,
    undefined,
    paramType
  );

  // n <= 1
  const condition = ts.factory.createBinaryExpression(
    paramName,
    ts.SyntaxKind.LessThanEqualsToken,
    ts.factory.createNumericLiteral(1)
  );

  const ifBody = ts.factory.createBlock(
    [ts.factory.createReturnStatement(ts.factory.createNumericLiteral(1))],
    true
  );

  const decrementedArg = ts.factory.createBinaryExpression(
    paramName,
    ts.SyntaxKind.MinusToken,
    ts.factory.createNumericLiteral(1)
  );

  const recurse = ts.factory.createBinaryExpression(
    paramName,
    ts.SyntaxKind.AsteriskToken,
    ts.factory.createCallExpression(functionName, undefined, [decrementedArg])
  );

  const statements = [
    ts.factory.createIfStatement(condition, ifBody),
    ts.factory.createReturnStatement(recurse),
  ];

  return ts.factory.createFunctionDeclaration(
    undefined,
    [ts.factory.createToken(ts.SyntaxKind.ExportKeyword)],
    undefined,
    functionName,
    undefined,
    [parameter],
    ts.factory.createKeywordTypeNode(ts.SyntaxKind.NumberKeyword),
    ts.factory.createBlock(statements, true)
  );
}

const resultFile = ts.createSourceFile(
  "func.ts",
  "",
  ts.ScriptTarget.Latest,
  false,
  ts.ScriptKind.TS
);

const printer = ts.createPrinter({ newLine: ts.NewLineKind.LineFeed });

const result = printer.printNode(
  ts.EmitHint.Unspecified,
  makeFactorialFunction(),
  resultFile
);

console.log(result);

The above code will create such a function:

export function factorial(readonly n: number): number {
  if (n <= 1) {
    return 1;
  }
  return n * factorial(n - 1);
}

It can be seen that the TypeScript Compiler API is imperative, but different from jscodeshift, its API is not chained, but more composite? We start with the identifier, assemble the parameters, If Statement conditions and code blocks, function return statements, and finally complete the assembly through createFunctionDeclaration. We can see that its use cost is not low at a glance , you need to have a clear understanding of Expression, Declaration, Statement and other related concepts, such as which token s the above If Statement needs to be assembled, and the AST of TypeScript, such as interface, type alias, decorator, etc. (you can view the AST structure of TypeScript in real time in ts-ast-viewer[7]).

Therefore, in this case, TS morph [8] (the original TS simple AST) was born. It is encapsulated on the basis of TypeScript Compiler API, which greatly reduces the use cost. For example, the above example is converted to TS morph:

import { Project } from "ts-morph";

const s = new Project().createSourceFile("./func.ts", "");

s.addFunction({
  isExported: true,
  name: "factorial",
  returnType: "number",
  parameters: [
    {
      name: "n",
      isReadonly: true,
      type: "number",
    },
  ],
  statements: (writer) => {
    writer.write(`
if (n <=1) {
  return 1;
}

return n * factorial(n - 1);
    `);
  },
}).addStatements([]);

s.saveSync();

console.log(s.getText());

Yes, in order to avoid the scenario of assembling like TypeScript Compiler API, TS morph does not provide API or related capabilities for creating IfStatement statements. The most convenient way is to directly call writeFunction to write directly.

Obviously, this operation has advantages and disadvantages. When creating Function, Class and Import declarations, we can directly pass in their structures, but for functions (Class methods) For internal statements, TS morph only provides the simplest capability at present, which may indeed reduce a lot of costs in many scenarios, but it is also doomed to be unable to be used in overly complex or more demanding scenarios.

When I wrote here, I suddenly thought of a special example: Vite[9]. As we all know, Vite will rewrite the dependency and convert the Bare Import into a correct import that can actually link to the code. For example, import console from 'console' will be rewritten as import console from '/ node_ Modules / console / SRC / index. JS' (the specific path is specified by main and module for esm modules). The logic of this part mainly depends on magic string and ES module lexer. Through es module lexer, the start and end positions of the identification of the import statement in the whole file are obtained, And replace it with a relative import that the browser can resolve through magic string (such as importAnalysisBuild.ts[10]). This also brings a new inspiration: for code conversion that only focuses on specific scenarios, such as importing statements to Vite, decorators to invertify and TypeDI, the use of AST in a big fight belongs to killing chickens rather than cattle knives. Similarly, TS morph works wonders when it only operates on coarse-grained ast nodes (such as the entire Class structure).

In fact, there may still be similar scenarios:

  • I just want to pass in the file path, and then I want to get all the class names in the file, the identification of the import statement (for example, fs is the identifier of import fs from 'fs', that is, Module Specifier), which are named imports (import {spawn} from' child_process') and which are type only imports (import type {options} from 'prettier'), Then do some corresponding operations, and the complexity of TS morph is still beyond my expectation.
  • I want to learn about compilation, but I don't want to start with textbooks and systematic courses. I just want to directly come to theory and practice and see how AST operation can make flowers. Maybe I'm more interested in learning in the future?
  • I am maintaining an open source project and preparing to send a Breaking Change. I hope to provide CodeMod to help users directly upgrade to the new version of code. Common operations may include updating import statements, updating JSX component properties, etc. In other words, in the scaffold + template scenario, some of my templates have only slight code differences, and I don't want to maintain multiple files. Instead, I want to extract the public part and dynamically write template specific code through AST. But! I haven't learned the principle of compilation! I don't want to take the time to pass all the API s of TS morph

After so much foreshadowing, it's time to welcome today's protagonist, @ TS Morpher [11] has made an additional layer of encapsulation based on TS morph. If the complexity of TypeScript Compiler API is 10, the complexity of TS morph is about 4, while the complexity of @ TS Morpher is less than 1. As a non professional class, a front-end kid who has not learned the compilation principle and played Babel, it is my inspiration when I need to do AST Checker and CodeMod.

As we know, AST operations can usually be easily divided into multiple units (if you didn't know before, congratulations on knowing now), such as obtaining nodes - checking nodes - modifying nodes 1 - modifying nodes 2 - saving source files. Each part can be separated independently. If we can call methods with clear responsibilities like Lodash, Or if you string operators like RxJS, the ast operation doesn't seem so terrible. Some students may say, why do you want to set a doll? Layer by layer? Then I can only say, whether it's a doll or not, it's easy to use and it's done. I directly press all declarations, statements and assignments, such as this (for more examples, please refer to the official website):

import { Project } from "ts-morph";
import path from "path";
import fs from "fs-extra";
import { createImportDeclaration } from "@ts-morpher/creator";
import { checkImportExistByModuleSpecifier } from "@ts-morpher/checker";
import { ImportType } from "@ts-morpher/types";

const sourceFilePath = path.join(__dirname, "./source.ts");

fs.rmSync(sourceFilePath);
fs.ensureFileSync(sourceFilePath);

const p = new Project();
const source = p.addSourceFileAtPath(sourceFilePath);

createImportDeclaration(source, "fs", "fs-extra", ImportType.DEFAULT_IMPORT);

createImportDeclaration(source, "path", "path", ImportType.NAMESPACE_IMPORT);

createImportDeclaration(
  source,
  ["exec", "execSync", "spawn", "spawnSync"],
  "child_process",
  ImportType.NAMED_IMPORT
);

createImportDeclaration(
  source,
  // First item will be regarded as default import, and rest will be used as named imports.
  ["ts", "transpileModule", "CompilerOptions", "factory"],
  "typescript",
  ImportType.DEFAULT_WITH_NAMED_IMPORT
);

createImportDeclaration(
  source,
  ["SourceFile", "VariableDeclarationKind"],
  "ts-morph",
  ImportType.NAMED_IMPORT,
  true
);

This series of method calls creates:

import fs from "fs-extra";
import * as path from "path";
import { exec, execSync, spawn, spawnSync } from "child_process";
import ts, { transpileModule, CompilerOptions, factory } from "typescript";
import type { SourceFile, VariableDeclarationKind } from "ts-morph";

Let's take a slightly more complicated example:

import { Project } from "ts-morph";
import path from "path";
import fs from "fs-extra";
import {
  createBaseClass,
  createBaseClassProp,
  createBaseClassDecorator,
  createBaseInterfaceExport,
  createImportDeclaration,
} from "@ts-morpher/creator";
import { ImportType } from "@ts-morpher/types";

const sourceFilePath = path.join(__dirname, "./source.ts");

fs.rmSync(sourceFilePath);
fs.ensureFileSync(sourceFilePath);

const p = new Project();
const source = p.addSourceFileAtPath(sourceFilePath);

createImportDeclaration(
  source,
  ["PrimaryGeneratedColumn", "Column", "BaseEntity", "Entity"],
  "typeorm",
  ImportType.NAMED_IMPORTS
);

createBaseInterfaceExport(
  source,
  "IUser",
  [],
  [],
  [
    {
      name: "id",
      type: "number",
    },
    {
      name: "name",
      type: "string",
    },
  ]
);

createBaseClass(source, {
  name: "User",
  isDefaultExport: true,
  extends: "BaseEntity",
  implements: ["IUser"],
});

createBaseClassDecorator(source, "User", {
  name: "Entity",
  arguments: [],
});

createBaseClassProp(source, "User", {
  name: "id",
  type: "number",
  decorators: [{ name: "PrimaryGeneratedColumn", arguments: [] }],
});

createBaseClassProp(source, "User", {
  name: "name",
  type: "string",
  decorators: [{ name: "Column", arguments: [] }],
});

These codes will create:

import { PrimaryGeneratedColumn, Column, BaseEntity, Entity } from "typeorm";

export interface IUser {
  id: number;

  name: string;
}

@Entity()
export default class User extends BaseEntity implements IUser {
  @PrimaryGeneratedColumn()
  id: number;

  @Column()
  name: string;
}

In fact, there is nothing complicated in essence, that is, the chained API of TS morph is encapsulated, and the addition, deletion, modification and query methods for common statement types are as follows:

  • Currently, Import, Export and Class are supported. JSX(TSX) should be supported next.
  • @TS Morpher splits the addition, deletion, modification and query methods into different package s. For example, the methods in @ TS Morpher / helper are used to obtain the declaration or declaration Identifier. For example, you can obtain all the imported module specifiers in a file (fs is for import fsMod from 'fs') or all the imported declarations, but you don't care what the declaration looks like, Throw it directly to @ TS Morpher / checker and call checkImportType to see what type of import it is.

Why should I do this? Because in my current project, I need to make some source level constraints. For example, I want to force the entry files of all main applications and sub applications to import a new SDK, such as import 'foo error reporter'. If there is no import, I will give you the whole one! Since not all sub applications and main applications can be controlled, such a research forced bayonet is required to be placed on the CI pipeline. If so, then using TS morph may be almost enough. Ah, sorry, I just think the AST operation can be simpler. Just build another layer by myself.

It also has 100% single test coverage and 100 + methods, but it has not reached the ideal state. For example, it reduces the complexity of AST operation to less than 0.5. I think you can click the button to call the method by providing a visual playground, preview the conversion results in real time, and combine some common capabilities on it, Such as merging the import statements of two files, batch changing JSX components, and so on.

This is also my gain from a month of tossing AST. I hope you can gain something~

reference material

[1]

babel-plugin-optional-chaining: https://github.com/babel/babel/blob/main/packages/babel-plugin-proposal-optional-chaining

[2]

babel-plugin-nullish-coalescing-operator: https://github.com/babel/babel/blob/main/packages/babel-plugin-proposal-nullish-coalescing-operator

[3]

jscodeshift: https://github.com/facebook/jscodeshift

[4]

Shenguang: https://www.zhihu.com/people/di-xu-guang-50

[5]

gogocode: https://gogocode.io/

[6]

GraphQL Tools: https://github.com/ardatan/graphql-tools

[7]

ts-ast-viewer: https://ts-ast-viewer.com/#

[8]

ts-morph: https://ts-morph.com/

[9]

Vite: https://github.com/vitejs/vite

[10]

importAnalysisBuild.ts: https://github.com/vitejs/vite/blob/545b1f13cec069bbae5f37c7540171128f439e7b/packages/vite/src/node/plugins/importAnalysisBuild.ts#L217

[11]

@ts-morpher: https://ts-morpher-docs.vercel.app/

Posted on Tue, 23 Nov 2021 07:02:50 -0500 by kdoggy