Erlang/Elixir: Using Leex and Yecc Parsing Domain Language (DSL)

This article needs to have a certain understanding of compilation principles
Lex Lexical Parser
Parser yecc
Project practice
Understand
Reference material

The purpose of this paper is to deal with Telegram Protocol Definition Language TL

This article needs to have a certain understanding of compilation principles.
Leex is a lexical analyzer implemented in Erlang language. It receives character stream input and generates symbol stream output.
Yecc is a Syntactic Parser implemented in Erlang language. It receives symbolic stream input and generates AST.

Lex Lexical Parser

A leex lexical analysis file consists of the following three parts:

  • Symbol Definitions.
    Definitions. This section defines character categories using regular expressions.

  • Symbolic Rules.
    Rules. Defines how to generate character matching rules for symbols

  • Symbol conversion Erlang code.
    In general, some auxiliary Erlang functions are defined here for further processing of TokenChars.

Parser yecc

yecc is a LALR-1 parser generator, similar to yacc. It receives a BNF grammar definition as input and generates an Erlang code for the parser.

Composition of grammar rule files

A. yrl grammar rule file consists of four parts:

  • Nonterminals.

What are Nonterminals? Things that can be expanded into smaller linguistic symbols, such as a code block, function block, control flow:

def test do Logger.info "This is a Nonterminals Code block" end
  • Terminals.
    end, def, as well asvariable Symbols that can no longer be expanded

  • Rootsymbol.
    Definition of tree root of abstract grammar tree, It points out, In the grammar rule file.yrlWhere does the middle rule begin to apply?.

  • Erlang code. (Optional)
    Transformation function

  • Project practice

    In this paper, an analysis is given. Telegram Examples of code generation for binary protocols

    inputMediaEmpty#9664f57f = InputMedia; inputMediaUploadedPhoto#f7aff1c0 file:InputFile caption:string = InputMedia; inputMediaPhoto#e9bfb4f3 id:InputPhoto caption:string = InputMedia; inputMediaGeoPoint#f9c44144 geo_point:InputGeoPoint = InputMedia; inputMediaContact#a6e45987 phone_number:string first_name:string last_name:string = InputMedia; inputMediaUploadedDocument#1d89306d file:InputFile mime_type:string attributes:Vector<DocumentAttribute> caption:string = InputMedia; inputMediaUploadedThumbDocument#ad613491 file:InputFile thumb:InputFile mime_type:string attributes:Vector<DocumentAttribute> caption:string = InputMedia; inputMediaDocument#1a77f29c id:InputDocument caption:string = InputMedia; inputMediaVenue#2827a81a geo_point:InputGeoPoint title:string address:string provider:string venue_id:string = InputMedia; inputMediaGifExternal#4843b0fd url:string q:string = InputMedia;

    Complete Protocol Definition File Reference scheme.tl

    Create a project

    This section briefly describes how to set up a project to generate lexical parser and parser through lexical rule file. xrl and grammatical rule file. yrl. First, we need to create a project:

    mix new leex_yecc_example mkdir src

    Compiling. xrl files and. yrl files manually as Erlang modules is tedious. Mix can automatically help you generate lexical analyzers and parsers, as long as you put. xrl files and. yrl files in the SRC subdirectory of the project root directory (such as leex_yecc_example/src). Executing mix compile will automatically help you generate lexical analyzer and parser corresponding. erl file.

    Mix supports compiling those files?

    iex(1)> Mix.compilers() [:yecc, :leex, :erlang, :elixir, :app]

    Create lexical files

    lexer.xrl

    Definitions. D = [0-9] NONZERODIGIT = [1-9] O = [0-7] HEX = [0-9a-fA-F] UPPER = [A-Z] LOWER = [a-z] EQ = (=) COLON = (:) SHARP = (#) WHITESPACE = [\s\t] TERMINATOR = \n|\r\n|\r COMMA = , ... ComplexType = ({LOWER}+\.{Capital}|{Capital}) VectorPrimitiveType = (V|v)(ector)(<)({PrimitiveType})(>) VectorComplexType = (V|v)(ector)(<)({ComplexType})(>) ... Rules. {COMMA} : skip_token. {WHITESPACE} : skip_token. {TERMINATOR} : skip_token. {MtpName}#{MtpId} : {token, {mtp_name, TokenLine, split_msg_type(TokenChars)}}. (flags:#) : {token, {flags_sharp_token,TokenLine, TokenChars}}. ...

    Create grammar files

    parser.yrl

    Nonterminals grammer field_items field_item . Terminals mtp_name mtp_id mtp_sharp flags_sharp_token field_name ... eq_token . Rootsymbol grammer. grammer -> mtp_name field_items eq_token return_type_vector_primitive: ['$1', '$2', '$3', '$4']. grammer -> mtp_name field_items eq_token return_type_vector_complex: ['$1', '$2', '$3', '$4']. grammer -> mtp_name field_items eq_token return_type_complex: ['$1', '$2', '$3', '$4']. ... field_items -> field_item : ['$1']. field_items -> field_item field_items : ['$1' | '$2']. field_item -> flags_sharp_token : ['$1']. field_item -> field_name field_primitive_type : [unwrap('$1'), unwrap('$2')]. ... Erlang code. unwrap({Type, _, V}) -> {Type, V}. strip_tail(S) -> lists:sublist(S, 1, length(S)-1).

    By calling parse function of parsing mode, the required AST can be generated. However, each type of symbol will be traversed by AST and code generation will be executed.

    Understand

    leex is a lexical analyzer. Its purpose is to receive input, apply lexical rules, identify the symbols in the input, and convert these symbols into some form so that the parser can generate AST.

    The final generated code is shown in the photo at the beginning of the article.

    4 April 2023, 13:30 | Views: 54

    Add new comment

    For adding a comment, please log in
    or create account

    0 comments