The purpose of this paper is to deal with Telegram Protocol Definition Language TL
This article needs to have a certain understanding of compilation principles.
Leex is a lexical analyzer implemented in Erlang language. It receives character stream input and generates symbol stream output.
Yecc is a Syntactic Parser implemented in Erlang language. It receives symbolic stream input and generates AST.
Lex Lexical Parser
A leex lexical analysis file consists of the following three parts:
Symbol Definitions.
Definitions. This section defines character categories using regular expressions.Symbolic Rules.
Rules. Defines how to generate character matching rules for symbolsSymbol conversion Erlang code.
In general, some auxiliary Erlang functions are defined here for further processing of TokenChars.
Parser yecc
yecc is a LALR-1 parser generator, similar to yacc. It receives a BNF grammar definition as input and generates an Erlang code for the parser.
Composition of grammar rule files
A. yrl grammar rule file consists of four parts:
-
Nonterminals.
What are Nonterminals? Things that can be expanded into smaller linguistic symbols, such as a code block, function block, control flow:
def test do Logger.info "This is a Nonterminals Code block" end
Terminals.
end, def, as well asvariable Symbols that can no longer be expanded
Rootsymbol.
Definition of tree root of abstract grammar tree, It points out, In the grammar rule file.yrlWhere does the middle rule begin to apply?.
Erlang code. (Optional)
Transformation function
Project practice
In this paper, an analysis is given. Telegram Examples of code generation for binary protocols
inputMediaEmpty#9664f57f = InputMedia; inputMediaUploadedPhoto#f7aff1c0 file:InputFile caption:string = InputMedia; inputMediaPhoto#e9bfb4f3 id:InputPhoto caption:string = InputMedia; inputMediaGeoPoint#f9c44144 geo_point:InputGeoPoint = InputMedia; inputMediaContact#a6e45987 phone_number:string first_name:string last_name:string = InputMedia; inputMediaUploadedDocument#1d89306d file:InputFile mime_type:string attributes:Vector<DocumentAttribute> caption:string = InputMedia; inputMediaUploadedThumbDocument#ad613491 file:InputFile thumb:InputFile mime_type:string attributes:Vector<DocumentAttribute> caption:string = InputMedia; inputMediaDocument#1a77f29c id:InputDocument caption:string = InputMedia; inputMediaVenue#2827a81a geo_point:InputGeoPoint title:string address:string provider:string venue_id:string = InputMedia; inputMediaGifExternal#4843b0fd url:string q:string = InputMedia;Complete Protocol Definition File Reference scheme.tl
Create a project
This section briefly describes how to set up a project to generate lexical parser and parser through lexical rule file. xrl and grammatical rule file. yrl. First, we need to create a project:
mix new leex_yecc_example mkdir src
Compiling. xrl files and. yrl files manually as Erlang modules is tedious. Mix can automatically help you generate lexical analyzers and parsers, as long as you put. xrl files and. yrl files in the SRC subdirectory of the project root directory (such as leex_yecc_example/src). Executing mix compile will automatically help you generate lexical analyzer and parser corresponding. erl file.
Mix supports compiling those files?
iex(1)> Mix.compilers() [:yecc, :leex, :erlang, :elixir, :app]
Create lexical files
lexer.xrl
Definitions. D = [0-9] NONZERODIGIT = [1-9] O = [0-7] HEX = [0-9a-fA-F] UPPER = [A-Z] LOWER = [a-z] EQ = (=) COLON = (:) SHARP = (#) WHITESPACE = [\s\t] TERMINATOR = \n|\r\n|\r COMMA = , ... ComplexType = ({LOWER}+\.{Capital}|{Capital}) VectorPrimitiveType = (V|v)(ector)(<)({PrimitiveType})(>) VectorComplexType = (V|v)(ector)(<)({ComplexType})(>) ... Rules. {COMMA} : skip_token. {WHITESPACE} : skip_token. {TERMINATOR} : skip_token. {MtpName}#{MtpId} : {token, {mtp_name, TokenLine, split_msg_type(TokenChars)}}. (flags:#) : {token, {flags_sharp_token,TokenLine, TokenChars}}. ...
Create grammar files
parser.yrl
Nonterminals grammer field_items field_item . Terminals mtp_name mtp_id mtp_sharp flags_sharp_token field_name ... eq_token . Rootsymbol grammer. grammer -> mtp_name field_items eq_token return_type_vector_primitive: ['$1', '$2', '$3', '$4']. grammer -> mtp_name field_items eq_token return_type_vector_complex: ['$1', '$2', '$3', '$4']. grammer -> mtp_name field_items eq_token return_type_complex: ['$1', '$2', '$3', '$4']. ... field_items -> field_item : ['$1']. field_items -> field_item field_items : ['$1' | '$2']. field_item -> flags_sharp_token : ['$1']. field_item -> field_name field_primitive_type : [unwrap('$1'), unwrap('$2')]. ... Erlang code. unwrap({Type, _, V}) -> {Type, V}. strip_tail(S) -> lists:sublist(S, 1, length(S)-1).
By calling parse function of parsing mode, the required AST can be generated. However, each type of symbol will be traversed by AST and code generation will be executed.
Understand
leex is a lexical analyzer. Its purpose is to receive input, apply lexical rules, identify the symbols in the input, and convert these symbols into some form so that the parser can generate AST.
The final generated code is shown in the photo at the beginning of the article.
Reference material