Create a programming language yourself

Please, don't talk much and do it

Start with bison

Let's first define several data types to be used in the language

typedef struct YYSTYPE
{
	string vSTRING; //String type
	int  vINTEGER;//Integer type
	double vDOUBLE;//double type
	struct CompileStruct* vCompileStruct;//Compilation structure
	Statement* vStatement;//Grammatical structure
} YYSTYPE;

That's enough. Just these types. In addition to defining the types required in the language, these classes also provide the use types of bison. For example, defining a function name requires the vstring type. The most important is the vStatement type, which is the basic type of all expressions and syntax structures. No matter any expression, in menthol parsing, Will be defined as statement type

First look at the token that will be used in the whole menthol

%token <vSTRING>VARIDENTIFIER 
%token <vSTRING> IDENTIFIER 
%token <vSTRING> GLOBALVARIDENTIFIER
%token <vINTEGER>NUMBER
%token <vSTRING>STRING
%token <vDOUBLE>DOUBLE
%token <vINTEGER>TRUE_KEYWORD
%token <vINTEGER>FALSE_KEYWORD
%token IF ELSE FOR BREAK  TRY EXCEPT THROW IMPORT MODULE USE
%token CONTINUE RETURN  WHILE   NULL_KEYWORD
%token POWER_OP NEQ_OP OR_OP AND_OP GE_OP LE_OP EQ_OP
%token ADD_ASSIGN SUB_ASSIGN DIV_ASSIGN MUL_ASSIGN ASSIGN_ASSIGN
%token MOD_ASSIGN AND_ASSIGN OR_ASSIGN XOR_ASSIGN 
%token SHIFT_LEFT_OP SHIFT_RIGHT_OP WMAIN  DEF  VAR  IN ARRAYSECTION DICT_OP TYPEOF CONST MMRT

Starting from IF, it should be a constant string that has been determined, so as long as it is marked, the first 8 lines need to define the type

About Statement

This is the most important part of syntax parsing. It is the basic type of all expressions. All expressions, whether if, while or for, are subtypes of Statement, as defined below

//Parser.h
class Statement
{
public:
	virtual void CreateCode()=0;
	virtual void AddChilder(Statement* s){}
	virtual void Release()=0;
	NodeType NType;	
	Statement():ParentNode(0),bytenumber(0),startipi(0),endipi(0),ilength(0){}
	
	Statement* ParentNode;
	int wfileaddressline;
	int bytenumber;
	string name;
	int startipi;
	int endipi;
	int ilength;
};

As can be seen from the above code, there are two pure virtual functions. CreateCode is the code generated by each expression according to its own logic. Addchild is a virtual function. It should be some expressions with sub expressions. For example, the content in module is the sub expression of module expression, and some are not. For example, define a var $a; This has nothing, and there is no subexpression,

NodeType indicates what type of expression the current expression is. Expressions like if, while and for are defined as follows

enum NodeType{
	MNT_FunctionParameter,//Function parameters
	MNT_FunctionParameterWithDefault,//Function parameters with default values
	MNT_TryParameter,//try
	MNT_VarIdentIfier,//$local variable
	MNT_FunctionParameterStatement,//Function parameter set
	MNT_ExpressionStatement,
	MNT_ExpressionList,
	MNT_AssignmentDefinition,
	MNT_InitializationDefinition,
	MNT_AssignmentList,
	MNT_InitializationExpression,
	MNT_InitializationList,
	MNT_BuiltinTypeDeclare,
	MNT_ArithmeticExpressionDefinition,
	MNT_IfStatement,
	MNT_WhileStatement,
	MNT_ForStatement,
	MNT_ArrayDeclare,
	MNT_DictDeclare,
	MNT_ArrayElement,
	MNT_DictElement,
	MNT_ContinueExpression,
	MNT_BreakExpression,
	MNT_FunctionDefinition,
	MNT_ReturnExpression,//return
	MNT_TryStatement,
	MNT_ThrowExpression,
	MNT_CodeBlockStatement,
	MNT_ImportPackageExpression,
	MNT_ModuleExpresson,
	MNT_ModuleFunCall,//Execution function
	MNT_ModuleFunctionDefinition,//Function definition in module
	MNT_TernaryExpression,//Ternary expression
	MNT_DictExpression,//
	MNT_FunctionCall,
	MNT_FunctionArguments,
	MNT_LogiceEpressionDefintion,
	MNT_MinusExpression,
	MNT_PlusExpression,
	MNT_Release,
	MNT_InverterExpression,
	MNT_TypeOfExpression,
	MNT_ModuleStatementList,
	MNT_ModuleDefine,//Module definition
	MNT_MainFunction,//main function
	MNT_InstanceExpression
};

It is used to judge whether some expressions are legal. For example, break cannot appear in places other than while and for

Let's take an example to illustrate the usage of Statement. Let's first look at the bison definition of while

while_statement: WHILE  '(' expression_definition ')' funciton_codeblock_statement
{
	$$ = new WhileStatement();
	$$->AddChilder($3);
	$$->AddChilder($5);
}
;

The program that parses it is defined as follows

class WhileStatement:public Statement{
public:
	WhileStatement();//structure
	~WhileStatement();
	void AddChilder(Statement* s);//Used to add subexpressions to Member members
	void CreateCode();//How to generate code
	int GetJmpPostion();
	int GetTemplateid();
	void SetBreakPostion(int b);
	void Release();
private:
	vector <Statement*> *Member;//Child members, two expressions_ definition´╝îfunciton_codeblock_statement
	int postion1;
	int postion2;
	int templateid;
	vector<int> *breakpostionvector;
};

For other expressions, such as if,for,module, etc., their basic definitions are not much worse.

This process will add all expressions with Statement as the base class to the statementlist class in a hierarchical relationship
In the variable of CompileStructTable

vector <Statement*> *CompileStructTable;

After all programs are parsed, call StatementList::CreateCode, which calls all the CreateCode methods in the expression subtype to generate compiled assembler (program defined syntax similar to assembler), and then store it in a certain format in the file, which will be stored later.

Now, starting with bison

1. Variable, which defines, for example, parameter names and variable names
VARIDENTIFIER{
	$$ =new VarIdentIfier($1);
}

When a variable identifier, that is, a variable starting with $in lex, is encountered, the variable name will be used as a parameter
In the VarIdentIfier constructor

//Parser.cpp
VarIdentIfier::VarIdentIfier(string s){
	wfileaddressline = lineno;
	name = s;
	NType = MNT_VarIdentIfier;
}
IDENTIFIER {
	StatementList *ls = (StatementList*)parm;
	ls->AddStringConstant(string($1));
	BuiltinTypeDeclare* btd =new BuiltinTypeDeclare();
	btd->SetFunctionPointerOrModule($1,1);
	$$= btd;
 } 
2.IDENTIFIER is used in many places, such as function name, package name, attribute name, etc

For example, when importing a package

import aaa

aaa is defined as IDENTIFIER

3.GLOBALVARIDENTIFIER, which is used to define global variables starting with @ in the module
//bison.y
GLOBALVARIDENTIFIER{
 	StatementList *ls = (StatementList*)parm;
	ls->AddStringConstant(string($1));
 	$$ = new VarIdentIfier((char*)$1.c_str());}
   ;

When bison is parsed, the parser will be the same as defining the variable, but with one more step, the global variable name will be stored as a string in the data section of the program as fixed data. The reason for this is that in the virtual machine, the program can dynamically determine whether the module has a global variable when running

With the above foundation, you can start parsing grammar step by step

I don't want to write. The next section starts with expression parsing

Posted on Mon, 06 Sep 2021 18:15:26 -0400 by play_