catalog
Lexical analysis: Lexical analysis -- TEST compiler (1)
Parsing: Syntax analysis -- TEST compiler (2)
Semantic analysis: Semantic analysis -- TEST compiler (3) 1 virtual machine
1.1 function
Read the intermediate code and execute it. The virtual machine here simulates the process of running the compiled TEST code on the machine
1.2 characteristics
- Using structure array code to store instructions and operands when reading
- Use the stack of operands to operate during execution
- The top register is the subscript of the next unit at the top of the stack, and the base register is the starting address of the data area of the current function in the stack
- There are generally no errors here, because the errors are reported in the previous processes, and there are no problems for those who can reach the virtual machine
1.3 design ideas
-
Because the execution process of virtual machine is to read the instructions of intermediate code continuously, the key is to realize the corresponding function of each instruction and determine which instruction is next. So you only need to read the intermediate code one by one, and each instruction as a branch performs the corresponding function according to the instruction name
-
How to determine which instruction is next: if you have learned the principle of computer composition, you know that there will be a program counter ip when you run the program, which is used to determine the next instruction. Here, the virtual machine also uses this. Normally, it reads the instructions of intermediate code in order, and adds one for each ip read, but if you encounter BR, BRF, etc., you need to skip If the instruction of transposition is used, the ip will be directly assigned to the corresponding operand, and then continue to execute in order.
-
Principle of operands stack: operands stack is to record how the value of each variable changes during the operation of the program, and to judge the conditions. Since it is a stack, it means that all operations are completed at the top of the stack. For example, STO is to assign the value of the stack top element to the variable corresponding to the operand. The operands stack also involves the problem of function opening up space. Every time a function enters, it needs to open up a space to store the returned location, base address and local variables of the function. With these, you can know where to go after the function is called. The function starts from the first location of the stack. These things are in the semantic analysis to determine how much space to open.
-
Because there are many kinds of instructions, you can choose to use many if or switch to judge each instruction, but which one is better? I read a blog about the comparison between if and switch, which mentioned that the average execution completion time of switch is about 2.33 times faster than the average execution completion time of if. The reason is that only once in the switch, variables and conditions are taken out for comparison, and every time in if, variables and conditions are taken out for comparison, so the efficiency of if is much slower than that of switch. Moreover, the more branches, the higher performance of switch is, the more obvious it is. Therefore, when choosing various instructions in our virtual machine program, switch is used for branch selection. At the same time, in order to make the code easier to understand and read, enumeration type can be used to represent each case.
-
Use switch for branch selection, but how to represent each case? If the numbers 1, 2, 3 and so on are used to represent each execution instruction, it will be very inconvenient, making the reading and understanding of the code more difficult, so it is not suitable. If you change to the string form of each instruction, it is a bit troublesome, but it is easy to understand a lot when reading. Therefore, we choose to combine these two types and use enumeration type to represent each instruction. In this way, it will be convenient to judge equality and read code. In my understanding, enumeration is a bit like the ා define instruction. It's convenient for us to understand the meaning when we read the code by renaming something. After thinking about it, I think the advantage of using enumeration is that if there is a large number of content to be renamed, it will make the code very long and troublesome if you use the ා define instruction completely. In addition, when judging enumeration type data, you don't need to use strcmp like a string. You can directly use = = to judge, which is very convenient.
-
At the beginning of typing the virtual machine code, I was always struggling with the symbol table generated by semantic analysis for the code of the virtual machine. I thought that without the symbol table, I could not find the specific identifier. But after repeatedly learning the principle of memory allocation, I realized that there is no need to pass in the symbol table. In the intermediate code, only when the operands are needed can the symbol table be checked. For example, in the semantic analysis, the operands of STO can get the relative address of the variable. What's useful for me is to distinguish the variable through the relative address, so I don't need to know what the variable looks like again. So in the virtual machine part, you just need to pass in the intermediate code instead of the symbol table.
#include<stdio.h> #include<ctype.h> #include<stdlib.h> #include<string.h> #include<string> #include<map> using namespace std ; struct Code{ //Intermediate code char opt[10]; int operand; }; Code code[1000];//Used to save intermediate code map<string, int> choseOpt ; //It is convenient to use map set to select different case s in switch, and it is convenient to judge strings enum opts {LOAD, LOADI, STO, ADD, SUB, MULT, DIV, BR, BRF, EQ, NOTEQ, GT, LES, GE, LE, AND, OR, NOT, IN, OUT, CAL, ENTER, RETURN}; //Replacing 1234 with a specific name makes the code easier to read //Initialize the map, and then select the case in the switch void mapInit() { choseOpt["LOAD"] = LOAD ; choseOpt["LOADI"] = LOADI ; choseOpt["STO"] = STO ; choseOpt["ADD"] = ADD ; choseOpt["SUB"] = SUB ; choseOpt["MULT"] = MULT ; choseOpt["DIV"] = DIV ; choseOpt["BR"] = BR ; choseOpt["BRF"] = BRF ; choseOpt["EQ"] = EQ ; choseOpt["NOTEQ"] = NOTEQ ; choseOpt["GT"] = GT ; choseOpt["LES"] = LES ; choseOpt["GE"] = GE ; choseOpt["LE"] = LE ; choseOpt["AND"] = AND ; choseOpt["OR"] = OR ; choseOpt["NOT"] = NOT ; choseOpt["IN"] = IN ; choseOpt["OUT"] = OUT ; choseOpt["CAL"] = CAL ; choseOpt["ENTER"] = ENTER ; choseOpt["RETURN"] = RETURN ; } //virtual machine void TESTmachine(){ FILE *in; char codein[100]; //Enter file name int codenum=0; //Number of instructions int top=0, base=0 ; //Top and bottom of stack int ip=0; //Current command position int stack[1000]; //Operand stack printf("Please enter the destination file name (including path):"); scanf("%s",codein); if((in=fopen(codein, "r"))==NULL){//Open input file printf("\n open%s Error!\n",codein); exit(-1) ; //End of run if there is an error } while(!feof(in)){ //Read intermediate code fscanf(in,"%s %d",&code[codenum].opt,&code[codenum].operand); codenum++; } codenum-- ; //The last read will add 1 more fclose(in); // for(int i=0;i<codenum;i++) // printf("%s %d\n",code[i].opt,code[i].operand); stack[0]=0; stack[1]=0; mapInit() ; //Initialize map memset(stack, 0, sizeof(stack)) ; while(ip < codenum){ //Execute the instruction until the last instruction, which represents the end of the main function Code temp = code[ip] ; //Use a temporary variable to perform the operation ip++ ; //Every time an instruction is executed, the address moves one bit backward switch(choseOpt[temp.opt]) { //Select the instruction to execute according to the operation code case LOAD : { //LOAD D loads the contents of D into the operand stack stack[top] = stack[temp.operand+base] ; //Find the location of the variables in the stack top++; break ; } case LOADI : { //LOADI a pushes constant a into the operand stack stack[top]=temp.operand; top++; break ; } case STO : { //STO D stores the contents of the top unit of the operand stack in D top--; //First, subtract one element from the top of the stack stack[temp.operand+base] = stack[top] ; break ; } case ADD : { //ADD will stack top unit and secondary stack top unit out of stack and ADD, and put them on top of stack stack[top-2] += stack[top-1] ; top--; break ; } case SUB : { //Subtract the secondary stack top unit from the stack top unit and stack it. Put the difference at the top of the stack. stack[top-2] = stack[top-2]-stack[top-1]; top--; break ; } case MULT : { //Stack the secondary stack top and stack top unit and multiply them, and place the product on the stack top. stack[top-2] = stack[top-1]*stack[top-2]; top--; break ; } case DIV : { //Divide the secondary stack top and stack top unit out of the stack and place the quotient at the top of the stack stack[top-2] = stack[top-2]/stack[top-1]; top--; break ; } case BR : { //BR lab unconditionally transferred to lab ip = temp.operand; //The operand records the position to jump break ; } case BRF : { //BRF if the logical value of stack top unit is false (0), transfer to lab if(stack[top-1]==0) ip = temp.operand; //The operand records the position to jump top-- ; break ; } case EQ : { //Make an equal comparison between the two units at the top of the stack, and put the result true or false (1 or 0) at the top of the stack stack[top-2] = (stack[top-2] == stack[top-1]) ; top--; break ; } case NOTEQ : { //The two units at the top of the stack do not equal the comparison, and put the result (1 or 0) at the top of the stack stack[top-2] = (stack[top-2] != stack[top-1]) ; top--; break ; } case GT : { //If the top of the secondary stack is greater than the top of the stack operand, the top of the stack is set to 1, otherwise it is set to 0 stack[top-2] = (stack[top-2] > stack[top-1]) ; top--; break ; } case LES : { //If the top of the secondary stack is less than the top of the stack operand, the top of the stack is set to 1, otherwise it is set to 0 stack[top-2] = (stack[top-2] < stack[top-1]) ; top--; break ; } case GE : { //If the top of the secondary stack is greater than or equal to the top of the stack operand, the top of the stack is set to 1, otherwise it is set to 0 stack[top-2] = (stack[top-2] >= stack[top-1]) ; top--; break ; } case LE : { //If the top of the secondary stack is less than or equal to the top of the stack operand, the top of the stack is set to 1, otherwise it is set to 0 stack[top-2] = (stack[top-2] <= stack[top-1]) ; top--; break ; } case AND : { //Do logic and operation on the top two units of the stack, and put the result (1 or 0) on the top of the stack stack[top-2] = (stack[top-2] && stack[top-1]) ; top--; break ; } case OR : { //Make logic or operation for two units at the top of the stack, and put the result (1 or 0) at the top of the stack stack[top-2] = (stack[top-2] || stack[top-1]) ; top--; break ; } case NOT : { //Reverse the logical value at the top of the stack stack[top-1] = !stack[top-1]; break ; } case IN : { //Read in an integer data from the standard input device (keyboard) and merge it into the operand stack printf("input data:\n"); scanf("%d", &stack[top]) ; top++; break ; } case OUT : { //Output the contents of the top unit of the stack to the standard output device (display) printf("Output:%d\n",stack[top-1]); top--; break ; } case CAL : { //Call function stack[top] = base ; //Record the base address of the main function stack[top+1] = ip ; //Record the position of the main function to be returned after function execution ip = temp.operand ; //Execute instruction position jumps to the beginning of the function base = top ; //The base address of the current function after entering the function break ; } case ENTER : { //Enter function body top += temp.operand ; //Open up space for functions break ; } case RETURN : { //Function return top = base ; //Space opened by release function ip = stack[top+1] ; //The second location holds the location to return to the main function base = stack[top] ; //Back to the bottom of the stack is the base address of the main function, which is reassigned break ; } } } } int main() { TESTmachine(); return 0 ; }3 Summary
- This is the whole content of the TEST compiler experiment. To be honest, in the theory class, I can't understand what I'm talking about at all. Only after I typed this code, I realized it. And it's a lot easier to learn the principles of computer composition and then look at the principles of compilation.
- After a while, I will participate in a compiler competition. I hope to be able to make many times better code at that time. To be honest, when I do this code, the teacher is too disgusting. I am not in the mood to do it well, so I have to deal with it casually. I really need to do a lot of optimization, and there are many unreasonable places in my code that can pass the inspection.