The JSON of getting started with Antlr4

In this chapter, we will learn how to write JSON syntax files, that is, how to construct a complete syntax by reading the reference manual, sample code, and existing non ANTLR syntax. Then we'll use listeners or accessors to turn JSON format into XML.

Note: JSON is a data structure for storing key value pairs. Because values themselves can also be used as containers for key value pairs, JSON can contain nested structures.

1, Top down design -- writing JSON syntax

In this chapter, our goal is to read JSON reference manual , view its syntax description diagram and existing syntax to construct an ANTLR syntax that can parse JSON. Next, we will extract the key words from the JSON reference manual and write them into ANTLR rules step by step.

A JSON file can be an object or an array of values

Grammatically, this is just a choice pattern, so we can use the following rules to express it:

// A JSON file can be an object or an array of values
json : object
     | array
     ;

The next step is to decompose the sub rules referenced by JSON rules. For objects, JSON syntax is defined as follows:

An object is a collection of unordered key value pairs. An object begins with a left curly bracket {and ends with a right curly bracket}. Each key is followed by a colon:, key value pairs are separated by commas.

The syntax diagram on the JSON official website emphasizes that the key in the object must be a string. In order to transform the above natural language expression into a grammatical structure, we try to decompose it and extract key phrases that can indicate which mode to adopt. The "one object is" in the first sentence clearly tells us to create a rule named "object". Next, "a set of unordered key value pairs" is actually a sequence of several "key value pairs". The "unordered set" indicates the semantic meaning of the key of the object, that is, the order of the key is meaningless. In the second sentence, a lexical symbol dependency is introduced. An object starts and ends with left and right curly brackets. The last sentence further specifies the details of the key value pair sequence: separated by commas. At this point, we can get the following syntax written by ANTLR tag:

// An object is a collection of unordered key value pairs. An object begins with a left curly bracket {and ends with a right curly bracket}.
// Each key is followed by a colon:, key value pairs are separated by commas
object : '{' pair (',' pair)* '}' 
       | '{' '}' 
       ;
pair : STRING ':' value;

Next, let's take a look at another advanced structure in JSON - arrays. The syntax of the array is described as follows:

An array is an ordered collection of values. An array begins with an open square bracket and ends with a right square bracket. Values are separated by commas

As with the object rule, array contains a sequence pattern separated by commas and a lexical symbol dependency between left and right square brackets.

// An array is an ordered collection of values. An array begins with an open square bracket and ends with a right square bracket.
// Values are separated by commas
array : '[' value (',' value)* ']'
      | '[' ']'
      ;

On the basis of the appeal rules, we need to write the rule value. By looking at the JSON reference manual, we can see that the syntax description of value is as follows:

A value can be a double quoted string, a number, true\false, null, an object, or an array.

Obviously, this is a very simple choice mode.

// A value can be a double quoted string, a number, true\false, null, an object, or an array.
value : STRING
      | NUMBER
      | 'true'
      | 'false'
      | 'null'
      | object
      | array
      ;

Here, because the value rule references object and array, it becomes (indirectly) a recursive rule. Above are all the syntax rules for parsing JSON. Let's take a look at the lexical rules.

According to the JSON syntax reference, the string is defined as follows:

A string is a sequence of zero or more Unicode characters surrounded by double quotes, where the Unicode characters are escaped with a backslash. A single character is represented by a string of length 1.

The string definition of JSON is very similar to that of C/Java. In fact, in the previous article, we have written the ANTLR lexical rules for strings, and the JSON string definition here only adds the escape of Unicode characters to the strings we wrote before. We then look at the JSON reference manual to get the following characters that need to be escaped.

Therefore, our string rule is defined as follows:

// A string is a sequence of zero or more Unicode characters surrounded by double quotation marks, where characters are escaped with a backslash.
// A single character is represented by a string of length 1
STRING : '"' (ESC | ~["\\])* '"';
fragment ESC : '\\' (["\\/bfnrt] | UNICODE);
fragment UNICODE : 'u' HEX HEX HEX HEX;
fragment HEX : [0-9a-fA-F];

Where the ESC fragment rule matches a Unicode sequence or predefined escape character. In Unicode fragment rules, we define a HEX fragment rule to replace the hexadecimal digits that need to be written many times.

The last lexical symbol to write is NUMBER.

// A number is very similar to a number in C/Java except that octal and hexadecimal are not allowed
NUMBER
    :   '-'? INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3, -4.5
    |   '-'? INT EXP             // 1e10 -3e4
    |   '-'? INT                 // -3, 45
    ;
fragment INT :   '0' | [1-9] [0-9]* ; // Numbers other than zero are not allowed to start with 0
fragment EXP :   [Ee] [+\-]? INT ; // \-Is an escape of because - in [...] is used to represent a range

Unlike the CSV syntax in the previous chapter, JSON needs to handle extra white space characters.

WS  :   [ \t\n\r]+ -> skip ;

At this point, the complete JSON syntax file has been written. The following is the result of a complete JSON syntax file and tagging alternative branches:

grammar JSON;

// A JSON file can be an object or an array of values
json : object
     | array
     ;

// An object is a collection of unordered key value pairs. An object begins with a left curly bracket {and ends with a right curly bracket}.
// Each key is followed by a colon:, key value pairs are separated by commas
object : '{' pair (',' pair)* '}'   #AnObject
       | '{' '}'                    #EmptyObject / / empty object
       ;
pair : STRING ':' value;

// An array is an ordered collection of values. An array begins with an open square bracket and ends with a right square bracket.
// Values are separated by commas
array : '[' value (',' value)* ']'  #ArrayOfValues
      | '[' ']'                     #EmptyArray / / empty array
      ;

// A value can be a double quoted string, a number, true\false, null, an object, or an array.
value : STRING  #String
      | NUMBER  #Atom
      | 'true'  #Atom
      | 'false' #Atom
      | 'null'  #Atom
      | object  #ObjectValue
      | array   #ArrayValue
      ;

// A string is a sequence of zero or more Unicode characters surrounded by double quotation marks, where characters are escaped with a backslash.
// A single character is represented by a string of length 1
STRING : '"' (ESC | ~["\\])* '"';
fragment ESC : '\\' (["\\/bfnrt] | UNICODE);
fragment UNICODE : 'u' HEX HEX HEX HEX;
fragment HEX : [0-9a-fA-F];

// A number is very similar to a number in C/Java except that octal and hexadecimal are not allowed
NUMBER
    :   '-'? INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3, -4.5
    |   '-'? INT EXP             // 1e10 -3e4
    |   '-'? INT                 // -3, 45
    ;
fragment INT :   '0' | [1-9] [0-9]* ; // no leading zeros
fragment EXP :   [Ee] [+\-]? INT ; // \- since - means "range" inside [...]

WS  :   [ \t\n\r]+ -> skip ;

Let's use the ANTLR tool to test it.

2, Convert JSON to XML

In this section we will build a JSON to XML translator. For the following JSON inputs, the expected output is:

Among them, the < element > element is a tag that we need to generate during the translation process.

Because the listener cannot store values (the return type is void), we need ParseTreeProperty to store intermediate results.

Then we start with the simplest rules. The Atom alternative branch in the value rule is used to match the text content in the lexical symbol. For it, we only need to store the value in ParseTreeProperty.

    @Override
    public void exitAtom(JSONParser.AtomContext ctx) {
        setXml(ctx, ctx.getText());
    }

For string, we need to do an extra thing -- eliminate the first double quotes.

    @Override
    public void exitArrayOfValues(JSONParser.ArrayOfValuesContext ctx) {
        StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.append("\n");
        for (JSONParser.ValueContext valueContext : ctx.value()){
            stringBuilder.append("<element>");
            stringBuilder.append(getXml(valueContext));
            stringBuilder.append("<element>");
            stringBuilder.append("\n");
        }
        setXml(ctx,stringBuilder.toString());
    }

    @Override
    public void exitString(JSONParser.StringContext ctx) {
        setXml(ctx, stripQuotes(ctx.getText()));
    }

As for the object value and ArrayValue alternative branches of the value rule, you only need to call the object and array rule methods.

    @Override
    public void exitObjectValue(JSONParser.ObjectValueContext ctx) {
        // Analogy string value() {return object();}
        setXml(ctx,getXml(ctx.object()));
    }

    @Override
    public void exitArrayValue(JSONParser.ArrayValueContext ctx) {
        setXml(ctx,getXml(ctx.array()));
    }

After translating all the elements of the value rule, we need to process the key value pairs and convert them into tags and text. For STRING ':' value, it corresponds to the tag name and tag value in XML respectively. Therefore, their translation results are as follows:

    @Override
    public void exitPair(JSONParser.PairContext ctx) {
        String tag = stripQuotes(ctx.STRING().getText());
        String value = String.format("<%s>%s<%s>\n",tag,getXml(ctx.value()),tag);
        setXml(ctx,value);
    }

As for the object rule, we know that it is composed of a series of key value pairs. That is to say, we need to loop through the key value pairs and append the corresponding XML to the result stored in the syntax analysis tree.

@Override
    public void exitAnObject(JSONParser.AnObjectContext ctx) {
        StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.append("\n");
        for (JSONParser.PairContext pairContext : ctx.pair()){
            stringBuilder.append(getXml(pairContext));
        }
        setXml(ctx,stringBuilder.toString());
    }

    @Override
    public void exitEmptyObject(JSONParser.EmptyObjectContext ctx) {
        setXml(ctx,"");
    }

In the same way, for the array rule, we use the same method, the only difference is that we need to label the child nodes < element >

    @Override
    public void exitArrayOfValues(JSONParser.ArrayOfValuesContext ctx) {
        StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.append("\n");
        for (JSONParser.ValueContext valueContext : ctx.value()){
            stringBuilder.append("<element>");
            stringBuilder.append(getXml(valueContext));
            stringBuilder.append("<element>");
            stringBuilder.append("\n");
        }
        setXml(ctx,stringBuilder.toString());
    }

    @Override
    public void exitEmptyArray(JSONParser.EmptyArrayContext ctx) {
        setXml(ctx,"");
    }

Finally, we store the final result in the root node.

    @Override
    public void exitJson(JSONParser.JsonContext ctx) {
        setXml(ctx,getXml(ctx.getChild(0)));
    }

The complete translator code is as follows:

package json;

import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeProperty;

public class JSONToXMLListener extends JSONBaseListener {
    // Store the translated string of each subtree in the root node of the subtree
    private ParseTreeProperty<String> xml = new ParseTreeProperty<String>();

    public void setXml(ParseTree node, String value){
        xml.put(node, value);
    }

    public String getXml(ParseTree node){
        return xml.get(node);
    }

    /**
     * Remove the double quotes' 'at the beginning and end of the string
     * @param s
     * @return
     */
    public String stripQuotes(String s) {
        if ( s==null || s.charAt(0)!='"' ) return s;
        return s.substring(1, s.length() - 1);
    }


    @Override
    public void exitJson(JSONParser.JsonContext ctx) {
        setXml(ctx,getXml(ctx.getChild(0)));
    }

    @Override
    public void exitAnObject(JSONParser.AnObjectContext ctx) {
        StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.append("\n");
        for (JSONParser.PairContext pairContext : ctx.pair()){
            stringBuilder.append(getXml(pairContext));
        }
        setXml(ctx,stringBuilder.toString());
    }

    @Override
    public void exitEmptyObject(JSONParser.EmptyObjectContext ctx) {
        setXml(ctx,"");
    }

    @Override
    public void exitPair(JSONParser.PairContext ctx) {
        String tag = stripQuotes(ctx.STRING().getText());
        String value = String.format("<%s>%s<%s>\n",tag,getXml(ctx.value()),tag);
        setXml(ctx,value);
    }

    @Override
    public void exitArrayOfValues(JSONParser.ArrayOfValuesContext ctx) {
        StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.append("\n");
        for (JSONParser.ValueContext valueContext : ctx.value()){
            stringBuilder.append("<element>");
            stringBuilder.append(getXml(valueContext));
            stringBuilder.append("<element>");
            stringBuilder.append("\n");
        }
        setXml(ctx,stringBuilder.toString());
    }

    @Override
    public void exitEmptyArray(JSONParser.EmptyArrayContext ctx) {
        setXml(ctx,"");
    }

    @Override
    public void exitString(JSONParser.StringContext ctx) {
        setXml(ctx, stripQuotes(ctx.getText()));
    }

    @Override
    public void exitAtom(JSONParser.AtomContext ctx) {
        setXml(ctx, ctx.getText());
    }

    @Override
    public void exitObjectValue(JSONParser.ObjectValueContext ctx) {
        // Analogy string value() {return object();}
        setXml(ctx,getXml(ctx.object()));
    }

    @Override
    public void exitArrayValue(JSONParser.ArrayValueContext ctx) {
        setXml(ctx,getXml(ctx.array()));
    }
}

Write main method call test

import json.JSONLexer;
import json.JSONParser;
import json.JSONToXMLListener;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

import java.io.BufferedReader;
import java.io.FileReader;

public class JSONMain {
    public static void main(String[] args) throws Exception{
        BufferedReader reader = new BufferedReader(new FileReader("xxx\\json.txt"));
        ANTLRInputStream inputStream = new ANTLRInputStream(reader);
        JSONLexer lexer = new JSONLexer(inputStream);
        CommonTokenStream tokenStream = new CommonTokenStream(lexer);
        JSONParser parser = new JSONParser(tokenStream);
        ParseTree parseTree = parser.json();
        System.out.println(parseTree.toStringTree());

        ParseTreeWalker walker = new ParseTreeWalker();
        JSONToXMLListener listener = new JSONToXMLListener();
        walker.walk(listener, parseTree);

        String xml = listener.getXml(parseTree);
        System.out.println(xml);
    }
}

json.txt The contents are as follows:

{
    "id" : 1,
    "name" : "Li",
    "scores" : {
        "Chinese" : "95",
        "English" : "85"
    },
    "array" : [1.2, 2.0e1, -3] 
}

The operation results are as follows:

Postscript

In this chapter, we learned how to write JSON syntax files by reading the reference manual and adopting top-down design. You also learned to use listeners to implement translators from JSON to XML. It can be seen that our translation process is not achieved overnight, but adopts the idea of divide and rule, starting from the simplest translation, and then merging the local results.

 

Tags: JSON Fragment xml Java

Posted on Fri, 05 Jun 2020 04:39:38 -0400 by abdbuet