XML and how it is parsed

Write before

brief introduction

Java like ours can be understood as a language for people to communicate with computers, while XML can be understood as a language for software to communicate with software. For example, how C and Java communicate, then XML and the JSON described later are our preferred languages.

XML refers to extensible markup language

Characteristic:

  1. XML is platform independent. Representing an XML file is the same whether it is opened on windows or Linux, or parsed out by Java or python.
  2. Is a stand-alone markup language. It makes sense to describe data without relying on anything, or to exist independently
  3. xml can describe itself

Why Learn XML

  1. Network Data Transfer (usually JSON)
    • Transferring data in a better way
    • Server and client may be different languages, XML may be cross-lingual
  2. Data Storage (rarely used)
  3. Configuration Files (Useful)
    • XML profiles allow interoperability between multiple languages

XML file

An.XML file is a way to save XML data

XML data can also exist in other ways (such as building XML data in memory or in strings)

Don't narrowly interpret the XML language as an XML file.

XML Syntax Format

1, XML document declaration

Document Declaration on first line

<?xml version="1.0" encoding="UTF-8"?>

2, tag (element/tag/node)

An XML document, consisting of one tag at a time

Markup syntax:

  • Start tag (open tag): <tag name>
  • Tag content: between start tag and end tag
  • End tag (closing tag): </tag name>

Example: Describe a name with a tag:

  • <name>Wang Er Ma Zi</name>

Tag name naming rules:

  • Can contain letters, numbers, other characters
  • Cannot start with numbers and punctuation
  • Cannot start with "xml" or "XML"
  • Cannot contain spaces and colons. 😃
  • Names are case sensitive

3, markers can be nested, but not crossed

Correct:

<person>
​		<name>Zhang San</name>
​		<age>18</age>
</person>

Error:

<person>
​		<name>Zhang San<age></name>18</age>
</person>

4. There must be one and only one root tag in an XML document

The outermost tag is a follow tag, consisting of a pair containing all but one:

Correct

<names>
​		<name>Zhang San</name>
​		<name>Li Si</name>
</names>

error

<name>Zhang San</name>
​<name>Li Si</name>

5, the hierarchical title of the marker

There are: child marker, father marker, brother marker, descendant marker, ancestor marker

For example:

<persons>
​	 <person>
​			<name>Zhang San</name>
​			<age>18</age>
​	</person>
​	<person>
​			<name>Li Si</name>
​			<age>20</age>
​	</persons>
</person>

name is a child tag of person.

name is a descendant tag of persons.

name is age's brother tag.

person is the parent tag of name.

persons are ancestor tags of name.

6, tag name allows duplication

7, tags have attributes in addition to start and end

Attributes are described in the start tag and consist of attribute names and attribute values, which are key-value pairs.

Format:

  • Can contain 0-N attributes
  • Property name cannot be duplicated
  • Spacing between multiple attributes
  • Attribute value must be quoted

For example:

<persons>
​		<person id="1001" groupid="01">
​				<name>Zhang San</name>
​				<age>18</age>
​		</person>
​		<person>
​				<name>Li Si</name>
​				<age>20</age>
​		</persons>
</person>

8, Notes

Comments cannot be written before document declarations

Comments cannot be nested

Format:

  • Comment Start: <!-
  • End of Comment: -- >

Grammar Advanced CDATA (Understanding)

CDATA is text data that should not be parsed by an XML parser.

Characters such as'<'and'&' are illegal in XML elements.

'<'causes an error because the parser interprets the character as the beginning of a new element.

'&'causes an error because the parser interprets the character as the beginning of the character entity.

Some text, such as JavaScript code, contains a large number of'<'or'&' characters. To avoid errors, you can define the script code as CDATA.

All content in the CDATA section is ignored by the parser.

CDATA Composition:

  • Start with'<![CDATA['.
  • End by']>'.

DOM4 Parse XML

Parsing steps:

  1. Introducing the jar file dom4j.jar
  2. Create an input stream that points to an XML file
    FileInputStream fis = new FileInputStream("address of xml file");
  3. Create an XML Reader object
    SAXReader sr = new SAXReader();
  4. Using the Read Tool object, read the input stream of the XML document and get the document object
    Document doc = sr.read(fis);
  5. Get the root element object in an XML document from the document object
    Element root = doc.getRootElement();

Document Object Document

Refers to the entire XML document loaded into memory.

Common methods:
1. Get the root element object in an XML document from the document object
Element root = doc.getRootElement();
2. Add Root Node
Element root = doc.addElement("root node name");

Element object

Refers to a single node in an XML document.

Common methods:
1. Get node name
String getName();
2. Get node content
String getText();
3. Set up node content
String setText();
4. Get the first child node object matching the name of the child node.
Element element(String child node name);
5. Get all child node objects
List elements();
6. Get the attribute value of the node
String attributeValue(String attribute name);
7. Get the contents of child nodes
String elementText(String child node name);
8. Add child nodes
Element addElement(String child node name);
9. Add Attributes
void addAttribute(String attribute name, String attribute value);

Resolve local file cases:

//1. Get the input stream of the file
FileInputStream fis = new FileInputStream("C:\\...\\books.xml");
//2. Create XML Reader Object
SAXReader sr = new SAXReader();
//3. Read the input stream of the XML document and get the document object through the reading tool
Document doc = sr.read(fis);
//4. Get the root node object of the document through the document object
Element root = doc.getRootElement();
//5. Get all the child nodes through the root node
List<Element> es = root.elements();
//6. Loop through three book s
for (Element e : es) {
    //1. Get the id attribute value
    String id = e.attributeValue("id");
    //2. Get the child node name and its contents
    String name = e.element("name").getText();
    //3. Get the child node info and its contents
    String info = e.element("info").getText();
    System.out.println("id="+id+",name="+name+",info="+info);
}

Resolve network file cases:

Implement a query for where numbers belong

String phone = "15500000000";
//1. Get the input stream to the XML resource
URL url = new URL("http://apis.juhe.cn/mobile/get?phone="+phone+"&dtype=xml&key=9f3923e8f87f1ea50ed4ec8c39cc9253");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();//Get the input stream under this web address
//2. Create an XML read object
SAXReader sr = new SAXReader();
//3. Read XML data by reading objects and return document objects
Document doc = sr.read(is);
//4. Get the root node
Element root = doc.getRootElement();
//5. Parse content
String code = root.elementText("resultcode");
if("200".equals(code)){
    Element result = root.element("result");
    String province = result.elementText("province");
    String city = result.elementText("city");
    if(province.equals(city)){
        System.out.println("The mobile phone number belongs to:"+city);
    }else{
        System.out.println("The mobile phone number belongs to:"+province+" "+city);
    }
}else{
    System.out.println("Please enter the correct mobile number");
}

Xpath parses XML

Path expression

Finding the contents of the XML directly through a series of symbols makes Xpath parsing XML files easier than previous DOM4 parsing.

Quickly find an element or group of elements by path
1. /: Start at the root node
2. //: Find descendant nodes from the node location where the lookup originated
3..: Find the current node
4....: Find the parent node
5. @: Select an attribute. Attribute usage:

  • [@Property Name='Value']
  • [@Property Name >'Value']
  • [@Property Name <'Value']
  • [@property name!='value']

Use steps

Find is done using two methods of the Node class: (Node is the parent interface of Document and Element)

Method 1:

Based on the path expression, find a matching single node: Element e = selectSingleNode("path expression");

Method 2:

List<Element> es = selectNodes("Path expression");

Case 1:

//1. Get the input stream of the file
FileInputStream fis = new FileInputStream("C:\\...\\books.xml");
//2. Create XML Reader Object
SAXReader sr = new SAXReader();
//3. Read the input stream of the XML document and get the document object through the reading tool
Document doc = sr.read(fis);
//3. Find all name nodes by Document Object+Xpath
List<Node> names = doc.selectNodes("//name");
for(int i=0; i<names.size(); i++){
    System.out.println(names.get(i).getName());
    System.out.println(names.get(i).getText());
}

Case 2:

String phone = "15500000000";
//1. Get the input stream to the XML resource
URL url = new URL("http://apis.juhe.cn/mobile/get?phone="+phone+"&dtype=xml&key=9f3923e8f87f1ea50ed4ec8c39cc9253");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();
//2. Create an XML read object
SAXReader sr = new SAXReader();
//3. Read XML data by reading objects and return document objects
Document doc = sr.read(is);

Node node = doc.selectSingleNode("//company");
System.out.println("Operators:"+node.getText());
is.close();

Generating XML from java

step

  1. Create an empty document object using the DocumentHelper
    Document doc = DocumentHelper.createDocument();

  2. Add a root node to a document object
    Element root = doc.addElement("root node name");

  3. Enrich our child nodes by root node object root
    Element e = root.addElement("element name");

  4. Create a file output stream for storing XML files
    FileOutputStream fos = new FileOutputStream("location to store");

  5. Converts a file output stream to an XML document output stream
    XMLWriter xw = new XMLWriter(fos);

  6. Write out the document
    xw.write(doc);

  7. Release Resources
    xw.close();

Give an example:

//1. Create an empty document object through the Document Helper
Document doc = DocumentHelper.createDocument();
//2. Add the root node object to the document object
Element books = doc.addElement("books");
//3. Enrich the child nodes in the root node
for(int i=0;i<1000;i++) {
    //Add 1000 book nodes to the root node.
    Element book = books.addElement("book");
    //Add the id attribute to the book node
    book.addAttribute("id", 1+i+"");
    //Add name and info nodes to the book node
    Element name = book.addElement("name");
    Element info = book.addElement("info");
    name.setText("Apple"+i);
    info.setText("Ha ha ha"+i);
}
//4. Create the output stream of the file
FileOutputStream fos = new FileOutputStream("c:\\books.xml");
//5. Convert file output stream to XML document output stream
XMLWriter xw = new XMLWriter(fos);
//6. Write an XML document
xw.write(doc);
//7. Release Resources
xw.close();
System.out.println("Code Execution Completed");

Use of XStream (Learn)

Use steps

Function: Quickly convert objects in Java to XML strings.

Steps to use:

  1. Create XStream Object
    XStream x = new XStream();

  2. Modify the generated root node name (you can change it or not, default root node name is package name. class name)
    x.alias("node name", class name.class); (Change class name.class to node name)

  3. Incoming object, generating XML string
    String xml string = x.toXML (object);

Give an example:

Person p = new Person(1001, "Zhang San", "Unknown");
XStream x = new XStream();
x.alias("person", Person.class);
String xml = x.toXML(p);
System.out.println(xml);

How Java Parses XML (Master)

Q: How many ways do you parse XML in Java? What are the differences? What are the advantages and disadvantages?

A: Four.

  1. SAX parsing

    • Resolution is event driven!
    • SAX parser, which reads XML file parsing line by line and triggers events whenever parsing to the start/end/content/property of a tag.
    • We can write programs to handle these events as they occur.
  2. DOM Resolution

    • An official W3C standard that represents an XML document in a platform-independent and language-independent manner. Analyzing this structure usually requires loading the entire document and building a document tree model in memory. Programmers can manipulate the document tree to complete data acquisition, modification, deletion, and so on.
    • Advantage:
      Documents are loaded in memory, allowing changes to data and structure.
      Access is bidirectional, and you can parse data in the tree in both directions at any time.
    • Disadvantages:
      Documents are fully loaded in memory and consume a lot of resources.
  3. JDOM parsing

    • The goal is to be a Java-specific document model that simplifies interaction with XML and is faster than using DOM. As the first Java-specific model, JDOM has been widely promoted and promoted.

    • The JDOM document states that its purpose is to "solve 80% (or more) Java/XML problems with 20% (or less) effort" assumed to be 20% based on the learning curve.

    • Advantage:
      The API of the DOM is simplified by using specific classes instead of interfaces.
      Java collection classes are used extensively to facilitate Java developers.

    • Disadvantages:

      No better flexibility.

      The performance is not that good.

  4. DOM4J parsing
    1. It is an intelligent branch of JDOM. It combines many functions beyond the basic XML document representation, including integrated XPath support, XML Schema support, and event-based processing for large or streamed documents. It also provides the option to build a document representation.
    2.DOM4J is a very good Java XML API with excellent performance, powerful features, and extreme ease of use. It is also an open source software. Now you can see that more and more Java software is using DOM4J to read and write XML.
    3. DOM4J is currently heavily used in many open source projects, such as Hibernate

Tags: Java Algorithm data structure

Posted on Thu, 25 Nov 2021 12:57:14 -0500 by bh