Understanding XML,DOM parsing, SAX parsing, DOM4J parsing

XML usage, command space, constraints, parsing

1.1 what is XMl?

XML(EXtensible Markup Language)´╝îExtensible markup language
  • Markup refers to markup language, also known as tag language, which can describe data with a series of tags. (user can customize the label)

1.2 function

  1. xml can be used as a standard for data transmission
    • Readability
    • Scalability
    • Maintainability
    • It's better to have nothing to do with language
  2. xml can be used as a configuration file
    • Many software and frameworks provide XML file configuration, so that the functions of the software or framework can be modified conveniently and quickly
  3. xml can persist data
    • You can store the data in an xml file and treat xml as a "temporary database".
  4. xml simplifies platform changes
    • xml data is stored in text format, which is easier to expand and upgrade

2 Comparison between XML and HTML

  1. XML is mainly used to describe data
  2. HTML is mainly used to display data

3. XML syntax

3.1 document declaration

The optional part of the XMl declaration file, if any, needs to be placed on the first line of the document

Described XML The version of the, and XML The encoding used by the document

3.2 elements

Element: it is a tag in XML. This tag is also called tag and node.

An XML element can contain letters, numbers, and other visible characters

XML specification:

  1. You cannot start with a number or part of a punctuation mark
  2. Cannot contain spaces and specific symbols
  3. Tags must exist in pairs. Default end tags are not allowed
  4. There is only one root tag
  5. Case sensitive
  6. Multi level nesting is allowed, but cross nesting is not allowed

XML must contain a root element

<?xml version="1.0" encoding="utf-8"?>
<root>
<child>
<subchild>.....</subchild>
</child>
</root>

Elements can contain labels

<?xml version="1.0" encoding="utf-8"?>
<root> 
	<a>I am toto</a>
</root>

3.3 properties

XML Attribute provides some additional information about the element

The XML attribute is written in the start tag, and the attribute value must be quoted.

An element can have mu lt iple attributes. Format: < element name attribute name = "attribute value" attribute name = "attribute value" >

Multiple tags can be distinguished by id:

<?xml version="1.0" encoding="utf-8">
<messages>
	<note id="1"></note>
 	<note id="2"></note>
</messages>

3.4 entities

entitycharacter
&lt<
&gt>
&amp&
&apos'
&quot''

Define format:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE root[
	<!ENTITY company "toto"
]>
<root> <naem>&comany;</name></root>

3.5 notes

<!-- -->
<!-- notes-->
  1. Do not appear in comments –
  2. Don't put notes in the middle of the label
  3. Comments cannot be nested

3.6 CDATA

The content parsed by the parser in xml is called PCDATA (Parsed CDATA), and the content that the parser will not parse is called CDATA (Character Data)

:: sometimes you don't want to parse the content, you can write it in the specified CDATA area

<?xml version="1.0" encoding="utf-8"?>
<root>
	<tag>
		<name>&amp;</name>
<		entity><![CDATA[&amp;]]></entity>
	</tag>
</root

4 XML constraints

4.1 good structure and effectiveness

Well structured is not necessarily effective, but effective must be well structured

4.2 DTD

Syntax format:

<!ELEMENT Yuansu name (Content format)>
  1. EMPTY: element cannot contain child elements and text (EMPTY element)

  2. (#PCDATA): it can contain any character data, but it cannot contain any child elements

  3. ANY: the element content is arbitrary, which is mainly used when the element content is uncertain

  4. Modifier: () | + *, () used to group elements | select a + from the listed elements, indicating that the element appears at least once and can appear more than once (1 or n times) * indicating that the element is allowed to appear zero to any number of times (0 to n times)? Indicates that the element can appear, but only once (0 to 1 times), and the object must appear in the specified order

  5. Internal DTD, DTD and XMl files are in the same file (understand)

  6. The external DTD, DTD and xml document are not in the same file. (common)

<?xml version="1.0" encoding="UTF-8"?>
<!-- external DTD -->
<!DOCTYPE students SYSTEM "dtd/students.dtd">
<students>
    <stu>
        <id>1</id>
        <name>tom</name>
        <age>20</age>
    </stu>
</students>

use:

  1. Local DTD

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE students SYSTEM "dtd/students.dtd">
    
  2. Public DTD

    Unable to use
    

5. XMl parsing

5.1 analysis method

There are generally two ways to parse XML:

DOM parsing:

SAX parsing:

DOM (Document Object Model) Document Object Model is a way to process XML recommended by W3C organization.

Parsing in DOM mode requires the parser to load the entire XML Document into a Document object.

The Document object contains Document elements, that is, the root element, which contains N child elements.

According to the definition of DOM, each element in an XML document is a Node:

An XML document has only one root node

Every element in XML is an element node

Every text in XML is a text node

Every attribute in XML is an attribute node

Every comment in XML is a comment node

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">haha</title>
<author>toto</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">heihei</title>
<author>toto</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">wawa</title>
<author>toto</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web" cover="paperback">
<title lang="en">Learning XML</title>
<author>toto</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

explain:

After the entire Document is parsed, it is a Document object

The root node is

There are four nodes under the root node

,,,

There are four child nodes in the node

<title>,<auther>,<year>,<price>

Each of the four child nodes has a text node

Node neutralization

<title>

DOM parses XML documents into a tree structure. This tree structure is called node tree. Through this tree, you can access any node, modify or delete their contents, or create new elements.

Each node in the number of nodes has a hierarchical relationship, parent node, child node and peer node, which are used to describe the relationship between nodes

SAX (Simple API for XML) is not the W3C standard, but it is the de facto standard of the XML community because of its high utilization rate. Almost all XML parsers support it.

Using SAX parsing, whenever a start tag, end tag or text content is read, we will call a specified method we rewrite, in which we write the current parsing operation to be completed. Until the XML document is read, the SAX parsing method will not save the node information and relationship in memory during the whole process.

Using SAX parsing method will not occupy a lot of memory to save XML document data and relationships, and it is efficient. However, node information and relationships will not be saved during parsing, and can only be read and parsed from front to back.

5.2 parser

Basically, the mainstream parsers now provide support for DOM and SAX parsing methods.

Different companies, organizations and teams can launch their own parsers to write code, including support for these two parsing methods

Crimson parser, which was launched by SUN in the early stage and used before JDK1.4, has poor performance and efficiency

Xerces parser, a DOM and SAX based parser launched by IBM, has been maintained by Apache foundation. It is one of the current popular parsers and has been added to JDK after JDK1.5

Aelfred2 parser, the parser launched by DOM4J team, is also the default parser in DOM4J tools

5.3 JAXP

5.31 DOM parsing

Steps for DOM parsing in JAXP:

  1. Call the DocumentBuilderFactory.newInstance() method to get the factory that creates the DOM parser

  2. Call the newDocumentBuilder method of the factory object to get the DOM parser object

  3. Call the parse() method of the DOM parser object to parse the XML document.

    The parse() method will return the parsed Document object

    Then, according to the structural relationship and characteristics of the document, call the corresponding API to parse it

<?xml version="1.0" encoding="UTF-8"?>
    <students>
        <stu id="2022001">
        <name>Wang Wu</name>
        <age>23</age>
        </stu>
        <stu id="2022002">
        <name>Li Si</name>
        <age>20</age>
        </stu>
    </students>
package com.briup.demo;

import javax.xml.bind.annotation.DomHandler;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

import javafx.util.BuilderFactory;
import sun.util.BuddhistCalendar;

public class Test {
	public static void main(String[] args) {
		Test test = new Test();
		String filePath = "src/com/briup/demo/class.xml";
		test.domPath(filePath);
	}

	public void domPath(String filePath) {
		// Create an object for DocumentBuilderFactory
		DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
		try {
			// Create a DocumentBuilder object from a factory object
			DocumentBuilder documentBuilder = builderFactory.newDocumentBuilder();
			// Parse the xml file through DocumentBuilder and get the doment
			// The doment object is the text object in the dom, which is used to represent the parsed XML file
			Document document = documentBuilder.parse(filePath);
			// Get root element
			Element root = document.getDocumentElement();
			// Get child nodes
			NodeList childNodes = root.getChildNodes();
			for (int i = 0; i < childNodes.getLength(); i++) {
				// Gets the child node currently being processed
				Node node = childNodes.item(i);
				// Determine the type of node, whether it is a text node or an element node
				switch (node.getNodeType()) {
				case Node.TEXT_NODE:
					System.out.println("Content of text node" + node.getTextContent());
					break;
				case Node.ELEMENT_NODE:
					System.out.println("The name of the element node" + node.getNodeName());
					// Gets all attributes in the current element node
					NamedNodeMap attributes = node.getAttributes();
					// Loop variables get each attribute
					for (int j = 0; j < attributes.getLength(); j++) {
						// Cast to an object of type Attr, representing a property
						Attr attr = (Attr) attributes.item(j);
						// You can get the property name and property value through the method
						System.out.println(attr.getName() + "=" + attr.getValue());
					}
					// Gets the child nodes under the current node
					NodeList nodeList = node.getChildNodes();
					for (int k = 0; k < nodeList.getLength(); k++) {
						Node item = nodeList.item(k);
						// If this node is an element node
						if (item.getNodeType() == item.ELEMENT_NODE) {
							// Take out the element node and its text value
							System.out.println(node.getNodeName() + "=" + item.getTextContent());
						}
					}
					break;
				default:
					throw new RuntimeException("Resolved to an unexpected node type:" + node);
				}
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

5.32 SAX analysis

When parsing an XML document in SAX mode, if the parser finds the content in the XML document (start tag, end tag, text value, etc.), it will call our rewritten method, that is, how to deal with the read content at this time is up to us. In SAX parsing based programs, there are five most commonly used SAX events:

  1. startDocument(), the parser will automatically call this method if it finds the start tag of the document

  2. endDocument(), the parser will automatically call this method if it finds the end tag of the document

  3. startElement(), the parser finds a start tag and will call this method automatically

  4. character(), the parser will automatically call this method if it finds the text value in the tag

  5. endElement(), if the parser finds an end tag, it will call this method automatically

    Whenever an event is triggered during parsing, the specified method we override will be called automatically.

    SAX parsing steps in JAXP:

    1. Get the SAXParserFactory class object, SAXParserFactory.newInstance()
    2. Using the factory object, create a SAX parser, saxParserFactory.newSAXParser()
    3. Call parse() of the parser to parse the xml file, and then rewrite Defau

saxParser.parse(filePath, new DefaultHandler(){...}

package com.briup.demo;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.dom4j.io.SAXContentHandler;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import com.briup.Jaxp.Student;



public class SaxTest {
	public static void main(String[] args) {
		try {
			//1.SAXParserFactory obtains the factory class object parsed by sax
			SAXParserFactory factory= SAXParserFactory.newInstance();
			//2.factory obtains the parser object corresponding to sax
			SAXParser parser=factory.newSAXParser();
			//3. Analysis
			File f = new File("src/com/briup/demo/class.xml");
			StudentHandler dh=new StudentHandler();
			parser.parse(f, dh);
			//verification
			List<Student> list = dh.getList();
			for (Student student : list) {
				System.out.println(student);
			}
		} catch (ParserConfigurationException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (SAXException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}
 class StudentHandler extends DefaultHandler{
	private Student stu;
	private List<Student> list;
	private String tagName;
	//Document parsing started
	@Override
	public void startDocument() throws SAXException {
		list = new ArrayList<Student>();
	}
	//Read start element
	/*
	 * uri: File path
	 * localName:Namespace
	 * qname:Start label
	 * attributes:Properties of the start tag
	 */
	@Override
	public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
	if("stu".equals(qName)) {
		stu=new Student();
		int id=Integer.parseInt(attributes.getValue("id"));
		int money=Integer.parseInt(attributes.getValue("money"));
//		System.out.println(id+":"+money);
		stu.setId(id);
		stu.setMoney(money);
	}else {
		tagName=qName;
	}
	}
	//Get text element
	@Override
	public void characters(char[] ch, int start, int length) throws SAXException {
		String str=new String (ch,start,length);
		str=str.trim();//Remove front and back spaces
		if("name".equals(tagName)) {
			stu.setName(str);
		}else if("age".equals(tagName)) {
			stu.setAge(Integer.parseInt(str));
		}
//		else if("gender".equals(tagName)) {
//			stu.setGender(str);
//		}else if("like".equals(tagName)) {
//			stu.setLike(str);
//		}
	}
	//Read end tag
	/*
	 * url: File path
	 * localName: Namespace
	 * qName: End tag name
	 */
	@Override
	public void endElement(String url, String localName, String qName) throws SAXException {
		if("stu".equals(qName)) {
			list.add(stu);
			}
		tagName=null;
	}
	//End of document
	@Override
	public void endDocument() throws SAXException {
	
	}
	public List<Student> getList(){
		return list;
	}

}

5.33 DOM4J parsing

step
1.Import dom4j.jar package
2.Create parser object
	SAXReader saxReader = new SAXReader();
3.File file =new File()Pass in to resolve xml route
4.call saxReader of read(file)Method incoming file Path get return value Document document
5.Get root node element
	Element rootElement = document.getRootElement();
6.teacher Element object collection
	List<Element> elements = rootElement.elements(); //The root node element. elements returns the collection element object
7.Recreate List aggregate list For storage teacher object
8.enhance for Loop traversal elements aggregate
	8.1 establish teacher object
	8.2 Get the attributes of the object node and store them in a collection
	8.3 Assign the obtained property value to the corresponding object property( teacher Only in id)//t.setId(Integer.parseInt(a.getValue()));	
	8.4 Get the content of the sub node of the object node, and assign values according to the tag name if(nodename.equals("..."))Or use Switch best;
	8.5 Get child nodes for text content.getText()Assign to object
9.Add object to collection
10.Returns a collection for checking
	package com.briup.jaxp;

import java.io.File;

import java.util.ArrayList;
import java.util.List;

import org.dom4j.Attribute;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;


public class Dom4jTest {
	public static void main(String[] args) {
		try {
			//Create parser object
			SAXReader saxReader = new SAXReader();
			File file=new File("src/com/briup/jaxp/teacher.xml");
			Document document = saxReader.read(file);
			//Get root node element
			Element rootElement = document.getRootElement();
//			System.out.println(rootElement.getName());
			//A collection of teacher element objects to get child node elements
			List<Element> elements = rootElement.elements();
			
			List<Teacher> list=new ArrayList<Teacher>();  //Object for storing teachers
			for (Element tea : elements) {
				Teacher t=new Teacher();
				List<Attribute> attributes = tea.attributes();
				//Loop properties
				for (Attribute a : attributes) {
					t.setId(Integer.parseInt(a.getValue()));	
				}
				//Get the child nodes of each tea
				List<Element> nodeElements = tea.elements();
				for (Element el : nodeElements) {
					String nodename = el.getName();
					if (nodename.equals("name")) {
						//Get text content
						t.setName(el.getText());
					}else if (nodename.equals("age")) {
						
						t.setAge(Integer.parseInt(el.getText()));
					}else if (nodename.equals("salary")) {
						t.setSalary(Double.parseDouble(el.getText()));
					}
				}
				list.add(t);
			}
			for (Teacher teacher : list) {
				System.out.println(teacher);
			}
			
		} catch (DocumentException e) {
			e.printStackTrace();
		}
	}
}

Tags: Java xml

Posted on Wed, 29 Sep 2021 14:44:59 -0400 by g-force2k2