XML concepts
Extensible Markup Language
Extensible: labels are customized
Function:
Store data
1. Configuration file
2. Transmission in the network
The difference between xml and html:
1. xml tags are customized and html tags are predefined
2. The syntax of xml is strict and that of html is loose
3. xml is used to store data, and html is used to display data
quick get start
The suffix of the xml document. xml
The first line of xml must be defined as a document declaration
There is one and only one root tag in the xml document
Attribute values must be enclosed in quotation marks (either single or double)
The label must be closed correctly
xml tag names are case sensitive
<?xml version = '1.0'?> <users> <user id='1'> <name>zhangsan</name> <age>23</age> <gender>male</gender> </user> <user id='2'> <name>lisi</name> <age>24</age> <gender>female</gender> </user> </users>
component
1. Document declaration
Format: <? XML attribute list? >
Attribute list:
Version: the version number must be an attribute
Encoding: encoding method. Inform the parsing engine of the character set used in the current document. The default is ISO-8859-1
standalone: independent yes: not dependent on other files no: dependent on other files
2. Instruction (understanding): combined with css
<?xml version="1.0" encoding="utf-8" standalone='no'?> <?xml-stylesheet type="text/css" href="a.css"?> <users> <user id='1'> <name>zhangsan</name> <age>23</age> <gender>male</gender> </user> <user id='2'> <name>lisi</name> <age>24</age> <gender>female</gender> </user> </users>
3. Labels: custom label names
Rule: names cannot start with numbers or punctuation marks
The name cannot start with the letter xml (or XML Xml, etc.)
The name cannot contain spaces
4. Attribute: unique ID value
5. Text: CDATA area: the data in this area will be displayed as is
<code > <![CDATA[ if(a < b && a > c){} ]]> </code>
constraint
Specify writing rules for xml documents
1. Ability to introduce constraint documents into xml
2. Be able to simply read constraint documents
classification
1. DTD: a simple constraint technique
2. Schema: a complex constraint technique
DTD.dtd
Importing dtd documents into xml documents
Internal dtd: define constraint rules in xml document
External dtd: define the constraint rules in the external dtd file
Local:
Network:
Schema.xsd
Disadvantages of DTD: the specific legitimacy of content cannot be defined, such as age=1000
analysis
Operate the xml document and read the data in the document into memory
Manipulating xml documents
Parse (read): read the data in the document into memory
Write: save the data in memory to the xml document for persistent storage
How to parse xml:
DOM: load the markup language document into memory at one time to form a DOM tree in memory
Advantages: it is easy to operate and can perform all CRUD operations on documents
Disadvantages: occupy memory
SAX: read line by line, event driven
Advantages: no memory
Disadvantages: it can only be read and cannot be added, deleted or modified
Common parsers for xml:
JAXP: the parser provided by sun company supports dom and sax
DOM4J: an excellent parser
Jsoup: jsoup is a java HTML parser that can directly parse a URL address and HTML text content. It provides a very labor-saving API, which can fetch and manipulate data through DOM, CSS and operation methods similar to JQuery
PULL: the built-in parser of Android operating system, sax mode
Jsoup
1. Import jar package
2. Get Document object
3. Get the corresponding label Element object
4. Get data
student.xml
<?xml version="1.0" encoding="UTF-8" ?> <students> <student number="heima_001"> <name>Bob</name> <age>18</age> <sex>male</sex> </student> <student number="heima_002"> <name>Alice</name> <age>18</age> <sex>male</sex> </student> </students>
package zg.jsoup; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; /* * Jsoup quick get start * 1,Import jar package*/ public class jsoupDemo1 { public static void main(String[] args) throws IOException { //2. Get the Document object and get it according to the xml Document //2.1 get the path of student.xml //Get bytecode file object get classloader find the path of the corresponding resource file get string representation String path = jsoupDemo1.class.getClassLoader().getResource("zg/jsoup/student.xml").getPath(); //2.2 parse the xml document, load the document into memory, and obtain the dom tree -- > document Document document = Jsoup.parse(new File(path), "UTF-8"); //3. Get Element object Element Elements elements = document.getElementsByTag("name"); //3.1 get the value of the first Element, that is, the object of the first name Element System.out.println(elements.size()); Element element = elements.get(0); //3.2 data acquisition String name = element.text(); System.out.println(name); } }
Jsup object
1. Jsup: a tool class that can parse html or xml documents and return Document
parse: parses html or xml documents and returns Document
parse (File in, String charsetName): parses xml or html files
parse (String html): parses xml or html strings
parse (URL, int timeoutMillis): get the specified html or xml document through the network path
package zg.jsoup; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; import java.net.URL; /* * Jsoup quick get start * 1,Import jar package*/ public class jsoupDemo2 { public static void main(String[] args) throws IOException { //2. Get the Document object and get it according to the xml Document //2.1 get the path of student.xml //Get bytecode file object get classloader find the path of the corresponding resource file get string representation String path = jsoupDemo2.class.getClassLoader().getResource("zg/jsoup/student.xml").getPath(); //2.2 parse the xml document, load the document into memory, and obtain the dom tree -- > document Document document = Jsoup.parse(new File(path), "UTF-8"); System.out.println(document);//Document returns the document as a string representation //parse (String html): parses xml or html strings String str = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" + "\n" + "<students>\n" + " <student number=\"heima_001\">\n" + " <name>Bob</name>\n" + " <age>18</age>\n" + " <sex>male</sex>\n" + " </student>\n" + "\n" + " <student number=\"heima_002\">\n" + " <name>Alice</name>\n" + " <age>18</age>\n" + " <sex>male</sex>\n" + " </student>\n" + "</students>\n"; Document document1 = Jsoup.parse(str);//Also return a Document object System.out.println(document1);//Can also be resolved to //parse (URL, int timeoutMillis): get the specified html or xml document through the network path URL url = new URL("https://editor.csdn.net/md?articleId=120723799 "); / / represents a resource path in the network Document document2 = Jsoup.parse(url, 1000); System.out.println(document2);//After parsing, it is an html document } }
2. Document: document object. Represents a dom tree in memory
Get Element object
getElementById(String id): get a unique element object according to the id attribute value
getElementsByTag(String tagName): get the element object collection according to the tag name
getElementsByAttribute(String key): get the element object collection according to the attribute name
getElementByAttributeValue(String key,String value): get the element object collection according to the corresponding attribute name and attribute value
package zg.jsoup; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; /* * Document/Element object*/ public class jsoupDemo3 { public static void main(String[] args) throws IOException { //2. Get the Document object and get it according to the xml Document //2.1 get the path of student.xml //Get bytecode file object get classloader find the path of the corresponding resource file get string representation String path = jsoupDemo3.class.getClassLoader().getResource("zg/jsoup/student.xml").getPath(); Document document = Jsoup.parse(new File(path), "UTF-8"); //3. Get element object //3.1 get all student objects Elements elements = document.getElementsByTag("student"); System.out.println(elements); System.out.println("-----------"); //3.2 get element objects with attribute name id Elements id = document.getElementsByAttribute("id"); System.out.println(id); System.out.println("-----------"); //getElementById(String id): get a unique element object according to the id attribute value Element zgdaren = document.getElementById("zgdaren"); System.out.println(zgdaren); System.out.println("-----------"); //3.3 get the element object whose number attribute value is heima_001 Elements elements1 = document.getElementsByAttributeValue("number", "heima_001"); System.out.println(elements1); } }
3. Elements: a collection of Element objects that can be used as an ArrayList
4. Element: element object
Get child element object
getElementById(String id): get a unique element object according to the id attribute value
getElementsByTag(String tagName): get the element object collection according to the tag name
getElementsByAttribute(String key): get the element object collection according to the attribute name
getElementByAttributeValue(String key,String value): get the element object collection according to the corresponding attribute name and attribute value
Get property value
String attr (String key): get the attribute value according to the attribute name
Get text content
String text(): get the plain text content of all sub tags
String html(): get all contents of the tag body (including string tags and text contents of sub tags)
package zg.jsoup; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; /* * Jsoup quick get start * 1,Import jar package*/ public class jsoupDemo4 { public static void main(String[] args) throws IOException { //2. Get the Document object and get it according to the xml Document //2.1 get the path of student.xml //Get bytecode file object get classloader find the path of the corresponding resource file get string representation String path = jsoupDemo4.class.getClassLoader().getResource("zg/jsoup/student.xml").getPath(); //2.2 parse the xml document, load the document into memory, and obtain the dom tree -- > document Document document = Jsoup.parse(new File(path), "UTF-8"); //Get child element object //Get the name tag through the document object, that is, get all the name tags Elements name = document.getElementsByTag("name"); System.out.println(name.size());//2 System.out.println("--------"); //Get child label object through Element object Element element_student = document.getElementsByTag("student").get(0); Elements name1 = element_student.getElementsByTag("name"); System.out.println(name1.size());//1 System.out.println("----------"); //String attr (String key): get the attribute value according to the attribute name //Gets the property value of the student object String number = element_student.attr("number"); System.out.println(number);//heima_001 System.out.println("----------"); //String text(): get text content String text = name1.text(); System.out.println(text); System.out.println("----------"); //If the names of text and html are Chinese, not text, but sub tags, html prints the contents of sub tags, and text obtains the plain text contents of all sub tags //String html(): get all the contents of the tag body (including the string contents of sub tags) String html = name1.html(); System.out.println(html); } }
5. Node: the node object is the parent class of Document and Element
Jsup shortcut query
Selector: selector
Elements select(String cssQuery)
package zg.jsoup; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; /* * Jsoup quick get start * 1,Import jar package*/ public class jsoupDemo5 { public static void main(String[] args) throws IOException { //2. Get the Document object and get it according to the xml Document //2.1 get the path of student.xml //Get bytecode file object get classloader find the path of the corresponding resource file get string representation String path = jsoupDemo5.class.getClassLoader().getResource("zg/jsoup/student.xml").getPath(); //2.2 parse the xml document, load the document into memory, and obtain the dom tree -- > document Document document = Jsoup.parse(new File(path), "UTF-8"); //3. Query name tag Elements name = document.select("name"); System.out.println(name); System.out.println("-----------"); //4. Query the element with id value zgdaren Elements select = document.select("#zgdaren"); System.out.println(select); System.out.println("------------"); //5. Get the student tag and the number property value is Heima_ age sub tag of 001 //5.1 get the student tag and the number attribute value is heima_001 Elements select1 = document.select("student[number=\'heima_001\']"); System.out.println(select1); //5.2 get the student tag and the number attribute value is Heima_ age sub tag of 001 Elements select2 = document.select("student[number=\'heima_001\'] > age"); System.out.println(select2); } }
XPath: selectors
XPath is the xml path language, which is a language used to determine the location of a part in xml (a subset of markup General Markup Language) documents
Using the Xpath of jsup requires additional jar packages to be imported
Query the xpath syntax of xml in w3cshool to complete the query
package zg.jsoup; import cn.wanghaomiao.xpath.exception.XpathSyntaxErrorException; import cn.wanghaomiao.xpath.model.JXDocument; import cn.wanghaomiao.xpath.model.JXNode; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; import java.util.List; /* * Jsoup quick get start * 1,Import jar package*/ public class jsoupDemo6 { public static void main(String[] args) throws IOException, XpathSyntaxErrorException { //2. Get the Document object and get it according to the xml Document //2.1 get the path of student.xml //Get bytecode file object get classloader find the path of the corresponding resource file get string representation String path = jsoupDemo6.class.getClassLoader().getResource("zg/jsoup/student.xml").getPath(); //2.2 parse the xml document, load the document into memory, and obtain the dom tree -- > document Document document = Jsoup.parse(new File(path), "UTF-8"); //The document object is inside jsup, which does not support Xpath syntax //3. Create a JXDocument object based on the document object JXDocument jxDocument = new JXDocument(document); //4. Query with xpath syntax //4.1 query all student Tags List<JXNode> jxNodes = jxDocument.selN("//student"); for (JXNode jxNode : jxNodes) { System.out.println(jxNode); } System.out.println("---------"); //4.2 query the name tag under all student tags List<JXNode> jxNodes1 = jxDocument.selN("//student/name"); for (JXNode jxNode : jxNodes1) { System.out.println(jxNode); } System.out.println("---------"); //4.3 query the name tag with id attribute under the student tag List<JXNode> jxNodes2 = jxDocument.selN("//student/name[@id]"); for (JXNode jxNode : jxNodes2) { System.out.println(jxNode); } System.out.println("---------"); List<JXNode> jxNodes3 = jxDocument.selN("//student/name[@id='zgdaren']"); for (JXNode jxNode : jxNodes3) { System.out.println(jxNode); } } }