Implementation of Html to Word function based on XWPFDocument and jsup

Implementation of Html to Word function based on XWPFDocument and jsup ...
demand
Implemented functions
Implementation process
The level of stepped pits is limited, and some have not been solved, but it is harmless
API used by XWPFDocument
API used by XWPFParagraph
API used by XWPFRun
Implementation of Html to Word function based on XWPFDocument and jsup

demand

On the system, the user can design the notification header according to the problem type according to the rich text editor (as shown in the figure below), and then download it into a word document, which needs to include the notification header.

Implemented functions

  • word Title Generation
  • Font style settings: color, size, line height, bold, italic, underline, strikethrough, background color, hyperlink, etc
  • Label nesting, style nesting settings
  • Generate table
  • Insert picture

Implementation process

  1. Introducing maven dependency
<!--poi --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>4.1.2</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>4.1.2</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml-schemas</artifactId> <version>4.1.2</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml-full</artifactId> <version>5.0.0</version> </dependency> <!-- format html--> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.13.1</version> </dependency>

2. If you don't say much, post the code directly. The code level is limited to two years of experience. If you find something that can be optimized or wrong, you can comment. I'm good at learning.

/** * html Convert to word tool class * * @author fatCountry */ public class HtmlConvertWordUtil { /** * html main method of converting to word * * @param html html to be converted */ public static XWPFDocument htmlConvertWord(String html) { XWPFDocument doc = new XWPFDocument(); // Formatting html through jsup Document parse = Jsoup.parse(html); // Get all tag elements in html Elements es = parse.body().getAllElements(); // Filter out all parent tags, limited to p, h1, h2, h3 and table List<Element> tag1 = es.stream().filter(x -> "p".equals(x.tagName()) || "h1".equals(x.tagName()) || "h2".equals(x.tagName()) || "h3".equals(x.tagName()) || "table".equals(x.tagName())).collect(Collectors.toList()); // Cycle each parent label, each parent label is a paragraph, and put the content under the parent label and the content corresponding to the child label into the paragraph of the parent label for (Element e : tag1) { //Create paragraph) createXWPFParagraph(doc, e); } return doc; } /** * Build paragraph * * @param docxDocument * @param e */ public static void createXWPFParagraph(XWPFDocument docxDocument, Element e) { // Create paragraph XWPFParagraph paragraph = docxDocument.createParagraph(); // Storage style List<String> allStyles = new ArrayList<>(); // Create content and style createXWPFRun(docxDocument, paragraph, e, allStyles); } /** * Create paragraph content * * @param docxDocument Target document * @param paragraph paragraph * @param e html Elements in * @param allStyles style */ public static void createXWPFRun(XWPFDocument docxDocument, XWPFParagraph paragraph, Element e, List<String> allStyles) { // Styles in parent label style List<String> parentStyle = new ArrayList<>(Arrays.asList(e.attr("style") == null ? new String[0] : e.attr("style").split(";"))); allStyles.addAll(parentStyle); // The parent label style is only for h1, h2, h3, p if (e.tagName().contains("h") || "p".equals(e.tagName())) { allStyles.add(e.tagName() + ":"); } // All child labels under the parent label List<Node> nodes = e.childNodes(); if (nodes != null && nodes.size() != 0) { // Processing tables separately if ("table".equals(e.tagName())) { // Create a table in word XWPFTable table = docxDocument.createTable(); // Set table width CTTblWidth width = table.getCTTbl().addNewTblPr().addNewTblW(); width.setType(STTblWidth.DXA); width.setW(BigInteger.valueOf(9072)); nodes = nodes.stream().filter(x -> x instanceof Element).collect(Collectors.toList()); // Get tbody Element tableBody = (Element) nodes.get(0); // When creating a table in word, a row will be created by default table.removeRow(0); // Traverse tr rows tableBody.childNodes().stream().filter(x -> x instanceof Element).map(x -> (Element) x).forEach(x -> { // Create row XWPFTableRow row = table.createRow(); // Subscript of column AtomicInteger i = new AtomicInteger(); // Traverse td column x.childNodes().stream().filter(c -> c instanceof Element).map(c -> (Element) c).forEach(c -> { i.getAndIncrement(); // To create multiple columns and rows, as long as the column of the first row is created, all the rows below will have corresponding columns. If multiple rows and columns are the same, you only need to create them once XWPFTableCell cell = row.getCell(i.intValue() - 1); if (cell == null) { cell = row.createCell(); } // Center cell contents vertically CTTcPr tcpr = cell.getCTTc().addNewTcPr(); CTVerticalJc va = tcpr.addNewVAlign(); va.setVal(STVerticalJc.CENTER); // Temporarily stores the style of the current label List<String> tempStyles = new ArrayList<>(); // Get style tempStyles = cellStyle(c, tempStyles); if (tempStyles.size() != 0) { allStyles.addAll(tempStyles); XWPFRun run = cell.addParagraph().createRun(); run.setText(c.text()); setFontStyle(allStyles, run, paragraph, docxDocument, c); allStyles.removeAll(allStyles); } else { cell.setText(c.text()); } }); }); return; } for (Node node : nodes) { // Create paragraph content XWPFRun run = paragraph.createRun(); // When the sub tag is TextNode, it means that the sub tag is only text and is not modified by the tag if (node instanceof TextNode) { TextNode textNode = (TextNode) node; run.setText(textNode.text().replaceAll(" ", "")); // The style is set based on the parent label style setFontStyle(allStyles, run, paragraph, docxDocument, null); // When the sub label is Element, it means that the sub label is not only text, but also label. At this time, the current text needs to be modified according to the style of the corresponding label } else if (node instanceof Element) { Element children = (Element) node; // Store style in sub label List<String> childrenStyle = new ArrayList<>(Arrays.asList(children.attr("style") == null ? new String[0] : children.attr("style").split(";"))); // Sub tag name String tagName = children.tagName(); // Add the style corresponding to the label childrenStyle.add(addTagStyle(tagName, children)); // Summarize parent-child label styles allStyles.addAll(childrenStyle); // Check whether there are sub tags under sub tags List<Node> grandsons = children.childNodes().stream().filter(x -> x instanceof Element).collect(Collectors.toList()); if (grandsons != null && grandsons.size() != 0) { // If there are words under the sub label, the label style needs to include the style of the current sub label createXWPFRun(docxDocument, paragraph, children, allStyles); } // Check whether the current tag contains content. If this direct text() method is not added, it will obtain the content of the sub tag of the current tag, resulting in the problem of duplicate content List<Node> childrenNodes = children.childNodes().stream().filter(x -> x instanceof TextNode).collect(Collectors.toList()); if (childrenNodes != null && childrenNodes.size() != 0) { // Sub label content String text = children.text(); // The setting content here needs to exclude the hyperlink a tag, which needs special setting content List<String> aStyle = allStyles.stream().filter(x -> x.indexOf("a:") >= 0).collect(Collectors.toList()); if (aStyle == null || aStyle.size() == 0) { run.setText(text.replaceAll(" ", "")); } setFontStyle(allStyles, run, paragraph, docxDocument, children); } // Remove the sub label style. The content of the current label is limited to the current label. The next label cannot be used allStyles.removeAll(childrenStyle); } } } else { // Horizontal line hr label if ("hr".equals(e.tagName())) { // Create paragraph content XWPFRun run = paragraph.createRun(); run.setText("-------------------—"); } } } /** * Add picture * @param run * @param pictureUrl * @param fileName * @return */ public static XWPFRun addPicture(XWPFRun run, String pictureUrl,String fileName) { if (pictureUrl == null) { return run; } URL url = null; InputStream inputStream = null; try { pictureUrl = URLDecoder.decode(pictureUrl,"UTF-8"); url = new URL(pictureUrl); inputStream = url.openConnection().getInputStream(); // Get picture types and add pictures run.addPicture(inputStream, getPictureType(pictureUrl),fileName,Units.toEMU(400), Units.toEMU(256)); } catch (MalformedURLException e) { throw new RuntimeException("picture url Parsing exception,url=" + pictureUrl); } catch (IOException e) { throw new RuntimeException("Get picture exception,url=" + pictureUrl); } catch (InvalidFormatException e) { throw new RuntimeException("Add picture exception,url=" + pictureUrl); } return run; } /** * Get the corresponding picture type code according to the picture type * @param picType * @return int */ private static int getPictureType(String picType){ int res = XWPFDocument.PICTURE_TYPE_PICT; if(picType != null){ if(picType.equalsIgnoreCase("png")){ res = XWPFDocument.PICTURE_TYPE_PNG; }else if(picType.equalsIgnoreCase("dib")){ res = XWPFDocument.PICTURE_TYPE_DIB; }else if(picType.equalsIgnoreCase("emf")){ res = XWPFDocument.PICTURE_TYPE_EMF; }else if(picType.equalsIgnoreCase("jpg") || picType.equalsIgnoreCase("jpeg")){ res = XWPFDocument.PICTURE_TYPE_JPEG; }else if(picType.equalsIgnoreCase("wmf")){ res = XWPFDocument.PICTURE_TYPE_WMF; } } return res; } /** * Cycle through the font styles in the table * * @param c cell Cell * @param tempStyles style * @return */ private static List<String> cellStyle(Element c, List<String> tempStyles) { List<Element> collect = c.childNodes().stream().filter(s -> s instanceof Element).map(s -> (Element) s).collect(Collectors.toList()); for (Element element : collect) { tempStyles.add(addTagStyle(element.tagName(), element)); List<Node> childs = element.childNodes().stream().filter(s -> s instanceof Element).collect(Collectors.toList()); if (childs != null && childs.size() != 0) { cellStyle(element, tempStyles); } } return tempStyles; } /** * Add label style * * @param tagName label * @param children element * @return */ public static String addTagStyle(String tagName, Element children) { String style = ""; switch (tagName) { // Font related case "font": // typeface if (StringUtils.isNotBlank(children.attr("face"))) { style = "face:" + children.attr("face"); } // size if (StringUtils.isNotBlank(children.attr("size"))) { style = "size:" + children.attr("size"); } // colour if (StringUtils.isNotBlank(children.attr("color"))) { style = "color:" + children.attr("color"); } break; // Delete line case "strike": style = "strike:"; break; // br label case "br": style = "br:"; break; // u label underline case "u": style = "u:"; break; // i label italics case "i": style = "i:"; break; // b bold label case "b": style = "b:"; break; // a tag hyperlink case "a": if (StringUtils.isNotBlank(children.attr("href"))) { style = "a:" + children.attr("href"); } break; } return style; } /** * Set content style * * @param styles style * @param run XWPFRun object to be styled * @param paragraph paragraph * @param docxDocument Target document */ public static void setFontStyle(List<String> styles, XWPFRun run, XWPFParagraph paragraph, XWPFDocument docxDocument, Element children) { // Custom style sorting parent label style before child label style after child label style can override parent label style Collections.sort(styles, new Comparator<String>() { @Override public int compare(String o1, String o2) { if (o1.contains("h") || "p".equals(o1)) { return -1; } else if ("p".equals(o2) || o2.contains("h")) { return 1; } else { return 0; } } }); for (String styleValue : styles) { if (StringUtils.isBlank(styleValue)) { continue; } // Gets the style in style String style = styleValue.substring(0, styleValue.indexOf(":")).replaceAll(" ", ""); // Value corresponding to style String value = styleValue.substring(styleValue.indexOf(":") + 1).replaceAll(" ", ""); switch (style) { /*--------------------------------The style corresponding to the label, such as p, h1, h2, h3*/ // text case "p": //Alignment //paragraph.setAlignment(ParagraphAlignment.BOTH); //First line indent: 567 = = 1cm //paragraph.setIndentationFirstLine(567); break; // h1 title case "h1": addCustomHeadingStyle(docxDocument, "Title 1", 1); paragraph.setStyle("Title 1"); run.setBold(true); run.setColor("000000"); run.setFontFamily("Song typeface"); run.setFontSize(20); break; // h2 title case "h2": addCustomHeadingStyle(docxDocument, "Title 2", 2); paragraph.setStyle("Title 2"); run.setBold(true); run.setColor("000000"); run.setFontFamily("Song typeface"); run.setFontSize(18); break; // h3 title case "h3": addCustomHeadingStyle(docxDocument, "Title 3", 3); paragraph.setStyle("Title 3"); run.setBold(true); run.setColor("000000"); run.setFontFamily("Song typeface"); run.setFontSize(16); break; // Row height and row spacing case "line-height": run.setTextPosition(Integer.parseInt(value)); break; /* Left inner margin because the unit is em, this label will not be processed temporarily case "padding-left": CTSectPr sectPr = docxDocument.getDocument().getBody().addNewSectPr(); CTPageMar pageMar = sectPr.addNewPgMar(); pageMar.setLeft(BigInteger.valueOf(720L)); break;*/ // typeface case "face": CTFonts ctFonts = run.getCTR().addNewRPr().addNewRFonts(); // Set Chinese font ctFonts.setEastAsia(value); // Set English numeric font ctFonts.setAscii(value); break; // size case "size": run.setFontSize(fontSizeConvert(value)); break; // colour case "color": run.setColor(value.replaceAll("#", "")); break; // Delete line case "strike": run.setStrikeThrough(true); break; // br line feed case "br": run.addCarriageReturn(); break; // u underline case "u": run.setUnderline(UnderlinePatterns.SINGLE); break; // i italics case "i": run.setItalic(true); break; // b bold case "b": run.setBold(true); break; // Background color background color case "background-color": run.getCTR().addNewRPr().addNewHighlight().setVal(getBackground(value)); break; // a super connection case "a": XWPFParagraphWrapper wrapper = new XWPFParagraphWrapper(paragraph); XWPFRun hyperRun = wrapper.insertNewHyperLinkRun(0, value); hyperRun.setText(children.text().replaceAll(" ", "")); hyperRun.setColor("0563C1"); hyperRun.setUnderline(UnderlinePatterns.SINGLE); break; // text alignment case "text-align": if ("center".equals(value)) { paragraph.setAlignment(ParagraphAlignment.CENTER); } else if ("left".equals(value)) { paragraph.setAlignment(ParagraphAlignment.LEFT); } else if ("right".equals(value)) { paragraph.setAlignment(ParagraphAlignment.RIGHT); } break; } } } /** * Add custom title style. Here is the source code of stackoverflow * * @param docxDocument Target document * @param strStyleId Style name * @param headingLevel Style level */ private static void addCustomHeadingStyle(XWPFDocument docxDocument, String strStyleId, int headingLevel) { CTStyle ctStyle = CTStyle.Factory.newInstance(); ctStyle.setStyleId(strStyleId); CTString styleName = CTString.Factory.newInstance(); styleName.setVal(strStyleId); ctStyle.setName(styleName); CTDecimalNumber indentNumber = CTDecimalNumber.Factory.newInstance(); indentNumber.setVal(BigInteger.valueOf(headingLevel)); // lower number > style is more prominent in the formats bar ctStyle.setUiPriority(indentNumber); CTOnOff onoffnull = CTOnOff.Factory.newInstance(); ctStyle.setUnhideWhenUsed(onoffnull); // style shows up in the formats bar ctStyle.setQFormat(onoffnull); // style defines a heading of the given level CTPPr ppr = CTPPr.Factory.newInstance(); ppr.setOutlineLvl(indentNumber); ctStyle.setPPr(ppr); XWPFStyle style = new XWPFStyle(ctStyle); // is a null op if already defined XWPFStyles styles = docxDocument.createStyles(); style.setType(STStyleType.PARAGRAPH); styles.addStyle(style); } /** * Convert font pixel size to word font size * * @param level * @return */ public static Integer fontSizeConvert(String level) { Integer fontSize = null; if (StringUtils.isBlank(level)) { return fontSize; } switch (level) { // Corresponding pixel size 10px case "1": fontSize = 7; break; // Corresponding pixel size 13px case "2": fontSize = 8; break; // Corresponding pixel size 16px case "3": fontSize = 9; break; // Corresponding pixel size 18px case "4": fontSize = 10; break; // Corresponding pixel size 24px case "5": fontSize = 14; break; // Corresponding pixel size 32px case "6": fontSize = 18; break; // Corresponding pixel size 48px case "7": fontSize = 28; break; case "8": fontSize = 36; break; case "9": fontSize = 48; break; case "10": fontSize = 72; break; default: fontSize = 5; } return fontSize; } /** * 17 aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, orange, purple, red, silver, teal, white, yellow. * * @param color * @return * @date 2020 April 7, 2007 7:16:39 PM */ public static STHighlightColor.Enum getBackground(String color) { color = color.replaceAll(" ", ""); if ("yellow".equals(color) || "rgb(255,255,0)".equals(color) || "#FFFF00".equals(color)) { //1 - yellow return STHighlightColor.YELLOW; } else if ("lime".equals(color) || "rgb(0,255,0)".equals(color) || "#00FF00".equals(color)) { //2-green return STHighlightColor.GREEN; } else if ("aqua".equals(color) || "rgb(0,255,255)".equals(color) || "#00FFFF".equals(color)) { //3-cyan return STHighlightColor.CYAN; } else if ("fuchsia".equals(color) || "rgb(255,0,255)".equals(color) || "#FF00FF".equals(color)) { //4-pink return STHighlightColor.MAGENTA; } else if ("blue".equals(color) || "rgb(0,0,255)".equals(color) || "#0000FF".equals(color)) { //5-blue return STHighlightColor.BLUE; } else if ("red".equals(color) || "rgb(255,0,0)".equals(color) || "#FF0000".equals(color)) { //6 - red return STHighlightColor.RED; } else if ("navy".equals(color) || "rgb(0,0,128)".equals(color) || "#000080".equals(color)) { //7 - dark blue return STHighlightColor.DARK_BLUE; } else if ("teal".equals(color) || "rgb(0,128,128)".equals(color) || "#008080".equals(color)) { //8 dark cyan return STHighlightColor.DARK_CYAN; } else if ("green".equals(color) || "rgb(0,128,0)".equals(color) || "#008000".equals(color)) { //9 - dark green return STHighlightColor.DARK_GREEN; } else if ("purple".equals(color) || "rgb(128,0,128)".equals(color) || "#800080".equals(color)) { //10 dark pink, purple return STHighlightColor.DARK_MAGENTA; } else if ("maroon".equals(color) || "rgb(128,0,0)".equals(color) || "#800000".equals(color)) { //11 crimson return STHighlightColor.DARK_RED; } else if ("olive".equals(color) || "rgb(128,128,0)".equals(color) || "#808000".equals(color)) { //12 dark yellow return STHighlightColor.DARK_YELLOW; } else if ("gray".equals(color) || "rgb(128,128,128)".equals(color) || "#808080".equals(color)) { //13 dark grey return STHighlightColor.DARK_GRAY; } else if ("silver".equals(color) || "rgb(192,192,192)".equals(color) || "#C0C0C0".equals(color)) { //14 light grey return STHighlightColor.LIGHT_GRAY; } else if ("black".equals(color) || "rgb(0,0,0)".equals(color) || "#000000".equals(color)) { //15 black return STHighlightColor.BLACK; } else { //colourless return STHighlightColor.NONE; } } }

The level of stepped pits is limited, and some have not been solved, but it is harmless

1. Generate word title
According to the online settings, neither paragraph.setStyle("1") nor paragraph.setStyle("heading 1") can work. You can create your own word and export it as an xml file after setting the title. You can see that the styleId of the title setting is indeed 1, but others have the corresponding style.

<w:style w:type="paragraph" w:styleId="1"> <w:name w:val="heading 1" /> <w:uiPriority w:val="9" /> <w:qFormat/> <w:pPr> <w:outlineLvl w:val="0" /> </w:pPr> </w:style>

However, the word created through XWPFDocument doc = new XWPFDocument() does not have the style of styleId=1, so even if it is set, it will not work.
resolvent:
Create a style, and then set the styleId. The addCustomHeadingStyle() method in the above code is the custom style, and then set it.
2. Background color settings
At present, the background color only supports RGB standard color, and none of the others. The getBackground() method in the above code can change the RGB color to the STHighlightColor.Enum enumeration type.
3. Table insert
The first one is in docxDocument.createTable(); When creating a table, a row will be created by default. When creating the second column Cell, only one row needs to be created. For example, the table has three rows and two columns. If two columns are created in the first row, the other two rows will also have two columns, and each row does not need to be created.
4. The conversion between font pixel size and word font size is not solved
No specific conversion ratio was found. I compared the size set by px with that set by word, and found out that the size is almost the same, which is the corresponding value of px and word. The fontSizeConvert() method in the above code.
5. When indenting, the unit is em, and EM is the relative unit of measurement. I didn't think about how to adjust it. It hasn't been solved yet.
6. Label nesting
This is annoying, but it has been solved. There is code on it
7. Text color
The color in the color html is #4d80bf, but the parameter in the setColor() method cannot be marked with # and can only be 4d80bf

API used by XWPFDocument

XWPFDocument doc = new XWPFDocument(); // Create word document XWPFParagraph paragraph = doc.createParagraph(); // Create paragraph XWPFTable table = doc.createTable(); // Create table

API used by XWPFParagraph

XWPFParagraph paragraph = doc.createParagraph(); // Create paragraph XWPFParagraph paragraph = cell.addParagraph(); // Create a paragraph xwpftablecell within a cell XWPFRun run = paragraph.createRun(); // Create content under paragraph paragraph.setStyle("Title 1"); // Set paragraph styles, such as headings /* ==========================Hyperlink start===============================*/ XWPFParagraphWrapper wrapper = new XWPFParagraphWrapper(paragraph); XWPFRun hyperRun = wrapper.insertNewHyperLinkRun(0, "Link address"); // Set hyperlink address hyperRun.setText("Hyperlink text"); hyperRun.setColor("text color"); hyperRun.setUnderline(UnderlinePatterns.SINGLE); // Underline /* ==========================End of hyperlink===============================*/ paragraph.setAlignment(ParagraphAlignment.CENTER); // Alignment left, right, center

API used by XWPFRun

run.setText("content"); // Set content run.addPicture(inputStream, XWPFDocument.PICTURE_TYPE_PNG,fileName,Units.toEMU(400), Units.toEMU(256)); // Set picture (picture stream, picture type, name, width, height) run.setBold(true); // Bold run.setColor("000000"); // Color cannot be# /* ==========================Set font start===============================*/ // First kind run.setFontFamily("Song typeface"); // I don't know why the font doesn't work // Second CTFonts ctFonts = run.getCTR().addNewRPr().addNewRFonts(); ctFonts.setEastAsia("typeface"); // Set Chinese font ctFonts.setAscii("typeface"); // Set English numeric font /* ==========================Set font end===============================*/ run.setFontSize(20); // font size run.setTextPosition(Integer.parseInt("Row height")); // Set row height and spacing runX.setStrike(true);//Single delete line (abandoned) run.setStrikeThrough(true); // Delete line run.addCarriageReturn(); // Line breaks are equivalent to br tags run.setUnderline(UnderlinePatterns.SINGLE); // Underline can select the type of line enumeration class run.setItalic(true); // Italics run.getCTR().addNewRPr().addNewHighlight().setVal(STHighlightColor.YELLOW); // Setting the background color parameter is an enumeration type runX.setDoubleStrikethrough(false);//Double strikethrough runX0.addCarriageReturn();//enter key

Finally, I understand the structure of XWPFDocument Word: paragraphs, tables, pictures, etc. tables include rows and cells, each cell contains paragraphs, paragraphs include XWPFRun content, and XWPFRun can include pictures. It's very complicated, especially the style design. I feel like I'm just getting started. I don't write it down now. I may forget it when I use it in the future. If you have questions, you can comment!!!
Reference documents
Apache POI Word(docx) getting started sample tutorial
POI TL word template engine API

30 November 2021, 21:44 | Views: 9485

Add new comment

For adding a comment, please log in
or create account

0 comments