Skip to content

Commit

Permalink
Set EscapeMode=xhtml when OutputSettings.Syntax = XML
Browse files Browse the repository at this point in the history
Was already configured when using the Parser.xmlParser(), but this will help users when parsing as HTML and then converting the output to XML.
  • Loading branch information
jhy committed Oct 24, 2023
1 parent 1657e8f commit c61ce94
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ Release 1.17.1 [PENDING]
`#replaceAll(operator)`. These methods update the original DOM, as well as the Elements list.
<https://github.com/jhy/jsoup/pull/2017>

* Improvement: when changing the OutputSettings syntax to XML, the xhtml EscapeMode is automatically set by default.

* Bugfix: when outputting with XML syntax, HTML elements that were parsed as data nodes (<script> and <style>) should
be emitted as CDATA nodes, so that they can be parsed correctly by an XML parser.
<https://github.com/jhy/jsoup/pull/1720>
Expand Down
4 changes: 4 additions & 0 deletions src/main/java/org/jsoup/nodes/Document.java
Original file line number Diff line number Diff line change
Expand Up @@ -488,11 +488,15 @@ public Syntax syntax() {
/**
* Set the document's output syntax. Either {@code html}, with empty tags and boolean attributes (etc), or
* {@code xml}, with self-closing tags.
* <p>When set to {@link Document.OutputSettings.Syntax#xml xml}, the {@link #escapeMode() escapeMode} is
* automatically set to {@link Entities.EscapeMode#xhtml}, but may be subsequently changed if desired.</p>
* @param syntax serialization syntax
* @return the document's output settings, for chaining
*/
public OutputSettings syntax(Syntax syntax) {
this.syntax = syntax;
if (syntax == Syntax.xml)
this.escapeMode(Entities.EscapeMode.xhtml);
return this;
}

Expand Down
15 changes: 15 additions & 0 deletions src/test/java/org/jsoup/nodes/ElementTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -2838,4 +2838,19 @@ void prettySerializationRoundTrips(Document.OutputSettings settings) {

assertEquals("<p>One</p>", header.html());
}

@Test void xmlSyntaxSetsEscapeMode() {
String html = "Foo&nbsp;&Succeeds;";
Document doc = Jsoup.parse(html);
doc.outputSettings().charset("ascii"); // so we can see the zws
assertEquals("Foo&nbsp;&#x227b;", doc.body().html());

doc.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
String out = doc.body().html();
assertEquals("Foo&#xa0;&#x227b;", out);

// can set back if desired
doc.outputSettings().escapeMode(Entities.EscapeMode.extended);
assertEquals("Foo&nbsp;&succ;", doc.body().html()); // succ is alias for Succeeds, and first hit in entities
}
}

0 comments on commit c61ce94

Please sign in to comment.