-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with xml_serialize()/xml_unserialize() roundtrip: Opening and ending tag mismatch [PATCH] #407
Comments
Same example using the example HTML file that comes with the package: library(xml2)
file <- system.file("extdata", "r-project.html", package = "xml2")
doc <- read_html(file)
class(doc)
#> [1] "xml_document" "xml_node"
raw <- xml2::xml_serialize(doc, connection = NULL)
doc2 <- xml2::xml_unserialize(raw)
#> Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
#> Opening and ending tag mismatch: link line 12 and head [76] Now, the problem seems to be with the children <- xml_children(doc)
class(children)
#> [1] "xml_nodeset"
raw <- xml2::xml_serialize(children, connection = NULL)
children2 <- xml2::xml_unserialize(raw)
#> {xml_nodeset (2)}
#> [1] <head>\n <meta http-equiv="Content-Type" content="text/html; charset=UTF ...
#> [2] <body>\n <div class="container page">\n <div class="row">\n <div ...
all.equal(children2, children)
#> [1] TRUE |
I think it's because Lines 66 to 67 in ef2310b
We get the same error message if we try: file <- system.file("extdata", "r-project.html", package = "xml2")
doc <- xml2::read_xml(file)
#> Error in read_xml.character(file) :
#> Opening and ending tag mismatch: link line 14 and head [76] |
One solution for this it to have $ git diff -u R/xml_serialize.R
diff --git a/R/xml_serialize.R b/R/xml_serialize.R
index 3f7357f..74e1608 100644
--- a/R/xml_serialize.R
+++ b/R/xml_serialize.R
@@ -22,7 +22,7 @@ xml_serialize.xml_document <- function(object, connection, ...) {
connection <- file(connection, "w", raw = TRUE)
on.exit(close(connection))
}
- serialize(structure(as.character(object, ...), class = "xml_serialized_document"), connection)
+ serialize(structure(as.character(object, ...), doc_type = doc_type(object), class = "xml_serialized_document"), connection)
}
#' @export
@@ -64,7 +64,13 @@ xml_unserialize <- function(connection, ...) {
# Select only the root
res <- xml_find_first(x, "/node()")
} else if (inherits(object, "xml_serialized_document")) {
- res <- read_xml(unclass(object), ...)
+ read_xml_int <- function(object, as_html = FALSE, ...) {
+ if (missing(as_html)) {
+ as_html <- identical(attr(object, "doc_type", exact = TRUE), "html")
+ }
+ read_xml(unclass(object), as_html = as_html, ...)
+ }
+ res <- read_xml_int(unclass(object), ...)
} else {
stop("Not a serialized xml2 object", call. = FALSE)
} I've submitted this patch in PR #408. |
Issue
xml_serialize()
-xml_unserialize()
roundtrip failes with: "Opening and ending tag mismatch: link line 12 and head [76]"I'd expect a roundtrip to always work.
Reproducible Example
Traceback:
Session Info
The text was updated successfully, but these errors were encountered: