You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nokogiri parser removes child element of anchor tag and add them as a separate element.
To Reproduce
Here's an example:
parse_data = Nokogiri::HTML.parse "<a> <table> <tr> </tr></table> </a>"
parse_data.to_html
"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n<a> </a><table> <tr> </tr>\n</table> </body></html>\n"
So it removes table from anchor tag and add them as a separate tag<a> </a><table> <tr> </tr>\n</table>
It seems the problem is with parser(Nokogiri::HTML.parse) which not correctly add child element of anchor tag
Nokogiri version: "1.10.1"
The text was updated successfully, but these errors were encountered:
Hi! Thanks for asking this question. The short answer is that Nokogiri is inheriting this behavior from the underlying parser, libxml2, and so there's unfortunately very little that nokogiri can easily do to modify this behavior. But read on for a suggestion (hint: nokogumbo).
The slightly longer version: the HTML4 spec for an A anchor element defines only "inline" elements as valid subelements. If you recurse through the inline definition, I think you'll find that only these elements are valid within an A element:
Now you may be saying to yourself, "But MDN says that table is a valid subelement!" and this is a very good point. I'll further note that Nokogiri, when run on JRuby (using the nekoHTML parsing library) does allow that table within the a element.
This can be traced to the fact that this was introduced in the HTML5 spec, which nekoHTML appears to at least partially support. However libxml2 does NOT support HTML5, and so Nokogiri-on-libxml2 inherits this limitation.
You may want to take a look at using NokoGumbo, which aims to bring HTML5 support to Nokogiri.
Nokogiri parser removes child element of anchor tag and add them as a separate element.
To Reproduce
Here's an example:
So it removes table from anchor tag and add them as a separate tag
<a> </a><table> <tr> </tr>\n</table>
It seems the problem is with parser(Nokogiri::HTML.parse) which not correctly add child element of anchor tag
Nokogiri version: "1.10.1"
The text was updated successfully, but these errors were encountered: