[FIX] Parsing XML from a URL #41

quetzyg · 2020-09-20T21:56:36Z

Problem

The logic introduced to fix issue #39 does a very basic string check. Because only the exact text/xml and application/xml strings are considered valid, when the Content-Type header contains values such as text/xml; charset=UTF-8 or application/xml; charset=UTF-8, we get an error.

Another problem is that there are other XML MIME types besides those two.

If we do a quick search here, we can find 400+ MIME types related to XML.

Solution

The mime type check logic has been improved by using a regular expression. The only thing that might take longer to understand in the changes is the regex itself, so here's a brief explanation:

(?i)  --> case insensitive matches (no need to convert the string to lowercase)
(
  (application|image|message|model) --> most of the types that make up an XML MIME type
  / --> type/subtype separator
  ((\w|\.|-)+\+?)?  --> this group may exist with characters (a-z, 0-9, _, . and -) (+ may not be there) at the start of a subtype
  |
  text --> remaining type that can be part of an XML MIME type
  / --> type/subtype separator
)
(wb)? --> Some XML MIME types might end in wbxml
xml --> the final part of the subtype

Caveats

The current regular expression isn't strict about what types should go with sub types, which means that non-existent/unofficial MIME types, like image/dash+xml or model/svg+xml are considered valid.

For the sake of simplicity, I think that's a good trade-off. It also means that when new XML MIME types are added in the future, this logic should require low maintenance, if any.

Task items:

Improve MIME type check logic in LoadURL();
Update existing tests for success and add tests to fail;

…nction

…dd tests for failure

… logic

coveralls · 2020-09-20T21:57:52Z

Coverage increased (+0.2%) to 91.773% when pulling 0f199ae on quetzyg:master into 1871a20 on antchfx:master.

zhengchun · 2020-09-21T02:09:01Z

merged. thanks!

https://github.com/antchfx/xmlquery/releases/tag/v1.3.2

quetzyg added 3 commits September 20, 2020 21:18

fix(parse): improve the content-type value checks in the LoadURL() fu…

ebfbb46

…nction

tests(parse): update existing LoadURL() function test for success + a…

928d449

…dd tests for failure

fix(LoadURL): remove unnecessary regex group from the MIME type check…

0f199ae

… logic

zhengchun merged commit 2fd993d into antchfx:master Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Parsing XML from a URL #41

[FIX] Parsing XML from a URL #41

quetzyg commented Sep 20, 2020 •

edited

Loading

coveralls commented Sep 20, 2020

zhengchun commented Sep 21, 2020

[FIX] Parsing XML from a URL #41

[FIX] Parsing XML from a URL #41

Conversation

quetzyg commented Sep 20, 2020 • edited Loading

Problem

Solution

Caveats

Task items:

coveralls commented Sep 20, 2020

zhengchun commented Sep 21, 2020

quetzyg commented Sep 20, 2020 •

edited

Loading