Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importing xmlsec breaks second lxml.etree.parse call #96

Closed
adelton opened this issue Jan 11, 2019 · 6 comments
Closed

importing xmlsec breaks second lxml.etree.parse call #96

adelton opened this issue Jan 11, 2019 · 6 comments

Comments

@adelton
Copy link

adelton commented Jan 11, 2019

When

import xmlsec

is added to trivial python code which uses lxml's etree.parse, the second etree.parse fails.

Let's have minimal XML file sf.xml containing

<data/>

Let's have trivial code sf.py

from lxml import etree
# import xmlsec
xml1 = etree.parse("sf.xml")
print(xml1)
xml2 = etree.parse("sf.xml")
print(xml2)

Running it with python3 sf.py produces two objects:

<lxml.etree._ElementTree object at 0x7fe067e3ae88>
<lxml.etree._ElementTree object at 0x7fe067e3adc8>

However, when that import xmlsec is uncommented, the behaviour changes to

<lxml.etree._ElementTree object at 0x7fe8e2419548>
Traceback (most recent call last):
  File "sf.py", line 5, in <module>
    xml2 = etree.parse("sf.xml")
  File "src/lxml/etree.pyx", line 3426, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1840, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1866, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1770, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 651, in lxml.etree._raiseParseError

This is with xmlsec that was installed via pip3 on Fedora 28:

# pip3 install xmlsec
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting xmlsec
  Using cached https://files.pythonhosted.org/packages/35/42/d7cd323c91d4706f3cc32ffe7d5f851ab8ef9898ccb350f6ba593dd8b89a/xmlsec-1.3.3.tar.gz
Requirement already satisfied: pkgconfig in /usr/local/lib/python3.6/site-packages (from xmlsec)
Requirement already satisfied: lxml>=3.0 in /usr/lib64/python3.6/site-packages (from xmlsec)
Installing collected packages: xmlsec
  Running setup.py install for xmlsec ... done
Successfully installed xmlsec-1.3.3

This is upstream version of Fedora bug https://bugzilla.redhat.com/show_bug.cgi?id=1665459. With xmlsec installed via pip3 I don't get that segfault ... but note that the traceback ends with empty line, where some error message ("cannot parse file") would be expected.

@baynes
Copy link

baynes commented Mar 24, 2020

I have seen the same on Centos 7. It forced us to use pyxmlsec instead of python-xmlsec which forces us to stick with Python2!

It is worth noting that with pyxmlsec, if olxml eltree can also give problems in some cases. Though that seems to be linked to when an error hanlder is invoked and cases python to segfault. It might be a separate problem in pyxmlsec or it might an indication of an underlying probem in lxml , libxmlsec or libxml2.

@hoefling
Copy link
Member

hoefling commented May 21, 2020

I can't reproduce this issue with the latest 1.3.8 anymore:

$ podman run --rm -it registry.centos.org/centos:7
# yum update
# yum group install "Development Tools"
# yum install python3-devel python3-pip libxml2-devel xmlsec1-devel xmlsec1-openssl-devel libtool-ltdl-devel
# pip3 install xmlsec==1.3.8
# echo '<data/>' > sf.xml
# echo "from lxml import etree
import xmlsec
xml1 = etree.parse('sf.xml')
print(xml1)
xml2 = etree.parse('sf.xml')
print(xml2)
" > sf.py
# python3 sf.py
<lxml.etree._ElementTree object at 0x7f97438e9700>
<lxml.etree._ElementTree object at 0x7f9742dacfc0>

Ping me if this issue should be reopened though!

@hoefling hoefling reopened this May 22, 2020
@hoefling
Copy link
Member

Nope, not resolved - I've tested with the lxml from PyPI, not CentOS package. Looks like it's the same issue as #84 for Arch.

@hoefling
Copy link
Member

It looks like an lxml issue to me - check the reasoning here. The workaround is to reset the default XML parser object after each etree.parse() invocation:

etree.parse('doc.xml')
etree.set_default_parser(parser=etree.XMLParser())
etree.parse('doc.xml')

or pass a new parser instance explicitly:

etree.parse('doc.xml', parser=etree.XMLParser())
etree.parse('doc.xml', parser=etree.XMLParser())

@hoefling
Copy link
Member

For reference: https://bugs.launchpad.net/lxml/+bug/1880251

@adelton adelton changed the title importing xmlsec breaks secondl xml.etree.parse call importing xmlsec breaks second lxml.etree.parse call May 25, 2020
@stanislavlevin
Copy link
Contributor

self-tests of python-xmlsec with PYXMLSEC_TEST_ITERATIONS > 0 fail due to this issue:

======================================= ERRORS =======================================
_______________ ERROR at teardown of TestSignContext.test_register_id ________________

self = <tests.test_ds.TestSignContext testMethod=test_register_id>

    def test_register_id(self):
        ctx = xmlsec.SignatureContext()
>       root = self.load_xml("sign_template.xml")

tests/test_ds.py:49: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/base.py:102: in load_xml
    root = etree.parse(self.path(name)).getroot()
src/lxml/etree.pyx:3536: in lxml.etree.parse
    ???
src/lxml/parser.pxi:1876: in lxml.etree._parseDocument
    ???
src/lxml/parser.pxi:1902: in lxml.etree._parseDocumentFromURL
    ???
src/lxml/parser.pxi:1805: in lxml.etree._parseDocFromFile
    ???
src/lxml/parser.pxi:1177: in lxml.etree._BaseParser._parseDocFromFile
    ???
src/lxml/parser.pxi:615: in lxml.etree._ParserContext._handleParseResultDoc
    ???
src/lxml/parser.pxi:725: in lxml.etree._handleParseResult
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E     File "b'/usr/src/RPM/BUILD/python-xmlsec-1.3.12/tests/data/sign_template.xml'", line 0
E   lxml.etree.XMLSyntaxError: <no detail available>

src/lxml/parser.pxi:665: XMLSyntaxError
-------------------------------- Captured stderr call --------------------------------
func=xmlSecNoXxeExternalEntityLoader:file=xmlsec.c:line=58:obj=unknown:subj=xmlSecNoXxeExternalEntityLoader:error=5:libxml2 library function failed:illegal external entity='/usr/src/RPM/BUILD/python-xmlsec-1.3.12/tests/data/sign_template.xml'; xml error: 0: NULL

...

================ 45 failed, 121 passed, 1 skipped, 44 errors in 6.29s ================

used versions:

libxml2-2.9.12
python-lxml-4.8.0
libxmlsec1-1.2.33
python 3.10

though the workaround you proposed works:

--- a/tests/base.py
+++ b/tests/base.py
@@ -99,10 +99,11 @@ class TestMemoryLeaks(unittest.TestCase):
 
     def load_xml(self, name, xpath=None):
         """returns xml.etree"""
-        root = etree.parse(self.path(name)).getroot()
-        if xpath is None:
-            return root
-        return root.find(xpath)
+        with open(self.path(name)) as f:
+            root = etree.parse(f).getroot()
+            if xpath is None:
+                return root
+            return root.find(xpath)
 
     def dump(self, root):
         print(etree.tostring(root))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants