Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java packages starting with io can conflict with Python's io module #29

Closed
felixnext opened this issue Sep 7, 2015 · 31 comments
Closed
Labels

Comments

@felixnext
Copy link

Hey,
I am using Jep 3.4.1 (installed with python setup.py install) on a server running python 2.7. I am calling Jep from Scala code. I am able to instanciate python, however I get the following error when I try to load a script:
jep.JepException: <type 'exceptions.ImportError'>: /usr/lib/python2.7/lib-dynload/_multiprocessing.so: undefined symbol: _Py_ZeroStruct

The script itself includes the following code:

from gensim.models import doc2vec
import nltk
import string
import os
doc2vec_model = None

The script works if I load it in python in the console. I made some experiments and the script runs fine, if I delete the import statements. Any ideas, why that might be?

Greetings

@felixnext
Copy link
Author

Also I just tested the jep console, where the import statements work without any problems...

@felixnext
Copy link
Author

I also just tested to set the LD_PRELOAD variable using export LD_PRELOAD="/usr/lib/libpython2.7.so.1.0", which brought me (hopefully) one step closer to the solution, as the error now is:

jep.JepException: <type 'exceptions.ImportError'>: io.BufferedIOBase

(Again I am able to import and find io and io.BufferedIOBase through the jep console)
However I get a similar error in the jep console when I try import io.BufferedIOBase

jep.JepException: jep.JepException: <type 'exceptions.ImportError'>: No module named BufferedIOBase

Yet this would be wrong syntax in python. If I use the correct syntax: from io import BufferedIOBase it works in the console without a problem...

EDIT:
I narrowed the problem down to the import of nltk, which seems to be cause for the io.BufferedIOBase error. However the import from the jep console works fine...

@ndjensen
Copy link
Member

ndjensen commented Sep 7, 2015

Hi, you've made good progress on this. Have you checked the LD_LIBRARY_PATH environment variable? Besides LD_PRELOAD, the jep console script sets LD_LIBRARY_PATH. It looks like maybe you have more than one version of python on your machine?

@felixnext
Copy link
Author

Which path goes into the LD_LIBRARY_PATH?

@ndjensen
Copy link
Member

ndjensen commented Sep 7, 2015

Look at the jep script that the build produced, it should be in there. Typically it needs to point at the python/lib dir, wherever libpython2.7.so is.

@felixnext
Copy link
Author

Alright, thanks for the quick reply. I tried it by setting:

export LD_LIBRARY_PATH="/usr/lib:/usr/local/lib/python2.7/dist-packages/"

However I still get the same error. (Also used -Djava.library.path with no result)
Although the error seems to be quite strange, as I can import most packages except the nltk.
(More specifically: I am unable to call from io import BufferedIOBase. However import io works...).

@ndjensen
Copy link
Member

ndjensen commented Sep 7, 2015

Just to make sure I'm following this correctly.

  1. nltk imports correctly from the python console?
  2. nltk imports correctly from the jep console?
  3. nltk doesn't import when loading from a java application?

The jep console should just about be identical to a java application. Maybe try explicitly setting PATH, it still seems like it might be picking up a different python.

export PATH="/usr/bin:/usr/local/bin:$PATH"

With LD_PRELOAD, LD_LIBRARY_PATH, and PATH explicitly set there's not much chance it can get the wrong python. If that doesn't work, do you know if there's another python installed on that system? And are you using virtualenv or not? This is quite perplexing.

@felixnext
Copy link
Author

That's correct. However I checked. Here are the environment vars:

VIRTUAL_ENV=
LD_PRELOAD="/usr/lib/libpython2.7.so"
LD_LIBRARY_PATH="/usr/lib:/usr/local/lib/python2.7/dist-packages/"
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/java-8-oracle/bin"

However I still get the same error. It works in the python console as well as the jep console, however fails in code. The actual code is (where io.BufferedIOBase is the object from the import error thrown by nltk):

jep.eval("from io import BufferedIOBase")

However the following code evaluates correctly:

jep.eval("import io")

@ndjensen ndjensen changed the title Crash on Loading Script Error when loading nltk Sep 7, 2015
@ndjensen
Copy link
Member

ndjensen commented Sep 7, 2015

If you look at the jep script, the jep console is a simple java program that is emulating the python interpreter. So it should behave very similarly to running any java application. I'm down to two theories:

  1. Something in nltk is messing up and the error message is misleading.
  2. Something is different about your environment when running your java application vs the jep console.

For theory 1, have you tried importing nltk through the jep console? I'd take it a step further and use the jep console and import just like your script,

from gensim.models import doc2vec
import nltk
import string
import os

I suspect the import of nltk will fail and the BufferedIOBase is a misleading message. Glancing at nltk's __init__.py, it does a lot of stuff. Can you confirm with the jep console if those imports in that order work or not?

For theory 2, I would recommend printing out various things on the sys or os module: sys.executable, sys.version, os.environ, etc. And then trying to determine what is different between the jep console and your java application. (That said I think theory 1 is more promising and should be investigated first).

@felixnext
Copy link
Author

Here is the output form the jep console:

{...,  'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/java-8-oracle/bin', 'LD_LIBRARY_PATH': '/usr/lib:/usr/local/lib/python2.7/dist-packages/', 'LANG': 'de_DE.UTF-8', 'TERM': 'cygwin', 'SHELL': '/bin/bash', 'SHLVL': '1', 'JAVA_HOME': '/usr/lib/jvm/java-8-oracle', 'COMP_WORDBREAKS': ' \t\n"\'><;|&(:', 'VIRTUAL_ENV': '', 'NLSPATH': '/usr/dt/lib/nls/msg/%L/%N.cat', '_': '/usr/local/bin/jep', 'LD_PRELOAD': '/usr/lib/libpython2.7.so', ...}

Here is the output from the java application:

{..., 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/java-8-oracle/bin', 'LD_LIBRARY_PATH': '/usr/lib:/usr/local/lib/python2.7/dist-packages/', 'LANG': 'de_DE.UTF-8', 'TERM': 'cygwin', 'SHELL': '/bin/bash', 'SHLVL': '1', 'JAVA_HOME': '/usr/lib/jvm/java-8-oracle', 'COMP_WORDBREAKS': ' \t\n"\'><;|&(:', 'VIRTUAL_ENV': '', 'NLSPATH': '/usr/dt/lib/nls/msg/%L/%N.cat', '_': '/usr/bin/java', 'LD_PRELOAD': '/usr/lib/libpython2.7.so', ...}

The only difference I see is the '_' Variable, however I don't know which influence it has...

Also: nltk works just fine in the jep console, as does from io import BufferedIOBase...

@ndjensen
Copy link
Member

ndjensen commented Sep 7, 2015

I don't know what the _ env variable is either. What environment/operating system are you running this in? I'm not very familiar with scala, how compliant is the output bytecode? Is it possible it could be producing bytecode different than javac?

@felixnext
Copy link
Author

Alright, after reading some documentation it seems like the _ Variable is the path to the execution context (therefore the jep file for the jep console and java for my application).
I am not sure about the bytecode, but I cannot imagine how that might be the issue. It seems more like a dependency problem. I do have python 3 installed on the system, but it is not referenced anywhere. Alos I tested the import statement in the python3 console and it works there as well.

@ndjensen
Copy link
Member

ndjensen commented Sep 8, 2015

It looks like the io module is significantly different between 2.6 and 2.7. Do you have a python 2.6 on the system that the Java application could be erroneously picking up?

@felixnext
Copy link
Author

I printed out sys.version in the Java application, which says 2.7.3. Also in both cases sys.executable relates to /usr/bin/python
Do you have a setup running with python 2.7? Could you check if the import nltk works there?

@bsteffensmeier
Copy link
Member

Since you are able to import io but not BufferedIOBase, you may be getting the wrong io module. It would be interesting to see what is in the io module that you are importing with the output of something like this(from the java application):

import io
print(dir(io))

Im guessing that BufferedIOBase will not be included in the output and perhaps whatever else is in the module or not will give us a clue as to where it is coming from.

@felixnext
Copy link
Author

Alright. The dir(io) from the jep console is:

['BlockingIOError', 'BufferedIOBase', 'BufferedRWPair', 'BufferedRandom', 'BufferedReader', 'BufferedWriter', 'BytesIO', 'DEFAULT_BUFFER_SIZE', 'FileIO', 'IOBase', 'IncrementalNewlineDecoder', 'OpenWrapper', 'RawIOBase', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'StringIO', 'TextIOBase', 'TextIOWrapper', 'UnsupportedOperation', '__all__', '__author__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_io', 'abc', 'open']

While the one from the java application is:

[__doc__, __file__, __loader__, __name__, __path__]

@bsteffensmeier
Copy link
Member

That looks promising, can you

print(io.__file__)

in both.

@felixnext
Copy link
Author

Okay. That seems to be pretty close to the issue.
the jep console says: /usr/lib/python2.7/io.pyc
and the application just: <java>

More interestingly the java applications gets the correct path for the string package: /usr/lib/python2.7/string.pyc

@ndjensen
Copy link
Member

ndjensen commented Sep 8, 2015

hook.py could potentially create an io module. For example,

from java.io import File

That will create a module java and a module io with the module io attached as an attribute to module java. But in sys.modules it will be java.io, not just io....

Are you passing a ClassEnquirer to the Jep constructor? If so, what implementation? If something is going wrong in there it could maybe mess up the import.

@felixnext
Copy link
Author

Hmm. My creation code for JEP looks like this:

Try(new Jep()) match {
  case Success(jep) => Right(new PythonInterop(jep))
  case Failure(e)   => Left(e)
}

Where PythonInterop is just a class for error handling.

@bsteffensmeier
Copy link
Member

You are getting into the import mechanism which is an area I am less familiar with, so I'm making some assumptions reading through code here. It looks like the ClassList inspects your classpath and allows imports of any class in any jar in your class path. If any jar has a package in it that starts with 'io' then jep will let you import it.(I think just 'io', doesn't even need to be 'io.'). Can you look at your classpath and see if anything starts with io. From java the list of jars that jep is using should be in

System.getProperty("java.class.path")

@felixnext
Copy link
Author

I explicitly set the classpath at the start of the application, which results in: base_importer.jar:base_importer-deps.jar
However I checked the dependecy jar, which indeed has an io package inside. Is there a way to tell jep not to use java deps?

@bsteffensmeier
Copy link
Member

For the current version of Jep I recommend creating a custom ClassEnquirer. For this particular case you could do something as simple as extending ClassList and overriding the contains method to return false when the package is 'io'.

I think we should leave this issue open and in a future version of jep we should look into making it easier to avoid conflicts with python and java(scala?) naming. Combining the 2 namespaces is a difficult problem, and I'm not sure we will be able to get all the kinks worked out with the default Enquirer, especially when you throw scala names in the mix.

Can you explain where your 'io' package comes from so we can understand the scope of the problem. I don't know anything about scala, but a quick google reveals that scala has an 'io' package. At the java level it looks like the package is 'scala.io' which shouldn't have problems but Im not sure if different scala implementations might be handling this differently.

@felixnext
Copy link
Author

First of all thanks for the great support!
The io package seems to come from netty:
http://netty.io/5.0/api/index.html
I do not include this directly, but I manage my dependecies through sbt (i.e. maven) and it seems to be needed by some other library. (So I do not think the error is related to scala)
Also can you maybe explain how to pass the ClassEnquirer to jep? I haven 't found anything in the jep class itself or the wiki.

@bsteffensmeier
Copy link
Member

You will need to call a more detailed constructor. The default arguments are (false, null, null, null), so you can use that and just change the last argument to a custom enquirer.

@ndjensen
Copy link
Member

ndjensen commented Sep 8, 2015

I traced down why the error message was so unhelpful. console.py is catching the exceptions and doing traceback.print_exc(). That only prints out the python portion. If however the exception was not caught in Python and made it back to Java where it was then caught and e.printStackTrace(), it would include a full CausedBy with ClassNotFoundException such as:

Caused by: jep.JepException: <type 'exceptions.ImportError'>: io.BufferedIOBase
    at /usr/lib/python2.7/site-packages/jep/hook.__getattr__(hook.py:22)
Caused by: java.lang.ClassNotFoundException: io.BufferedIOBase
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at jep.Jep.eval(Native Method)
    at jep.Jep.eval(Jep.java:456)

Is there a way to e.printStacktrace() with scala? And if so can you verify it has all that information?

@felixnext
Copy link
Author

Scala can use the same function as Java (it just has some neat additions, like real functional programming ;) ), so here you go:

jep.JepException: <type 'exceptions.ImportError'>: io.BufferedIOBase
        at /usr/local/lib/python2.7/dist-packages/jep/hook.__getattr__(hook.py:22)
        at /usr/lib/python2.7/gzip.<module>(gzip.py:36)
        at /usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.load(npyio.py:365)
        at /home/felix/.local/lib/python2.7/site-packages/gensim-0.12.1-py2.7-linux-x86_64.egg/gensim/utils._load_specials(utils.py:284)
        at /home/felix/.local/lib/python2.7/site-packages/gensim-0.12.1-py2.7-linux-x86_64.egg/gensim/utils._load_specials(utils.py:272)
        at /home/felix/.local/lib/python2.7/site-packages/gensim-0.12.1-py2.7-linux-x86_64.egg/gensim/utils.load(utils.py:253)
        at /home/felix/.local/lib/python2.7/site-packages/gensim-0.12.1-py2.7-linux-x86_64.egg/gensim/models/word2vec.load(word2vec.py:1371)
        at /etc/elementary/scripts/doc2vec_exec.load(doc2vec_exec.py:14)
Caused by: java.lang.ClassNotFoundException: io.BufferedIOBase
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at jep.Jep.invoke(Native Method)
        at jep.Jep.invoke(Jep.java:380)
        at elementary.glue.PythonInterop$$anonfun$4.apply(PythonInterop.scala:59)
        at scala.util.Try$.apply(Try.scala:191)
        at elementary.glue.PythonInterop.callFunction(PythonInterop.scala:59)
        at elementary.util.machinelearning.Doc2VecPy$.create(Doc2VecPy.scala:112)
        at elementary.tools.baseimporter.BaseImport$.d2vTest(BaseImport.scala:124)
        at elementary.tools.baseimporter.BaseImport$.main(BaseImport.scala:112)
        at elementary.tools.baseimporter.BaseImport.main(BaseImport.scala)

So looks like the same ClassNotFound Exception.
Also I could not inherit from ClassList (due to private constructor), therefore I will just implement ClassEnquirer interface.

@ndjensen ndjensen changed the title Error when loading nltk Java packages starting with io can conflict with Python's io module Sep 8, 2015
@ndjensen ndjensen added the defect label Sep 8, 2015
@ndjensen
Copy link
Member

ndjensen commented Sep 8, 2015

I'm going to look into improving the quality of the error message a bit for jep 3.4. For 3.5 I was planning on reworking ClassList and ClassEnquirer somewhat, so I'll try and make the defaults smart enough to handle this scenario for that release.

ndjensen added a commit that referenced this issue Jan 20, 2016
provided ClassEnquirer implementations will not identify io and re as
Java packages

Change-Id: Ic5f95d16bfa5def27e0fd0095a26b3928406829f
@ndjensen
Copy link
Member

Fixed on dev_3.5 branch. Java packages starting with "io" or "re" will be removed by the two provided ClassEnquirer implementations, ClassList and NamingConventionClassEnquirer. If a developer needs those imported from Java instead of Python, they will need to implement a custom ClassEnquirer.

@ndjensen ndjensen reopened this Aug 23, 2016
@ndjensen
Copy link
Member

I did not test the fix well enough. Working on a more complete fix and unit test.

ndjensen added a commit that referenced this issue Aug 23, 2016
Change-Id: I60e83e95933e2f4f8623229d1f0108c8ca4f9e7a
@ndjensen
Copy link
Member

Fixed on dev_3.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants