Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running test_locally.py #24

Open
joeljosephjin opened this issue May 16, 2020 · 6 comments
Open

Error running test_locally.py #24

joeljosephjin opened this issue May 16, 2020 · 6 comments

Comments

@joeljosephjin
Copy link

i was testing the ppo implementation by stable baselines which is given. I made the submission but it failed. I created the weights by training only 600 steps.

$ python test_locally.py -s simulator/goseek-v0.1.4.x86_64 -i submission -g
Running agent...
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
  File "eval.py", line 85, in <module>
    results = main(episode_cfg, agent_args)
  File "eval.py", line 65, in main
    agent = get_agent_cls(agent_args["name"], Agent)(agent_args)
  File "/goseek-challenge/baselines/agents.py", line 39, in __init__
    self.model = PPO2.load(config["weights"])
  File "/miniconda/lib/python3.7/site-packages/stable_baselines/common/base_class.py", line 936, in load
    data, params = cls._load_from_file(load_path, custom_objects=custom_objects)
  File "/miniconda/lib/python3.7/site-packages/stable_baselines/common/base_class.py", line 666, in _load_from_file
    data = json_to_data(json_data, custom_objects=custom_objects)
  File "/miniconda/lib/python3.7/site-packages/stable_baselines/common/save_util.py", line 120, in json_to_data
    base64.b64decode(serialization.encode())
ModuleNotFoundError: No module named 'tensorflow.python.util.module_wrapper'
stopped simulator...
stopped submission...

***** Summary Metrics *****
{'Actions': 0,
 'Collisions': 0,
 'Precision': 0,
 'Recall': 0,
 'Weighted Total': -1}

import tensorflow.python.util.module_wrapper works perfectly when i try in the python shell. I also noticed that even if i uninstall tensorflow, test_locally.py still shows the same error. I tried it in genesis cloud compute instance as well as google cloud compute instance. Both showed same error.

@joeljosephjin
Copy link
Author

joeljosephjin commented May 16, 2020

The submission at evalai shows it failed. The result file was empty => [].

The random agent submission works fine.

@joeljosephjin
Copy link
Author

Steps I did for submitting the ppo-agent:

  1. copy ppo-weights.pkl file into the goseek-challenge directory
  2. renamed Dockerfile-ppo to Dockerfile
  3. sudo docker build -t submission .
  4. evalai push submission:latest --phase goseek-challenge-competition-groundtruth-607

@griffith826
Copy link
Member

Hello. Do you see a link on evalai for the stdout file that was returned? This is the stdout from your submission. On a failure in the submission, the results returned will be empty.

It looks like something might have been missing. ModuleNotFoundError: No module named 'tensorflow.python.util.module_wrapper' is the last line of stdout for your submission.

I hope that helps you diagnose what happened. A couple ideas:

  • Look back through the output for docker build. Maybe something didn't install correctly?
  • I'd recommend using test_locally.py to confirm your submission doesn't produce an error before submitting. It will be easier to debug locally than trying to debug from the logs on evalai.

We have tested submissions with Dockerfile-ppo before, and did not see this error. We can try a fresh build and see if something changed from under us that might have caused this, though.

@ZacRavichandran
Copy link
Member

I just built Dockerfile-ppo from scratch and could not reproduce the error. Could we confirm the package versions are as expected? The tensorflow and stable-baselines versions used for training and evaluation are below:

Docker Image

docker run --rm -it goseek-ppo  /bin/bash
>>> python -c "import tensorflow; tensorflow.__version__"
1.13.1
>>> python -c "import stable_baselines; print(stable_baselines.__version__)"
2.10.0

Host

>>> python
>>>  import tensorflow; print(tensorflow.__version__)
1.13.1
>>> import stable_baselines; print(stable_baselines.__version__)
2.9.0

@joeljosephjin
Copy link
Author

joeljosephjin commented May 16, 2020

(goseek2) root@ubuntu:~/goseek-challenge# docker run --rm -it submission /bin/bash

root@372be81b5739:/goseek-challenge# python -c "import tensorflow; tensorflow.__version__"

gave this error:

Traceback (most recent call last):
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/miniconda/lib/python3.7/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/miniconda/lib/python3.7/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/miniconda/lib/python3.7/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/miniconda/lib/python3.7/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/miniconda/lib/python3.7/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

I re-installed the specified versions of tensorflow and stable_baselines but it did not remove the error.
I am suspecting this has something to do with the CUDA version.

@ZacRavichandran
Copy link
Member

That often causes issues. What driver and CUDA version are you on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants