Error running test_locally.py #24

joeljosephjin · 2020-05-16T11:08:12Z

i was testing the ppo implementation by stable baselines which is given. I made the submission but it failed. I created the weights by training only 600 steps.

$ python test_locally.py -s simulator/goseek-v0.1.4.x86_64 -i submission -g
Running agent...
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/miniconda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
  File "eval.py", line 85, in <module>
    results = main(episode_cfg, agent_args)
  File "eval.py", line 65, in main
    agent = get_agent_cls(agent_args["name"], Agent)(agent_args)
  File "/goseek-challenge/baselines/agents.py", line 39, in __init__
    self.model = PPO2.load(config["weights"])
  File "/miniconda/lib/python3.7/site-packages/stable_baselines/common/base_class.py", line 936, in load
    data, params = cls._load_from_file(load_path, custom_objects=custom_objects)
  File "/miniconda/lib/python3.7/site-packages/stable_baselines/common/base_class.py", line 666, in _load_from_file
    data = json_to_data(json_data, custom_objects=custom_objects)
  File "/miniconda/lib/python3.7/site-packages/stable_baselines/common/save_util.py", line 120, in json_to_data
    base64.b64decode(serialization.encode())
ModuleNotFoundError: No module named 'tensorflow.python.util.module_wrapper'
stopped simulator...
stopped submission...

***** Summary Metrics *****
{'Actions': 0,
 'Collisions': 0,
 'Precision': 0,
 'Recall': 0,
 'Weighted Total': -1}

import tensorflow.python.util.module_wrapper works perfectly when i try in the python shell. I also noticed that even if i uninstall tensorflow, test_locally.py still shows the same error. I tried it in genesis cloud compute instance as well as google cloud compute instance. Both showed same error.

The text was updated successfully, but these errors were encountered:

joeljosephjin · 2020-05-16T11:10:07Z

The submission at evalai shows it failed. The result file was empty => [].

The random agent submission works fine.

joeljosephjin · 2020-05-16T11:21:04Z

Steps I did for submitting the ppo-agent:

copy ppo-weights.pkl file into the goseek-challenge directory
renamed Dockerfile-ppo to Dockerfile
sudo docker build -t submission .
evalai push submission:latest --phase goseek-challenge-competition-groundtruth-607

griffith826 · 2020-05-16T12:33:11Z

Hello. Do you see a link on evalai for the stdout file that was returned? This is the stdout from your submission. On a failure in the submission, the results returned will be empty.

It looks like something might have been missing. ModuleNotFoundError: No module named 'tensorflow.python.util.module_wrapper' is the last line of stdout for your submission.

I hope that helps you diagnose what happened. A couple ideas:

Look back through the output for docker build. Maybe something didn't install correctly?
I'd recommend using test_locally.py to confirm your submission doesn't produce an error before submitting. It will be easier to debug locally than trying to debug from the logs on evalai.

We have tested submissions with Dockerfile-ppo before, and did not see this error. We can try a fresh build and see if something changed from under us that might have caused this, though.

ZacRavichandran · 2020-05-16T14:23:03Z

I just built Dockerfile-ppo from scratch and could not reproduce the error. Could we confirm the package versions are as expected? The tensorflow and stable-baselines versions used for training and evaluation are below:

Docker Image

docker run --rm -it goseek-ppo  /bin/bash
>>> python -c "import tensorflow; tensorflow.__version__"
1.13.1
>>> python -c "import stable_baselines; print(stable_baselines.__version__)"
2.10.0

Host

>>> python
>>>  import tensorflow; print(tensorflow.__version__)
1.13.1
>>> import stable_baselines; print(stable_baselines.__version__)
2.9.0

joeljosephjin · 2020-05-16T19:18:18Z

(goseek2) root@ubuntu:~/goseek-challenge# docker run --rm -it submission /bin/bash

root@372be81b5739:/goseek-challenge# python -c "import tensorflow; tensorflow.__version__"

gave this error:

Traceback (most recent call last):
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/miniconda/lib/python3.7/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/miniconda/lib/python3.7/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/miniconda/lib/python3.7/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/miniconda/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/miniconda/lib/python3.7/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/miniconda/lib/python3.7/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

I re-installed the specified versions of tensorflow and stable_baselines but it did not remove the error.
I am suspecting this has something to do with the CUDA version.

ZacRavichandran · 2020-05-16T21:04:00Z

That often causes issues. What driver and CUDA version are you on?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error running test_locally.py #24

Error running test_locally.py #24

joeljosephjin commented May 16, 2020

joeljosephjin commented May 16, 2020 •

edited

Loading

joeljosephjin commented May 16, 2020

griffith826 commented May 16, 2020

ZacRavichandran commented May 16, 2020

joeljosephjin commented May 16, 2020 •

edited

Loading

ZacRavichandran commented May 16, 2020

Error running test_locally.py #24

Error running test_locally.py #24

Comments

joeljosephjin commented May 16, 2020

joeljosephjin commented May 16, 2020 • edited Loading

joeljosephjin commented May 16, 2020

griffith826 commented May 16, 2020

ZacRavichandran commented May 16, 2020

joeljosephjin commented May 16, 2020 • edited Loading

ZacRavichandran commented May 16, 2020

joeljosephjin commented May 16, 2020 •

edited

Loading

joeljosephjin commented May 16, 2020 •

edited

Loading