Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] Redis crashes in middle of experiment #2861

Closed
richardliaw opened this issue Sep 12, 2018 · 2 comments
Closed

[tune] Redis crashes in middle of experiment #2861

richardliaw opened this issue Sep 12, 2018 · 2 comments

Comments

@richardliaw
Copy link
Contributor

richardliaw commented Sep 12, 2018

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 16.04
  • Ray installed from (source or binary): wheels
  • Ray version: master
  • Python version: 3.6
  • Exact command to reproduce: Run atari-a2c in tuned_examples.

Describe the problem

In middle of Tune experiment, Redis seems to drop the connection (this is only single-node).

Source code / logs

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 12/12 CPUs, 4/4 GPUs
Result logdir: /root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp
PENDING trials:
 - mujoco-runner_4_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=5:	PENDING
RUNNING trials:
 - mujoco-runner_0_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=1:	RUNNING [pid=7045], 27849 s, 607 ts, -101 acc
 - mujoco-runner_1_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=2:	RUNNING [pid=7047], 27832 s, 629 ts, -68.9 acc
 - mujoco-runner_2_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=3:	RUNNING [pid=7051], 27827 s, 614 ts, -57.2 acc
 - mujoco-runner_3_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=4:	RUNNING [pid=7049], 27817 s, 630 ts, -118 acc

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 911, in _process_task
    self._store_outputs_in_objstore(return_object_ids, outputs)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 839, in _store_outputs_in_objstore
    self.put_object(object_ids[i], outputs[i])
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 368, in put_object
    self.store_and_register(object_id, value)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 303, in store_and_register
    serialization_context=self.serialization_context)
  File "pyarrow/_plasma.pyx", line 396, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/_plasma.pyx", line 300, in pyarrow._plasma.PlasmaClient.create
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Encountered unexpected EOF

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/workers/default_worker.py", line 69, in <module>
    ray.worker.global_worker.main_loop()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 1044, in main_loop
    self._wait_for_and_process_task(task)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 1003, in _wait_for_and_process_task
    self._process_task(task)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 915, in _process_task
    ray.utils.format_error_message(traceback.format_exc()))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 925, in _handle_process_task_failure
    self._store_outputs_in_objstore(return_object_ids, failure_objects)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 839, in _store_outputs_in_objstore
    self.put_object(object_ids[i], outputs[i])
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 368, in put_object
    self.store_and_register(object_id, value)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 303, in store_and_register
    serialization_context=self.serialization_context)
  File "pyarrow/_plasma.pyx", line 396, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/_plasma.pyx", line 300, in pyarrow._plasma.PlasmaClient.create
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Broken pipe

  This error is unexpected and should not have happened. Somehow a worker
  crashed in an unanticipated way causing the main_loop to throw an exception,
  which is being caught in "python/ray/workers/default_worker.py".

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/envs/softlearning/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/softqlearning-private/examples/mujoco_all_ray.py", line 223, in <module>
    main()
  File "/root/softqlearning-private/examples/mujoco_all_ray.py", line 218, in main
    for policy, variant_spec in zip(args.policy, variant_specs)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/tune/tune.py", line 91, in run_experiments
    runner.step()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 103, in step
    self._process_events()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 252, in _process_events
    [result_id], _ = ray.wait(list(self._running))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 2879, in wait
    object_id_strs, timeout, num_returns)
  File "pyarrow/_plasma.pyx", line 590, in pyarrow._plasma.PlasmaClient.wait
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Encountered unexpected EOF
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_3_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=4_2018-08-18_23-09-00h15_vgl_/progress.csv' mode='w' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_3_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=4_2018-08-18_23-09-00h15_vgl_/result.json' mode='w' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_2_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=3_2018-08-18_23-09-00fxuq4tqx/progress.csv' mode='w' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_2_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=3_2018-08-18_23-09-00fxuq4tqx/result.json' mode='w' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_1_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=2_2018-08-18_23-09-00h2mqxkgy/progress.csv' mode='w' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_1_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=2_2018-08-18_23-09-00h2mqxkgy/result.json' mode='w' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_0_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=1_2018-08-18_23-09-000paimgmv/progress.csv' mode='w' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_0_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=1_2018-08-18_23-09-000paimgmv/result.json' mode='w' encoding='UTF-8'>
Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 484, in connect
    sock = self._connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 541, in _connect
    raise err
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 529, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 667, in execute_command
    connection.send_command(*args)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 585, in send_packed_command
    self.connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.17.0.2:25143. Connection refused.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 484, in connect
    sock = self._connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 541, in _connect
    raise err
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 529, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 2183, in connect
    ray.services.check_version_info(worker.redis_client)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/services.py", line 382, in check_version_info
    redis_reply = redis_client.get("VERSION_INFO")
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 976, in get
    return self.execute_command('GET', name)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 673, in execute_command
    connection.send_command(*args)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 585, in send_packed_command
    self.connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.17.0.2:25143. Connection refused.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 484, in connect
    sock = self._connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 541, in _connect
    raise err
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 529, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 667, in execute_command
    connection.send_command(*args)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 585, in send_packed_command
(softlearning) root@2781f2e0e6c2:~/softqlearning-private#     self.connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.17.0.2:25143. Connection refused.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 484, in connect
    sock = self._connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 541, in _connect
    raise err
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 529, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/workers/default_worker.py", line 55, in <module>
    info, mode=ray.WORKER_MODE, use_raylet=(args.raylet_name is not None))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 2194, in connect
    driver_id=None)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/utils.py", line 117, in push_error_to_driver_through_redis
    "data": data
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 2011, in hmset
    return self.execute_command('HMSET', name, *items)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 673, in execute_command
    connection.send_command(*args)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 585, in send_packed_command
    self.connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.17.0.2:25143. Connection refused.
Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 911, in _process_task
    self._store_outputs_in_objstore(return_object_ids, outputs)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 839, in _store_outputs_in_objstore
    self.put_object(object_ids[i], outputs[i])
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 368, in put_object
    self.store_and_register(object_id, value)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 303, in store_and_register
    serialization_context=self.serialization_context)
  File "pyarrow/_plasma.pyx", line 396, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/_plasma.pyx", line 300, in pyarrow._plasma.PlasmaClient.create
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/workers/default_worker.py", line 69, in <module>
    ray.worker.global_worker.main_loop()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 1044, in main_loop
    self._wait_for_and_process_task(task)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 1003, in _wait_for_and_process_task
    self._process_task(task)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 915, in _process_task
    ray.utils.format_error_message(traceback.format_exc()))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 925, in _handle_process_task_failure
    self._store_outputs_in_objstore(return_object_ids, failure_objects)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 839, in _store_outputs_in_objstore
    self.put_object(object_ids[i], outputs[i])
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 368, in put_object
    self.store_and_register(object_id, value)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 303, in store_and_register
    serialization_context=self.serialization_context)
  File "pyarrow/_plasma.pyx", line 396, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/_plasma.pyx", line 300, in pyarrow._plasma.PlasmaClient.create
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 177, in _read_from_socket
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 668, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 680, in parse_response
    response = connection.read_response()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 624, in read_response
    response = self._parser.read_response()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 284, in read_response
    response = self._buffer.readline()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 216, in readline
    self._read_from_socket()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 191, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 484, in connect
    sock = self._connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 541, in _connect
    raise err
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 529, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/workers/default_worker.py", line 76, in <module>
    driver_id=None)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/utils.py", line 76, in push_error_to_driver
    "data": data
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 2011, in hmset
    return self.execute_command('HMSET', name, *items)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 673, in execute_command
    connection.send_command(*args)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 585, in send_packed_command
    self.connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.17.0.2:25143. Connection refused.
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_2_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=3_2018-08-18_23-09-00fxuq4tqx/rllab-logger/debug.log' mode='a' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_2_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=3_2018-08-18_23-09-00fxuq4tqx/rllab-logger/progress.csv' mode='w' encoding='UTF-8'>
Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 911, in _process_task
    self._store_outputs_in_objstore(return_object_ids, outputs)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 839, in _store_outputs_in_objstore
    self.put_object(object_ids[i], outputs[i])
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 368, in put_object
    self.store_and_register(object_id, value)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 303, in store_and_register
    serialization_context=self.serialization_context)
  File "pyarrow/_plasma.pyx", line 396, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/_plasma.pyx", line 300, in pyarrow._plasma.PlasmaClient.create
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/workers/default_worker.py", line 69, in <module>
    ray.worker.global_worker.main_loop()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 1044, in main_loop
    self._wait_for_and_process_task(task)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 1003, in _wait_for_and_process_task
    self._process_task(task)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 915, in _process_task
    ray.utils.format_error_message(traceback.format_exc()))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 925, in _handle_process_task_failure
    self._store_outputs_in_objstore(return_object_ids, failure_objects)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 839, in _store_outputs_in_objstore
    self.put_object(object_ids[i], outputs[i])
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 368, in put_object
    self.store_and_register(object_id, value)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 303, in store_and_register
    serialization_context=self.serialization_context)
  File "pyarrow/_plasma.pyx", line 396, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/_plasma.pyx", line 300, in pyarrow._plasma.PlasmaClient.create
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 177, in _read_from_socket
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 668, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 680, in parse_response
    response = connection.read_response()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 624, in read_response
    response = self._parser.read_response()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 284, in read_response
    response = self._buffer.readline()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 216, in readline
    self._read_from_socket()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 191, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 484, in connect
    sock = self._connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 541, in _connect
    raise err
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 529, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/workers/default_worker.py", line 76, in <module>
    driver_id=None)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/utils.py", line 76, in push_error_to_driver
    "data": data
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 2011, in hmset
    return self.execute_command('HMSET', name, *items)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 673, in execute_command
    connection.send_command(*args)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 585, in send_packed_command
    self.connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.17.0.2:25143. Connection refused.
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_1_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=2_2018-08-18_23-09-00h2mqxkgy/rllab-logger/debug.log' mode='a' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_1_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=2_2018-08-18_23-09-00h2mqxkgy/rllab-logger/progress.csv' mode='w' encoding='UTF-8'>
Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 911, in _process_task
    self._store_outputs_in_objstore(return_object_ids, outputs)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 839, in _store_outputs_in_objstore
    self.put_object(object_ids[i], outputs[i])
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 368, in put_object
    self.store_and_register(object_id, value)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 303, in store_and_register
    serialization_context=self.serialization_context)
  File "pyarrow/_plasma.pyx", line 396, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/_plasma.pyx", line 300, in pyarrow._plasma.PlasmaClient.create
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/workers/default_worker.py", line 69, in <module>
    ray.worker.global_worker.main_loop()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 1044, in main_loop
    self._wait_for_and_process_task(task)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 1003, in _wait_for_and_process_task
    self._process_task(task)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 915, in _process_task
    ray.utils.format_error_message(traceback.format_exc()))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 925, in _handle_process_task_failure
    self._store_outputs_in_objstore(return_object_ids, failure_objects)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 839, in _store_outputs_in_objstore
    self.put_object(object_ids[i], outputs[i])
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 368, in put_object
    self.store_and_register(object_id, value)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 303, in store_and_register
    serialization_context=self.serialization_context)
  File "pyarrow/_plasma.pyx", line 396, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/_plasma.pyx", line 300, in pyarrow._plasma.PlasmaClient.create
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 177, in _read_from_socket
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 668, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 680, in parse_response
    response = connection.read_response()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 624, in read_response
    response = self._parser.read_response()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 284, in read_response
    response = self._buffer.readline()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 216, in readline
    self._read_from_socket()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 191, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 484, in connect
    sock = self._connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 541, in _connect
    raise err
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 529, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/workers/default_worker.py", line 76, in <module>
    driver_id=None)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/ray/utils.py", line 76, in push_error_to_driver
    "data": data
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 2011, in hmset
    return self.execute_command('HMSET', name, *items)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/client.py", line 673, in execute_command
    connection.send_command(*args)
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 585, in send_packed_command
    self.connect()
  File "/opt/conda/envs/softlearning/lib/python3.6/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.17.0.2:25143. Connection refused.
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_0_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=1_2018-08-18_23-09-000paimgmv/rllab-logger/debug.log' mode='a' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/root/ray_results/gym/pusher-2d/image-reach/20180818-image-reach-spatial-softmax-lsp-4-lsp/mujoco-runner_0_discount=0.99,arm_goal_distance_cost_coeff=1.0,image_size=32x32x3,seed=1_2018-08-18_23-09-000paimgmv/rllab-logger/progress.csv' mode='w' encoding='UTF-8'>
@richardliaw
Copy link
Contributor Author

Is this related to #2028? @robertnishihara

@richardliaw
Copy link
Contributor Author

This error doesn't seem to appear anymore; closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant