Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker integration timeout #963

Closed
LotharSee opened this issue May 29, 2014 · 9 comments · Fixed by #988
Closed

Docker integration timeout #963

LotharSee opened this issue May 29, 2014 · 9 comments · Fixed by #988
Assignees
Labels
Milestone

Comments

@LotharSee
Copy link
Contributor

Hi,

I got some timeouts using the docker check, resulting in an error of the check.

It happened:

  • With a default setup of a Ubuntu 14.04 inside a Vagrant, using package version of Docker with 2 running containers. It was timing out on the event endpoint.
  • I was able to make it work by upgrading to Docker 0.11, but the bug appeared again when I was running 8+ containers. It was timing out on both event and cgroup metrics.
 Traceback (most recent call last):
  File "docker.py", line 292, in <module>
    check.check(instance)
  File "docker.py", line 142, in check
    containers = self._get_containers(instance)
  File "docker.py", line 216, in _get_containers
    return self._get_json("%(url)s/containers/json" % instance, params={"size": 1})
  File "docker.py", line 241, in _get_json
    request = urllib2.urlopen(req)
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "docker.py", line 105, in unix_open
    return self.do_open(UnixHTTPConnection(unix_socket), new_req)
  File "/usr/lib/python2.7/urllib2.py", line 1187, in do_open
    r = h.getresponse(buffering=True)
  File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 409, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python2.7/socket.py", line 476, in readline
    data = self._sock.recv(self._rbufsize)
socket.timeout: timed out
@remh
Copy link

remh commented Jun 6, 2014

@LotharSee It was fixed with 4.3.1 right ?

@LotharSee
Copy link
Contributor Author

Partially.
The timeout can occur while getting events and while getting the container list.

I caught the one with the events (to allow trying to get containers and metrics after), but it is raised with the list of containers (if we don't have it, there is nothing else to do).

So it does not solve the real issue: the timeout itself.

@remh remh added this to the 5.1.0 milestone Jun 12, 2014
@alq666
Copy link
Member

alq666 commented Jun 18, 2014

As discussed, 2 things to check with a lot of containers

  1. run the collector in the foreground with a patch of socket.py (drop in the debugger when you get a timeout)
  2. see why the opener has to be reinstalled each time it runs.

@alq666 alq666 added the checks label Jun 18, 2014
@LotharSee
Copy link
Contributor Author

The reason of the timeout is here: moby/moby#6546

I will disable this flag (by default or with a configuration parameter). As a result, we will stop reporting docker.disk.size.
It is better than no metrics at all.

@remh
Copy link

remh commented Jun 19, 2014

Can we disable the flag only when we are getting a timeout ?

@LotharSee
Copy link
Contributor Author

I did, we will automatically fallback to the query without the size if the first timeout.

@remh
Copy link

remh commented Jun 19, 2014

👍

@MarkTroutfetter
Copy link

Any update on when this fix will get released? I'm very excited to use this feature but it fails for Amazon Linux AMI + docker integration.

@LotharSee LotharSee modified the milestones: 5.1.0, 4.x Aug 21, 2014
@LotharSee
Copy link
Contributor Author

@MarkTroutfetter This got released with the Agent 4.4. The Github milestone was wrong.
So it is already fixed!

But I also suggest you to go directly for the Agent 5.0 (already released as a Docker container, in a few days for everyone else) as it greatly improve the Docker integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants