Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tracing in etcd server #11166

Closed
jingyih opened this issue Sep 18, 2019 · 5 comments
Closed

Add tracing in etcd server #11166

jingyih opened this issue Sep 18, 2019 · 5 comments

Comments

@jingyih
Copy link
Contributor

jingyih commented Sep 18, 2019

Today, etcd server has generic warning if a request is taking too long to be applied. If a range request takes over 100ms to finish, the server generates a log message like this:
W | etcdserver: read-only range request "key:\"foo\" " with result "range_response_count:0 size:5" took too long (1.310910984s) to execute

When this happens, it is usually not easy to track down the exact cause. There are multiple OSS issues opened regarding this, both in etcd and Kubernetes repo. I think the most interesting request type is range, because: A) it is usually frequently served. B) client could ask for a lot of keys in a single request, which often results in a very long time to finish.

When a range request is taking too long, it will be good to know the time spend on each step of the request lifecycle. Example steps could be:

  • Get read index from raft leader
  • Wait until the server’s local backend store finishes applying the raft entry corresponding to the leader’s read index.
  • Check user’s permission on the requested keys
  • Get range result from mvcc store
    • Range keys in in-memory index tree
    • Range keys in underlying bolt db
  • Filter and sort key value pairs

At the same time, we need to make sure:

  • Tracing does not generate too much entries in log.
  • Tracing does not noticeably degrade server performance.
@jingyih
Copy link
Contributor Author

jingyih commented Sep 18, 2019

I think @YoyinZyc has already started working on this feature.

@jfbai
Copy link

jfbai commented Dec 4, 2019

@jingyih @YoyinZyc Hi, is this trace log enabled by default? I did not observed trace log along with took too long.

My etcd server version is 3.4.2, which is upgraded from 3.3.10.

@YoyinZyc
Copy link
Contributor

YoyinZyc commented Dec 4, 2019

Jingyi Hu Yuchen Zhou Hi, is this trace log enabled by default? I did not observed trace log along with took too long.

My etcd server version is 3.4.2, which is upgraded from 3.3.10.

Could you provide more details about the request you made? Basically, tracing is enabled for range, put and compact requests. Btw, it works only when you enable --logger=zap which is not default currently.

@jfbai
Copy link

jfbai commented Dec 5, 2019

Jingyi Hu Yuchen Zhou Hi, is this trace log enabled by default? I did not observed trace log along with took too long.
My etcd server version is 3.4.2, which is upgraded from 3.3.10.

Could you provide more details about the request you made? Basically, tracing is enabled for range, put and compact requests. Btw, it works only when you enable --logger=zap which is not default currently.

@YoyinZyc Got it, thanks a lot. :)

In my case, etcd is used as k8s backend storage, range, put operations are much more concerned.

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants