Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mem efficiency, make full use of client struct memory for reply buffers #8968

Merged
merged 6 commits into from
Jun 8, 2021

Conversation

ShooterIT
Copy link
Collaborator

@ShooterIT ShooterIT commented May 20, 2021

When we allocate a client struct with 16k reply buffer, the allocator we may give us 20K,
This commit makes use of that extra space.
Additionally, it tries to store whatever it can from the reply into the static 'buf' before
allocating a new node for the reply list.

Suggested by @oranagra in #8966 (review)

Similar changes (can all be mentioned in the release notes together): #8975, #8966

@oranagra
Copy link
Member

so there are 3 changes here:

  1. make use of the internal fragmentation space of the client struct.
    i'm not certain if the approach you took is the cleanest one. i imagined using an empty array [] in the struct and a specially crafted call to malloc (like we do for clientReplyBlock). i think your approach may actually be cleaner since other places that allocate a client struct and didn't add anything to malloc will still have something. but on the other hand, they don't currently init the new size variable you added, which may cause issues (i guess dead code currently).
    maybe it's time to finally unify all the places that allocate a client struct and init all it's fields to a common function.
  2. because of that change, there's a chance not all client struct buffers will be of the same size, and copyClientOutputBuffer become more complicated, however that new code there is likely to always be dead code, so it's untested.
    maybe we can change it in a way more of it it'll be tested? like maybe write what we can into the initial static buf, and / or somehow re-use existing code in order to add a new reply list node?
  3. this is actually unrelated to the change in this PR (right?), when attempting to write to the static client buffer, we write what we can if there's room for something, instead of just giving up and putting the while thing in the reply list.

Copy link
Contributor

@yoav-steinberg yoav-steinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM two minor comments.

yoav-steinberg
yoav-steinberg previously approved these changes May 20, 2021
Copy link
Contributor

@yoav-steinberg yoav-steinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more minor thoughts.

@ShooterIT
Copy link
Collaborator Author

ShooterIT commented May 21, 2021

Hi @oranagra

  1. make use of the internal fragmentation space of the client struct.

I am not sure i understand you, firstly, AFAIK, static buf is necessary because we don't want to frequently allocate/free memory to reply clients.

For you said 2, I refactor it to make it easy to test.

3. this is actually unrelated to the change in this PR (right?)

I think it also belongs to this PR, because we do not use the buf when it has room but can't store one entire bulk. In redis protocol, we have many *x\r\n, $x\r\n, always putting reply into buf if possible can make a chance to save one clientReplyBlock. For example, there is 9k space in buf, we need to reply multi bulks and one of them is a 20k(one jemalloc size:)) bulk reply. Before this commit, we may create two clientReplyBlock, firstly, put *x\r\n into buf, and then we create big clientReplyBlock to store 20k bulk, then, we still need to create a 16k bulk to store \r\n or other small bulks. But now, we put 9k of 20K bulk reply into buf, and then only create one clientReplyBlock to store others.

@yoav-steinberg
Copy link
Contributor

I think it also belongs to this PR, because we do not use the buf when it has room but can't store one entire bulk. In redis protocol, we have many *x\r\n, $x\r\n, always putting reply into buf if possible can make a chance to save one clientReplyBlock. For example, there is 9k space in buf, we need to reply multi bulks and one of them is a 20k(one jemalloc size:)) bulk reply. Before this commit, we may create two clientReplyBlock, firstly, put *x\r\n into buf, and then we create big clientReplyBlock to store 20k bulk, then, we still need to create a 16k bulk to store \r\n or other small bulks. But now, we put 9k of 20K bulk reply into buf, and then only create one clientReplyBlock to store others.

@ShooterIT now I see what you mean. buf is almost always used for the protocol headers ($,*, etc.) So two system calls are the norm... I guess, in this case, the optimization is worth it..

@oranagra
Copy link
Member

regarding the 1,2,3 in my previous comment:

  1. i didn't mean we want to eliminate the static buffer.. it is very much needed in order to avoid mallocs and frees for small replies, and also to remain more cache friendly. i'll discuss that topic below.
  2. change copyClientOutputBuffer so that it uses existing code and doesn't have a big pile of dead code. i see you found an elegant solution.
  3. i thought this may be a topic for a different PR, but it is somewhat related, so we can keep it here.

what i meant was that instead of declaring the last member of the client struct a char buf[PROTO_REPLY_CHUNK_BYTES];, we can declare it as char buf[];, and then when we malloc the client struct, we do zmalloc(sizeof(client)+PROTO_REPLY_CHUNK_BYTES).
the result is similar. accessing c->buf[offset] would compute an offset from the client struct (there's no pointer dereference),

the implication of doing that change i suggested is that you can't allocate client structs on the stack, and that anywhere that allocates one has to be changed.
but in some way the approach you took also needs a change in any places that allocates a client struct, since you only initialized your new field in one allocation.
bottom line, i think we can keep your approach, but we do need to init the new field in the other places, or use this opportunity to refactor them all to use the same createClient

@oranagra oranagra added state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten release-notes indication that this issue needs to be mentioned in the release notes labels May 27, 2021
Copy link
Contributor

@yoav-steinberg yoav-steinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@oranagra oranagra changed the title Make full use of 'buf' of client struct Mem efficiency, make full use of client struct memory for reply buffers Jun 8, 2021
@oranagra oranagra merged commit c396fd9 into redis:unstable Jun 8, 2021
@ShooterIT ShooterIT deleted the client-buf branch June 8, 2021 13:24
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
…rs (redis#8968)

When we allocate a client struct with 16k reply buffer, the allocator we may give us 20K,
This commit makes use of that extra space.
Additionally, it tries to store whatever it can from the reply into the static 'buf' before
allocating a new node for the reply list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes indication that this issue needs to be mentioned in the release notes state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants