-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add draft of throttled copy #258
base: master
Are you sure you want to change the base?
Conversation
Having this feature is a good idea, it could be quite useful. But I think this could be implemented a bit more generically. How about:
It'd be nice if we could make that more generic than just the copy function and expand to all, though it might be hard to specify this on the commandline since there could be several stores involved in a single command. Perhaps we could just expect this to be defined in store config file (https://github.com/folbricht/desync?tab=readme-ov-file#configuration). Thoughts? |
No problem. I'll obviously defer to your expertise in the matter. Edit: Are you good with me limiting this to the |
You don't need to change the
Hope that helps, let me know if not and I can whip up a template on a branch which you can then complete. |
ratelimitstore.go
Outdated
ctx, cancel:= context.WithTimeout(context.Background(), s.options.timeout) | ||
defer cancel() | ||
|
||
if s.options.immediateOrFail{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need immediateOrFail
? Seems unexpected to cause a failure because the rate-limit was exceeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way. I guess I'd lean towards removal.
Related:
I'm not sure what the overall failure handling should be, I guess a client normally wants to wait, but I don't like naked waits with no timeout ability. Perhaps timeouts should be default large.
We could have two paths, one where we use a context with timeout and one without. But that's another code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd certainly be nice to have a context on all these functions but that's a major refactoring of the library that should be done in a separate PR.
The wait really depends on the rate-limit here, and I don't think it should hang forever. Adding a timeout seems overly complex and I'd remove that. It might be possible to calculate some upper limit based on the rate-limit used and the max chunk size perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm my understanding:
- Remove the timeout from the context.
- Allow the ops to wait indefinitely, with no timeout.
I'm ultimately only concerned with throttling cache operations against my local store, so I think that's fine for me. However, I haven't looked at the serving code much, is there a chance that this affects HTTP handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, no custom timeout. This can be implemented some day when contexts are added to all store operations.
This shouldn't affect HTTP stores at all, since we're not really limiting the data-rate, but the speed at which whole chunks can be read/stored. So if the rate-limiter allows one chunk to be accessed, that will be transferred at max speed which means the HTTP timeouts apply like normal. Chunk downloading is limited to the rate-limit on average, not for individual chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test with the HttpHandler. There is a way to reproduce an error.
goroutine 125 [select]:
golang.org/x/time/rate.(*Limiter).wait(0xc0001df360, {0xe27338, 0x13ceb80}, 0x1, {0x0?, 0x0?, 0x139e0c0?}, 0xd56f30)
/home/bstrausser/go/pkg/mod/golang.org/x/time@v0.5.0/rate/rate.go:285 +0x3e5
golang.org/x/time/rate.(*Limiter).WaitN(0x79e845e27a40b76?, {0xe27338, 0x13ceb80}, 0x269063b6f56504d5?)
/home/bstrausser/go/pkg/mod/golang.org/x/time@v0.5.0/rate/rate.go:248 +0x50
github.com/folbricht/desync.RateLimitedStore.HasChunk({{0xe27dc0, 0xc0003f2b00}, 0xc0001df360, {0x3fc999999999999a, 0x1}}, {0x76, 0xb, 0xa4, 0x27, 0x5e, ...})
/home/bstrausser/Git/desync/ratelimitstore.go:59 +0x7e
github.com/folbricht/desync.HTTPHandler.head({{{0xceccfb, 0x5}, 0x1, {0x0, 0x0}}, {0xe273e0, 0xc00040cd50}, 0x0, {0xc0003f0990, 0x1, ...}, ...}, ...)
/home/bstrausser/Git/desync/httphandler.go:77 +0x4e
github.com/folbricht/desync.HTTPHandler.ServeHTTP({{{0xceccfb, 0x5}, 0x1, {0x0, 0x0}}, {0xe273e0, 0xc00040cd50}, 0x0, {0xc0003f0990, 0x1, ...}, ...}, ...)
/home/bstrausser/Git/desync/httphandler.go:47 +0x266
net/http.serverHandler.ServeHTTP({0xc001bda720?}, {0xe25808?, 0xc00061c2a0?}, 0x6?)
/usr/local/go/src/net/http/server.go:2938 +0x8e
net/http.(*conn).serve(0xc0006a0c60, {0xe275d8, 0xc00040d1a0})
/usr/local/go/src/net/http/server.go:2009 +0x5f4
created by net/http.(*Server).Serve in goroutine 9
/usr/local/go/src/net/http/server.go:3086 +0x5cb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just part of the stacktrace. Is there a repro on your branch? What test can I run to see it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I went back to look I couldn't repro.
ratelimitstore.go
Outdated
|
||
func (s RateLimitedLocalStore) GetChunk(id ChunkID) (*Chunk, error) { | ||
|
||
return s.wrappedStore.GetChunk(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to limit reads as well (and I think we should), it would make sense to first call GetChunk() on the underlying store, then wait after we have the data. That isn't super-precise in terms of limiting since there could be bursts of maxChunkSize * n at startup but over time it should even out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'll invert the order and add in support for reads.
ratelimitstore.go
Outdated
immediateOrFail bool | ||
} | ||
|
||
type RateLimitedLocalStore struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the Local
in the name. This could limit any kind of store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I'll fix. I think I was originally wrapping a local store only.
Add throttle on reads. |
|
||
func (s RateLimitedStore) StoreChunk(chunk *Chunk) error { | ||
|
||
// This isn't ideal because what I'm really interested is in size over the wire. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can actually achieve this by using a limit of "bytes per second" (must be equal or larger than max-chunk-size). The here in StoreChunk()
you can
b, err := chunk.Data()
if err != nil {
return err
}
err = s.limiter.WaitN(ctx, len(b))
if err != nil {
return RateLimitedExceeded
}
return s.wrappedStore.StoreChunk(chunk)
It's a little more interesting in GetChunk()
since you don't know how large it is. One way would be to WaitN(ctx, <avg-chunk-size>)
before calling the wrapped store, or WaitN(ctx, <size of data>)
after actually pulling the chunk from the wrapped store. The latter can cause spikes early on when concurrency is used, but should average to the rate you set over time.
As for HasChunk()
, not sure if you really need to limit it, it's very small. You can just use 1
like you have now if you want to limit it too, could do that before calling the wrappedStore.
Adds ability to do throttled copy operations related to filling the cache.
Problem
Basically I'm trying to replicate what curl offers - https://everything.curl.dev/usingcurl/transfers/rate-limiting
We had discussed before doing this store/server side. I'm not sure that works for our case.
We have many devices 500-1500 per facility (warehouse)
Each device is generally doing work, so we need to dribble in updates in the background
Devices are wifi aware so if they are in a better part of the facility they can stream faster.
The store wouldn't have knowledge of IP to device. They store wouldn't know if the device is busy or in a good wifi area.
Approach
I think I took a pretty standard approach for rate-limiting workers. I do think it would have been better if I was able to get the size of the chunks. Right now waiting a fixed period essentially just limits the number of chunks per second, which will get your roughly (Average Chunk Size ) / second of bytes. At least with this way there is no coordination between workers in summing up bytes.