fix: fix the cache max bytes buffering check #7181

rodesai · 2021-03-09T08:22:19Z

this patch fixes our validation of cache.max.bytes.buffering. We validate based
on the set of live queries in the engine context. However, we weren't populating
this set for sandboxes. This is fixed in this change by introducing a sandbox type
for transient query metadata, and filling in the live queries set with sandboxed
query metadatas

this patch fixes our validation of cache.max.bytes.buffering. We validate based on the set of live queries in the engine context. However, we weren't popullating this set for sandboxes. This is fixed in this change by introducing a sandbox type for transient query metadata, and filling in the live queries set with sandboxed query metadatas

guozhangwang

LGTM!

ableegoldman

This is totally orthogonal but it made me think: since Walker's work on adding/removing threads we now have the ability to resize the cache in Streams. It's not possible through a public API at the moment, but it could be with a pretty trivial KIP. I wonder if this would be useful to ksqlDB? That way we could give the initial queries as much memory as we have available, then give some of that memory up as new ones come in. Resizing the cache should be non-disruptive -- it doesn't require a rebalance or anything, the only effect would be to suddenly flush some portion of the LRU records.

I'm just thinking out loud here, I actually don't know what the current model is. How do we handle distributing cache memory in ksqlDB at the moment?

guozhangwang · 2021-03-09T21:33:32Z

That way we could give the initial queries as much memory as we have available, then give some of that memory up as new ones come in.

Not sure I understand: are you saying we give some threads within a single instance more memory than others?

ableegoldman · 2021-03-09T22:00:42Z

No, I mean we use some new KafkaStreams#resizeCacheMaxBuffering to make sure all instances have equal memory. To take CCloud as an example, we have one thread per query and up to 20 queries. And let's say 20GB of memory we want to make available for the Streams cache (probably a huge overestimate, just looking for round numbers). Initially the user only runs one persistent query, and we give it all 20GB. A week later they add 4 more persistent queries: now that we have 5 queries in total, each should get 4GB for the cache. We would first call KafkaStreams.resizeCacheMaxBuffering(4GB) to resize the original query, then launch the 4 new queries with 4GB. Similarly if some queries are terminated, we resize the cache to give more memory to the remaining queries.

That way we're always utilizing the full available memory for caching without needing to change cache.max.bytes.buffering and restart

rodesai · 2021-03-10T01:15:57Z

Thanks for the idea @ableegoldman. Today, we don't do anything special and I imagine in most cases people just use the streams default, or 0 if they don't want buffering. The intent with this change is to let you set a limit so if you have users that do set a high buffer size we don't blow up. I think your suggestion makes sense, though I'm not sure what the ROI is (e.g. how much adding buffer helps throughput in the typical case and if it's worth adding complexity to maximize the size). That said, could we do something similar with suppress buffer size? That's definitely worth exploring because we need a way to bound the suppress buffer size, but it's generally better to have a bigger buffer because if you run out the query fails (or you lose suppress semantics). If we go that route, we should also have a way to fail if reducing the suppress buffer size would force eviction. Not to go off on another tangent, but as an alternative to tuning per-query sizes for these things have we looked at letting you use a shared cache and suppress buffer across multiple streams apps?

guozhangwang · 2021-03-10T05:28:08Z

That way we're always utilizing the full available memory for caching without needing to change cache.max.bytes.buffering and restart

Not to go off on another tangent, but as an alternative to tuning per-query sizes for these things have we looked at letting you use a shared cache and suppress buffer across multiple streams apps?

I think sharing memory space across instances within a single JVM is a plausible direction. Maybe we can file a ticket for it? :)

ableegoldman · 2021-04-10T00:44:16Z

FYI I filed https://issues.apache.org/jira/browse/KAFKA-12649 for the dynamic cache resizing. I think the LOE on the cache resizing is practically zero since we already implemented it as part of the add/removeStreamThread stuff. We would just need to do a trivial KIP and then a small PR to leverage it in ksqlDB.

We haven't already implemented this for suppress, since the buffer works a bit differently (ie its not a single config that then gets distributed across threads), but we could obviously do something similar there. That said we probably want/need spill-to-disk for suppression anyways, so this in-memory buffer resizing would just be a nice optimization maybe as future work.

ableegoldman · 2021-04-10T00:46:12Z

Lol, in the 6 minutes since I filed that AK ticket someone from the community has already picked it up. Depending on how quickly they move we could get this into the next cloud release with maybe a day of work on our end)

rodesai requested a review from a team as a code owner March 9, 2021 08:22

fix checkstyle

93701fe

guozhangwang approved these changes Mar 9, 2021

View reviewed changes

ableegoldman reviewed Mar 9, 2021

View reviewed changes

fix test

c6e0a71

rodesai merged commit f383800 into confluentinc:master Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix the cache max bytes buffering check #7181

fix: fix the cache max bytes buffering check #7181

rodesai commented Mar 9, 2021 •

edited

Loading

guozhangwang left a comment

ableegoldman left a comment

guozhangwang commented Mar 9, 2021

ableegoldman commented Mar 9, 2021

rodesai commented Mar 10, 2021

guozhangwang commented Mar 10, 2021

ableegoldman commented Apr 10, 2021

ableegoldman commented Apr 10, 2021

fix: fix the cache max bytes buffering check #7181

fix: fix the cache max bytes buffering check #7181

Conversation

rodesai commented Mar 9, 2021 • edited Loading

guozhangwang left a comment

Choose a reason for hiding this comment

ableegoldman left a comment

Choose a reason for hiding this comment

guozhangwang commented Mar 9, 2021

ableegoldman commented Mar 9, 2021

rodesai commented Mar 10, 2021

guozhangwang commented Mar 10, 2021

ableegoldman commented Apr 10, 2021

ableegoldman commented Apr 10, 2021

rodesai commented Mar 9, 2021 •

edited

Loading