-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch digest to cuckoo filters, to enable O(1) removal #413
Conversation
Hey Yoav, Thanks; will take a look. Two immediate things:
|
Hopefully fixed. Is there a way to test it locally?
Apologies for the noobness. Removed myself. |
See SUBMITTING.md for build info. |
@yoavweiss Thank you for working on the proposal.
|
Oh, I now understand the intent of removing the flag. The motive of the proposal is to build a digest without referring to every response object stored in cache. The fact means that it is not be easy for the client to determine the freshness of the entries that is going to be included in the digest. I am sympathetic to the idea, but I am afraid if the approach works well with the current mechanism of HTTP/2 caching. My understanding is that browsers that exist today only consume a pushed response when it fails to find a freshly cached response in its cache. Otherwise, the pushed response never lands in the browser cache. Unless we change the behavior of the browsers to respect the pushed response even if a freshly cached object already exists in its cache, there's a chance that servers would continually push responses that gets ignored by the client (due to the existence of a freshly cached response in the browser cache with the same URL). @yoavweiss Assuming that I correctly understand the motive of removing the distinction between a fresh digest and a stale digest, I would appreciate it if you could clarify your ideas on the problem. |
Thanks for reviewing, @kazuho! :) My intent was to include all stored resources in the digest, regardless of them being stale or fresh. Entries are added to the digest when a resource is added to the cache and removed from the digest when a resource is removed. The reason is that I think the distinction doesn't make much sense, and maintaining it adds a lot of complexity, basically forcing browsers to recreate the digest for every connection at O(N) cost. Under this premise what servers should do is:
Does that make sense? I'm not sure I understand your reference to the push cache vs. the HTTP cache in your comment. In light of my explanation, is there still an issue there in your view? |
https://github.com/efficient/cuckoofilter is the reference implementation.
Happy to change it. Do you have any specific changes in mind? |
Thank you for the explanation. I now understand the intent better. I think that we need to consider two issues regarding the approach. First is the fact that a browser cache may contain more stale responses than fresh resources. Below are the numbers of cached objects found in my Firefox's cache (to be honest the date is from 2016, I haven't been using Firefox in recent weeks and therefore cannot provide up-to-date data).
As you can see, large scale websites tend to have more stale objects than fresh objects. In other words, including information of stale-cached objects increases the size of the digest roughly three times in this case. Since performance-sensitive resources (that we need to push) are likely to be stored fresh (since they are the most likely ones marked as immutable, or near-immutable), transmitting only the digest of freshly-cached responses makes sense. Second is a configuration issue on the server side. One strategy that can be employed by an H2 server (under the current draft) is to receive a digest of freshly cached resources only, compare the digest against the list of resources the browser should preload by only using the URL, and push the missing resources to the client. It is possible for a H2 server to perform the comparison without actually fetching the resource (from origin or from cache) since only the URL would be required for calculating the digest. The proposal prevents such strategy from being deployed since it requires the ETag values to be always taken into consideration (should they be associated to the HTTP responses). In other words, servers would be required to load response headers of the resources to determine if it needs to be pushed, which could be a huge performance degradation on some deployments. Fortunately, servers could avoid the issue by not including ETags for resources that it may push. I think such change on the server-side configuration would be possible, but we need to make sure if we are to take the path (of removing the fresh vs. stale distinction).
Let me explain using an example. Consider the following case:
When receiving a new request from the client, the server cannot determine if the client has style.css in its cache. Therefore, The client, when observing This would be repeated every time until the cached object either becomes stale or gets removed from the cache. My understanding is that the browser behavior (explained in *) is true for Firefox and also for Chrome. Am I wrong, or missing something?
Thank you for the link. I will try to use it. OTOH, do you have some working code that can actually calculate the cache-digest value taking a list of URLs as an input (something like https://github.com/h2o/cache-digest.js)? I ask this because it would give us a better sense in how the actual size of the digest would be.
One way to proceed would be to split the discussion of |
@yoavweiss Have you considered the approach using Cuckoo filter to generate GCS? I can understand the fact that you do not want to iterate through the browser cache when sending a cache digest. Per-host Cuckoo hash seems like a good solution to the issue. OTOH, as I described in my previous comment, it seems that sending the hash directly has several issues. That is why I am wondering if it would be viable to generate GCS from the per-host Cuckoo filter that would be maintained within the browser. I can see three benefits in the approach, compared to sending the values of Cuckoo filter directly:
In case of The biggest cost of calculating GCS from Cuckoo hash would be the sort operation. But I think that the cost could be negligible compared to the ECDH operation that we would be doing for every connection, considering the fact that the number of entries that we would need to sort would be small (e.g., up to 1,000 entries of uint32_t), and the fact that sort algorithms faster than O(n log n) radix sort can be deployed (e.g. radix sort). WDYT? |
So have a cuckoo filter digest and then put its fingerprints in a GCS? I have not considered that. Need to give it some thought... At the same time, it's not clear to me how that would enable a "stale" vs. "fresh" digests, or handling of improperly cached resources (fresh resources that were replaced on the server). |
I would appreciate it if you could consider. To me it seems it's worth giving a thought.
Under the approach proposed in this PR, structure that stores the per-host digest would look like below.
What I am suggesting that you could change the structure to the following.
In addition to the hash value, each entry will contain the moment when the entry becomes stale. The moment can be calculated when the entry is added. For example, if the entry represents a HTTP response with a When building a GCS digest, you would do the following:
You can skip the operations related to stale objects (i.e. step 2, 3-3, 5) if the server is unwilling to receive stale digests. Whether the approach can be implemented depends on if a client can determine the moment a response becomes stale. I anticipate that it is possible to determine that when you register the entry to Cuckoo filters (which is when you receive the response from the server). |
@yoavweiss @kazuho I'm planning to attend the IETF 100 hackathon this weekend in Singapore. (First timer here. 🤗🔰) I'm happy to collaborate on a (Node.js?) implementation of this spec if either of you are around and interested. I'm fairly familiar with the current spec, having implemented it as a service worker and on the server. |
Wonderful! I'll be attending the hackathon on both days (i.e. Saturday and Sunday). I do not think that I would have time to work on Cache Digests, but would love to discuss with you (or help, if you need) about your work on Cache Digests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback based on WIP implementation of cuckoo filters for cache digest: https://gitlab.com/http2/cache-digest-koel
draft-ietf-httpbis-cache-digest.md
Outdated
7. Let `h2` be the return value of {{hash}} with `fingerprint` and `N` as inputs, XORed with `h1`. | ||
8. Let `h` be `h1`. | ||
9. Let `position_start` be 40 + `h` * `f`. | ||
10. Let `position_end` be `position_start` + `f` * `b`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b
is not defined
draft-ietf-httpbis-cache-digest.md
Outdated
4. While `fingerprint-value` is 0 and `h` > `f`: | ||
4.1. Let `fingerprint-value` be the `f` least significant bits of `hash-value`. | ||
4.2. Let `hash-value` be the the `h`-`f` most significant bits of `hash-value`. | ||
4.3. `h` -= `f` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code feels inconsistent with the writing style used throughout. Would suggest:
Substract
f
fromh
.
draft-ietf-httpbis-cache-digest.md
Outdated
`hash-value` can be computed using the following algorithm: | ||
|
||
1. Let `hash-value` be the SHA-256 message digest {{RFC6234}} of `key`, expressed as an integer. | ||
2. Return `hash-value` modulo N. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is difficult to do in JavaScript where uint operations are typically still limited to 32 bits. The truncation in the previous proposal (step 4) is more compatible and, if I understand correctly, achieves the same objective. Can this be changed to something that does not require 256 bit integer modulo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the need for an integer modulo is due to an error in the specification.
The text in the PR states that N
is a prime number smaller than 2\*\*32
. Could it be the case that N is something to be defined as 2N?
If that is the case, the modulo operation can be implemented by using bitwise AND.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll truncate the hash before the modulo operation.
draft-ietf-httpbis-cache-digest.md
Outdated
7. Let `h2` be the return value of {{hash}} with `fingerprint` and `N` as inputs, XORed with `h1`. | ||
8. Let `h` be `h1`. | ||
9. Let `position_start` be 40 + `h` * `f`. | ||
10. Let `position_end` be `position_start` + `f` * `b`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b
is not defined
draft-ietf-httpbis-cache-digest.md
Outdated
* `ETag`, an array of characters | ||
* `validators`, a boolean | ||
* `URL` a string corresponding to the Effective Request URI ({{RFC7230}}, Section 5.5) of a cached response {{RFC7234}}. | ||
* `ETag` a string corresponding to the entity-tag {{RFC7232}} if a cached response {{RFC7234}} (if the ETag is available; otherwise, null). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an implementor's perspective, it would help me to understand this if examples were provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples for URL and ETag? Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ETag
Btw just noticed a typo on line 321: of a cached response
draft-ietf-httpbis-cache-digest.md
Outdated
@@ -99,10 +102,9 @@ allows a stream to be cancelled by a client using a RST_STREAM frame in this sit | |||
is still at least one round trip of potentially wasted capacity even then. | |||
|
|||
This specification defines a HTTP/2 frame type to allow clients to inform the server of their | |||
cache's contents using a Golomb-Rice Coded Set {{Rice}}. Servers can then use this to inform their | |||
cache's contents using a Cuckoo-fliter {{Cuckoo}} based digest. Servers can then use this to inform their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: filter
draft-ietf-httpbis-cache-digest.md
Outdated
|
||
1. Let `f` be the number of bits per fingerprint, calculated as `P + 3` | ||
2. Let `b` be the bucket size, defined as 4 | ||
3. Let `bytes` be `f`*`N`*`b`/8 rounded up to the nearest integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markdown escaping issue makes N
italic and hides the *
characters.
draft-ietf-httpbis-cache-digest.md
Outdated
9. Let `position_start` be 40 + `h` * `f`. | ||
10. Let `position_end` be `position_start` + `f` \* `b`. | ||
11. While `position_start` < `position_end`: | ||
7. Let `fingerprint-string` be the value of `fingerprint` in base 10, expressed as a string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yoavweiss Curious... May I ask why this change? I don't see any issues with it. Just don't understand what it means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It defines a way to convert fingerprint into a string, so that we can apply {{hash}} to it
I've got an incomplete initial reference implementation at https://github.com/yoavweiss/cache-digests-cuckoo It doesn't yet include removal and querying (that's what I'll be adding next), but I did run it on a list of ~3250 URL (which I got out of my main profile chrome://cache/) and it seems to be creating reasonable sized digests. One more advantage, the digests seem to be highly compressible when sparse. Results so far: In practice, I think ~1000 entries is most probably enough, but it's good to know we can increase the digest size (to avoid having to recreate it), without significant over-the-wire penalty. |
OK, I now have a complete reference implementation and it seems to be working fine. It also exposed an issue with the initial algorithm, forcing table allocation to accommodate a power of 2 number of entries. Latest results for 3250 URLs taken from my cache:
One note: the 1021 entries table had 35 collisions, so it seems like it's insufficient for that number of URLs, unless we're willing to absorb extra pushes for ~1% of the resources. |
@yoavweiss Interesting! It's good to know that we have numbers now. What is the value of P (the false positive ratio) that you used? |
P=8 (so 1/256 false positive) |
Note that the numbers here may be possible to further optimize. One example is semi-sorting of the buckets which the Cuckoo-Filters paper mentions, and which I have not yet implemented. It adds some runtime complexity, but can reduce the fingerprint size per resource by a full bit, so could have resulted in ~9% smaller digests in this case. |
@yoavweiss Awesome! 🤩 Me not being familiar with these data structures (despite reading Wikipedia article 😅), why does the 1/256 probability (~4/1000) result in 35 collisions? |
The 35 collisions are on top of the false positive rates, and represents resources that we failed to put into the table to begin with (due to both their buckets being full). That rate of collisions seems high compared to the results in the paper, so I need to dig further to see who's wrong... |
The collisions are now fixed. It was an algorithm issue, where the entry to be pushed was always the same one at the end of the bucket. I've change that to be a random fingerprint from the bucket, which significantly improved things. The ref implementation is now collision free almost up to the point where the digest is full. |
Resolves #268