-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating stored headers #165
Comments
@andypaicu and @MattMenke have more informed opinions than I do at this point. :) |
I think you want @MattMenke2. |
Thanks, Anne! MattMenke is also me, just my personal account (Which I prefer not to use for work. Lawyers scare me). So do please do use @MattMenke2. |
Chrome's lists can be found at https://cs.chromium.org/chromium/src/net/http/http_response_headers.cc?gsn=PERSIST_RAW&g=0&l=47. Chrome has two lists: Those it shouldn't updated when using cached response after receiving a 304 (kNonUpdateHeaders + kNonUpdatedHeaderPrefixes), and those it doesn't put in the cache at all (kHopByHopResponseHeaders+ kChallengeResponseHeaders+ kCookieResponseHeaders + kSecurityStateHeaders + "Content-Range" + headers explicitly indicated by a no-cache header) Chrome's no-update list was recently modified to match FireFox's, with a few extra rules. In particular, Chrome's list includes "www-authenticate", "x-frame-options", "x-xss-protection", "x-content-", and "x-webkit-". "www-authenticate" definitely makes sense in the list (Well, I assume no browser caches proxy challenge responses, anyways, but it makes sense if the proxy challenge header does). It seems to me like "x-frame-options" shouldn't be in that list, as it could reasonably be generated based on the requesting origin, rather than give a list of all origins (Both to keep down response sizes, and to avoid exposing other URLs) "x-xss-protection" doesn't have that problem, but seems strange to single it out as not updated, unless there's a good reason to treat it differently. I'm not sufficiently familiar with the "x-content-" and "x-webkit-" prefixes to comment usefully on them. |
Also, it seems weird to have "Proxy-Authorization" on the list, as it's a response header. Though it we want to keep it for legacy reasons, we should probably include "Authorization", too. |
Re: Proxy-Authorization and Authorization: my inclination would be to not specify request headers in the list, as they don't have any function in responses (specified or practical), and so it would be strange to specify them, and especially strange to specify some but not others. If implementations felt like they wanted to still omit them because they don't remember why they were omitted and were wary of changing, that's fine, but I'd point out that the intermediary caches that do update stored headers don't do this. |
I'll update my PR in WPT as well as the CDN tests to include more headers, so we can capture exactly what's current practice here. |
Just to clarify, it looks to me FireFox does currently omit Proxy-Authorization, which is where Chrome apparently got its behavior from. See line 901 of https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpResponseHead.cpp#879. Or did you mean that FireFox will definitely remove that line? |
@MattMenke2 I edited :) It'd be great to understand why they do so. @mcmanus any idea? |
don't know.. its older than the 2007 version control cutover - so you really have to badly want the answer to go looking. Im going to guess that they were enumerating hop by hop headers and felt Proxy* should always fall into that bucket. |
Is there general agreement in this issue to add two lists to draft-ietf-httpbis-cache:
I do see that the contents of those lists don't yet necessarily have consensus. |
I think it's one unified list. Where do you think it'd be different? |
#165 (comment) indicates that the set of headers that Chrome avoids caching is a different set from the one it avoids updating. (e.g. Chrome caches |
Updating Content-Encoding or Content-Location also seems pretty weird, though both should be cached. |
I can see a case for Are there any other headers that you think should be considered (whether or not we have separate lists)? The common case for these ones that appear weird is a server that is trying to correct a previous mistake; e.g., it sent the wrong header, and wants to update the cached entry. |
It would be helpful if someone could put down a tentative proposal to compare with implementations and headers with interesting properties. (E.g., it's not clear to me from the above discussion if not being able to update |
I think we can do it in groups -
Any other headers of interest? I'm reluctant to start opting headers out of update just because "it seems weird", since that's an open-ended set, and unlike with browsers, we can't expect intermediary caches to be updated every time somebody comes up with a new header.
|
Note that having to respect |
Content-Encoding seems problematic - want to get passed a malware scanner of some type? Service a resource with the wrong Content-Encoding, possibly get it in the browser cache (Which is generally at a lower layer than decompression), though the resource fails to load. Then re-request the resource with a 304 that changes the Content-Encoding. Admittedly, service workers and the like are perfectly capable of using their own methods to bypass filters. |
Also I don't think "Some browsers may want to ignore updates to this header" is really a great option, going forward - I think we want to be consistent here, unless there's a pretty compelling reason not to. |
I understand, but that's not a HTTP cache -- you're inserting non-standard functions on top of a HTTP cache for implementation convenience. Restricting the behaviour of all HTTP caches (including intermediaries and server-side caches) because some browser caches might be doing extra things like this doesn't make sense. Even for those browsers, there are other strategies that they might take -- e.g., invalidating a cache entry that's an optimised structure if the content-type changes. Likewise, a browser that stores decoded responses needs to be aware of the semantics of content-encoding anyway, and take appropriate steps to behave correctly. |
I'm not following...What "extra things" are you talking about? |
I.e., storing and updating these headers works just fine until you start doing non-specified things with the HTTP cache. If you do such things, it's your responsibility to make sure that it doesn't interfere with the proper operation of the cache. Re-defining caching to suit current browser behaviour might work if browser caches were the only HTTP caches in existence, but they're not. |
Those two points seem to contradict each other - only one can be correct. If Content-Encoding is a property of the content, and not merely the transfer of the content, then caches should store that content, as opposed to a mutated version of it, no? C-E being a property of content is a fiction - that may be what the spec says, but that's not the reality. You could not make a functional browser that behaved otherwise. I thought one of the purposes of the fetch spec was to reconcile RFCs with the way things actually work, or can reasonably be modified to work. |
HTTP caches do store that content -- at least in intermediaries. If browser caches store the decompressed content, they need to account for the impact of doing so, as it's not specified behaviour. In the case you described, if a browser stores something with an unrecognised content-coding thereby evading a filter, and then a 304 updates the content-coding, there are a number of valid strategies that could be taken to assure correct behaviour by the filter:
Why the changing the behaviour of all caches (including all that are currently deployed) to accommodate this non-standard behaviour the most reasonable path, especially when there are a number of ways they can correctly behave within the constrains of the current spec? I could see an argument that HTTP needs to accommodate caches that store responses without content-encodings, or provide advice for those that do. Would that be helpful? |
I'm working on a patch to add these lists, with notes that the contents aren't final. I'll include the "cached but not updated" list, probably saying that caches MAY ignore updates to headers in that list. As Mark convinces the browsers to update various headers on that list, or the browsers convince the CDNs not to, we can tighten the specification, but MAY seems to capture the status quo. |
That sounds like a pretty drastic change from the status quo that I'm not sure is really worth the amount of effort involved. Especially as it leaves much of the design challenges up to the implementers and will eventually result in implementers having to reverse engineer their respective setups to get consistent behavior across sites. |
The alternative is to replace the third paragraph with something like:
However, I'm really only comfortable with this approach if the list is limited and relatively static. There's a lot of cargo culting apparent in the current implementation behaviours, and I see no reason to accommodate them. Specifically:
|
Sorry for the late contribution to this issue. I ran into a related ambiguity recently and thought it was relevant:
IIUC, the combination would mean that if the original response had a Apologies if I misread something; I should really run some tests. (edit to add: Either way, the decision on request time should probably be specified explicitly.) |
Discussed in Sinapore; I will create a conservative PR as a basis of further discussion. |
@agrover pointed to the gecko code that lists: I couldn't see anything that strips |
OK so the previous list was headers Gecko doesn't cache. This is a list of headers Gecko doesn't update on a 304 (or 206 partial content). So there are those |
See draft PR #337. This covers all of the connection headers (both listed in It includes It also includes It does not include It does not yet include Thoughts? |
The problem with Cookie headers isn't replacing the cached ones, but with using the cached Cookie headers in the first place - we don't want to reuse Cookie headers from old responses, setting the old Cookies once again. |
If that's the goal, it seems like it'd be better to do it in the cookie spec -- i.e., specify that cookies should only be used when the response is what we used to call "first hand" -- obtained directly from the origin server, without an |
This is currently taken care of by Fetch, which hooks into the cookie spec at 11.4 of https://fetch.spec.whatwg.org/#http-network-fetch (which happens after consulting the cache and missing). The integration could be much cleaner, but it does what you're asking for. What would you like added to the cookie spec to make that more transparent? |
So sending cookies hooks in above the cache layer, and setting them hooks in above it, in the spec? That seems unfortunate. |
Setting the |
Yes, I'd argue the setting the cookies in the cookie store should happen at the same layer, however. |
We could move the Either way, it seems like we're handling the problem discussed above in Fetch, not RFC6265bis. Are there still changes we should make to the latter regarding this discussion (short of rewriting the whole thing in terms of Fetch, which I don't think anyone's signing up for)? |
I think we could move it and it probably makes sense if that's when they're processed. (That algorithm also takes a credentials flag which I'd hope folks set correctly if they invoke it directly, but I'm not aware of anyone invoking it directly.) We should maybe also update https://fetch.spec.whatwg.org/#http-header-layer-division to account for this somewhat subtle difference. |
OK, that seems to take care of Cookies. I'm going to mark the PR as ready for review. |
The editors have signed off on #337. Any further comments before we merge? |
7234 requires stored headers to be updated upon a
304
orHEAD
response.However, there are a number of issues:
x-content-*
,content-*
andx-webkit-*
- see Chrome and WebkitIt seems like a lot of the bugs referenced above originated from the confusing language around omitting entity headers in RFC2616; that was removed in RFC7234 (see current language).
I think we need to:
Mozilla's list of headers they don't update might be a good starting point for the first two (see also related bug).
There are also browser tests and an outstanding PR.
Note that this is a security issue, because some browsers filter out headers starting with
Content-
, which includesContent-Security-Policy
-- i.e., a browser that has an old copy of the response won't see the new CSP header on a 304 response.(this might also apply to 206 responses, since there's header combination there too)
The text was updated successfully, but these errors were encountered: