-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quoted cache-control directives #128
Comments
In bis, we decided to make this SHOULD NOT generate, MUST accept. I think we should consider moving to MUST NOT generate, MAY accept. |
I believe we should continue to recommend writing sane parsers. For that, a "MAY" is not sufficient. |
So, I think we need to have a discussion in the WG about whether architectural purity / reuse of parsers trumps interoperability, given that it's been ~five years since that recommendation and there still hasn't been any movement. |
My theory is that the parsers are broken in many more aspects - I don't believe that writing to spec so that it is consistent with components that just do substring searches is the right thing to do (yes, this needs investigation). For instance: https://bugzilla.mozilla.org/show_bug.cgi?id=716346 |
AFAICT they're not doing substring searches, they're correctly implementing other aspects of the spec; it's this specific thing. |
-1 All I see is one example of a known bug in a broken parser. We do not make all deployed senders of valid HTTP syntax suddenly non-compliant just because one vendor is too lazy to fix an obviously broken parser. |
in bkk: mnot to gather more data with bias towards closing with no action if the data isnt helpful |
Do you have evidence of parsers which fail for
and work for
but also do the right thing for
? EDIT - actually:
|
Yes; on this page, see
and
respectively. Currently, it looks like Chrome, Firefox, Edge, nginx, Squid, and Varnish all have that set of behaviours. |
Ok. great work. At this point I'd like to understand why a UA would do that. If the parser is capable enough to understand the syntax, why it then does not handle the values as specified? That is, is this intentional or just an oversight? Yes, that likely requires studying the code. |
/cc @bslassey @ddragana @ericlaw1979 @yadij @bsdphk Any insights as to why parsing Thanks, |
AFAIK for Squid it is just that logic still uses the RFC2616 definitions for parsing the C-C header. RFC2616 documented exact lower case field-names for all the mandatory controls. It also uses the delta-seconds ABNF without mention of quoting for the fields with numeric parameters and explicitly mentions quotes in others. So Squid interprets the mixed-case field name and quoting as extensions and thus ignore. With neither complaints nor normative requirements in RFC723x to support these odd syntax we had no reason to look at these until now. We can change easily enough. |
Sorry, Varnish was a false positive there (it was caching things by default); I've updated the tests. |
@MattMenke2 can probably help here. (I suspect implementations in browsers are mostly old parsers that haven't really been touched or checked if they actually conform to anything. And figuring out whether they can be changed at this point is a somewhat costly endeavor.) |
ietf104: this isn't interoperable - should core note this? |
ietf104: need to continue digging to into test scenarios |
Chrome has one global shared parser for handling quotes in header lines regardless of context, and breaking up lines with commas into multiple lies (With a blacklist of headers this doesn't work for). That code breaks up Cache-Control: extension="max-age=100", max-age=3600 into: We then handle them as separate headers, and our (very simple) max-age parser has no trouble handling the second header. The quote handling logic doesn't consider the context of the quotes (after an equal sign, after a semicolon, first character of a string, after a random alphanumeric character, etc). It treats them all as a start of a quoted string. |
I found the bug that causes Apache httpd's cache to fail on quoted max-age values. It correctly parses the field to extract either max-age=100 or max-age="100", but when it converts the value to a large integer it assumes the value is unquoted. This is in modules/cache/cache_util.c: ap_cache_control()
|
Awesome. Let's fix it :-) |
@MattMenke2 it'd be much safer to do that with an allow list, rather than a deny list. Overall -- the reality here seems to be that for whatever reason, the majority of cache implementations (by a large margin) don't honour
I understand that having these two forms being considered equivalent is architecturally more consistent and aesthetically more pleasing, but that is not the world we are living in, and I see no evidence that the world will change to match these expectations on any reasonable timeline. If cache implementers were standing up and saying "yes, we should fix that bug" I would feel differently, but they're not; as far as I can see, they're both resource-poor and very reluctant to make arbitrary changes to products for little perceptible benefit. Therefore, my proposal is:
|
But the question remains why Chrome doesn't process
the same way as:
|
I do apologize, but it's been 4 months, and I've since lost all context. While still working on network-adjacent stuff in Chrome I'm no longer on Chrome's network stack team (And even if I were, I'd be reluctant to dig into this code and relevant specs once every four months to pick up a stalled discussion). Unfortunately, that also means I'm not a good person to commit to sweeping changes to Chrome's header parsing logic. |
Strongly disagreed. A correct parser already MUST handle (ignore) extension directives, and to do that, it MUST handle both token and quoted-string values. Saying that this does not apply to certain existing directives only complicates the parser, and makes it more likely to fail on extensions. |
What real-world cache-control directive uses quotes today? |
Why is that relevant? Do you want to remove quoted-string as option in general? |
We currently say "MUST accept" for extension directives. Do you want to change that? For existing directives, we say "ought to". Changing that to "SHOULD" is a normative change (that I would actually support). But then the unevitable question would be: under which circumstances is it ok to ignore the requirement? |
The discussion and the data gathered make me think the most realistic thing to do is to deprecate all quoted forms of cache-control directives. If I wanted to introduce a new cache-control directive that used quoting today -- especially if it had the possibility of a comma or other syntacticly interesting component as payload -- I'd use a different header field. I do not believe we're going to be able to get the majority of cache implementations to change their behaviour here; if they step up and say they will, I'll be happy to be wrong. Absent that, we should make the specification match reality, so it's relevant. |
Well, we need to be clear what's being discussed. We currently have: "MUST accept extensions that might use quoted strings as params". Do you have evidence that existing current implementations get this wrong? |
Yes, I was looking at them. But the results seem to be "behaviour check results" or "test dependency failed", so I have no idea how to evaluate them... |
The filled-in circle is "yes"; the empty circle is "no". The first test indictates that almost no cache honours max-age with quotes; only ATS and Safari do. |
So "Does HTTP cache ignore the phrase max-age in a quoted string (before the "real" max-age)?" - which is about extension parameters with quoted strings is almost (ahem) universally passing - isn't this exactly the case we're discussing right now? |
The reason why we treat Cache-Control: extension="max-age=100", max-age="3600" and Cache-Control: extension="max-age=100", max-age=3600 differently is because the Cache-Control logic has its own parser. We do have a shared parser for stuff in the format: foo=bar; foo2="bar2";..., but a lot of stuff doesn't use it. And I'm not sure we use it for anything that doesn't need to support semi-colon-separated entries. Sorry for dropping this discussion - I was getting this discussion confused with a more general one on comma handling, which is something I don't have time to work on and can't feel like I can commit to on behalf of Chrome. For things limited to parsing of a single header I certainly can work on, however. Edit: Fixed paragraph breaks for readability. |
Discussed in Singapore: want to get more info from cache implementations by January to understand this better. |
@MattMenke2 already described this up in #128 (comment), but since folks in Singapore were asking what the clients were doing and there seems to be some confusion, I'll (re)clarify Chrome's behavior: Chrome has generic logic splitting headers, including Cache-Control by commas, in a way which pays attention to quoting. So Various bits of (This is based on source inspection just now. @MattMenke2 can correct me if I got this wrong.) |
That's my reading as well, though I've never touched the cache code either. Also, if we want double-quote handling it Cache-Control lines, it's not clear to me if we want semi-colon handling as well, which is part of every instance in which Chrome's general HTTP header quote handler is used, so is a non-optional part of Chrome's internal API for doing so. Should "Cache-Control: max-age=10; min-dingos=2" be parsed as max-age=10, with an an supported secondary parameter? If so, we'd be deviating even more from spec. If not, we'd be introducing a novel use of quotes in HTTP headers - supporting them in them in header values where they're never needed, and where semi-colons are not supported. |
@MattMenke2 - no, Cache-Control does not have semicolon-separated characters. |
@davidben - thanks for the explanation (and apologies for not processing the earlier feedback properly). Two comments/questions on the genetic split-by-comma code:
As a general comment: so yes, I do now understand that the design of the parser predates RFC 7234, and just does try to implement 2616. In particular, extracting the actual values uses specific logic per directive. The intent of the change in RFC 7234 was to allow implementers to get rid of special cases: to be able to use the same code path for all parameters. Was it ever considered to implement this? If so, what led to the decision not to? |
Not supporting semi-colons itself seems like a special case, no? So rather than a list of headers that do/do not support general parsing, we'd need a list of headers that don't adhere to the pattern, a list of ones that don't support semi-colons, and a list of those that do, no? The general behavior in the case of unsupported parameters is just to ignore them, rather than to ignore the entire header. |
Well, at the end of the day there is common syntax in HTTP header fields. Sometimes it looks like that, but there a lot of subtle differences. Yes, that's bad (and that's why we are trying to improve things for new header fields). And yes, Cache-Control (comma-separated list name/value pairs) is very different from things like Content-Type (comma-separated list of identifiers + optional parameters). I can see that it would be nice to use a generic parser here, and just either drop parameters or fail if some are present. But even in that case, you'd probably accept more than what should be accepted (for instance: "Cache-Control: foo;,bar"). What should be possible is to build parsers from common components, or to have a single parser that is sufficiently parametrized. |
Oh, and as for whether anyone on the Chrome team considered updating behavior to match RFC 7234 - neither David nor I work on the cache, but I don't think anyone from the network stack team, at least, was aware of this change from RFC 2616. |
Looking at the code, I believe so. I don't see a test for it at the HTTP layer function, but the lower-level thing it uses does have logic and tests for escapes. |
...and in fact it doesn't show up in https://greenbytes.de/tech/webdav/rfc7234.html#changes.from.rfc.2616, which might be one reason why it has been ignored (yes, one reason :-). This makes me think that advertising this change more would be better than reverting it. |
An observation: Assuming I'm reading the grammars correctly, If so, that suggests reverting it, better aligning with both running code and structured headers. |
seconded. |
I don't see how this is relevant for Cache-Control. We can't make normative protocol changes based on the design of a new header field syntax. FWIW, I disagree with:
...whether that is okay or not would depend on the definition of the header field. It could accept both. |
Discussed in Basel.
|
In my testing, many caches do not recognise
Cache-Control: max-age="3600"
as a valid CC directive.Since this isn't interoperable, we should consider changing the spec so that the interoperable case is documented.
The text was updated successfully, but these errors were encountered: