Quoted cache-control directives #128

mnot · 2018-08-02T00:57:51Z

In my testing, many caches do not recognise Cache-Control: max-age="3600" as a valid CC directive.

Since this isn't interoperable, we should consider changing the spec so that the interoperable case is documented.

The text was updated successfully, but these errors were encountered:

mnot · 2018-08-02T01:23:54Z

In bis, we decided to make this SHOULD NOT generate, MUST accept. I think we should consider moving to MUST NOT generate, MAY accept.

reschke · 2018-08-02T04:16:34Z

I believe we should continue to recommend writing sane parsers. For that, a "MAY" is not sufficient.

mnot · 2018-08-02T06:56:37Z

So, I think we need to have a discussion in the WG about whether architectural purity / reuse of parsers trumps interoperability, given that it's been ~five years since that recommendation and there still hasn't been any movement.

reschke · 2018-08-02T07:07:17Z

My theory is that the parsers are broken in many more aspects - I don't believe that writing to spec so that it is consistent with components that just do substring searches is the right thing to do (yes, this needs investigation).

For instance: https://bugzilla.mozilla.org/show_bug.cgi?id=716346

mnot · 2018-08-02T07:19:02Z

AFAICT they're not doing substring searches, they're correctly implementing other aspects of the spec; it's this specific thing.

royfielding · 2018-11-05T21:21:33Z

-1

All I see is one example of a known bug in a broken parser. We do not make all deployed senders of valid HTTP syntax suddenly non-compliant just because one vendor is too lazy to fix an obviously broken parser.

mcmanus · 2018-11-06T02:23:57Z

in bkk: mnot to gather more data with bias towards closing with no action if the data isnt helpful

reschke · 2018-11-06T11:05:20Z

Do you have evidence of parsers which fail for

Cache-Control: max-age="3600"

and work for

Cache-Control: max-age=3600

but also do the right thing for

Cache-Control: extension="max-age=100", max-age="3600"

?

EDIT - actually:

Cache-Control: extension="max-age=100", max-age=3600

mnot · 2018-12-18T05:24:11Z

Yes; on this page, see

HTTP cache should reuse a response with quoted Cache-Control: max-age

and

HTTP cache must ignore the phrase "max-age" in a quoted string, even when max-age has a quoted value too

respectively. Currently, it looks like Chrome, Firefox, Edge, nginx, Squid, and Varnish all have that set of behaviours.

reschke · 2018-12-18T05:51:44Z

Ok. great work.

At this point I'd like to understand why a UA would do that. If the parser is capable enough to understand the syntax, why it then does not handle the values as specified? That is, is this intentional or just an oversight? Yes, that likely requires studying the code.

mnot · 2018-12-19T04:09:30Z

/cc @bslassey @ddragana @ericlaw1979 @yadij @bsdphk

Any insights as to why parsing Cache-Control behaves the way it does in your implementations, as per above?

Thanks,

yadij · 2018-12-19T05:36:06Z

AFAIK for Squid it is just that logic still uses the RFC2616 definitions for parsing the C-C header.

RFC2616 documented exact lower case field-names for all the mandatory controls. It also uses the delta-seconds ABNF without mention of quoting for the fields with numeric parameters and explicitly mentions quotes in others. So Squid interprets the mixed-case field name and quoting as extensions and thus ignore.

With neither complaints nor normative requirements in RFC723x to support these odd syntax we had no reason to look at these until now. We can change easily enough.

mnot · 2018-12-20T07:30:54Z

Sorry, Varnish was a false positive there (it was caching things by default); I've updated the tests.

annevk · 2019-01-02T08:39:46Z

@MattMenke2 can probably help here. (I suspect implementations in browsers are mostly old parsers that haven't really been touched or checked if they actually conform to anything. And figuring out whether they can be changed at this point is a somewhat costly endeavor.)

mcmanus · 2019-03-25T15:57:36Z

ietf104: this isn't interoperable - should core note this?

mcmanus · 2019-03-25T15:59:40Z

ietf104: need to continue digging to into test scenarios

MattMenke2 · 2019-03-25T16:11:34Z

Chrome has one global shared parser for handling quotes in header lines regardless of context, and breaking up lines with commas into multiple lies (With a blacklist of headers this doesn't work for).

That code breaks up Cache-Control: extension="max-age=100", max-age=3600

into:
Cache-Control: extension="max-age=100"
Cache-Control: max-age=3600

We then handle them as separate headers, and our (very simple) max-age parser has no trouble handling the second header.

The quote handling logic doesn't consider the context of the quotes (after an equal sign, after a semicolon, first character of a string, after a random alphanumeric character, etc). It treats them all as a start of a quoted string.

royfielding · 2019-03-26T10:00:21Z

I found the bug that causes Apache httpd's cache to fail on quoted max-age values. It correctly parses the field to extract either max-age=100 or max-age="100", but when it converts the value to a large integer it assumes the value is unquoted.

This is in modules/cache/cache_util.c: ap_cache_control()

                if (!ap_cstr_casecmpn(token, "max-age", 7)) {
                    if (token[7] == '='
                            && !apr_strtoff(&offt, token + 8, &endp, 10)
                            && endp > token + 8 && !*endp) {
                        cc->max_age = 1;
                        cc->max_age_value = offt;
                    }
                }

reschke · 2019-03-26T10:20:11Z

Awesome. Let's fix it :-)

royfielding · 2019-03-26T11:17:43Z

https://bz.apache.org/bugzilla/show_bug.cgi?id=63288

mnot · 2019-07-05T04:43:22Z

@MattMenke2 it'd be much safer to do that with an allow list, rather than a deny list.

Overall -- the reality here seems to be that for whatever reason, the majority of cache implementations (by a large margin) don't honour Cache-Control: max-age if the value is quoted. I strongly suspect that they treat things like stale-while-revalidate (where implemented) in the same way, despite this text:

Cache directives are identified by a token, to be compared case-insensitively, and have an optional argument, that can use both token and quoted-string syntax. For the directives defined below that define arguments, recipients ought to accept both forms, even if one is documented to be preferred. For any directive not defined by this specification, a recipient MUST accept both forms.

I understand that having these two forms being considered equivalent is architecturally more consistent and aesthetically more pleasing, but that is not the world we are living in, and I see no evidence that the world will change to match these expectations on any reasonable timeline.

If cache implementers were standing up and saying "yes, we should fix that bug" I would feel differently, but they're not; as far as I can see, they're both resource-poor and very reluctant to make arbitrary changes to products for little perceptible benefit.

Therefore, my proposal is:

Change the text above to:

Cache directives are identified by a token, to be compared case-insensitively, and have an optional argument, that can use either the token or quoted-string syntax. Directives will define which form is expected; senders MUST send in that form, and recipients MUST accept that form. Recipients MAY accept both forms (converting from one to the other where required and possible).

Update the existing directives defined in this specification to follow the approach outlined above.

reschke · 2019-07-05T06:26:44Z

Chrome has one global shared parser for handling quotes in header lines regardless of context, and breaking up lines with commas into multiple lies (With a blacklist of headers this doesn't work for).

That code breaks up Cache-Control: extension="max-age=100", max-age=3600

into:
Cache-Control: extension="max-age=100"
Cache-Control: max-age=3600

We then handle them as separate headers, and our (very simple) max-age parser has no trouble handling the second header.

The quote handling logic doesn't consider the context of the quotes (after an equal sign, after a semicolon, first character of a string, after a random alphanumeric character, etc). It treats them all as a start of a quoted string.

But the question remains why Chrome doesn't process

Cache-Control: extension="max-age=100", max-age="3600"

the same way as:

Cache-Control: extension="max-age=100", max-age=3600

MattMenke2 · 2019-07-06T00:44:18Z

I do apologize, but it's been 4 months, and I've since lost all context. While still working on network-adjacent stuff in Chrome I'm no longer on Chrome's network stack team (And even if I were, I'd be reluctant to dig into this code and relevant specs once every four months to pick up a stalled discussion). Unfortunately, that also means I'm not a good person to commit to sweeping changes to Chrome's header parsing logic.

reschke · 2019-09-30T10:51:58Z

Strongly disagreed.

A correct parser already MUST handle (ignore) extension directives, and to do that, it MUST handle both token and quoted-string values. Saying that this does not apply to certain existing directives only complicates the parser, and makes it more likely to fail on extensions.

mnot · 2019-09-30T10:56:09Z

What real-world cache-control directive uses quotes today?

reschke · 2019-09-30T11:11:49Z

Why is that relevant? Do you want to remove quoted-string as option in general?

reschke · 2019-09-30T11:15:06Z

I don't know that we're going to be able to get more data here. Can we resolve this with MUST NOT generate, SHOULD accept?

We currently say "MUST accept" for extension directives. Do you want to change that?

For existing directives, we say "ought to". Changing that to "SHOULD" is a normative change (that I would actually support). But then the unevitable question would be: under which circumstances is it ok to ignore the requirement?

mnot · 2019-10-01T05:04:43Z

We currently say "MUST accept" for extension directives. Do you want to change that?

For existing directives, we say "ought to". Changing that to "SHOULD" is a normative change (that I would actually support). But then the unevitable question would be: under which circumstances is it ok to ignore the requirement?

The discussion and the data gathered make me think the most realistic thing to do is to deprecate all quoted forms of cache-control directives. If I wanted to introduce a new cache-control directive that used quoting today -- especially if it had the possibility of a comma or other syntacticly interesting component as payload -- I'd use a different header field.

I do not believe we're going to be able to get the majority of cache implementations to change their behaviour here; if they step up and say they will, I'll be happy to be wrong. Absent that, we should make the specification match reality, so it's relevant.

reschke · 2019-10-01T06:49:07Z

Well, we need to be clear what's being discussed. We currently have: "MUST accept extensions that might use quoted strings as params". Do you have evidence that existing current implementations get this wrong?

mnot · 2019-10-01T06:53:18Z

https://cache-tests.fyi/#cc-parse

reschke · 2019-10-01T07:01:41Z

Yes, I was looking at them. But the results seem to be "behaviour check results" or "test dependency failed", so I have no idea how to evaluate them...

mnot · 2019-10-01T07:20:51Z

The filled-in circle is "yes"; the empty circle is "no". The first test indictates that almost no cache honours max-age with quotes; only ATS and Safari do.

reschke · 2019-10-01T07:34:43Z

So "Does HTTP cache ignore the phrase max-age in a quoted string (before the "real" max-age)?" - which is about extension parameters with quoted strings is almost (ahem) universally passing - isn't this exactly the case we're discussing right now?

MattMenke2 · 2019-10-01T15:53:11Z

The reason why we treat

Cache-Control: extension="max-age=100", max-age="3600"

and

Cache-Control: extension="max-age=100", max-age=3600

differently is because the Cache-Control logic has its own parser.

We do have a shared parser for stuff in the format: foo=bar; foo2="bar2";..., but a lot of stuff doesn't use it. And I'm not sure we use it for anything that doesn't need to support semi-colon-separated entries.

Sorry for dropping this discussion - I was getting this discussion confused with a more general one on comma handling, which is something I don't have time to work on and can't feel like I can commit to on behalf of Chrome. For things limited to parsing of a single header I certainly can work on, however.

Edit: Fixed paragraph breaks for readability.

tfpauly · 2019-11-18T09:23:14Z

Discussed in Singapore: want to get more info from cache implementations by January to understand this better.

davidben · 2019-11-18T09:46:02Z

@MattMenke2 already described this up in #128 (comment), but since folks in Singapore were asking what the clients were doing and there seems to be some confusion, I'll (re)clarify Chrome's behavior:

Chrome has generic logic splitting headers, including Cache-Control by commas, in a way which pays attention to quoting. So Cache-Control: extension="foo,max-age=10,bar", max-age=10 will be split correctly. This logic does not remove the quotes, it simply does a quote-aware split by comma and removes surrounding whitespace.

Various bits of Cache-Control logic parse the split values. The parser for timestamps like max-age and stale-while-revalidate simply looks for values with a case-insensitive ${DIRECTIVE}= prefix and tries to decode the remainder as an integer. Quotes have not been removed at this point, so quoted integers are rejected. This is consistent with the original definitions of those directives.
https://tools.ietf.org/html/rfc2616#section-14.9
https://tools.ietf.org/html/rfc5861#section-3

(This is based on source inspection just now. @MattMenke2 can correct me if I got this wrong.)

MattMenke2 · 2019-11-18T11:49:11Z

That's my reading as well, though I've never touched the cache code either.

Also, if we want double-quote handling it Cache-Control lines, it's not clear to me if we want semi-colon handling as well, which is part of every instance in which Chrome's general HTTP header quote handler is used, so is a non-optional part of Chrome's internal API for doing so.

Should "Cache-Control: max-age=10; min-dingos=2" be parsed as max-age=10, with an an supported secondary parameter? If so, we'd be deviating even more from spec. If not, we'd be introducing a novel use of quotes in HTTP headers - supporting them in them in header values where they're never needed, and where semi-colons are not supported.

reschke · 2019-11-18T12:38:25Z

@MattMenke2 - no, Cache-Control does not have semicolon-separated characters.

reschke · 2019-11-18T12:44:37Z

@davidben - thanks for the explanation (and apologies for not processing the earlier feedback properly).

Two comments/questions on the genetic split-by-comma code:

I assume it will process an escaped double quote inside quoted-string properly?
As @mnot said earlier: this really ought to be based on a white list; nobody stops people from defining fields where this code will do the wrong thing (and maintaining a black list would really not scale)

As a general comment: so yes, I do now understand that the design of the parser predates RFC 7234, and just does try to implement 2616. In particular, extracting the actual values uses specific logic per directive.

The intent of the change in RFC 7234 was to allow implementers to get rid of special cases: to be able to use the same code path for all parameters. Was it ever considered to implement this? If so, what led to the decision not to?

MattMenke2 · 2019-11-18T12:56:04Z

Not supporting semi-colons itself seems like a special case, no? So rather than a list of headers that do/do not support general parsing, we'd need a list of headers that don't adhere to the pattern, a list of ones that don't support semi-colons, and a list of those that do, no? The general behavior in the case of unsupported parameters is just to ignore them, rather than to ignore the entire header.

reschke · 2019-11-18T13:26:45Z

Well, at the end of the day there is common syntax in HTTP header fields. Sometimes it looks like that, but there a lot of subtle differences. Yes, that's bad (and that's why we are trying to improve things for new header fields).

And yes, Cache-Control (comma-separated list name/value pairs) is very different from things like Content-Type (comma-separated list of identifiers + optional parameters).

I can see that it would be nice to use a generic parser here, and just either drop parameters or fail if some are present. But even in that case, you'd probably accept more than what should be accepted (for instance: "Cache-Control: foo;,bar").

What should be possible is to build parsers from common components, or to have a single parser that is sufficiently parametrized.

MattMenke2 · 2019-11-18T13:30:57Z

Oh, and as for whether anyone on the Chrome team considered updating behavior to match RFC 7234 - neither David nor I work on the cache, but I don't think anyone from the network stack team, at least, was aware of this change from RFC 2616.

davidben · 2019-11-18T13:47:39Z

I assume it will process an escaped double quote inside quoted-string properly?

Looking at the code, I believe so. I don't see a test for it at the HTTP layer function, but the lower-level thing it uses does have logic and tests for escapes.

reschke · 2019-11-18T13:49:53Z

...and in fact it doesn't show up in https://greenbytes.de/tech/webdav/rfc7234.html#changes.from.rfc.2616, which might be one reason why it has been ignored (yes, one reason :-).

This makes me think that advertising this change more would be better than reverting it.

davidben · 2019-12-11T03:43:02Z

An observation: Assuming I'm reading the grammars correctly, Cache-Control aligns with a structured header sh-dict. If we were to make it an sh-dict, integer-valued keys like max-age would use a value of sh-integer. This would mean max-age=1234 is okay and max-age="1234" is not.

If so, that suggests reverting it, better aligning with both running code and structured headers.

bsdphk · 2020-01-03T10:10:29Z

seconded.

reschke · 2020-01-03T11:13:47Z

I don't see how this is relevant for Cache-Control. We can't make normative protocol changes based on the design of a new header field syntax.

FWIW, I disagree with:

This would mean max-age=1234 is okay and max-age="1234" is not.

...whether that is okay or not would depend on the definition of the header field. It could accept both.

mnot · 2020-02-02T10:21:28Z

Discussed in Basel.

For directives we define, upgrade SHOULD NOT generate quoted form to MUST NOT.
Remove the sentence "For any directive not defined by this specification, a recipient must accept both forms."

Fixes #128

mnot added the caching label Aug 2, 2018

mnot added the discuss label Oct 9, 2018

mnot removed the discuss label Nov 12, 2018

mnot self-assigned this Nov 12, 2018

mnot added the discuss label Feb 26, 2019

mnot removed the discuss label Apr 12, 2019

mnot added the discuss label Jul 5, 2019

mnot added a commit that referenced this issue Feb 3, 2020

Change requirements for handling different forms of cache directives

d5d9483

Fixes #128

mnot mentioned this issue Feb 3, 2020

Change requirements for handling different forms of cache directives #288

Merged

mnot added has-proposal and removed discuss labels Feb 3, 2020

reschke closed this as completed in #288 Feb 4, 2020

Quoted cache-control directives #128

Quoted cache-control directives #128

Comments

mnot commented Aug 2, 2018

mnot commented Aug 2, 2018

reschke commented Aug 2, 2018 • edited Loading

mnot commented Aug 2, 2018

reschke commented Aug 2, 2018 • edited Loading

mnot commented Aug 2, 2018

royfielding commented Nov 5, 2018

mcmanus commented Nov 6, 2018

reschke commented Nov 6, 2018 • edited Loading

mnot commented Dec 18, 2018

reschke commented Dec 18, 2018

mnot commented Dec 19, 2018

yadij commented Dec 19, 2018 • edited Loading

mnot commented Dec 20, 2018

annevk commented Jan 2, 2019

mcmanus commented Mar 25, 2019

mcmanus commented Mar 25, 2019

MattMenke2 commented Mar 25, 2019

royfielding commented Mar 26, 2019

reschke commented Mar 26, 2019

royfielding commented Mar 26, 2019

mnot commented Jul 5, 2019

reschke commented Jul 5, 2019 • edited Loading

MattMenke2 commented Jul 6, 2019

reschke commented Sep 30, 2019

mnot commented Sep 30, 2019

reschke commented Sep 30, 2019

reschke commented Sep 30, 2019

mnot commented Oct 1, 2019

reschke commented Oct 1, 2019

mnot commented Oct 1, 2019

reschke commented Oct 1, 2019

mnot commented Oct 1, 2019

reschke commented Oct 1, 2019

MattMenke2 commented Oct 1, 2019 • edited Loading

tfpauly commented Nov 18, 2019

davidben commented Nov 18, 2019 • edited Loading

MattMenke2 commented Nov 18, 2019

reschke commented Nov 18, 2019

reschke commented Nov 18, 2019

MattMenke2 commented Nov 18, 2019

reschke commented Nov 18, 2019

MattMenke2 commented Nov 18, 2019

davidben commented Nov 18, 2019

reschke commented Nov 18, 2019

davidben commented Dec 11, 2019

bsdphk commented Jan 3, 2020

reschke commented Jan 3, 2020 • edited Loading

mnot commented Feb 2, 2020

reschke commented Aug 2, 2018 •

edited

Loading

reschke commented Aug 2, 2018 •

edited

Loading

reschke commented Nov 6, 2018 •

edited

Loading

yadij commented Dec 19, 2018 •

edited

Loading

reschke commented Jul 5, 2019 •

edited

Loading

MattMenke2 commented Oct 1, 2019 •

edited

Loading

davidben commented Nov 18, 2019 •

edited

Loading

reschke commented Jan 3, 2020 •

edited

Loading