-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/url: URL.String() hashbang "#!" encodes as "#%21" #19917
Comments
@dragonsinth is there some RFC that we could quote/refer to that defines |
Sure, in RFC 2396 (https://www.ietf.org/rfc/rfc2396.txt):
RFC3986 (https://www.ietf.org/rfc/rfc3986.txt) then later redefines several of those characters (including If you read down to appendix A of RFC3986, "Collected ABNF for URI", you'll find the following rules:
As you can see, sub-delims is distinct from pct-encoded and they are allow in the fragment without encoding. For further reference, Google Search build a whole spec on it a while ago (though it's now deprecated, but lots of app still use this): https://developers.google.com/webmasters/ajax-crawling/docs/getting-started |
The sub-delims isn't handled properly which affects more than just fragments. |
Change https://golang.org/cl/61650 mentions this issue: |
Can you provide a specific example? |
Unless my reading of RFC 3986 is incorrect, sub-delims is part of the reserved set as well. |
@fraenkel I should have been more clear: I meant, are there any examples where programs are broken because Go escapes sub-delims in the Path? You've observed that This issue is about sub-delims in fragments. @dragonsinth has convinced me that we need to support unescaped
The first approach avoids adding public APIs. Both proposals might possibly break user code, but the second approach breaks less code. The first approach changes the semantics of URL.Fragment, while the second approach keeps the current semantics of URL.Fragment -- a caller must switch to RawFragment and EscapedFragment to get the new escaping behavior. The only code potentially broken by the second approach is code like the following, where the caller is expecting that u.String() will never contain an unescaped sub-delim: u, err := url.Parse(...)
foo(u.String()) I lean towards the second approach, despite my general dislike of the API, because it's the same pattern as RawPath and that change seemed to work out OK. Also, I'm paranoid of breaking user code. Thoughts? /cc @rsc for the API question |
i strongly prefer the first approach; a user should have no reasonable expectation that Here's how we're working around this today:
|
In case I wasn't clear, in both proposals, Also, here is some code that would be broken by the first proposal: var u url.URL
u.Fragment = `'x`
fmt.Sprintf("<a href='%s'>", u.String()) // prints <a href=''x'> because ' is a sub-delim You could argue that no one should write code like this. I mention this because we actually found code similar to the above in net/http/fs.go. It's possible (likely, even) that code like the above exists in the wild today. This code happens to work today because u.String() will escape the single quote. My first proposal will break the above code. My second proposal will not. |
Ahh, that makes sense. Thanks! In that case, Fragment + RawFragment seems good to me; it mirrors Path and Query. It seems like a tiny break in behavior to make it net more sensible. But I'm not super strongly opinionated here as long as u.String() is great. |
I can make something up.
userinfo should not escape ! since it too allows sub-delims. I haven't seen anyone complain. |
Sub-delims in Path work just fine. That was fixed in go1.5, via #5684, which added RawPath and EncodedPath, similarly to a solution proposed above for Fragment. I am not concerned about userinfo given that userinfo is essentially deprecated. |
What would be the replacement for userinfo? |
I would wait for an actual report (rather than a hypothetical) before doing anything about userinfo, especially given that userinfo is essentially deprecated. |
Vote to second approach. Looks reasonable. |
I didn't get to thinking about this for Go 1.10, sorry. To go down the RawFragment/EscapedFragment path I think we would need to have evidence that something cares about the distinction between #!wut and #%21wut. Does anything care? It really really really should not. (This is in contrast to paths, where the RFCs explicitly allow caring about the distinction between %2F and /, even though it's impossible to tell them apart if you only record a single string.) |
I super care, because browsers / javascript see the two as different values. https://www.google.com/#!wat (Ultimately that means round tripping a url from a browser -> go URL -> browser changes the value seen by javascript code.) |
There is an issue for this. |
Any progress on this? |
Sweet! |
In Go 1.11, the issue golang/go#19917 has been resolved, and "#!" is now encoded as "#!" rather than "#%21". Follows golang/go@8a33045.
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?1.7.4
What did you do?
https://play.golang.org/p/R-4KVNXa9Z
What did you expect to see?
http://example.org/#!wut
What did you see instead?
http://example.org/#%21wut
Extra info
#6784
#5684
Looks like the first issue was marked a duplicate of the second, but the underlying problem was never fixed.
The text was updated successfully, but these errors were encountered: