Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert host to lowercase on URL building #386

Closed
asvetlov opened this issue Nov 27, 2019 · 8 comments
Closed

Convert host to lowercase on URL building #386

asvetlov opened this issue Nov 27, 2019 · 8 comments

Comments

@asvetlov
Copy link
Member

MacOS can return CamelCased FQDN for the local host, e.g. Mac-1137.local.
At least I see it on Azure CI for aiohttp project.

IIRC the name should be lower-cased.
Well, technically the upper case is supported still, the comparison rule is case-insensitive.
But lower-casing is highly recommended, yarl should apply this conversion.

@webknjaz
Copy link
Member

I recall hitting something similar in CherryPy/Cheroot. Let me check how I dealt with it.

@webknjaz
Copy link
Member

So according to my old research, the user can put a lot of random stuff, including Unicode range chars, as a hostname on the OS level: cherrypy/cheroot#27 (comment).
And it'll be returned as-is via socket.gethostname() w/o any validation.

I think this issue may need to be extended with that in mind.

@asvetlov
Copy link
Member Author

hostname is IDNA-encoded in yarl, so weird chars should go.
IDNA doesn't convert to lower case, this is a problem I've faced.

@webknjaz
Copy link
Member

I think the comparison should be case-insensitive but the internal representation could be stored as provided on input.

@asvetlov
Copy link
Member Author

asvetlov commented Nov 27, 2019

There is no case-insensitive comparison operator is Python.
People always use a == b and a != b for such things.

@webknjaz
Copy link
Member

I thought you were talking about implementing a custom __eq__() and friends.

@asvetlov
Copy link
Member Author

__eq__ requires a class for case insensitive string.
The ci-str class should define __hash__ to be used as a dictionary key.
The hash is a number, different strings produce different hashes.
Python assumes that objects are not equal if their hashes are not equal, this knowledge is widely utilized by dict implementation for example.
So, hash(ci_str('aBc')) has a high chance to be different from any form of hash('abc') etc. It leads to broken in check.

That's why case-insensitive strings are error-prone when implemented in Python.
Choosing a canonical form and conversion to it is much easier and safer.

@asvetlov
Copy link
Member Author

That's interesting, we have tests for lowercased host for years:

def test_lowercase():
    url = URL("http://github.com")
    assert url.raw_host == "github.com"
    assert url.host == url.raw_host

def test_lowercase_nonascii():
    url = URL("http://Айдеко.Рф")
    assert url.raw_host == "xn--80aidohy.xn--p1ai"
    assert url.host == "айдеко.рф"

I think, all that we need is lowercasing the host always.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants