-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
url validator, exotic _wildcard_ issue #315
Comments
$ python -c "from validators import __version__; print(__version__)"
0.22.0
$ python -c "from validators import url; print(url('hxxps://wildcard.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113'))"
ValidationError(func=url, args={'value': 'hxxps://wildcard.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113'})
$ python -c "from validators import url; print(url('https://wildcard.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113'))"
True I do not understand, it works as expected. |
I get that hxxp/hxxps is just a convention. I only wanted to raise an issue about the http version. The Markdown syntax in this issue removed the underscores from the url with
When I run this I get this output:
|
This comment was marked as outdated.
This comment was marked as outdated.
Hey @jeroengui the domain Because the following is valid according to $ python -c "from validators import url; print(url('https://_wildcard._jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113', rfc_2782=True))"
True Are you sure it's not |
I'm not sure about the standard and how more I read how more confused I get, but the thing is: Most browsers are accepting urls with subdomains like At the moment it poses a real threat, because a lot of (antivirus) companies use validators for urls just like this one. I don't know if this is something that should be fixed in the browsers, in the standard or in the validators, and the issue is definitely bigger than just this implementation. So to be clear: We are figuring out if We agree that So an underscore can be used in any place of the subdomain except the one right before the dot? Stackoverflow has had some discussion on the topic also: https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it EDIT: I added an example DNS record on my own domain that seems to work fine in most browsers : here ( |
Some more information: It seems like SSL certificates cannot be generated for https://cabforum.org/2018/11/12/ballot-sc12-sunset-of-underscores-in-dnsnames/ |
This patch - + r"(?:[a-zA-Z0-9-_]{0,61}[A-Za-z0-9])?\.)"
+ + rf"(?:[a-zA-Z0-9-_]{{0,61}}[A-Za-z0-9{'_'if rfc_2782 else ''}])?\.)" in validators/src/validators/domain.py Line 45 in 854375f
will work. You'll have to add PR is welcome. |
I use the validators library in a URL scanning project. But this URL that work in the browser doesn't get recognized as a URL (replaced the tt by xx because it's a malicious sample):
hxxps://wildcard.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113/
I suppose the
_wildcard_
part of the domain has something to do with it...The text was updated successfully, but these errors were encountered: