Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url validator, exotic _wildcard_ issue #315

Closed
jeroengui opened this issue Nov 1, 2023 · 7 comments · Fixed by #339
Closed

url validator, exotic _wildcard_ issue #315

jeroengui opened this issue Nov 1, 2023 · 7 comments · Fixed by #339
Labels
enhancement Issue/PR: A new feature expected Issue: Works as designed outdated Issue/PR: Open for more than 3 months

Comments

@jeroengui
Copy link

I use the validators library in a URL scanning project. But this URL that work in the browser doesn't get recognized as a URL (replaced the tt by xx because it's a malicious sample):

hxxps://wildcard.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113/

I suppose the _wildcard_ part of the domain has something to do with it...

@yozachar
Copy link
Collaborator

yozachar commented Nov 2, 2023

$ python -c "from validators import __version__; print(__version__)"
0.22.0

$ python -c "from validators import url; print(url('hxxps://wildcard.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113'))" 
ValidationError(func=url, args={'value': 'hxxps://wildcard.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113'})

$ python -c "from validators import url; print(url('https://wildcard.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113'))"
True

I do not understand, it works as expected. hxxp / hxxps is a convention, not a valid protocol.

@yozachar yozachar added waiting Issue/PR: Wating for reply expected Issue: Works as designed labels Nov 2, 2023
@jeroengui
Copy link
Author

I get that hxxp/hxxps is just a convention. I only wanted to raise an issue about the http version.

The Markdown syntax in this issue removed the underscores from the url with _wildcard_, sorry for that.

import validators
print(validators.url('https://_wildcard_.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113/'))

When I run this I get this output:

ValidationError(func=url, args={'value': 'https://_wildcard_.jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113/'})

@yozachar yozachar added bug Issue: Works not as designed and removed waiting Issue/PR: Wating for reply expected Issue: Works as designed labels Nov 3, 2023
@yozachar

This comment was marked as outdated.

@yozachar yozachar added this to the 0.23.1 milestone Mar 19, 2024
@yozachar yozachar added the outdated Issue/PR: Open for more than 3 months label Mar 19, 2024
@yozachar
Copy link
Collaborator

yozachar commented Mar 19, 2024

Hey @jeroengui the domain _wildcard_.jetsett.in seems to be invalid.

Because the following is valid according to rfc_2782 (TLDR: underscores are prepended not appended).

$ python -c "from validators import url; print(url('https://_wildcard._jetsett.in/wwwmicrfofst/pohwernetsharaven8726skkksusukjs9916192629900177771017000172610720027111113', rfc_2782=True))"
True

Are you sure it's not _wildcard._jetsett.in instead?

@yozachar yozachar added waiting Issue/PR: Wating for reply and removed bug Issue: Works not as designed labels Mar 19, 2024
@yozachar yozachar removed this from the 0.23.1 milestone Mar 19, 2024
@jeroengui
Copy link
Author

jeroengui commented Mar 19, 2024

I'm not sure about the standard and how more I read how more confused I get, but the thing is:

Most browsers are accepting urls with subdomains like _wildcard_.jetsett.in and malicious actors are using them in phishing campaigns etc.

At the moment it poses a real threat, because a lot of (antivirus) companies use validators for urls just like this one.
The urls never get in their blocklists, exposing their users to more risk because these malicious urls work fine in their browsers.

I don't know if this is something that should be fixed in the browsers, in the standard or in the validators, and the issue is definitely bigger than just this implementation.

So to be clear:

We are figuring out if http://_wildcard_.domain.com is a valid url.

We agree that http://_wildcard.jetsett.in and https://my_sarisari_store.typepad.com/ are valid urls.

So an underscore can be used in any place of the subdomain except the one right before the dot?

Stackoverflow has had some discussion on the topic also: https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it

EDIT: I added an example DNS record on my own domain that seems to work fine in most browsers : here (https://_test_.jeroengui.be/)

@jeroengui
Copy link
Author

jeroengui commented Mar 19, 2024

Some more information: It seems like SSL certificates cannot be generated for _wildcard_.domain.com but a wildcard certificate like *.domain.com works just fine for _wildcard_.domain.com

https://cabforum.org/2018/11/12/ballot-sc12-sunset-of-underscores-in-dnsnames/

@yozachar yozachar added enhancement Issue/PR: A new feature expected Issue: Works as designed and removed waiting Issue/PR: Wating for reply labels Mar 20, 2024
@yozachar
Copy link
Collaborator

  • rfc_1034 does not mention anything about underscore.
  • rfc_2782 talks about prepending underscores.

So appending underscore to sub/domain name is not encouraged, but it is technically valid.

This patch

- + r"(?:[a-zA-Z0-9-_]{0,61}[A-Za-z0-9])?\.)"
+ + rf"(?:[a-zA-Z0-9-_]{{0,61}}[A-Za-z0-9{'_'if rfc_2782 else ''}])?\.)"

in

+ r"(?:[a-zA-Z0-9-_]{0,61}[A-Za-z0-9])?\.)"

will work.

You'll have to add ("example_.com", False, True) to pytest parameter list of test_returns_true_on_valid_domain() in test_domain.py.

PR is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue/PR: A new feature expected Issue: Works as designed outdated Issue/PR: Open for more than 3 months
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants