Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted local parts fail validation; reserved R-LDH labels (double dashes) fail validation #92

Closed
spaceone opened this issue Nov 14, 2022 · 6 comments

Comments

@spaceone
Copy link

Just taking the valid example from Wikipedia shows errors:

https://en.wikipedia.org/wiki/Email_address#Examples

>>> validate_email('" "@example.org')                                                                                                                                                                                                         
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 313, in validate_email
    allow_empty_local=allow_empty_local)
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 424, in validate_email_local_part
    raise EmailSyntaxError("The email address contains invalid characters before the @-sign: %s." % bad_chars)
email_validator.EmailSyntaxError: The email address contains invalid characters before the @-sign: QUOTATION MARK, SPACE.                                                                                                                     
>>> validate_email('"john..doe"@example.org')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 313, in validate_email
    allow_empty_local=allow_empty_local)
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 424, in validate_email_local_part
    raise EmailSyntaxError("The email address contains invalid characters before the @-sign: %s." % bad_chars)
email_validator.EmailSyntaxError: The email address contains invalid characters before the @-sign: FULL STOP, QUOTATION MARK.                                                                                                                 
>>> validate_email('"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com')                                                                                                                                                  
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 304, in validate_email
    raise EmailSyntaxError("The email address is not valid. It must have exactly one @-sign.")
email_validator.EmailSyntaxError: The email address is not valid. It must have exactly one @-sign.
>>> validate_email('postmaster@[123.123.123.123]')                                                                                                                                                                                            
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/idna/core.py", line 327, in uts46_remap
    raise IndexError()
IndexError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 489, in validate_email_domain_part
    domain = idna.uts46_remap(domain, std3_rules=False, transitional=False)
  File "/usr/lib/python3/dist-packages/idna/core.py", line 332, in uts46_remap
    _unot(code_point), pos + 1, repr(domain)))
idna.core.InvalidCodepoint: Codepoint U+005B not allowed at position 1 in '[123.123.123.123]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 319, in validate_email
    domain_part_info = validate_email_domain_part(parts[1], test_environment=test_environment, globally_deliverable=globally_deliverable)                                                                                                     
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 491, in validate_email_domain_part
    raise EmailSyntaxError("The domain name %s contains invalid characters (%s)." % (domain, str(e)))
email_validator.EmailSyntaxError: The domain name [123.123.123.123] contains invalid characters (Codepoint U+005B not allowed at position 1 in '[123.123.123.123]').                                                                          
>>> validate_email('postmaster@[IPv6:2001:0db8:85a3:0000:0000:8a2e:0370:7334]')
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/idna/core.py", line 327, in uts46_remap
    raise IndexError()
IndexError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 489, in validate_email_domain_part
    domain = idna.uts46_remap(domain, std3_rules=False, transitional=False)
  File "/usr/lib/python3/dist-packages/idna/core.py", line 332, in uts46_remap
    _unot(code_point), pos + 1, repr(domain)))
idna.core.InvalidCodepoint: Codepoint U+005B not allowed at position 1 in '[IPv6:2001:0db8:85a3:0000:0000:8a2e:0370:7334]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 319, in validate_email
    domain_part_info = validate_email_domain_part(parts[1], test_environment=test_environment, globally_deliverable=globally_deliverable)
  File "/usr/lib/python3/dist-packages/email_validator/__init__.py", line 491, in validate_email_domain_part
    raise EmailSyntaxError("The domain name %s contains invalid characters (%s)." % (domain, str(e)))
email_validator.EmailSyntaxError: The domain name [IPv6:2001:0db8:85a3:0000:0000:8a2e:0370:7334] contains invalid characters (Codepoint U+005B not allowed at position 1 in '[IPv6:2001:0db8:85a3:0000:0000:8a2e:0370:7334]').
@spaceone
Copy link
Author

See also #77

@JoshData
Copy link
Owner

This is covered in the README. Obsolete forms of email addresses are rejected because they are likely to cause trouble in most situations that the library is intended for.

@jenstroeger
Copy link

jenstroeger commented Jan 4, 2023

Following up on issue pydantic/pydantic#4910, an email address produced by Hypothesis’ emails() strategy also fails to validate: 0@a0--0.ac

@JoshData
Copy link
Owner

JoshData commented Jan 4, 2023

Hypothesis is (probably?) wrong. RFC 5890 says:

Labels within the class of R-LDH labels that are not prefixed with "xn--" are also not valid IDNA labels. To allow for future use of mechanisms similar to IDNA, those labels MUST NOT be processed as ordinary LDH labels by IDNA-conforming programs and SHOULD NOT be mixed with IDNA labels in the same zone.

("LDH labels" are defined as parts of a hostname between dots (in this case a0--0). "R-LDH labels" are defined as those that have dashes in the 3rd and 4th characters.)

I understand this paragraph to mean that the pattern "??--" other than "xn--" (which is for IDNA) is reserved for future use and should be rejected today as invalid by applications that handle internationalized domains according to this RFC, which this library does.

This is the opinion of the developer of the Python IDNA library at kjd/idna#27 who knows more about this than I do, so I'm really just going with what he says there.

@jenstroeger
Copy link

Hmm… tagging @Zac-HD and @DRMacIver from Hypothesis to chime in.

@Zac-HD
Copy link

Zac-HD commented Jan 6, 2023

Thanks for the ping! Reading RFC-5890 indicates that a0--0 is a "reserved LDH label", indeed the minimal-such label according to Hypothesis, and that the only valid R-LDH labels are those beginning xn-- and ending with a punycode suffix (which Hypothesis does not attempt to generate).

I'll ship a bugfix to exclude such invalid domain labels shortly, and really appreciate the report 😍

@JoshData JoshData closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2023
JoshData added a commit that referenced this issue Feb 4, 2023
…n checks

Including invalid RFC 5890 R-LDH labels (e.g. '??--' other than 'xn--'), see #92.

The IDNA library will check this but its error messages are not friendly, and for future proofing it's better to not assume it does any general syntax checks.
@JoshData JoshData changed the title valid email adresses are markes as invalid Quoted local parts fail validation; reserved R-LDH labels (double dashes) fail validation Feb 28, 2023
JoshData added a commit that referenced this issue Apr 11, 2023
…ith better exception messages

People have opened issues several times about quoted local parts being incorrectly rejected. We can give a better error when it happens to head-off questions about it by parsing them so that we know when they occur.

* Detect when a quoted-string local part might be present when splitting the address into a local part and domain part when the address has quoted @-signs in the local part rather than giving an error message about multiple @-signs.
* Remove the surrounding quotes and un-escape the string before checking the syntax of the local part. Return the un-quoted and un-escaped string as the normalized local_part in the returned ValidatedEmail object if it's valid as an unquoted local part.
* Check for invalid characters in the quoted-string (per the spec and our additional Unicode character checks) and raise exceptions.
* Add a new option to accept quoted-string local parts which is off by default. When accepting them, apply Unicode normalization as per dot-atom internationalized addresses and apply minimal backslash escaping.
* Update tests.

See #54, #92.
JoshData added a commit that referenced this issue Apr 15, 2023
…ith better exception messages

People have opened issues several times about quoted local parts being incorrectly rejected. We can give a better error when it happens to head-off questions about it by parsing them so that we know when they occur.

* Detect when a quoted-string local part might be present when splitting the address into a local part and domain part when the address has quoted @-signs in the local part rather than giving an error message about multiple @-signs.
* Remove the surrounding quotes and un-escape the string before checking the syntax of the local part. Return the un-quoted and un-escaped string as the normalized local_part in the returned ValidatedEmail object if it's valid as an unquoted local part.
* Check for invalid characters in the quoted-string (per the spec and our additional Unicode character checks) and raise exceptions.
* Add a new option to accept quoted-string local parts which is off by default. When accepting them, apply Unicode normalization as per dot-atom internationalized addresses and apply minimal backslash escaping.
* Update tests.

See #54, #92.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants