Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle RFC5322 #54

Closed
jeking3 opened this issue Aug 17, 2020 · 2 comments
Closed

Handle RFC5322 #54

jeking3 opened this issue Aug 17, 2020 · 2 comments

Comments

@jeking3
Copy link

jeking3 commented Aug 17, 2020

Refer to section 3.4.1 of RFC 5322 and to the Wikipedia page on email addresses.

I added the following test:

jking@pulsar:~/python-email-validator$ git diff
diff --git a/tests/test_main.py b/tests/test_main.py
index d0c627a..7347f6a 100644
--- a/tests/test_main.py
+++ b/tests/test_main.py
@@ -7,6 +7,18 @@ from email_validator import EmailSyntaxError, EmailUndeliverableError, \
 @pytest.mark.parametrize(
     'email_input,output',
     [
+        (
+            '" "@example.com',
+            ValidatedEmail(
+                local_part=' ',
+                ascii_local_part=' ',
+                smtputf8=False,
+                ascii_domain='example.com',
+                domain='example.com',
+                email='" "@example.com',
+                ascii_email='" "@example.com',
+            ),
+        ),
         (
             'Abc@example.com',
             ValidatedEmail(

The test failed:

        else:
            # The local part failed the ASCII check. Now try the extended internationalized requirements.
            m = re.match(DOT_ATOM_TEXT_UTF8 + "\\Z", local)
            if not m:
                # It's not a valid internationalized address either. Report which characters were not valid.
                bad_chars = ', '.join(sorted(set(
                    c for c in local if not re.match(u"[" + (ATEXT if not allow_smtputf8 else ATEXT_UTF8) + u"]", c)
                )))
>               raise EmailSyntaxError("The email address contains invalid characters before the @-sign: %s." % bad_chars)
E               email_validator.EmailSyntaxError: The email address contains invalid characters before the @-sign:  , ".

This is a valid email address, even if hard to use.

@JoshData
Copy link
Owner

I like your attention to detail, but.... It's valid but kind of a ridiculous address. Is this an email address that you use or that you know is actually used in the wild?

As per the README, the goal of this project is not to parse all valid email addresses and quoted forms are specifically listed in the Assumptions part of the README as not supported.

@jeking3
Copy link
Author

jeking3 commented Aug 17, 2020

I did not see that in the README. Thanks.

@jeking3 jeking3 closed this as completed Aug 17, 2020
JoshData added a commit that referenced this issue Apr 11, 2023
…ith better exception messages

People have opened issues several times about quoted local parts being incorrectly rejected. We can give a better error when it happens to head-off questions about it by parsing them so that we know when they occur.

* Detect when a quoted-string local part might be present when splitting the address into a local part and domain part when the address has quoted @-signs in the local part rather than giving an error message about multiple @-signs.
* Remove the surrounding quotes and un-escape the string before checking the syntax of the local part. Return the un-quoted and un-escaped string as the normalized local_part in the returned ValidatedEmail object if it's valid as an unquoted local part.
* Check for invalid characters in the quoted-string (per the spec and our additional Unicode character checks) and raise exceptions.
* Add a new option to accept quoted-string local parts which is off by default. When accepting them, apply Unicode normalization as per dot-atom internationalized addresses and apply minimal backslash escaping.
* Update tests.

See #54, #92.
JoshData added a commit that referenced this issue Apr 15, 2023
…ith better exception messages

People have opened issues several times about quoted local parts being incorrectly rejected. We can give a better error when it happens to head-off questions about it by parsing them so that we know when they occur.

* Detect when a quoted-string local part might be present when splitting the address into a local part and domain part when the address has quoted @-signs in the local part rather than giving an error message about multiple @-signs.
* Remove the surrounding quotes and un-escape the string before checking the syntax of the local part. Return the un-quoted and un-escaped string as the normalized local_part in the returned ValidatedEmail object if it's valid as an unquoted local part.
* Check for invalid characters in the quoted-string (per the spec and our additional Unicode character checks) and raise exceptions.
* Add a new option to accept quoted-string local parts which is off by default. When accepting them, apply Unicode normalization as per dot-atom internationalized addresses and apply minimal backslash escaping.
* Update tests.

See #54, #92.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants