Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ticket 35581: Updated django.core.mail to Python's modern email API [WIP, DO-NOT-MERGE] #2

Closed
wants to merge 5 commits into from

Conversation

medmunds
Copy link
Owner

Trac ticket number

ticket-35581

Branch description

Switch django.core.mail from Python's "legacy email API" to its newer "modern email API."

DO NOT MERGE until these security issues in Python's modern email API have been fixed: python/cpython#80222 and python/cpython#121284.

Checklist

  • This PR targets the main branch.
    [This PR currently targets Refs #35581 -- Improved django.core.mail tests. django/django#18502]
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • n/a I have attached screenshots in both light and dark modes for any UI changes.

Notes

This is the second of two PRs involved in updating django.core.mail from Python's legacy to modern email APIs. See django#18502 for a separate PR that expands existing email tests prior to these changes.

Some info that may be helpful for reviewers:

  1. Forced trailing newlines in text parts: Python's modern email API forces trailing newlines on text/* content in message bodies and attachments (see email: set_content() always assumes trailing EOL python/cpython#121515). In practice, this probably doesn't matter much: most text/* content ends with a newline, and it generally doesn't hurt to add one.

    Unfortunately, a lot of our test cases used variations on body="Content" and assertEqual(generated_body, "Content") without trailing newlines. Modern email's behavior broke all of these tests. I've updated them to include trailing newlines (in both fixture and assertion).

    I'm open to suggestions on this. I did investigate some workarounds for handling the body encoding ourselves, but these either had other downsides (e.g., sending all text/* content with base64 content transfer encoding) or involved copying a big chunk of Python's raw_content_manager implementation into Django's code, which kind of goes against the spirit of this ticket.

  2. MIMEBase attachments: I had originally proposed continuing support for MIMEBase attachments without deprecation. But it turns out modern email's email.message.MIMEPart is a good replacement, so this PR puts MIMEBase on the usual deprecation path. I've also added a docs example of using MIMEPart for inline images.

  3. Deprecations and changes to undocumented behavior:

    • Implemented the deprecation plan from the original django-developers discussion.
    • Removed support for the undocumented EmailMessage.mixed_subtype and EmailMultiAlternatives.alternative_subtype attributes. (But it will raise an error on attempts to use them.)
    • Removed undocumented support for legacy email.charset.Charset objects in the undocumented EmailMessage.encoding property. (There was a single test that covered this; it's been removed.)
    • Otherwise preserved support for the undocumented EmailMessage.encoding property. I suggest renaming this to charset if we ever want to document it.
  4. Changes to RFC 2047 encoded-words in non-ASCII headers:

    • Modern email uses a different RFC 2047 encoder than legacy. The results are semantically equivalent, but modern email tries to minimize the portion of the header using RFC 2047 encoding. This affected several test cases that were looking for a specific encoded-word in the generated message.
    • Modern email supports only the utf-8 charset for generating RFC 2047 encoded-words. With legacy email, Django used the DEFAULT_CHARSET setting (or the undocumented EmailMessage.encoding override) to set the encoded-word charset. The results are semantically equivalent, but this affected one test case. (In practice, it seems extremely unlikely there would be email software still in use that supports RFC 2047 encoding but doesn't understand utf-8.)
    • I dropped tests for applying RFC 2047 encoding to the localpart ("username") of a non-ASCII email address. RFC 2047 specifically prohibits this, and no known MTA or email client supports it. See ticket-25986#12 et seq. (Python will actually still incorrectly use an encoded-word here, but it's a Python bug that might get fixed someday, so I'm no longer testing for it.)

@medmunds medmunds force-pushed the ticket_35581 branch 2 times, most recently from 75cde0c to de19186 Compare August 21, 2024 03:06
@medmunds medmunds force-pushed the refs_35581 branch 4 times, most recently from 9f597ae to 4670862 Compare August 24, 2024 22:41
@medmunds medmunds force-pushed the ticket_35581 branch 2 times, most recently from 54ff7ab to a7927b3 Compare August 28, 2024 00:05
@adamchainz
Copy link

Is this meant to be against the django/django repo, rather than your fork? Or are you deliberately avoiding that until the security issue are fixed?

@medmunds
Copy link
Owner Author

medmunds commented Sep 2, 2024

@adamchainz this is meant to be against django#18502, which updates and adds missing tests to tests/mail prior to changing the mail implementation. (GitHub doesn't allow a PR based on another PR, unless you create a branch for it.)

@medmunds
Copy link
Owner Author

medmunds commented Sep 3, 2024

Incidentally, I could pretty easily be convinced to just drop the undocumented functions and classes (get rid of core/mail/_deprecated.py) rather than taking them through a deprecation period.

This might actually be a "more compatible" approach for SafeMIMEText and SafeMIMEMultipart, depending on how you read the documentation. Those are currently covered only incidentally as the return type of EmailMessage.message():

class EmailMessage

  • message() constructs a django.core.mail.SafeMIMEText object (a subclass of Python’s MIMEText class) or a django.core.mail.SafeMIMEMultipart object holding the message to be sent. If you ever need to extend the EmailMessage class, you’ll probably want to override this method to put the content you want into the MIME object.

(There is no other mention of the SafeMIME classes in Django's documentation.)

With the changes in this PR, EmailMessage.message() now returns a Python email.message.EmailMessage, so it's not possible to be strictly compatible with both of these past claims:

  1. "message() constructs a SafeMIMEText … or a SafeMIMEMultipart object" (and returns it).
  2. "SafeMIMEText [is] a subclass of Python's MIMEText class." (Incidentally, SafeMIMEMultipart is not, and nothing is documented about its base class.)

The implementation currently in the PR preserves claim 2 for the SafeMIMEText class. (And keeps both SafeMIME class implementations available, deprecated, for code that used them without documentation.)

If we would prefer to preserve claim 1, we could instead define SafeMIMEText = SafeMIMEMultipart = email.message.EmailMessage and deprecate the SafeMIME names on import. (And get rid of the unnecessary SafeMIME implementations.)

@medmunds medmunds force-pushed the refs_35581 branch 4 times, most recently from 9061621 to 003bec2 Compare October 29, 2024 19:01
@medmunds medmunds marked this pull request as draft October 29, 2024 19:34
- Converted HeadersCheckMixin to MailTestsMixin for all shared helpers:
  - Hoisted assertStartsWith() from BaseEmailBackendTests.
  - Added matching assertEndsWith().
  - Hoisted get_decoded_attachments() from MailTests.
  - Improved failure reporting in assertMessageHasHeaders().
- Used unittest subTest() to improve handling of compound test cases.
- Replaced `assertTrue(test on string)` with custom assertions,
  so that failure reporting is more informative than `True != False`.
- Used modern email API (policy.default) for tests that reparse
  generated messages, and switched to modern accessors where helpful.
- Split get_raw_attachments() helper out of get_decoded_attachments(),
  and used modern iter_attachments() to avoid finding nested attachments
  in attached message/* emails.
- Stopped using legacy parseaddr.
Added tests for:
- Some behaviors described in email documentation but not (fully)
  covered by existing tests
- Other behavior critical to the proper operation of django.core.mail
  (and prone to inadvertent change in future updates)
- Updated tests that depended on specific legacy email APIs or legacy
  behavior to be more implementation-agnostic.
- Added comments to identify other things that are legacy specific
  and can be expected to change if using Python's modern email API.
- Changed EmailMessage.message() to construct a "modern email API"
  email.message.EmailMessage. Added ``policy`` keyword arg.
- Added support for modern MIMEPart objects in EmailMessage.attach()
  (and EmailMessage constructor, EmailMessage.attachments list).
- Moved now-unnecessary SafeMIME*, forbid_multi_line_headers(), and
  related code to django.core.mail._deprecated (list below).
- Updated SMTP EmailBackend to use modern email.policy.SMTP.

Deprecated:
- Attaching MIMEBase objects (replace with MIMEPart)
- BadHeaderError (modern email uses ValueError)
- SafeMIMEText, SafeMIMEMultipart (unnecessary for modern email)
- django.core.mail.forbid_multi_line_headers()
  (undocumented, but exposed via ``__all__`` and in wide use)
- django.core.mail.message.sanitize_address()
  (undocumented, but in wide use)
- (All but sanitize_address() issue deprecation warning on import,
  to catch ``except`` clause and type-checking-only usage.)

Removed without deprecation:
- EmailMessage.mixed_subtype (undocumented)
- EmailMultiAlternatives.alternative_subtype (undocumented)

Other:
- Changed tests to expect trailing newlines in all text/* message
  bodies and attachment content. Python's modern email API forces
  trailing newlines on text/* content.
- Dropped tests for incorrect RFC 2047 encoding of non-ASCII email
  address localparts. This is specifically prohibited by RFC 2047, and
  not supported by any known MTA or email client. (Python still
  mis-applies encoded-word to non-ASCII localparts, but it is a bug that
  may be fixed in the future.)
- Added tests that try to discourage using Python's legacy email APIs
  in future updates to django.core.mail.
@medmunds
Copy link
Owner Author

medmunds commented Dec 16, 2024

Moved to django#18966

@medmunds medmunds closed this Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants