-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Description-Content-Type field #258
Changes from 7 commits
fcd83ac
d454938
774d091
33dc8c5
e4b39db
a3c07c5
741cad9
375fe19
1f03d37
4b17b0b
a8c3b77
c669ea7
03be683
7fc0752
087abb1
b371449
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,7 +21,7 @@ The current core metadata file format, version 1.2, is specified in :pep:`345`. | |
|
||
However, the version specifiers and environment markers sections of that PEP | ||
have been superceded as described below. In addition, metadata files are | ||
permitted to contain the following additional field: | ||
permitted to contain the following additional fields: | ||
|
||
Provides-Extra (multiple use) | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
@@ -52,6 +52,90 @@ respectively. | |
It is legal to specify ``Provides-Extra:`` without referencing it in any | ||
``Requires-Dist:``. | ||
|
||
Description-Content-Type | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
A string containing the format of the distribution's description, so that | ||
tools can intelligently render the description. | ||
|
||
Historically, PyPI supported descriptions in plain text and `reStructuredText | ||
(reST) <http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html>`_, | ||
and could render reST into HTML. However, it is common for distribution | ||
authors to write the description in `Markdown | ||
<https://daringfireball.net/projects/markdown/>`_ (`RFC 7763 | ||
<https://tools.ietf.org/html/rfc7763>`_) as many code hosting sites render | ||
Markdown READMEs, and authors would reuse the file for the description. PyPI | ||
didn't recognize the format and so could not render the description correctly. | ||
This resulted in many packages on PyPI with poorly-rendered descriptions when | ||
Markdown is left as plain text, or worse, was attempted to be rendered as reST. | ||
This field allows the distribution author to specify the format of their | ||
description, opening up the possibility for PyPI and other tools to be able to | ||
render Markdown and other formats. | ||
|
||
The format of this field is the same as the ``Content-Type`` header in HTTP | ||
(e.g.: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an "i.e." ("for explanation") rather than an "e.g." ("for example"), since RFC 1341 is the relevant normative reference here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in a3c07c5. Thanks! |
||
`RFC 1341 <https://www.w3.org/Protocols/rfc1341/4_Content-Type.html>`_). | ||
Briefly, this means that it has a ``type/subtype`` part and then it can | ||
optionally have a number of parameters: | ||
|
||
Format:: | ||
|
||
Description-Content-Type: <type>/<subtype>; charset=<charset>[; <param_name>=<param value> ...] | ||
|
||
The ``type/subtype`` part has only a few legal values: | ||
|
||
- ``text/plain`` | ||
- ``text/x-rst`` | ||
- ``text/markdown`` | ||
|
||
The ``charset`` parameter can be used to specify whether the character set in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. charset is a character encoding. See for example https://www.w3.org/International/articles/http-charset/index There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @merwok: The doc has changed a bit since you reviewed in February, mostly as a result of me merging msabramo#2. Does the explanation look good in the new version:
or do you still think it needs to be tweaked? |
||
use is UTF-8, ASCII, etc. If ``charset`` is not provided, then it is | ||
recommended that the implementation (e.g.: PyPI) treat the content as | ||
UTF-8. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe move the sentence "If charset is not provided..." down below? (See next comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does e4b39db help? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @pfmoore that this can be dropped in favour of the more concise specification below. |
||
|
||
Other parameters might be specific to the chosen subtype. For example, for the | ||
``markdown`` subtype, there is a ``variant`` parameter that allows specifying | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it expected that PyPI will support rendering different variants of markdown (which will presumably require using a number of different libraries to achieve this)? I would much rather just have a single idea of what "Markdown" is and just pick a single standard and stick with it, ideally something that is compatible with what Github thinks Markdown is so that the same file will work with both. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The intention here (possibly needs to be clarified?) is that CommonMark is what PyPI would initially support, as it seems like the best standard right now, but I was trying to leave room for other variants so we can adapt to change. I do wonder if I should remove the references to GFM and original Markdown, so that folks don't think they can use them and get their exact semantics (because it will likely be unrecognized by PyPI and thus treated as CommonMark). Folks are likely to see GFM and get excited and specify and then expect it to work exactly like GitHub and then complain when it doesn't. Should I remove GFM and original Markdown and say CommonMark is the only legal variant right now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As per the GitHub Flavored Markdown spec, it's a superset of CommonMark with the following additional features ("extensions"):
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The rendering mechanism used by both PyPI and Warehouse does not rely on the renderer to emit safe HTML and we do our own strict HTML sanitization after it has been rendered. Theoretically we could have an option for raw HTML as a long_description and it shouldn't provide any avenue for attack. The list of whitelisting tags/attributes can be found at readme_renderer/clean.py#L24-48. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like neither There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can of course add things to the list where needed as well! The current list is basically compiled by adding things whenever we found something that didn't render correctly on PyPI that we felt was a reasonable thing to want to have rendered. I didn't exhaustively look through every HTML tag and make a determination on it. |
||
the variant of Markdown in use, such as: | ||
|
||
- ``CommonMark`` for `CommonMark` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The last backtick is causing incorrect rendering There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Fixed in 4b17b0b. |
||
<https://tools.ietf.org/html/rfc7764#section-3.5>`_ | ||
|
||
- ``GFM`` for `GitHub Flavored Markdown (GFM) | ||
<https://tools.ietf.org/html/rfc7764#section-3.2>`_ | ||
|
||
- ``Original`` for `Gruber's original Markdown syntax | ||
<https://tools.ietf.org/html/rfc7763#section-6.1.4>`_ | ||
|
||
Example:: | ||
|
||
Description-Content-Type: text/plain; charset=UTF-8 | ||
|
||
Example:: | ||
|
||
Description-Content-Type: text/x-rst; charset=UTF-8 | ||
|
||
Example:: | ||
|
||
Description-Content-Type: text/markdown; charset=UTF-8; variant=CommonMark | ||
|
||
Example:: | ||
|
||
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM | ||
|
||
Example:: | ||
|
||
Description-Content-Type: text/markdown; charset=UTF-8; variant=Original | ||
|
||
If a ``Description-Content-Type`` is not specified or it's set to an | ||
unrecognized value, then the assumed content type is ``text/x-rst; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is probably wrong... slightly. My assumption is that PyPI will, if an explicit content-type has been picked, will error if it doesn't correctly render on upload (preventing the current problem of "oops, I accidentally broke my long description"). This is different than just assuming My suggestion would be to say:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! I used your wording in a8c3b77. Let me know if you have other comments. |
||
charset=UTF-8``. | ||
|
||
If the ``charset`` is not specified or it's set to an unrecognized value, then | ||
the assumed ``charset`` is ``UTF-8``. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lines 92–94 above merely recommend that implementations assume the charset is UTF-8, but this line seems to require that they do so. You should resolve the contradiction one way or the other. I'd prefer if UTF-8 is required to be the default for simplicity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd be all for UTF-8 only. What do others think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm +1 on saying UTF-8 if otherwise unspecified, but I think it's potentially useful to allow an alternative user-specified encoding. I doubt it'll be used a lot, but it's consistent with similar MIME-type standards. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in a3c07c5. Thanks! |
||
|
||
If the subtype is ``markdown`` and ``variant`` is not specified or it's set to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line duplicates what's said above. |
||
an unrecognized value, then the assumed ``variant`` is ``CommonMark``. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe include the case of a missing charset here? Something like
I'm not sure I see much benefit in distinguishing between "charset isn't given but the implementation should treat it as UTF-8" and "when charset isn't given, it defaults to UTF-8". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, looks fine to me, thanks. |
||
|
||
Version Specifiers | ||
================== | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"stating the markup syntax (if any) used in" might be clearer than "containing the format of"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Changed to that in b371449.