title: Changelog toc_depth: 2
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning. See the Contributing Guide for details.
- DRY fix in
abbr
extension by introducing methodcreate_element
(#1483).
- Backslash Unescape IDs set via
attr_list
ontoc
(#1493).
A new AbbrTreeprocessor
has been introduced, which replaces the now deprecated
AbbrInlineProcessor
. Abbreviation processing now happens after Attribute Lists,
avoiding a conflict between the two extensions (#1460).
The AbbrPreprocessor
class has been renamed to AbbrBlockprocessor
, which
better reflects what it is. AbbrPreprocessor
has been deprecated.
A call to Markdown.reset()
now clears all previously defined abbreviations.
Abbreviations are now sorted by length before executing AbbrTreeprocessor
to ensure that multi-word abbreviations are implemented even if an abbreviation
exists for one of those component words. (#1465)
Abbreviations without a definition are now ignored. This avoids applying abbr tags to text without a title value.
Added an optional glossary
configuration option to the abbreviations extension.
This provides a simple and efficient way to apply a dictionary of abbreviations
to every page.
Abbreviations can now be disabled by setting their definition to ""
or ''
.
This can be useful when using the glossary
option.
- Fixed links to source code on GitHub from the documentation (#1453).
- All postprocessors are now run on heading content.
- Footnote references are now stripped from heading content. Fixes #660.
- A more robust
striptags
is provided to convert headings to plain text. Unlike, themarkupsafe
implementation, HTML entities are not unescaped. - The plain text
name
, richhtml
, and unescaped rawdata-toc-label
are saved totoc_tokens
, allowing users to access the full rich text content of the headings directly fromtoc_tokens
. - The value of
data-toc-label
is sanitized separate from heading content before being written toname
. This fixes a bug which allowed markup through in certain circumstances. To access the raw unsanitized data, retrieve the value fromtoken['data-toc-label']
directly. - An
html.unescape
call is made just prior to callingslugify
so thatslugify
only operates on Unicode characters. Note thathtml.unescape
is not run onname
,html
, ordata-toc-label
. - The functions
get_name
andstashedHTML2text
defined in thetoc
extension are both deprecated. Instead, third party extensions should use some combination of the new functionsrun_postprocessors
,render_inner_html
andstriptags
.
- Include
scripts/*.py
in the generated source tarballs (#1430). - Ensure lines after heading in loose list are properly detabbed (#1443).
- Give smarty tree processor higher priority than toc (#1440).
- Permit carets (
^
) and square brackets (]
) but explicitly exclude backslashes (\
) from abbreviations (#1444). - In attribute lists (
attr_list
,fenced_code
), quoted attribute values are now allowed to contain curly braces (}
) (#1414).
- Fix type annotations for
convertFile
- it accepts only bytes-based buffers. Also remove legacy checks from Python 2 (#1400) - Remove legacy import needed only in Python 2 (#1403)
- Fix typo that left the attribute
AdmonitionProcessor.content_indent
unset (#1404) - Fix edge-case crash in
InlineProcessor
withAtomicString
(#1406). - Fix edge-case crash in
codehilite
with an emptycode
tag (#1405). - Improve and expand type annotations in the code base (#1401).
- Fix handling of bogus comments (#1425).
- Fix a performance problem with HTML extraction where large HTML input could trigger quadratic line counting behavior (#1392).
- Improve and expand type annotations in the code base (#1394).
A new boolean option permalink_leading
controls the position of the permanent
link anchors generated with permalink
. Setting permalink_leading
to True
will cause the links to be inserted at the start of the header, before any other
header content. The default behavior for permalink
is to append permanent
links to the header, placing them after all other header content.
- Add support for cPython version 3.12 (and PyPy 3.10) and drop support for Python version 3.7 (#1357).
- Refactor changelog to use the format defined at https://keepachangelog.com/.
- Update the list of empty HTML tags (#1353).
- Add customizable TOC title class to TOC extension (#1293).
- Add API documentation of the code base which is generated by mkdocstrings (#1220).
- Fix a corner case in admonitions where if an indented code block was provided as the first block, the output would be malformed (#1329).
- Add a special case for initial
's
to smarty extension (#1305). - Unescape any backslash escaped inline raw HTML (#1358).
- Unescape backslash escaped TOC token names (#1360).
- Restore console script (#1327).
- Officially support Python 3.11.
- Improve standalone * and _ parsing (#1300).
- Consider
<html>
HTML tag a block-level element (#1309). - Switch from
setup.py
topyproject.toml
.
- Fix an import issue with
importlib.util
(#1274).
The HTML4 spec
specifically deprecates the use of the align
attribute and it does not appear
at all in the HTML5
spec.
Therefore, by default, the tables extension will now use
the style
attribute (setting just the text-align
property) in td
and th
blocks.
The former behavior is available by setting the use_align_attribute
configuration option to True
when enabling the extension.
For example, to configure the old align
behavior:
from markdown.extensions.tables import TableExtension
markdown.markdown(src, extensions=[TableExtension(use_align_attribute=True)])
Unescaping backslash escapes has been moved to a Treeprocessor, which enables
proper HTML escaping during serialization. However, it is recognized that
various third-party extensions may be calling the old class at
postprocessors.UnescapePostprocessor
. Therefore, the old class remains in the
code base, but has been deprecated and will be removed in a future release. The
new class treeprocessors.UnescapeTreeprocessor
should be used instead.
Various objects were deprecated in version 3.0 and began raising deprecation warnings (see the version 3.0 release notes for details). Any of those objects which remained in version 3.3 have been removed from the code base in version 3.4 and will now raise errors. The relevant objects are listed below.
Deprecated Object | Replacement Object |
---|---|
markdown.version |
markdown.__version__ |
markdown.version_info |
markdown.__version_info__ |
markdown.util.etree |
xml.etree.ElementTree |
markdown.util.string_type |
str |
markdown.util.text_type |
str |
markdown.util.int2str |
chr |
markdown.util.iterrange |
range |
markdown.util.isBlockLevel |
markdown.Markdown().is_block_level |
markdown.util.Processor().markdown |
markdown.util.Processor().md |
markdown.util.Registry().__setitem__ |
markdown.util.Registry().register |
markdown.util.Registry().__delitem__ |
markdown.util.Registry().deregister |
markdown.util.Registry().add |
markdown.util.Registry().register |
In addition, the md_globals
parameter of
Markdown.extensions.Extension.extendMarkdown()
is no longer recognized as a
valid parameter and will raise an error if provided.
-
Some new configuration options have been added to the footnotes extension (#1218):
-
Small refactor of the
BACKLINK_TITLE
option; The use offormat()
instead of "old"%d
formatter allows one to specify text without the need to have the number of the footnote in it (like footnotes on Wikipedia for example). The modification is backward compatible so no configuration change is required. -
Addition of a new option
SUPERSCRIPT_TEXT
that allows one to specify a custom placeholder for the footnote itself in the text. Ex:[{}]
will give<sup>[1]</sup>
,({})
will give<sup>(1)</sup>
, or by default, the current behavior:<sup>1</sup>
.
-
-
The Table of Contents extension now accepts a
toc_class
parameter which can be used to set the CSS class(es) on the<div>
that contains the Table of Contents (#1224). -
The CodeHilite extension now supports a
pygments_formatter
option that can be set to a custom formatter class (#1187).- If
pygments_formatter
is set to a string (ex:'html'
), Pygments' default formatter by that name is used. - If
pygments_formatter
is set to a formatter class (or any callable which returns a formatter instance), then an instance of that class is used.
The formatter class is now passed an additional option,
lang_str
, to denote the language of the code block (#1258). While Pygments' built-in formatters will ignore the option, a custom formatter assigned to thepygments_formatter
option can make use of thelang_str
to include the code block's language in the output. - If
- Extension entry-points are only loaded if needed (#1216).
- Added additional checks to the
<pre><code>
handling ofPrettifyTreeprocessor
(#1261, #1263). - Fix XML deprecation warnings.
- Disallow square brackets in reference link ids (#1209).
- Retain configured
pygments_style
after first code block (#1240). - Ensure fenced code attributes are properly escaped (#1247).
- Fix a dependency issue (#1195, #1196).
- Make the
slugify_unicode
function not remove diacritical marks (#1118). - Fix
[toc]
detection when used withnl2br
extension (#1160). - Re-use compiled regex for block level checks (#1169).
- Don't process shebangs in fenced code blocks when using CodeHilite (#1156).
- Improve email address validation for Automatic Links (#1165).
- Ensure
<summary>
tags are parsed correctly (#1079). - Support Python 3.10 (#1124).
- Properly parse unclosed tags in code spans (#1066).
- Properly parse processing instructions in md_in_html (#1070).
- Properly parse code spans in md_in_html (#1069).
- Preserve text immediately before an admonition (#1092).
- Simplified regex for HTML placeholders (#928) addressing (#932).
- Ensure
permalinks
andanchorlinks
are not restricted bytoc_depth
(#1107). - Fix corner cases with lists under admonitions (#1102).
- Unify all block-level tags (#1047).
- Fix issue where some empty elements would have text rendered as
None
when usingmd_in_html
(#1049). - Avoid catastrophic backtracking in
hr
regex (#1055). - Fix
hr
HTML handling (#1053).
- Properly parse inline HTML in md_in_html (#1040 & #1045).
- Avoid crashing when md_in_html fails (#1040).
- Correctly parse raw
script
andstyle
tags (#1036). - Ensure consistent class handling by
fenced_code
andcodehilite
(#1032).
The HTML5
spec
recommends that the class defining the language of a code block be prefixed with
language-
. Therefore, by default, both the
fenced_code and
codehilite extensions now prepend the prefix when
code highlighting is disabled.
If you have previously been including the prefix manually in your fenced code blocks, then you will not want a second
instance of the prefix. Similarly, if you are using a third party syntax highlighting tool which does not recognize
the prefix, or requires a different prefix, then you will want to redefine the prefix globally using the lang_prefix
configuration option of either the fenced_code
or codehilite
extensions.
For example, to configure fenced_code
to not apply any prefix (the previous behavior), set the option to an empty string:
from markdown.extensions.fenced_code import FencedCodeExtension
markdown.markdown(src, extensions=[FencedCodeExtension(lang_prefix='')])
!!! note When code highlighting is enabled, the output from Pygments is used unaltered. Currently, Pygments does not provide an option to include the language class in the output, let alone prefix it. Therefore, any language prefix is only applied when syntax highlighting is disabled.
Empty curly braces are now completely ignored by the Attribute List extension. Previously, the extension would recognize them as attribute lists and remove them from the document. Therefore, it is no longer necessary to backslash escape a set of curly braces which are empty or only contain whitespace.
Despite not being documented, previously an attribute list could be defined anywhere within a table cell and get
applied to the cell (<td>
element). Now the attribute list must be defined at the end of the cell content and must
be separated from the rest of the content by at least one space. This makes it easy to differentiate between attribute
lists defined on inline elements within a cell and the attribute list for the cell itself. It is also more consistent
with how attribute lists are defined on other types of elements.
The extension has also added support for defining attribute lists on table header cells (<th>
elements) in the same
manner as data cells (<td>
elements).
In addition, the documentation for the extensions received an overhaul. The features (#987) and limitations (#965) of the extension are now fully documented.
-
All Pygments' options are now available for syntax highlighting (#816).
- The Codehilite extension now accepts any options which Pygments supports as global configuration settings on the extension.
- Fenced Code Blocks will accept any of the same options on individual code blocks.
- Any of the previously supported aliases to Pygments' options continue to be supported at this time. However, it is recommended that the Pygments option names be used directly to ensure continued compatibility in the future.
-
Fenced Code Blocks now work with Attribute Lists when syntax highlighting is disabled. Any random HTML attribute can be defined and set on the
<code>
tag of fenced code blocks when theattr_list
extension is enabled (#816). -
The HTML parser has been completely replaced. The new HTML parser is built on Python's
html.parser.HTMLParser
, which alleviates various bugs and simplify maintenance of the code (#803, #830). -
The Markdown in HTML extension has been rebuilt on the new HTML Parser, which drastically simplifies it. Note that raw HTML elements with a
markdown
attribute defined are now converted to ElementTree Elements and are rendered by the serializer. Various bugs have been fixed (#803, #595, #780, and #1012). -
Link reference parsing, abbreviation reference parsing and footnote reference parsing has all been moved from
preprocessors
toblockprocessors
, which allows them to be nested within other block level elements. Specifically, this change was necessary to maintain the current behavior in the rebuilt Markdown in HTML extension. A few random edge-case bugs (see the included tests) were resolved in the process (#803). -
An alternate function
markdown.extensions.headerid.slugify_unicode
has been included with the Table of Contents extension which supports Unicode characters in table of contents slugs. The oldmarkdown.extensions.headerid.slugify
method which removes non-ASCII characters remains the default. Import and passmarkdown.extensions.headerid.slugify_unicode
to theslugify
configuration option to use the new behavior. -
Support was added for Python 3.9 and dropped for Python 3.5.
- Document how to pass configuration options to Extra (#1019).
- Fix HR which follows strong em (#897).
- Support short reference image links (#894).
- Avoid a
RecursionError
from deeply nested blockquotes (#799). - Fix issues with complex emphasis (#979).
- Fix unescaping of HTML characters
<>
in CodeHilite (#990). - Fix complex scenarios involving lists and admonitions (#1004).
- Fix complex scenarios with nested ordered and unordered lists in a definition list (#918).
- Add
checklinks
tox environment to ensure all links in documentation are good. - Refactor extension API documentation (#729).
- Load entry_points (for extensions) only once using
importlib.metadata
. - Do not double escape entities in TOC.
- Correctly report if an extension raises a
TypeError
(#939). - Raise a
KeyError
when attempting to delete a nonexistent key from the extension registry (#939). - Remove import of
packaging
(orpkg_resources
fallback) entirely. - Remove
setuptools
as a run-time dependency (install_required
).
- The
name
property intoc_tokens
from the TOC extension now escapes HTML special characters (<
,>
, and&
).
Python 2.7 reaches end-of-life on 2020-01-01 and Python-Markdown 3.2 has dropped support for it. Please upgrade to Python 3, or use Python-Markdown 3.1.
In order to fix issue #792, em
/strong
inline processors were refactored. This
translated into removing many of the existing inline processors that handled this
logic:
em_strong
strong
emphasis
strong2
emphasis
These processors were replaced with two new ones:
em_strong
em_strong2
The legacy_em
extension was also modified with new,
refactored logic and simply overrides the em_strong2
inline processor.
Before, the HTML generated by CodeHilite looked like:
<pre><code>foo = 'bar'</code></pre>
if you were not using Pygments.<pre>foo = 'bar'</pre>
if you were using Pygments.
To make the cases more consistent (and adhere to many Markdown specifications and
HTML code block markup suggestions), CodeHilite will now always additionally wrap
code with <code>
tags. See #862 for more details.
This change does not alter the Python-Markdown API, but users relying on the old markup will find their output now changed.
Internally, this change relies on the Pygments 2.4, so you must be using at least that version to see this effect. Users with earlier Pygments versions will continue to see the old behavior.
Previously, Python-Markdown was using either the xml.etree.cElementTree
module
or the xml.etree.ElementTree
module, based on their availability. In modern
Python versions, the former is a deprecated alias for the latter. Thus, the
compatibility layer is deprecated and extensions are advised to use
xml.etree.ElementTree
directly. Importing markdown.util.etree
will raise
a DeprecationWarning
beginning in version 3.2 and may be removed in a future
release.
Therefore, extension developers are encouraged to replace
from markdown.util import etree
with
import xml.etree.ElementTree as etree
in their code.
-
Some new configuration options have been added to the toc extension:
-
The
anchorlink_class
andpermalink_class
options allow class(es) to be assigned to theanchorlink
andpermalink
respectively. This allows using icon fonts from CSS for the links. Therefore, an empty string passed topermalink
now generates an emptypermalink
. Previously nopermalink
would have been generated. (#776) -
The
permalink_title
option allows the title attribute of apermalink
to be set to something other than the default English stringPermanent link
. (#877)
-
-
Document thread safety (#812).
-
Markdown parsing in HTML has been exposed via a separate extension called
md_in_html
. -
Add support for Python 3.8.
- HTML tag placeholders are no longer included in
.toc_tokens
(#899). - Unescape backslash-escaped characters in TOC ids (#864).
- Refactor bold and italic logic in order to solve complex nesting issues (#792).
- Always wrap CodeHilite code in
code
tags (#862).
- Fixed import failure in
setup.py
when the source directory is not onsys.path
(#823). - Prefer public
packaging
module to pkg_resources' private copy of it (#825).
Historically, version numbers were acquired via the attributes
markdown.version
and markdown.version_info
. As of 3.0, a more standardized
approach is being followed and versions are acquired via the
markdown.__version__
and markdown.__version_info__
attributes. As of 3.1
the legacy attributes will raise a DeprecationWarning
if they are accessed. In
a future release the legacy attributes will be removed.
-
A Contributing Guide has been added (#732).
-
A new configuration option to set the footnote separator has been added. Also, the
rel
andrev
attributes have been removed from footnotes as they are not valid in HTML5. Therefs
andbackrefs
classes already exist and serve the same purpose (#723). -
A new option for
toc_depth
to set not only the bottom section level, but also the top section level. A string consisting of two digits separated by a hyphen in between ("2-5"
), defines the top (t
) and the bottom (b
) (<ht>..<hb>
). A single integer still defines the bottom section level (<h1>..<hb>
) only. (#787).
- Update CLI to support PyYAML 5.1.
- Overlapping raw HTML matches no longer leave placeholders behind (#458).
- Emphasis patterns now recognize newline characters as whitespace (#783).
- Version format had been updated to be PEP 440 compliant (#736).
- Block level elements are defined per instance, not as class attributes (#731).
- Double escaping of block code has been eliminated (#725).
- Problems with newlines in references has been fixed (#742).
- Escaped
#
are now handled in header syntax (#762).
- Brought back the
version
andversion_info
variables (#709). - Added support for hexadecimal HTML entities (#712).
The enable_attributes
keyword is deprecated in version 3.0 and will be
ignored. Previously the keyword was True
by default and enabled an
undocumented way to define attributes on document elements. The feature has been
removed from version 3.0. As most users did not use the undocumented feature, it
should not affect most users. For the few who did use the feature, it can be
enabled by using the Legacy Attributes
extension.
The smart_emphasis
keyword is deprecated in version 3.0 and will be ignored.
Previously the keyword was True
by default and caused the parser to ignore
middle-word emphasis. Additionally, the optional smart_strong
extension
provided the same behavior for strong emphasis. Both of those features are now
part of the default behavior, and the Legacy
Emphasis extension is available to disable that
behavior.
The output_formats
keyword now only accepts two options: html
and xhtml
Note that if (x)html1
, (x)html4
or (x)html5
are passed in, the number is
stripped and ignored.
Both safe_mode
and the associated html_replacement_text
keywords are
deprecated in version 3.0 and will be ignored. The so-called "safe mode" was
never actually "safe" which has resulted in many people having a false sense of
security when using it. As an alternative, the developers of Python-Markdown
recommend that any untrusted content be passed through an HTML sanitizer (like
Bleach) after being converted to HTML by
markdown. In fact, Bleach
Whitelist provides a curated list
of tags, attributes, and styles suitable for filtering user-provided HTML using
bleach.
If your code previously looked like this:
html = markdown.markdown(text, safe_mode=True)
Then it is recommended that you change your code to read something like this:
import bleach
from bleach_whitelist import markdown_tags, markdown_attrs
html = bleach.clean(markdown.markdown(text), markdown_tags, markdown_attrs)
If you are not interested in sanitizing untrusted text, but simply desire to escape raw HTML, then that can be accomplished through an extension which removes HTML parsing:
from markdown.extensions import Extension
class EscapeHtml(Extension):
def extendMarkdown(self, md):
md.preprocessors.deregister('html_block')
md.inlinePatterns.deregister('html')
html = markdown.markdown(text, extensions=[EscapeHtml()])
As the HTML would not be parsed with the above Extension, then the serializer
will escape the raw HTML, which is exactly what happened in previous versions
with safe_mode="escape"
.
Positional arguments on the markdown.Markdown()
class are deprecated as are
all except the text
argument on the markdown.markdown()
wrapper function.
Using positional arguments will raise an error. Only keyword arguments should be
used. For example, if your code previously looked like this:
html = markdown.markdown(text, [SomeExtension()])
Then it is recommended that you change it to read something like this:
html = markdown.markdown(text, extensions=[SomeExtension()])
!!! Note
This change is being made as a result of deprecating "safe_mode"
as the
safe_mode
argument was one of the positional arguments. When that argument
is removed, the two arguments following it will no longer be at the correct
position. It is recommended that you always use keywords when they are
supported for this reason.
In previous versions of Python-Markdown, the built-in extensions received
special status and did not require the full path to be provided. Additionally,
third party extensions whose name started with "mdx_"
received the same
special treatment. This is no longer the case.
Support has been added for extensions to define an entry
point. An entry point is a string name which
can be used to point to an Extension
class. The built-in extensions now have
entry points which match the old short names. And any third-party extensions
which define entry points can now get the same behavior. See the documentation
for each specific extension to find the assigned name.
If an extension does not define an entry point, then the full path to the extension must be used. See the documentation for a full explanation of the current behavior.
The previously documented method of appending the extension configuration
options as a string to the extension name is deprecated and will raise an error.
The extension_configs
keyword should be
used instead. See the documentation for a
full explanation of the current behavior.
The HeaderId Extension is deprecated and will raise an error if specified. Use the Table of Contents Extension instead, which offers most of the features of the HeaderId Extension and more (support for meta data is missing).
Extension authors who have been using the slugify
and unique
functions
defined in the HeaderId Extension should note that those functions are now
defined in the Table of Contents extension and should adjust their import
statements accordingly (from markdown.extensions.toc import slugify, unique
).
All processors and patterns now get "registered" to a
Registry. A backwards compatible shim is
included so that existing simple extensions should continue to work.
A DeprecationWarning
will be raised for any code which calls the old API.
Previously, instances of the Markdown
class were represented as any one of
md
, md_instance
, or markdown
. This inconsistency made it difficult when
developing extensions, or just maintaining the existing code. Now, all instances
are consistently represented as md
.
The old attributes on class instances still exist, but raise a
DeprecationWarning
when accessed. Also on classes where the instance was
optional, the attribute always exists now and is simply None
if no instance
was provided (previously the attribute would not exist).
The markdown.util.isBlockLevel
function is deprecated and will raise a
DeprecationWarning
. Instead, extensions should use the isBlockLevel
method
of the Markdown
class instance. Additionally, a list of block level elements
is defined in the block_level_elements
attribute of the Markdown
class which
extensions can access to alter the list of elements which are treated as block
level elements.
Previously, the extendMarkdown
method of a markdown.extensions.Extension
subclasses accepted an md_globals
keyword, which contained the value returned
by Python's globals()
built-in function. As all of the configuration is now
held within the Markdown
class instance, access to the globals is no longer
necessary and any extensions which expect the keyword will raise a
DeprecationWarning
. A future release will raise an error.
Historically, version numbers were acquired via the attributes
markdown.version
and markdown.version_info
. Moving forward, a more
standardized approach is being followed and versions are acquired via the
markdown.__version__
and markdown.__version_info__
attributes. The legacy
attributes are still available to allow distinguishing versions between the
legacy Markdown 2.0 series and the Markdown 3.0 series, but in the future the
legacy attributes will be removed.
A new InlineProcessor
class handles inline processing much better and allows
for more flexibility. The new InlineProcessor
classes no longer utilize
unnecessary pretext and post-text captures. New class can accept the buffer that
is being worked on and manually process the text without regular expressions and
return new replacement bounds. This helps us to handle links in a better way and
handle nested brackets and logic that is too much for regular expression.
-
A new testing framework is included as a part of the Markdown library, which can also be used by third party extensions.
-
A new
toc_depth
parameter has been added to the Table of Contents Extension. -
A new
toc_tokens
attribute has been added to the Markdown class by the Table of Contents Extension, which contains the raw tokens used to build the Table of Contents. Users can use this to build their own custom Table of Contents rather than needing to parse the HTML available on thetoc
attribute of the Markdown class. -
When the Table of Contents Extension is used in conjunction with the Attribute Lists Extension and a
data-toc-label
attribute is defined on a header, the content of thedata-toc-label
attribute is now used as the content of the Table of Contents item for that header. -
Additional CSS class names can be appended to Admonitions.
For information on prior releases, see their changelogs: