Skip to content

Commit

Permalink
gh-118761: Improve the import time of gettext (#128898)
Browse files Browse the repository at this point in the history
``gettext`` is often imported in programs that may not end up translating
anything. In fact, the ``struct`` module already has a delayed import when
parsing ``GNUTranslations`` to speed up the no ``.mo`` files case. The re module
is also used in the same situation, but behind a function chain only
called by ``GNUTranslations``.

Cache the compiled regex globally the first time it is used. The
finditer function is converted to a method call on the compiled
object which is slightly more efficient, and necessary for the
delayed re import.
  • Loading branch information
eli-schwartz authored Jan 20, 2025
1 parent bbeb219 commit c9c9fcb
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 15 deletions.
33 changes: 18 additions & 15 deletions Lib/gettext.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@

import operator
import os
import re
import sys


Expand All @@ -70,22 +69,26 @@
# https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
# http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y

_token_pattern = re.compile(r"""
(?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs
(?P<NUMBER>[0-9]+\b) | # decimal integer
(?P<NAME>n\b) | # only n is allowed
(?P<PARENTHESIS>[()]) |
(?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >,
# <=, >=, ==, !=, &&, ||,
# ? :
# unary and bitwise ops
# not allowed
(?P<INVALID>\w+|.) # invalid token
""", re.VERBOSE|re.DOTALL)

_token_pattern = None

def _tokenize(plural):
for mo in re.finditer(_token_pattern, plural):
global _token_pattern
if _token_pattern is None:
import re
_token_pattern = re.compile(r"""
(?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs
(?P<NUMBER>[0-9]+\b) | # decimal integer
(?P<NAME>n\b) | # only n is allowed
(?P<PARENTHESIS>[()]) |
(?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >,
# <=, >=, ==, !=, &&, ||,
# ? :
# unary and bitwise ops
# not allowed
(?P<INVALID>\w+|.) # invalid token
""", re.VERBOSE|re.DOTALL)

for mo in _token_pattern.finditer(plural):
kind = mo.lastgroup
if kind == 'WHITESPACES':
continue
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Reduce import time of :mod:`gettext` by up to ten times, by importing
:mod:`re` on demand. In particular, ``re`` is no longer implicitly
exposed as ``gettext.re``. Patch by Eli Schwartz.

0 comments on commit c9c9fcb

Please sign in to comment.