Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor codecs error handlers to use _PyUnicodeError_GetParams and extract complex logic into separate functions #129173

Open
picnixz opened this issue Jan 22, 2025 · 1 comment
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@picnixz
Copy link
Member

picnixz commented Jan 22, 2025

Feature or enhancement

Proposal:

I want to refactor the different codecs handlers in Python/codecs.c to use _PyUnicodeError_GetParams. Some codecs handlers will be refactored as part of #126004 but some others are not subject to issues (namely, the ignore, namereplace, surrogateescape, and surrogatepass handlers do not suffer from crashes, or at least I wasn't able to make them crash easily).

In addition, I also plan to split the handlers into functions instead of 2 or 3 big blocks of code handling a specific exception. For that reason, I will introduce the following helper macros:

#define _PyIsUnicodeEncodeError(EXC)    \
    PyObject_TypeCheck(EXC, (PyTypeObject *)PyExc_UnicodeEncodeError)
#define _PyIsUnicodeDecodeError(EXC)    \
    PyObject_TypeCheck(EXC, (PyTypeObject *)PyExc_UnicodeDecodeError)
#define _PyIsUnicodeTranslateError(EXC) \
    PyObject_TypeCheck(EXC, (PyTypeObject *)PyExc_UnicodeTranslateError)

For handlers that need to be fixed, I will first fix them in-place (no refactorization). Afterwards, I will refactor them and extract the relevant part of the code into functions. That way, the diff will be easier to follow (I've observed that it's much harder to read the diff where I did both so I will revert that part in the existing PRs; EDIT: actually there is no PR doing both fixes and split...).

I'm creating this issue to track the progression of the refactorization if no issue occurs.

cc @vstinner @encukou

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@picnixz
Copy link
Member Author

picnixz commented Jan 22, 2025

The merge plan is as follows:

picnixz added a commit that referenced this issue Jan 24, 2025
…129174)

We also cleanup `PyCodec_StrictErrors` and the error message rendered
when an object of incorrect type is passed to codec error handlers.
@picnixz picnixz changed the title Refactor codecs error handlers to use _PyUnicodeError_GetParams Refactor codecs error handlers to use _PyUnicodeError_GetParams and extract logic into separate functions Feb 9, 2025
@picnixz picnixz changed the title Refactor codecs error handlers to use _PyUnicodeError_GetParams and extract logic into separate functions Refactor codecs error handlers to use _PyUnicodeError_GetParams and extract complex logic into separate functions Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
Development

No branches or pull requests

1 participant