Skip to content

Commit

Permalink
Fix a bug with reading non-UTF-8 encoded files (#192)
Browse files Browse the repository at this point in the history
  • Loading branch information
jsh9 authored Dec 20, 2024
1 parent 3e5b304 commit dfd9494
Show file tree
Hide file tree
Showing 7 changed files with 24 additions and 3 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,6 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# macOS stuff
.DS_Store
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# Change Log

## [unpublished]
## [0.5.13] - 2024-12-20

- Fixed

- Fixed a bug where assigning a dict value (such as `abc['something'] = 123`)
would result in EdgeCaseError
- Fixed a bug where non-UTF-8 encoded files would crash _pydoclint_

- Full diff
- https://github.com/jsh9/pydoclint/compare/0.5.12...0.5.13

## [0.5.12] - 2024-12-15

Expand Down
5 changes: 4 additions & 1 deletion pydoclint/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -638,7 +638,10 @@ def _checkFile(
if not filename.is_file(): # sometimes folder names can end with `.py`
return []

with open(filename, encoding='utf8') as fp:
with open(filename, encoding='utf-8', errors='replace') as fp:
# Note: errors='replace' would replace unrecognized characters with
# question marks. This may not be a perfect solution, but for
# not this may be good enough.
src: str = ''.join(fp.readlines())

tree: ast.Module = ast.parse(src)
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = pydoclint
version = 0.5.12
version = 0.5.13
description = A Python docstring linter that checks arguments, returns, yields, and raises sections
long_description = file: README.md
long_description_content_type = text/markdown
Expand Down
4 changes: 4 additions & 0 deletions tests/data/edge_cases/19_file_encoding/nonascii.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# coding: iso-8859-5
# (Unlikely to be the default encoding for most testers.)
# ±¶ÿàáâãäåæçèéêëìíîï <- Cyrillic characters
u = '®âðÄ'
4 changes: 4 additions & 0 deletions tests/data/edge_cases/19_file_encoding/nonascii2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# coding: iso-8859-5
# (Unlikely to be the default encoding for most testers.)
# БЖџрстуфхцчшщъыьэюя <- Cyrillic characters
'Ўт№Ф'
2 changes: 2 additions & 0 deletions tests/test_main.py
Original file line number Diff line number Diff line change
Expand Up @@ -1531,6 +1531,8 @@ def testNonAscii() -> None:
],
),
('18_assign_to_subscript/case.py', {}, []),
('19_file_encoding/nonascii.py', {}, []), # from: https://github.com/ipython/ipython/blob/0334d9f71e7a97394a73c15c663ca50d65df62e1/IPython/core/tests/nonascii.py
('19_file_encoding/nonascii2.py', {}, []), # from: https://github.com/ipython/ipython/blob/0334d9f71e7a97394a73c15c663ca50d65df62e1/IPython/core/tests/nonascii2.py
],
)
def testEdgeCases(
Expand Down

0 comments on commit dfd9494

Please sign in to comment.