Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstaged files might include characters which are auto escaped by git leading to wrong path names. #439

Open
buermarc opened this issue Feb 16, 2025 · 0 comments

Comments

@buermarc
Copy link

Problem:

Unstaged files might include characters which are auto escaped by git. They are not correctly handled in the GitDiffTool.unstaged method. This leads to diff_cover trying to open a file which does not exist.

$ man git-status

The fields (including the ->) are separated from each other by a single space. If a filename contains whitespace or other nonprintable characters, that field will be quoted in the
manner of a C string literal: surrounded by ASCII double quote (34) characters, and with interior special characters backslash-escaped.

General problem:

(venv) λ ~/repos/diff_cover/ main* git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        "\\"
        replicate.py

nothing added to commit but untracked files present (use "git add" to track)
(venv) λ ~/repos/diff_cover/ main* l
.rw-r--r--@   26 mulc 16 Feb 10:28  .coveragerc
.rw-r--r--@  336 mulc 16 Feb 10:28  .flake8
drwxr-xr-x@    - mulc 16 Feb 11:04  .git
.rw-r--r--@   71 mulc 16 Feb 10:28  .git-blame-ignore-revs
drwxr-xr-x@    - mulc 16 Feb 10:28  .github
.rw-r--r--@  428 mulc 16 Feb 10:28  .gitignore
.rw-r--r--@    5 mulc 16 Feb 10:38  .python-version
.rw-r--r--@    0 mulc 16 Feb 10:39  \
.rw-r--r--@  23k mulc 16 Feb 10:28  CHANGELOG
drwxr-xr-x@    - mulc 16 Feb 10:42  diff_cover
.rw-r--r--@  10k mulc 16 Feb 10:28  LICENSE
.rw-r--r--@  230 mulc 16 Feb 10:28  MANIFEST.in
.rw-r--r--@   68 mulc 16 Feb 10:28  NOTICE
.rw-r--r--@  46k mulc 16 Feb 10:28  poetry.lock
.rw-r--r--@ 3.5k mulc 16 Feb 10:28  pyproject.toml
.rw-r--r--@  17k mulc 16 Feb 10:28  README.rst
.rw-r--r--@  242 mulc 16 Feb 10:44  replicate.py
drwxr-xr-x@    - mulc 16 Feb 10:28  tests
drwxr-xr-x@    - mulc 16 Feb 10:42  venv
.rwxr-xr-x@  423 mulc 16 Feb 10:28  verify.sh
(venv) λ ~/repos/diff_cover/ main* bash verify.sh
All done! ✨ 🍰 ✨
34 files would be left unchanged.
Skipped 1 files
===================================================================================== test session starts ======================================================================================
platform linux -- Python 3.13.2, pytest-8.2.0, pluggy-1.5.0
rootdir: /home/mulc/repos/diff_cover
configfile: pyproject.toml
plugins: mock-3.14.0, cov-6.0.0, datadir-1.5.0
collected 306 items

...
---------- coverage: platform linux, python 3.13.2-final-0 -----------
Coverage XML written to file coverage.xml

============================================================================== 306 passed, 28 warnings in 14.93s ===============================================================================
diff-cover 9.2.2
diff-quality 9.2.2
Traceback (most recent call last):
  File "/home/mulc/repos/diff_cover/venv/bin/diff-cover", line 6, in <module>
    sys.exit(main())
             ~~~~^^
  File "/home/mulc/repos/diff_cover/diff_cover/diff_cover_tool.py", line 307, in main
    percent_covered = generate_coverage_report(
        arg_dict["coverage_file"],
    ...<14 lines>...
        expand_coverage_report=arg_dict["expand_coverage_report"],
    )
  File "/home/mulc/repos/diff_cover/diff_cover/diff_cover_tool.py", line 277, in generate_coverage_report
    reporter.generate_report(output_file)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/home/mulc/repos/diff_cover/diff_cover/report_generator.py", line 302, in generate_report
    report = template.render(self._context())
                             ~~~~~~~~~~~~~^^
  File "/home/mulc/repos/diff_cover/diff_cover/report_generator.py", line 351, in _context
    context = super().report_dict()
  File "/home/mulc/repos/diff_cover/diff_cover/report_generator.py", line 220, in report_dict
    src_stats = {src: self._src_path_stats(src) for src in self.src_paths()}
                                                           ~~~~~~~~~~~~~~^^
  File "/home/mulc/repos/diff_cover/diff_cover/report_generator.py", line 86, in src_paths
    for src, summary in self._diff_violations().items()
                        ~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/mulc/repos/diff_cover/diff_cover/report_generator.py", line 196, in _diff_violations
    src_paths_changed = self._diff.src_paths_changed()
  File "/home/mulc/repos/diff_cover/diff_cover/diff_reporter.py", line 174, in src_paths_changed
    num_lines = self._get_file_lines(path)
  File "/home/mulc/repos/diff_cover/diff_cover/diff_reporter.py", line 188, in _get_file_lines
    with open(path, encoding="utf-8") as file_handle:
         ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '"\\\\"'

Another way to replicate the problem:

(venv) λ ~/repos/diff_cover/ main* ll
Permissions Size User Date Modified Name
.rw-r--r--@    0 mulc 16 Feb 10:39   \
.rw-r--r--@  23k mulc 16 Feb 10:28   CHANGELOG
drwxr-xr-x@    - mulc 16 Feb 10:42   diff_cover
.rw-r--r--@  10k mulc 16 Feb 10:28   LICENSE
.rw-r--r--@  230 mulc 16 Feb 10:28   MANIFEST.in
.rw-r--r--@   68 mulc 16 Feb 10:28   NOTICE
.rw-r--r--@  46k mulc 16 Feb 10:28   poetry.lock
.rw-r--r--@ 3.5k mulc 16 Feb 10:28   pyproject.toml
.rw-r--r--@  17k mulc 16 Feb 10:28   README.rst
.rw-r--r--@  242 mulc 16 Feb 10:44   replicate.py
drwxr-xr-x@    - mulc 16 Feb 10:28   tests
drwxr-xr-x@    - mulc 16 Feb 10:42   venv
.rwxr-xr-x@  423 mulc 16 Feb 10:28   verify.sh
(venv) λ ~/repos/diff_cover/ main* git status  
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        "\\"
        replicate.py

nothing added to commit but untracked files present (use "git add" to track)
(venv) λ ~/repos/diff_cover/ main* cat replicate.py 
from diff_cover.git_diff import GitDiffTool
from pathlib import Path

tool = GitDiffTool(range_notation="unused", ignore_whitespace="unused")
untraced_files = tool.untracked()

for file in untraced_files:
    assert Path(file).exists(), file

(venv) λ ~/repos/diff_cover/ main* python replicate.py 
Traceback (most recent call last):
  File "/home/mulc/repos/diff_cover/replicate.py", line 8, in <module>
    assert Path(file).exists(), file
           ~~~~~~~~~~~~~~~~~^^
AssertionError: "\\"
(venv) λ ~/repos/diff_cover/ main*

A solution would be to check each file in the untracked method, and see if it starts and ends with the ASCII 34 character. If yes, we have to unescape the string. I found a SO thread which has some suggestions how to do that. [1] I am unsure which the best one is escpecially regarding the multiple encoding options, so I went with ast.literal_eval. There are likely other and better
ways to achieve this.

This was tested on linux, so I am unsure how e.g. windows behaves. I'll have to setup a dev environment on windows first, but I would not be suprised if there is some other behavior that totally make sense that I haven't considered :)

[1] https://stackoverflow.com/questions/1885181/how-to-un-escape-a-backslash-escaped-string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant