Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple dictionaries (add default one) for dictionary config setting #2727

Closed
yarikoptic opened this issue Feb 6, 2023 · 8 comments · Fixed by #2767
Closed

Multiple dictionaries (add default one) for dictionary config setting #2727

yarikoptic opened this issue Feb 6, 2023 · 8 comments · Fixed by #2767

Comments

@yarikoptic
Copy link
Contributor

we have

❯ grep dictionary .codespellrc
dictionary = .codespell_dict

in https://github.com/bids-standard/bids-specification/blob/HEAD/.codespellrc but that disables default dictionaries, and I need to specify -D- to get them used:

❯ codespell
❯ codespell -D-
./DECISION-MAKING.md:17: Curent ==> Current
./src/common-principles.md:188: entites ==> entities

Unfortunately, -D is the different style than e.g. --skip which expects them comma separated and -D doesn't work that way:

❯ git diff
diff --git a/.codespellrc b/.codespellrc
index 28aeafee..a513a1a7 100644
--- a/.codespellrc
+++ b/.codespellrc
@@ -2,5 +2,5 @@
 skip = *.js,*.svg,*.eps,.git
 ignore-words-list = fo,te,als
 builtin = clear,rare
-dictionary = .codespell_dict
+dictionary = -,.codespell_dict
 exclude-file = src/CHANGES.md
❯ codespell
...
codespell: error: argument -D/--dictionary: expected one argument

so the question is how could I keep using default dictionaries + custom one while relying on specification in the config file?

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Feb 7, 2023

Let's start with command line options, we'll look into configuration files later. The output of --help reads:

  -D DICTIONARY, --dictionary DICTIONARY
                        custom dictionary file that contains spelling corrections. If this flag is not specified or equals "-"
                        then the default dictionary is used. This option can be specified multiple times.

I would expect -D - -D .codespell_dict to work. Can you check it does?

If it does work, chances are this works in the configuration file:

dictionary = -
dictionary = .codespell_dict

@yarikoptic
Copy link
Contributor Author

Thank you @DimitriPapadopoulos for the rapid follow up -- sorry I missed it .

-D- -D .codespell_dict -- works!
❯ codespell -D- -D .codespell_dict
./README.md:63: folders ==> directories
./README.md:64: sub-directory ==> subdirectory
./README.md:65: zumba ==> zebra
./DECISION-MAKING.md:17: Curent ==> Current
./src/common-principles.md:188: entites ==> entities
❯ codespell -D-
./DECISION-MAKING.md:17: Curent ==> Current
./src/common-principles.md:188: entites ==> entities
❯ cat .codespell_dict
sub-directory->subdirectory
sub-directories->subdirectories
file name->filename
file names->filenames
folder->directory
folders->directories
zumba->zebra
Two lines in config file -- does not!
❯ grep dictionary .codespellrc
dictionary = -
dictionary = .codespell_dict
changes on filesystem:                                                                                                                                       
 .codespellrc | 1 +
❯ codespell --version
2.2.2
changes on filesystem:                                                                                                                                       
 .codespellrc | 1 +
❯ codespell
Traceback (most recent call last):
  File "/usr/bin/codespell", line 33, in <module>
    sys.exit(load_entry_point('codespell==0.0.0', 'console_scripts', 'codespell')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/codespell_lib/_codespell.py", line 767, in _script_main
    return main(*sys.argv[1:])
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/codespell_lib/_codespell.py", line 772, in main
    options, parser = parse_options(args)
                      ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/codespell_lib/_codespell.py", line 427, in parse_options
    config.read(cfg_files)
  File "/usr/lib/python3.11/configparser.py", line 712, in read
    self._read(fp, filename)
  File "/usr/lib/python3.11/configparser.py", line 1111, in _read
    raise DuplicateOptionError(sectname, optname,
configparser.DuplicateOptionError: While reading from '.codespellrc' [line  6]: option 'dictionary' in section 'codespell' already exists

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Feb 23, 2023

While configparser raises the DuplicationError, it also provides a strict option to customize parser behaviour:

  • strict, default value: True
    When set to True, the parser will not allow for any section or option duplicates while reading from a single source (using read_file(), read_string() or read_dict()). It is recommended to use strict parsers in new applications.

Since this is not a new application, I shall propose to set strict to False, for consistency between commend line option -D and config file option dictionary.

We can think about adding support for multiple dictionaries separated by , in a later merge request.

@yarikoptic
Copy link
Contributor Author

no good. With

the patch adding strict=False
diff --git a/codespell_lib/_codespell.py b/codespell_lib/_codespell.py
index a082e6d1..8f975179 100644
--- a/codespell_lib/_codespell.py
+++ b/codespell_lib/_codespell.py
@@ -541,7 +541,7 @@ def parse_options(
     cfg_files = ["setup.cfg", ".codespellrc"]
     if options.config:
         cfg_files.append(options.config)
-    config = configparser.ConfigParser(interpolation=None)
+    config = configparser.ConfigParser(interpolation=None, strict=False)
 
     # Read toml before other config files.
     toml_files = []
@@ -571,7 +571,7 @@ def parse_options(
     # Collect which config files are going to be used
     used_cfg_files = []
     for cfg_file in cfg_files:
-        _cfg = configparser.ConfigParser()
+        _cfg = configparser.ConfigParser(strict=False)
         _cfg.read(cfg_file)
         if _cfg.has_section("codespell"):
             used_cfg_files.append(cfg_file)

it no longer crashes but it just takes the 2nd value as the value, not a list of values (ie dictionaries).

@DimitriPapadopoulos
Copy link
Collaborator

Right, configparser.ConfigParser.read_file does not raise a DuplicateOptionError any more, but appears to be silently overwriting old values instead of creating a list of values. I cannot find a relevant option in Customizing Parser Behaviour, actually the documentation does not specify the behaviour at all.

I need to dig in configparser, but if the behaviour is not specified in the documentation, I suspect whatever solution I try will be fragile. See for example:

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Feb 25, 2023

@larsoner I see multiple issues in codespell options:

  1. Some command line options “can be specified multiple times” (-D/--dictionary), others expect “comma-separated list of files” (-S/--skip), others don't even bother to document how to specify multiple files (-I/--ignore-words and -x/--exclude-file), if possible at all.
  2. In the case of -D/--dictionary, what works for command line options will not work in INI files: you cannot have multiple occurrences of the same key in INI files by default – or at least configparser will at best take the last value into account and discard previous values.

Solutions:

  1. We could modify the behaviour of configparser to accept multiple occurrences of keys, using a custom dict_type for configparser.ConfigParser. This would result in non-standard INI files, but command line options that “can be specified multiple times” could be used the same way in a config file.
  2. We could modify all the above options that accept a file to accept a “comma-separated list of files”. Then command line options and config file options could be equivalent.

We could do both, of course.

Thoughts?

@larsoner
Copy link
Member

I prefer solution (2) to (1)

@DimitriPapadopoulos
Copy link
Collaborator

Solution 2 is being implemented in #2767.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants