-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for illegal characters in codelists #418
Check for illegal characters in codelists #418
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @dc-almeida, couple of comments below.
Some more general questions from my side:
- Should we extend the illegal characters to all attributes or just the name of a code? The reason I'm asking is that I think for the definition of a variable, it might be useful to use punctuation such as
:
,;
or even quotes. - Should the illegal characters be project specific, i.e. be part of the nomenclature config? In my opinion, the point of nomenclature is to establish standards for variable names, etc... so having project specific illegal characters defeats that point.
- I see the point of limiting the illegal character check for only the current repo. On the other hand, it would still affect the codelists so I'd vote for being more strict and search everything that's used in a project, including external repositories.
tests/data/codelist/illegal_chars/char_in_external_repo/nomenclature.yaml
Outdated
Show resolved
Hide resolved
FYI, the idea for this PR came from @orichters via IAMconsortium/common-definitions#138. And the current implementation with making the illegal-characters project/repo-specific was to avoid having to go through a dozen legacy projects and manually cleaning up stuff... |
@danielhuppmann, ah thanks for the clarification. This would mean though that we'd have to copy-paste these from now on standard illegal characters into every |
We could flip it around, fix the illegal characters and introduce a |
I like that idea but I’d again flip it around so that we can also set a specific list of characters (maybe we want to make $ illegal later). So an argument “check-illegal-characters” that is True by default, can be set to False, or take a list of characters. |
I'd split that into two attributes |
Fair enough. Sounds ok, @dc-almeida? |
In addition, I would set the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two smaller changes, using ErrorCollector
and using model_dump to check the Code attributes, then good to merge from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick changes @dc-almeida. Good to be merged from my side now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks @dc-almeida! Just please add empty lines at the end of the new yaml files for the tests.
Before merging, can you please go through all workflow-repos in production and add the line illegal_characters: [ ]
If necessary.
Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>
Closes #401, updates #411
The way it's implemented, it does not check external repos for illegal characters, thus being specific to the repo where the config is defined.
This was tested previously with common-definitions, as some variables contain possibly problematic characters (see here).
By checking if each code has a populated
repository
attribute, we identify and skip the external repo codes from the code lists.I think the Pydantic validation of the
illegal_characters
field is straightforward, but can add a test for it if deemed necessary.Caveat is that the stray tag check was previously a Pydantic validation, and now is (must) be called explicitly. If a better suggestion comes up, happy to implement it!