-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: Enable Spellcheck in dockbuild #21402
Conversation
Hello @FabioRosado! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on June 24, 2018 at 14:32 Hours UTC |
scripts/announce.py
Outdated
@@ -67,6 +67,17 @@ def get_authors(revision_range): | |||
cur.discard('Homu') | |||
pre.discard('Homu') | |||
|
|||
# Update doc/source/names_wordlist.txt with the names of every author | |||
names = [] | |||
[names.extend(re.sub('\W+', ' ', x).split()) for x in cur.union(pre)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a generator expression instead of a list comprehension since you only iterate over names
once
scripts/announce.py
Outdated
names = [] | ||
[names.extend(re.sub('\W+', ' ', x).split()) for x in cur.union(pre)] | ||
|
||
path = os.path.sep.join(os.path.abspath(__file__).split(os.path.sep)[:-2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use os.path.dirname
to go up a couple directories or alternately os.path.join
using ../..
(latter is more succinct though not completely sure of portability)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FabioRosado do you plan to add more to this PR to actually enable it on CI?
In that case, I would quickly submit another PR with just the try except change in conf.py if you want (so at least the doc build works again)
doc/source/conf.py
Outdated
extensions.append('sphinxcontrib.spelling') | ||
spelling_word_list_filename = ['spelling_wordlist.txt', 'names_wordlist.txt'] | ||
spelling_ignore_pypi_package_names = True | ||
except ModuleNotFoundError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also catch ImportError?
Because if you have installed sphinxcontrib.spelling
, but not pyenchant, you get this (which will be a common case I think, since this is what you get if you simply pip/conda install it)
doc/source/conf.py
Outdated
spelling_word_list_filename = ['spelling_wordlist.txt', 'names_wordlist.txt'] | ||
spelling_ignore_pypi_package_names = True | ||
except ModuleNotFoundError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can print a message about no spell-check being done because it is not installed?
Heya yeah that is my plan but I have some busy days at work I plan to work on this tomorrow or Wednesday. I appreciate your help @jorisvandenbossche I need to check datapythonista’s PR and see if he changed anything and then I’ll check if I need to add something or not. I was planning to add some sort of message saying why it failed but I wasn’t sure if I should add a print or some sort of logging (I just need to check what is used in pandas) |
@FabioRosado, I can work on the simple PR to make the documentation build work again later this afternoon. Then you can check for a better solution when you have time. |
@datapythonista that sounds awesome I didn't want to step in on your PR that's why I added that bit as a thing that could be done in case you did it differently haha my schedule just changed tomorrow so I will have to work on this on Wednesday :/ |
No worries. Just updated the PR so the documentation can be built when the dependencies for the spellcheck are not present, as this is kind of urgent. Then, with no rush and whenever you have time, you can work on a good approach to manage the dependencies for the spellcheck. |
Codecov Report
@@ Coverage Diff @@
## master #21402 +/- ##
=======================================
Coverage 91.9% 91.9%
=======================================
Files 153 153
Lines 49547 49547
=======================================
Hits 45537 45537
Misses 4010 4010
Continue to review full report at Codecov.
|
scripts/announce.py
Outdated
for name in all_names: | ||
wordlist.write('{}\n'.format(name)) | ||
|
||
update_name_wordlist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to remove this from my code will fix on the next commit
On this update, I have decided to create another function in The plan to use sets is to try and avoid duplicated names. I have also updated the regex to avoid some characters but not all non-words (names such as O'Leary as marked as a single word). Now when the command I also updated the bit that datapythonista wrote to just add I'm not sure if the code in Finally, I have been reading about CI and how to add it to the docbuild but I am unsure how to do this, I have also read the scripts located in the ci folder. I'd appreciate some guidance in this matter as I have no idea how to proceed from here (my apologies). Finally, is it worth it to make a test in order to try and bump coverage up since it went down a bit? Thank you! |
@@ -18,7 +18,7 @@ Fixed Regressions | |||
**Comparing Series with datetime.date** | |||
|
|||
We've reverted a 0.23.0 change to comparing a :class:`Series` holding datetimes and a ``datetime.date`` object (:issue:`21152`). | |||
In pandas 0.22 and earlier, comparing a Series holding datetimes and ``datetime.date`` objects would coerce the ``datetime.date`` to a datetime before comapring. | |||
In pandas 0.22 and earlier, comparing a Series holding datetimes and ``datetime.date`` objects would coerce the ``datetime.date`` to a datetime before comparing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think you want this whatsnew or the one for 24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was testing the spellcheck to see if all the names were marked as correct, on the process of the spellcheck it marked this and the other whatsnew with some grammatical errors, so I checked and it seems that they should be fixed that's why I included it here - sorry for the confusion since this change wasn't really related to the PR I could revert the changes and submit a new one with just those changes if you would like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK - it's simple enough that I think that's fine. It just wasn't readily apparent to me that's why those notes were included but that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand I should have explained in my first comment. Could you help me with the CI integration, would it suffice to write a test and run the spellcheck and assert that no exception is raised?
ci/build_docs.sh
Outdated
@@ -69,6 +69,10 @@ if [ "$DOC" ]; then | |||
pandas/core/reshape/reshape.py \ | |||
pandas/core/reshape/tile.py | |||
|
|||
echo "Running spellcheck on documentation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to follow the example on line 35 where make.py
is being called. If I'm not mistaken we need to cd into the doc folder before running the make.py spellcheck since the script cd back to $TRAVIS_BUILD_DIR
.
This is the first time I tried to write something like this so I'd like your opinion about it and if enabling the spellcheck within CI should be done in a different way.
Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you just move this closer to line 35 so you don't need to keep changing directories?
I added the code down there because I thought I needed to let the whole script run and then do the spell check at the end. I have moved that bit closer to |
@@ -1,1652 +1,1987 @@ | |||
Critchley | |||
Villanova |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this list at all? (as you are now getting it from the authors)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so at build time, why don't we just generate the list (to a file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the list is build when spellcheck is called. I have updated the spellcheck function located in make.py to call the function get_name_wordlist() which is located in scripts/announce.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so let's delete the list then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we still need the list, the only reason why we do need it is due to the fact that there are names(around 5-7) in the file release.rst
that are not author names (or at least they are different so they don't get added when we run the function get_names_wordlist
.
If these names are not added somewhere then the build will always fail. If you are worried about having a file with all the names of every contributor I could just delete most of them and add only those in release.rst
since the file will be updated with all the names.
Perhaps this file could be good to have just in case people need to add their own names (like they use a different one or something and it gets flagged as a spelling error like those in release.rst
), so with this file people would know that they have to add those names in there and not in the spelling_wordlist.txt
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could just delete most of them
Yes, I would only keep those that are not generated by the script (so the ones that are needed to get the script running)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would only keep those that are not generated by the script (so the ones that are needed to get the script running)
Actually, another option is to have them all in there, and have the script update it on demand (so if the release manager adds the names to the notes, the scripts needs to be run). That would save the extra step of creating it on each doc build.
Yes we do need the list, like I said on my previous PR, the Sphinx extension reads the words to be whitelisted from a txt file and currently there is no way to use stdout to whitelist words 😄 |
The doc build on Travis is failing with "ModuleNotFoundError: No module named 'git'" (check the last build to see the log output of the doc build) |
@jorisvandenbossche The build that failed is that #21511 or the build from my previous commit? If it was the latter I was getting a problem from the lint script, but removing the But now that you mention that, i'm importing the function from announce.py which uses -- EDIT -- Ideally, I would like the list to be updated when users run the command |
No, it is from this PR. Each PR (and each commit that you add to a PR) is tested on travis. You can follow the link to that by clicking on the "All checks have passed" -> "Show all checks". Note that it is not actually failing (travis will show it as green), but you can see in the doc build log that it is actually failing. Eg for the last commit added here: https://travis-ci.org/pandas-dev/pandas/jobs/393561877 The reason this is failing is simply because the correct dependencies are not installed on travis (so the installation script needs to be updated, need to check where you can add this)
Yes, that's a good idea. |
BTW, I think a very valuable contribution would be to add a feature to sphinxcontrib-spelling to skip a file ... (that would solve many problems here!) |
And it might be as simple as skipping based on the |
@jorisvandenbossche Yeah you are right that would make things so much easier, I'll try to submit a PR to the spelling extension to add the possibility to avoid certain files. I guess this PR will have to either be placed on hold or closed since if this gets merged then the build will eventually fail when new names are added - unless we add another dependency to make this work on the build. I will wait until I hear from you as to what is the best course of action with these issues 👍 |
That would be really cool!
I think we can go ahead with this for now, as long as the script is there to update the list of names to not have the build fail. We can later remove that again if the feature to ignore certain files is added. |
I had to remove ModuleNotFoundError since linting kept marking this as non-existent as this except is only available on Python 3.6, I have updated the name list back to all the names as well. I did leave the reference to the function that updates the wordlist on the spellcheck function, but if you run the announce.py script the list will also be updated (since its not called from Will look forward to fixing some issue here once I figure out how to avoid files/folders in the spelling extension haha |
can you rebase / update |
I thought this PR wasn't accepted as there was some different opinions about it |
rereading i think this is ok if we can just auto generate the list of authors from the commit history rather than hard coding everything |
@jreback @jorisvandenbossche are you happy merging this if @FabioRosado rebases? My opinion is that the whole typo validation adds too much complexity to the project. And that being based in enchant, which is a deprecated project, it even makes less sense to have it. We can surely have a separate repo in the pandas org that checks in the pandas repo whether there is any typo with all this system we have, that would be very useful. But having it in pandas, when afaik none of the maintainers was able to set it up, doesn't seem to be worth. In any case, can you let @FabioRosado if he should rebase so we can merge it, or just close the PR. Thanks! |
ok let's close this then. |
This is the initial PR to try and fix the issue #21354 the sphinx spelling extension needs every name to be divided and added to each line in order to get the name marked as spelt correctly. The file
names_wordlist.txt
is updated on every run of the functionget_authors()
located insidescripts/announce.py
.Since the issue #21396 is related to my previous PR I attempted to resolve this issue here as well, I followed the suggestion of @jorisvandenbossche and added a try/except to the
conf.py
if the dependency doesn't exist then nothing will happen.@datapythonista since you submitted a PR(#21397) to try and fix the issue, perhaps you have an opinion about this bit of code. Also, if you would like to do the changes on your own PR I will just delete this bit and use your solution 😄 👍
I would like some feedback on this first attempt, the generator
[names.extend(re.sub('\W+', ' ', x).split()) for x in cur.union(pre)]
seems a bit smelly but does the trick, should this be refactored to use a long form and make it more readable? I extended the list to names because split() was creating a list inside the main list and this makes every name to be split and added to the same list (this is probably very memory inefficient though)I tried to run the script from my machine but I got an issue with it, probably I'm doing something wrong but when I run
./scripts/announce.py $GITHUB v0.15.1..v0.23.0
I get the error message:Not sure what I'm doing wrong, to be honest. If I run just the
get_authors()
function everything seems to be fine (including the v0.15.1..v0.23.0 format) and the names_wordlist is updated successfully.I will look forward to read your opinions about this PR 👍
git diff upstream/master -u -- "*.py" | flake8 --diff