CI: Enable Spellcheck in dockbuild #21402

FabioRosado · 2018-06-09T16:31:45Z

This is the initial PR to try and fix the issue #21354 the sphinx spelling extension needs every name to be divided and added to each line in order to get the name marked as spelt correctly. The file names_wordlist.txt is updated on every run of the function get_authors() located insidescripts/announce.py.

Since the issue #21396 is related to my previous PR I attempted to resolve this issue here as well, I followed the suggestion of @jorisvandenbossche and added a try/except to the conf.py if the dependency doesn't exist then nothing will happen.

@datapythonista since you submitted a PR(#21397) to try and fix the issue, perhaps you have an opinion about this bit of code. Also, if you would like to do the changes on your own PR I will just delete this bit and use your solution 😄 👍

I would like some feedback on this first attempt, the generator [names.extend(re.sub('\W+', ' ', x).split()) for x in cur.union(pre)] seems a bit smelly but does the trick, should this be refactored to use a long form and make it more readable? I extended the list to names because split() was creating a list inside the main list and this makes every name to be split and added to the same list (this is probably very memory inefficient though)

I tried to run the script from my machine but I got an issue with it, probably I'm doing something wrong but when I run ./scripts/announce.py $GITHUB v0.15.1..v0.23.0 I get the error message:

usage: announce.py [-h] [--repo REPO] revision_range 
announce.py: error: unrecognized arguments: v0.15.1..v0.23.0

Not sure what I'm doing wrong, to be honest. If I run just the get_authors() function everything seems to be fine (including the v0.15.1..v0.23.0 format) and the names_wordlist is updated successfully.

I will look forward to read your opinions about this PR 👍

closes Missing dependency for sphinxcontrib-spelling #21396 CI: Enable spellcheck in docbuild #21354
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-06-09T16:31:48Z

Hello @FabioRosado! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 24, 2018 at 14:32 Hours UTC

WillAyd · 2018-06-09T20:13:54Z

scripts/announce.py

@@ -67,6 +67,17 @@ def get_authors(revision_range):
    cur.discard('Homu')
    pre.discard('Homu')

+    # Update doc/source/names_wordlist.txt with the names of every author
+    names = []
+    [names.extend(re.sub('\W+', ' ', x).split()) for x in cur.union(pre)]


Could use a generator expression instead of a list comprehension since you only iterate over names once

WillAyd · 2018-06-09T20:15:50Z

scripts/announce.py

+    names = []
+    [names.extend(re.sub('\W+', ' ', x).split()) for x in cur.union(pre)]
+
+    path = os.path.sep.join(os.path.abspath(__file__).split(os.path.sep)[:-2])


Can use os.path.dirname to go up a couple directories or alternately os.path.join using ../.. (latter is more succinct though not completely sure of portability)

jorisvandenbossche

@FabioRosado do you plan to add more to this PR to actually enable it on CI?
In that case, I would quickly submit another PR with just the try except change in conf.py if you want (so at least the doc build works again)

jorisvandenbossche · 2018-06-11T07:00:36Z

doc/source/conf.py

+    extensions.append('sphinxcontrib.spelling')
+    spelling_word_list_filename = ['spelling_wordlist.txt', 'names_wordlist.txt']
+    spelling_ignore_pypi_package_names = True
+except ModuleNotFoundError:


Can you also catch ImportError?
Because if you have installed sphinxcontrib.spelling, but not pyenchant, you get this (which will be a common case I think, since this is what you get if you simply pip/conda install it)

jorisvandenbossche · 2018-06-11T07:01:11Z

doc/source/conf.py

+    spelling_word_list_filename = ['spelling_wordlist.txt', 'names_wordlist.txt']
+    spelling_ignore_pypi_package_names = True
+except ModuleNotFoundError:
+    pass


Maybe we can print a message about no spell-check being done because it is not installed?

FabioRosado · 2018-06-11T07:54:39Z

Heya yeah that is my plan but I have some busy days at work I plan to work on this tomorrow or Wednesday.

I appreciate your help @jorisvandenbossche I need to check datapythonista’s PR and see if he changed anything and then I’ll check if I need to add something or not.
Also, good point on the ImportError I didn’t thought about that.

I was planning to add some sort of message saying why it failed but I wasn’t sure if I should add a print or some sort of logging (I just need to check what is used in pandas)

datapythonista · 2018-06-11T08:43:05Z

@FabioRosado, I can work on the simple PR to make the documentation build work again later this afternoon. Then you can check for a better solution when you have time.

FabioRosado · 2018-06-11T18:18:16Z

@datapythonista that sounds awesome I didn't want to step in on your PR that's why I added that bit as a thing that could be done in case you did it differently haha my schedule just changed tomorrow so I will have to work on this on Wednesday :/

datapythonista · 2018-06-11T20:53:32Z

No worries. Just updated the PR so the documentation can be built when the dependencies for the spellcheck are not present, as this is kind of urgent. Then, with no rush and whenever you have time, you can work on a good approach to manage the dependencies for the spellcheck.

codecov · 2018-06-14T21:54:24Z

Codecov Report

Merging #21402 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #21402   +/-   ##
=======================================
  Coverage    91.9%    91.9%           
=======================================
  Files         153      153           
  Lines       49547    49547           
=======================================
  Hits        45537    45537           
  Misses       4010     4010

Flag	Coverage Δ
#multiple	`90.3% <ø> (ø)`	⬆️
#single	`41.78% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1aa08c...1290adf. Read the comment docs.

FabioRosado · 2018-06-14T22:08:38Z

scripts/announce.py

+        for name in all_names:
+            wordlist.write('{}\n'.format(name))
+
+update_name_wordlist()


I forgot to remove this from my code will fix on the next commit

FabioRosado · 2018-06-14T22:27:01Z

On this update, I have decided to create another function in scripts/announce.py that uses the get_authors in order to get all the names since version 0.3.0. I wanted to use the names already on the wordlist (because it contains some names located in release.rst that aren't in get_authors) so I turned those names into a set, then updated the set with the names that I got from get_authors.

The plan to use sets is to try and avoid duplicated names. I have also updated the regex to avoid some characters but not all non-words (names such as O'Leary as marked as a single word).

Now when the command python make.py spellcheck is run the function update_name_wordlist() will be called before the spelling extension - so all names will be updated before spellcheck is run.

I also updated the bit that datapythonista wrote to just add ModuleNotFoundError and replaced logging.warn by logging.warning - because the .warn method is deprecated.

I'm not sure if the code in update_name_wordlist is the most efficient one so I'm more than happy to change something if you think there are better ways to do this.

Finally, I have been reading about CI and how to add it to the docbuild but I am unsure how to do this, I have also read the scripts located in the ci folder. I'd appreciate some guidance in this matter as I have no idea how to proceed from here (my apologies).

Finally, is it worth it to make a test in order to try and bump coverage up since it went down a bit?

Thank you!

WillAyd · 2018-06-14T22:35:25Z

doc/source/whatsnew/v0.23.1.txt

@@ -18,7 +18,7 @@ Fixed Regressions
 **Comparing Series with datetime.date**

 We've reverted a 0.23.0 change to comparing a :class:`Series` holding datetimes and a ``datetime.date`` object (:issue:`21152`).
-In pandas 0.22 and earlier, comparing a Series holding datetimes and ``datetime.date`` objects would coerce the ``datetime.date`` to a datetime before comapring.
+In pandas 0.22 and earlier, comparing a Series holding datetimes and ``datetime.date`` objects would coerce the ``datetime.date`` to a datetime before comparing.


Don't think you want this whatsnew or the one for 24

I was testing the spellcheck to see if all the names were marked as correct, on the process of the spellcheck it marked this and the other whatsnew with some grammatical errors, so I checked and it seems that they should be fixed that's why I included it here - sorry for the confusion since this change wasn't really related to the PR I could revert the changes and submit a new one with just those changes if you would like.

Ah OK - it's simple enough that I think that's fine. It just wasn't readily apparent to me that's why those notes were included but that makes sense.

I understand I should have explained in my first comment. Could you help me with the CI integration, would it suffice to write a test and run the spellcheck and assert that no exception is raised?

FabioRosado · 2018-06-16T15:54:21Z

ci/build_docs.sh

@@ -69,6 +69,10 @@ if [ "$DOC" ]; then
           pandas/core/reshape/reshape.py \
           pandas/core/reshape/tile.py

+    echo "Running spellcheck on documentation"


I tried to follow the example on line 35 where make.py is being called. If I'm not mistaken we need to cd into the doc folder before running the make.py spellcheck since the script cd back to $TRAVIS_BUILD_DIR.

This is the first time I tried to write something like this so I'd like your opinion about it and if enabling the spellcheck within CI should be done in a different way.

Thank you

Why don't you just move this closer to line 35 so you don't need to keep changing directories?

FabioRosado · 2018-06-17T11:10:36Z

I added the code down there because I thought I needed to let the whole script run and then do the spell check at the end. I have moved that bit closer to ./make.py so this should run once the docs are build

jreback · 2018-06-18T22:52:45Z

doc/source/names_wordlist.txt

@@ -1,1652 +1,1987 @@
-Critchley
-Villanova


do we need this list at all? (as you are now getting it from the authors)?

so at build time, why don't we just generate the list (to a file)

Yeah the list is build when spellcheck is called. I have updated the spellcheck function located in make.py to call the function get_name_wordlist() which is located in scripts/announce.py

ok, so let's delete the list then.

Unfortunately, we still need the list, the only reason why we do need it is due to the fact that there are names(around 5-7) in the file release.rst that are not author names (or at least they are different so they don't get added when we run the function get_names_wordlist.

If these names are not added somewhere then the build will always fail. If you are worried about having a file with all the names of every contributor I could just delete most of them and add only those in release.rst since the file will be updated with all the names.

Perhaps this file could be good to have just in case people need to add their own names (like they use a different one or something and it gets flagged as a spelling error like those in release.rst), so with this file people would know that they have to add those names in there and not in the spelling_wordlist.txt, what do you think?

I could just delete most of them

Yes, I would only keep those that are not generated by the script (so the ones that are needed to get the script running)

Yes, I would only keep those that are not generated by the script (so the ones that are needed to get the script running)

Actually, another option is to have them all in there, and have the script update it on demand (so if the release manager adds the names to the notes, the scripts needs to be run). That would save the extra step of creating it on each doc build.

FabioRosado · 2018-06-18T23:44:24Z

Yes we do need the list, like I said on my previous PR, the Sphinx extension reads the words to be whitelisted from a txt file and currently there is no way to use stdout to whitelist words 😄

jorisvandenbossche · 2018-06-19T08:17:05Z

The doc build on Travis is failing with "ModuleNotFoundError: No module named 'git'" (check the last build to see the log output of the doc build)

FabioRosado · 2018-06-19T09:32:31Z

@jorisvandenbossche The build that failed is that #21511 or the build from my previous commit? If it was the latter I was getting a problem from the lint script, but removing the ModuleNotFoundError from the try/except (when trying to import the spelling library) seem to fix the issue.

But now that you mention that, i'm importing the function from announce.py which uses gitpython
and pygithub, I guess this could potentially be a problem, I'll try to import these dependencies along with sphinx spelling and just log if they weren't found.

-- EDIT --
I have tested my code on python 3.6 and I am currently having issues with importing the function update_names_wordlist() from the announce.py file. On 3.5 there is no issue so I will need to try and figure out how to update the list when the spellcheck runs.

Ideally, I would like the list to be updated when users run the command python make.py spellcheck and not just from travis since the build can break locally but not on travis.

jorisvandenbossche · 2018-06-19T12:03:43Z

The build that failed is that #21511 or the build from my previous commit?

No, it is from this PR. Each PR (and each commit that you add to a PR) is tested on travis. You can follow the link to that by clicking on the "All checks have passed" -> "Show all checks".

Note that it is not actually failing (travis will show it as green), but you can see in the doc build log that it is actually failing. Eg for the last commit added here: https://travis-ci.org/pandas-dev/pandas/jobs/393561877

The reason this is failing is simply because the correct dependencies are not installed on travis (so the installation script needs to be updated, need to check where you can add this)

I'll try to import these dependencies along with sphinx spelling and just log if they weren't found.

Yes, that's a good idea.
Although that makes yet another optional dependency for a dev environment ...

jorisvandenbossche · 2018-06-19T12:04:22Z

BTW, I think a very valuable contribution would be to add a feature to sphinxcontrib-spelling to skip a file ... (that would solve many problems here!)

jorisvandenbossche · 2018-06-19T12:16:04Z

BTW, I think a very valuable contribution would be to add a feature to sphinxcontrib-spelling to skip a file ... (that would solve many problems here!)

And it might be as simple as skipping based on the docname in https://github.com/sphinx-contrib/spelling/blob/fa8c4b6140d7be0dcaa2aefbf690b84090bb0884/sphinxcontrib/spelling/builder.py#L144

FabioRosado · 2018-06-20T05:57:03Z

@jorisvandenbossche Yeah you are right that would make things so much easier, I'll try to submit a PR to the spelling extension to add the possibility to avoid certain files.

I guess this PR will have to either be placed on hold or closed since if this gets merged then the build will eventually fail when new names are added - unless we add another dependency to make this work on the build.

I will wait until I hear from you as to what is the best course of action with these issues 👍

jorisvandenbossche · 2018-06-20T07:39:09Z

'll try to submit a PR to the spelling extension to add the possibility to avoid certain files.

That would be really cool!

I guess this PR will have to either be placed on hold or closed

I think we can go ahead with this for now, as long as the script is there to update the list of names to not have the build fail. We can later remove that again if the feature to ignore certain files is added.

FabioRosado · 2018-06-20T08:20:53Z

I had to remove ModuleNotFoundError since linting kept marking this as non-existent as this except is only available on Python 3.6, I have updated the name list back to all the names as well.

I did leave the reference to the function that updates the wordlist on the spellcheck function, but if you run the announce.py script the list will also be updated (since its not called from main). Hope this is okay.

Will look forward to fixing some issue here once I figure out how to avoid files/folders in the spelling extension haha

jreback · 2018-09-25T16:30:20Z

can you rebase / update

FabioRosado · 2018-09-25T18:02:41Z

I thought this PR wasn't accepted as there was some different opinions about it

jreback · 2018-09-25T18:09:08Z

rereading i think this is ok if we can just auto generate the list of authors from the commit history rather than hard coding everything

datapythonista · 2018-11-03T06:59:57Z

@jreback @jorisvandenbossche are you happy merging this if @FabioRosado rebases?

My opinion is that the whole typo validation adds too much complexity to the project. And that being based in enchant, which is a deprecated project, it even makes less sense to have it. We can surely have a separate repo in the pandas org that checks in the pandas repo whether there is any typo with all this system we have, that would be very useful. But having it in pandas, when afaik none of the maintainers was able to set it up, doesn't seem to be worth.

In any case, can you let @FabioRosado if he should rebase so we can merge it, or just close the PR. Thanks!

jreback · 2018-11-03T13:22:57Z

My opinion is that the whole typo validation adds too much complexity to the project. And that being based in enchant, which is a deprecated project, it even makes less sense to have it. We can surely have a separate repo in the pandas org that checks in the pandas repo whether there is any typo with all this system we have, that would be very useful. But having it in pandas, when afaik none of the maintainers was able to set it up, doesn't seem to be worth.

ok let's close this then.

WillAyd requested changes Jun 9, 2018

View reviewed changes

gfyoung added Testing pandas testing functions or related to the test suite Docs labels Jun 9, 2018

jorisvandenbossche reviewed Jun 11, 2018

View reviewed changes

FabioRosado force-pushed the spellcheck branch from e9ee9cd to 02fc76c Compare June 14, 2018 21:54

FabioRosado commented Jun 14, 2018

View reviewed changes

WillAyd reviewed Jun 14, 2018

View reviewed changes

FabioRosado commented Jun 16, 2018

View reviewed changes

jreback requested changes Jun 18, 2018

View reviewed changes

FabioRosado added 7 commits June 20, 2018 06:03

Build a list of names from every author

478b058

Attempt to import sphixcontrib-spelling, if fails pass

1e3c8a0

Add ModuleNotFoundError, use logging.warning (warn is depricated)

96ae258

Add wordlists and ignore pypi package names

c78668b

Refactor get_author to update names_wordlist.txt

c0c174e

Update announce, call function from make.py

6b68f03

Update wordlist, fix typos

5d418fb

FabioRosado added 4 commits June 20, 2018 06:05

Run spellcheck from build_docs.sh

4602b68

Move spellcheck higher in the code

b7bae3a

Remove ModuleNotFound from try/except

e17ec14

Add import git to try/except

17a2b17

FabioRosado force-pushed the spellcheck branch from f6f3dd2 to 18c7fa0 Compare June 20, 2018 05:54

Update names wordlist, add work around git import issue

18c7fa0

Fix linting, update name wordlist

470f716

Merge branch 'master' into spellcheck

1290adf

jreback closed this Nov 3, 2018

datapythonista mentioned this pull request Dec 15, 2018

DOC: Remove doc spellcheck #24287

Merged

		@@ -1,1652 +1,1987 @@
		Critchley
		Villanova

CI: Enable Spellcheck in dockbuild #21402

CI: Enable Spellcheck in dockbuild #21402

Conversation

FabioRosado commented Jun 9, 2018

I will look forward to read your opinions about this PR 👍

pep8speaks commented Jun 9, 2018 • edited Loading

Comment last updated on June 24, 2018 at 14:32 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabioRosado commented Jun 11, 2018

datapythonista commented Jun 11, 2018

FabioRosado commented Jun 11, 2018

datapythonista commented Jun 11, 2018

codecov bot commented Jun 14, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

FabioRosado commented Jun 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabioRosado Jun 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabioRosado commented Jun 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabioRosado commented Jun 18, 2018

jorisvandenbossche commented Jun 19, 2018

FabioRosado commented Jun 19, 2018 • edited Loading

jorisvandenbossche commented Jun 19, 2018

jorisvandenbossche commented Jun 19, 2018

jorisvandenbossche commented Jun 19, 2018

FabioRosado commented Jun 20, 2018

jorisvandenbossche commented Jun 20, 2018

FabioRosado commented Jun 20, 2018

jreback commented Sep 25, 2018

FabioRosado commented Sep 25, 2018

jreback commented Sep 25, 2018

datapythonista commented Nov 3, 2018

jreback commented Nov 3, 2018

pep8speaks commented Jun 9, 2018 •

edited

Loading

codecov bot commented Jun 14, 2018 •

edited

Loading

FabioRosado Jun 16, 2018 •

edited

Loading

FabioRosado commented Jun 19, 2018 •

edited

Loading