-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docstrings checking #2011
Comments
Currently there are a lot of docstrings issue inside ibis. Accordingly pydocstyle, currently there are 3064 docstrings error inside the project. Ibis CI now is checking just files at ibis/ and ibis/omniscidb but for the other backends, a "precise" validation should be done locally using pydocstyle. and it takes times. My point of view is that a docstring validations is similar to code validation. In ibis we have already code format validation, and Ibis also has pre-commit hooks for black, flake8 and isort. That helps a lot to keep review simple and easier. I think that would be good to work on that into 2 phases.
the 1st step is important because after this fixing every new contributions will automatically be checked the docstrings on CI. for the phase 2, we need to discuss best practices and create a document specifying the ibis documentation guideline. |
Following the discussion on #2018 : There are a lot of functions and method inside Ibis with no docstrings or with docstrings that doesn't fit in the numpy specifications. it is an evidence that it is hard for reviewers to check that manually also maybe it also not clear for reviewers that ibis use numpy docstring standard for docstrings. |
pls show a sample of the warnings the linters oftentimes show IMHO a lot of useless things that simply should be ignored we already do quite a lot of linking so need to be quite mindful of adding baggage |
in the PR #2018 also I am proposing to keep as private functions that don't need to be visible to end-users. But if that functions need to be visible to these functions should be very well documented and the documentation should be updated. https://docs.ibis-project.org/api.html seems to be out of date. |
here is the current log from pydocstyle: https://pastebin.com/uUsRHkhP |
the problem is do it in a subjective approach .. also my opinion could be different than yours. using something objective makes the review workflow easier and it guarantees at least a minimum standard level for the code. |
you are missing the point the linters check for too much stuff here you are going to cause massive changes for IMHO not much gain and a lot of pain from contributors |
the checks just contemplate the numpy docstring specification. I think all these changes will have a good impact, for the documentation, for new contributors, for reviewers, for users ... as I am proposing that .. I am working on fixing these issues .. I am trying to work on one backend each week. |
what exactly is the problem with the current lint checks? |
currently on CI pydocstyle just run for related to #2018 these are the current docstring errors: https://pastebin.com/pB7Qgc4y as you can see numpy docstring specification specify that all public functions methods or classes need a docstring. |
@toryhaavik @scottcode @cpcloud any thought about this? |
@xmnlab I appreciate your enthusiasm for docstrings as well as automated style checkers. I especially like our use of The problem with pydocstyle is that it complains about a whole bunch of stuff that I don't see much value in. I don't care if the first line ends in a period or not. It doesn't bother me much if some docstrings fit in one line including the quotes while others don't. If the style checker doesn't automatically make uncontroversial fixes to these kind of issues, then I'd prefer not having my commits hung up on such tiny style details. Some things it finds can be important, like missing docstrings, but even in that case, sometimes method names are self-explanatory enough. Especially in a batch of many very similar methods, ensuring docstrings on every single one might actually more distracting than helpful. The few ways in which standardization would be helpful don't seem to be checked by pydocstyle anyway. Like consistently using either
(Which are we supposed to use, by the way?) In short, I'd be on board with taking a little more care with our docstrings in ways that add value, but let's not overburden ourselves with minutiae. |
Also, I think using pydocstyle in the way you have with fixing docstrings for the Pyspark backend can lead to unintended consequences, like making a bunch of methods private just so the style checker will ignore them. I don't think all those methods that you changed to private in #2018 really ought to be private. |
hey @scottcode thanks for the feedback :) personally I don't worry also if the summary ends with period or not ... but it is good to be consistent on the entire docstring/documentation .. some places that ends with period .. and other places that doesn't end with period seems to be inconsistent (I am using the period here just as an example). Currently we use numpydoc style as our default style https://docs.ibis-project.org/contributing.html#style-and-formatting if we decide to add exceptions we should update the documentation. using pydocstyle we can specify to ignore some kind of errors .. and IMHO this info should be described into the documentation. as you mentioned pydocstyle doesn't check some important stuff. but we can contribute pydocstyle to improve it, I think that would be a great help (also we can motivate other people to contribute). Currently there is an issue that makes hard to add pydocstyle into git pre-commit hook. I am planing to help into that* related to #2018 I change that methods to private because It didn't look as it need to be public or that is used externally .. ibis will register it and it is not used directly ... but I think that also private functions/methods should be documented. but I don't worry to keep all these functions as public ... but my first guess is that if it is public it should be public into the API documentation ... and so .. the docstrings should be improved. detailed documentation helps contributors, users and also reviewers. |
@datapythonista was telling me about a lot of work they are doing in Pandas to automate and standardize their docstrings, to make it easier for review changes to them and verify they are accurate. @datapythonista do you have a link that summarizes that work that might be useful for ibis to learn from? |
numpydoc master will have a So far can just be called from Python, see this PR: numpy/numpydoc#238 Let me know if you have any question, I'll probably put together a blog post when everything is in place. |
thanks @datapythonista ! It looks interesting! I will play with that! |
@datapythonista is pandas using already this new feature of numpydoc? |
@datapythonista also could you give an overview about the decisions you made on pandas group related to docstrings verification? it would be good to learn more about the experience you had and implement good practices also here. |
pandas has been adding new validations for around two years while building that script. Now we're moving it to numpydoc so the rest of the community can also use it. There is still some discussion, because numpydoc allows several options here and there, and we are more strict and want "one and only one way to do things". But the points under discussion are minimal, and you can ignore the errors you don't care about. In conclusion, the script added a lot of value for pandas in keeping docstrings right and consistent. For other projects it's still "in beta" and if you start using right now, you'll be part of the discussions and testing. |
thanks @datapythonista for the details ! @jreback @scottcode @cpcloud any thoughts about using the same/similar approach used by pandas? |
@xmnlab +1 in having some doc string validations there are currently pretty good validations in place. sure i think some of this can be improved but let’s go slow with a POC and not wholesale just change things for example having copy pasted doc strings is actually a big -1; this causes a lot of maintenance costs so happy for you or others to propose things but as i said let’s go step by step |
@jreback not sure I am following you ... could you give more details/examples? |
and how people are dealing with these cases in pandas? by the other hand, a good documentation could avoid contribution costs (eg: #1989) it is not related but, for example, pyopensci review requires that I understand your point .. but I think our current state about the documentation is not good .. and it is harder to maintain that than spending time to improve docstrings and documentation. |
how so? what exactly is problematic about the current docs / documentation? |
@jreback good point. I am analyzing the documentation (at least an overview) and I will share some points here today or tomorrow. |
Probably worth sharing our experience with pandas. We've got around 1,600 docstrings, and many of them could benefit from.having better content (too short descriptions, lack of examples...). There is no way senior people can fix all those, there are just too many. So we encourage first time contributors to help us. But that takes a lot of our of time reviewing code, and most of that commenting about formatting and punctuation errors. We are adding validation to the CI (which requires fixing the errors first) to be able to have the contributions we need in a way we don't need to waste lots of hours in reviews we can automate. We spent lots of hours in this for more than two years, and there is still a lot to do. I don't know much about ibis documentation, but my recommendation is to avoid a quest to fix docstrings unless you really have a good reason to or you're very bored. ;) |
I created an issue to organize some points related to documentation #2025. @datapythonista thank you for your feedback. You are doing a very nice job related to the docstrings. the validation option looks very nice. In the case of ibis ... I think that probably we could fix and organize the docstrings texts and validations soon. There are a lot of internal functions that don't need docstrings or their docstrings could be something simple. Maybe the hard work here would be define some strategies to do it. just for sharing ... I was reading today sympy's documentation about documentation style and it seems very interesting: |
Feel free to make things conform to pydocstyle, but a wholesale change to enforce this is out of scope. |
Accordingly pydocstyle, currently there are 3445 docstings issues inside the project.
I have started to fix that #1996
I will open one PR for each backend dosctring fixing
The text was updated successfully, but these errors were encountered: