Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Fixed PySpark docstrings and changed some public function to private. #2018

Closed
wants to merge 2 commits into from

Conversation

xmnlab
Copy link
Contributor

@xmnlab xmnlab commented Oct 31, 2019

This PR is related to issue #2011

  • public functions that don't need to be public inside the PySpark backend API changed to private
  • Added some docstrings for public functions/methods.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very inconsistent with the rest of ibis

can you explain the purpose

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

hey @jreback thanks for review that.

I am trying to improving the CI and review experience fixing docstrings and adding docstring checking for more backends.

My idea is working into 2 phases: 1. fix in a simple way the docstring issues and 2nd. improve and standardize (as much as possible) the docstrings.

the 1st step is important because after this fixing every new contributions will automatically be checked the docstrings on CI.

for the phase 2, we need to discuss best practices and create a document specifying the ibis documentation guideline.

more specific related to this PR:

  • as far as I could understood .. the public functions I changed to private is not actually a public function on ibis API so it doesn't need to be public.
  • if we should keep that as public ... so every function should be very well documented.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2019

@xmnlab pls open an issue for discussion first

i don’t agree with 2) at all

you are substantially changing the structure compared to existing code

pls compare versus all the other backends

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

sure I will create an issue for that .. thanks!

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 31, 2019

actually I will start a discussion inside the same issue I created.

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 6, 2019

@jreback not sure if it resolves the issues related to avoid replication of some parts of the docstrings: https://gist.github.com/xmnlab/6889541d616ab4f6388749fc13ff9dfe

it is a template for docstring.

let me know your thoughts :)

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 7, 2019

hey @jreback I was talking to @scopatz about the docstring stuff ... and I think I understood now your point.

As I can understand now .. it is not related to copy paste the docstring (parameters and return) among the functions into the same module .. but actually it is related to duplicating the docstring between the general functions like the API and the backends, is that correct?

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 7, 2019

if this assumption is correct .. maybe we can use the same approach used on matplotlib (eg. https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/pyplot.py#L173)

@jreback
Copy link
Contributor

jreback commented Nov 8, 2019

@xmnlab it’s just not worth time / effort to do this in private functions that have essentially the same args

we have a style currently - would prefer not to change this w/o really good reasons and discussion

there are plenty of open issues that would be IMHO priority to changing our pretty good style and doc strings.

if you are looking for something to work on the docs themselves would be a pretty good place to start. for example they are heavily impala focused which is a pretty obsolete backend

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 8, 2019

hey @jreback I understand your point of view.

@xmnlab it’s just not worth time / effort to do this in private functions that have essentially the same args

yes .. that is why I changed the functions to private. it was not just to skip docstring checking .. it was also to say explicitly that this functions are not used directly as public on API. I didn't have any intention to create inconsistency or create any confusion. sorry about that.

just to be clear .. I don't worry to close this PR :)

my only concern (and reason for bring up this topic) is some docstrings are incomplete and we need to spend more time during the coding or reviewing to understand the code.

also for new users ... the API documentation is not clear in some points .. such as the type of the parameters or the data returned ... or the exceptions that it could be raised or how to use that.

as we read more code than write code .. maybe improving the experience reading code would be good.

just to use pyspark as an example .. it is not clear the parameter type and the return type for ibis.pyspark.connect:

>>> ibis.pyspark.connect?
Signature: ibis.pyspark.connect(session)
Docstring:
Create a `SparkClient` for use with Ibis. Pipes **kwargs into SparkClient,
which pipes them into SparkContext. See documentation for SparkContext:
https://spark.apache.org/docs/latest/api/python/_modules/pyspark/context.html#SparkContext

so my point is how would be a better way to ensure API public functions to have a better documentation?

could a PR template with a check list help? could an improvement of the guideline for contributions help? current relate to the doctest we just have We use numpydoc as our standard format for docstrings. (PS: in the code there are some few docstring that use another standard)

I created an issue related to the documentation #2025

I would be more than happy to help improving the experience for contributors, reviewers and user :)

@jreback
Copy link
Contributor

jreback commented Mar 12, 2020

closing this. these doc-strings need more discussion if we want to do this at all.

@jreback jreback closed this Mar 12, 2020
@xmnlab xmnlab deleted the fix-pyspark-docstring branch December 21, 2020 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants