Skip to content

Docstring Errors Examples

Galuh Sahid edited this page Feb 15, 2020 · 19 revisions

More info: https://pandas.io/docs/development/contributing_docstring.html#writing-a-docstring

GL01: Docstring text (summary) should start in the line immediately after the opening quotes (not in the same line, or leaving a blank line in between)

Bad

def assert_categorical_equal(
    left, right, check_dtype=True, check_category_order=True, obj="Categorical"
):
    """Test that Categoricals are equivalent.

    Parameters
    ----------
    left : Categorical
    ...
    """
    pass

Good

def assert_categorical_equal(
    left, right, check_dtype=True, check_category_order=True, obj="Categorical"
):
    """
    Test that Categoricals are equivalent.

    Parameters
    ----------
    left : Categorical
    ...
    """
    pass

Read more about summary in Contributing Docstring - Section 2: Extended Summary

GL02: Closing quotes should be placed in the line after the last text in the docstring (do not close the quotes in the same line as the text, or leave a blank line between the last text and the quotes)

Bad

def unstack():
    """
    Pivot a row index to columns.

    When using a MultiIndex, a level can be pivoted so each value in
    the index becomes a column. This is especially useful when a subindex
    is repeated for the main index, and data is easier to visualize as a
    pivot table.

    The index level will be automatically removed from the index when added
    as columns."""
    pass

Good

def unstack():
    """
    Pivot a row index to columns.

    When using a MultiIndex, a level can be pivoted so each value in
    the index becomes a column. This is especially useful when a subindex
    is repeated for the main index, and data is easier to visualize as a
    pivot table.

    The index level will be automatically removed from the index when added
    as columns.
    """
    pass

GL08: The object does not have a docstring

Bad

@property
def right(self):
    return self._data._right

Good

@property
def right(self):
    """
    Return the right endpoints of each Interval in the IntervalIndex as
    an Index.
    """
    return self._data._right

Read how to write a docstring here.

SS06: Summary should fit in a single line

Bad

def duplicated(self, subset=None, keep="first"):
    """
    Return boolean Series denoting duplicate rows, optionally only
    considering certain columns.
    """
    ...

Good

def duplicated(self, subset=None, keep="first"):
    """
    Return boolean Series denoting duplicate rows.

    Considering certain columns is optional.
    """
    ...

Read more about short summary here

ES01: No extended summary found

Bad

def unstack():
    """
    Pivot a row index to columns.
    """
    pass

Good

def unstack():
    """
    Pivot a row index to columns.

    When using a MultiIndex, a level can be pivoted so each value in
    the index becomes a column. This is especially useful when a subindex
    is repeated for the main index, and data is easier to visualize as a
    pivot table.

    The index level will be automatically removed from the index when added
    as columns.
    """
    pass

Read more about extended summary here.

PR01: Parameters {missing_params} not documented

Bad

class Series:
    def plot(self, kind, **kwargs):
        """
        Generate a plot.

        Render the data in the Series as a matplotlib plot of the
        specified kind.

        Note the blank line between the parameters title and the first
        parameter. Also, note that after the name of the parameter `kind`
        and before the colon, a space is missing.

        Also, note that the parameter descriptions do not start with a
        capital letter, and do not finish with a dot.

        Finally, the `**kwargs` parameter is missing.

        Parameters
        ----------

        kind: str
            kind of matplotlib plot
        """
        pass

Good

We need to add **kwargs** to the docstring:

class Series:
    def plot(self, kind, color='blue', **kwargs):
        """
        Generate a plot.

        Render the data in the Series as a matplotlib plot of the
        specified kind.

        Parameters
        ----------
        kind : str
            Kind of matplotlib plot.
        color : str, default 'blue'
            Color name or rgb code.
        **kwargs
            These parameters will be passed to the matplotlib plotting
            function.
        """
        pass

PR02: Unknown parameters {unknown_params}

Bad

def astype(self, dtype, copy=True, errors="raise", **kwargs):
	"""
        Cast a pandas object to a specified dtype ``dtype``.
        Parameters
        ----------
        dtype : data type, or dict of column name -> data type
            Use a numpy.dtype or Python type to cast entire pandas object to
            the same type. Alternatively, use {col: dtype, ...}, where col is a
            column label and dtype is a numpy.dtype or Python type to cast one
            or more of the DataFrame's columns to column-specific types.
        copy : bool, default True
            Return a copy when ``copy=True`` (be very careful setting
            ``copy=False`` as changes to values then may propagate to other
            pandas objects).
        errors : {'raise', 'ignore'}, default 'raise'
            Control raising of exceptions on invalid data for provided dtype.
            - ``raise`` : allow exceptions to be raised
            - ``ignore`` : suppress exceptions. On error return original object.
            .. versionadded:: 0.20.0
        kwargs : keyword arguments to pass on to the constructor
        Returns
        -------
        casted : same type as caller
	"""
	...

kwargs is not recognized as a parameter. It should be **kwargs.

Good

Change kwargs to **kwargs:

def astype(self, dtype, copy=True, errors="raise", **kwargs):
	"""
        Cast a pandas object to a specified dtype ``dtype``.
        Parameters
        ----------
        dtype : data type, or dict of column name -> data type
            Use a numpy.dtype or Python type to cast entire pandas object to
            the same type. Alternatively, use {col: dtype, ...}, where col is a
            column label and dtype is a numpy.dtype or Python type to cast one
            or more of the DataFrame's columns to column-specific types.
        copy : bool, default True
            Return a copy when ``copy=True`` (be very careful setting
            ``copy=False`` as changes to values then may propagate to other
            pandas objects).
        errors : {'raise', 'ignore'}, default 'raise'
            Control raising of exceptions on invalid data for provided dtype.
            - ``raise`` : allow exceptions to be raised
            - ``ignore`` : suppress exceptions. On error return original object.
            .. versionadded:: 0.20.0
        **kwargs : keyword arguments to pass on to the constructor
        Returns
        -------
        casted : same type as caller
	"""
	...

PR06: Parameter "{param_name}" type should use "{right_type}" instead of "{wrong_type}"

Bad

The code below would output an error "Parameter 'path' type should use 'str' instead of 'string'.

def read_spss(
    path: Union[str, Path],
    usecols: Optional[Sequence[str]] = None,
    convert_categoricals: bool = True,
) -> DataFrame:
    """
    Load an SPSS file from the file path, returning a DataFrame.

    .. versionadded:: 0.25.0

    Parameters
    ----------
    path : string or Path
        File path.
    usecols : list-like, optional
        Return a subset of the columns. If None, return all columns.
    convert_categoricals : bool, default is True
        Convert categorical columns into pd.Categorical.

    Returns
    -------
    DataFrame
    """

Good

def read_spss(
    path: Union[str, Path],
    usecols: Optional[Sequence[str]] = None,
    convert_categoricals: bool = True,
) -> DataFrame:
    """
    Load an SPSS file from the file path, returning a DataFrame.

    .. versionadded:: 0.25.0

    Parameters
    ----------
    path : str or Path
        File path.
    usecols : list-like, optional
        Return a subset of the columns. If None, return all columns.
    convert_categoricals : bool, default is True
        Convert categorical columns into pd.Categorical.

    Returns
    -------
    DataFrame
    """

PR07: Parameter "{param_name}" has no description

In the example below, the parameter axis is missing a description:

Bad

def _get_counts_nanvar(
    value_counts: Tuple[int],
    mask: Optional[np.ndarray],
    axis: Optional[int],
    ddof: int,
    dtype=float,
) -> Tuple[Union[int, np.ndarray], Union[int, np.ndarray]]:
    """ Get the count of non-null values along an axis, accounting
    for degrees of freedom.

    Parameters
    ----------
    values_shape : Tuple[int]
        shape tuple from values ndarray, used if mask is None
    mask : Optional[ndarray[bool]]
        locations in values that should be considered missing
    axis : Optional[int]
    ddof : int
        degrees of freedom
    dtype : type, optional
        type to use for count

    Returns
    -------
    count : scalar or array
    d : scalar or array
    """

Good

def _get_counts_nanvar(
    value_counts: Tuple[int],
    mask: Optional[np.ndarray],
    axis: Optional[int],
    ddof: int,
    dtype=float,
) -> Tuple[Union[int, np.ndarray], Union[int, np.ndarray]]:
    """ Get the count of non-null values along an axis, accounting
    for degrees of freedom.

    Parameters
    ----------
    values_shape : Tuple[int]
        shape tuple from values ndarray, used if mask is None
    mask : Optional[ndarray[bool]]
        locations in values that should be considered missing
    axis : Optional[int]
        axis to count along
    ddof : int
        degrees of freedom
    dtype : type, optional
        type to use for count

    Returns
    -------
    count : scalar or array
    d : scalar or array
    """

PR08: Parameter "{param_name}" description should start with a "capital letter"

Bad

The description of the parameter axis does not start with a capital letter:

def take_nd(
    arr, indexer, axis: int = 0, out=None, fill_value=np.nan, allow_fill: bool = True
):
    """
    Specialized Cython take which sets NaN values in one pass

    This dispatches to ``take`` defined on ExtensionArrays. It does not
    currently dispatch to ``SparseArray.take`` for sparse ``arr``.

    Parameters
    ----------
    arr : array-like
        Input array.
    indexer : ndarray
        1-D array of indices to take, subarrays corresponding to -1 value
        indices are filed with fill_value
    axis : int, default 0
        axis to take from
    out : ndarray or None, default None
        Optional output array, must be appropriate type to hold input and
        fill_value together, if indexer has any -1 value entries; call
        maybe_promote to determine this type for any fill_value
    fill_value : any, default np.nan
        Fill value to replace -1 values with
    allow_fill : boolean, default True
        If False, indexer is assumed to contain no -1 values so no filling
        will be done.  This short-circuits computation of a mask.  Result is
        undefined if allow_fill == False and -1 is present in indexer.

    Returns
    -------
    subarray : array-like
        May be the same type as the input, or cast to an ndarray.
    """

Good

def take_nd(
    arr, indexer, axis: int = 0, out=None, fill_value=np.nan, allow_fill: bool = True
):
    """
    Specialized Cython take which sets NaN values in one pass

    This dispatches to ``take`` defined on ExtensionArrays. It does not
    currently dispatch to ``SparseArray.take`` for sparse ``arr``.

    Parameters
    ----------
    arr : array-like
        Input array.
    indexer : ndarray
        1-D array of indices to take, subarrays corresponding to -1 value
        indices are filed with fill_value
    axis : int, default 0
        Axis to take from
    out : ndarray or None, default None
        Optional output array, must be appropriate type to hold input and
        fill_value together, if indexer has any -1 value entries; call
        maybe_promote to determine this type for any fill_value
    fill_value : any, default np.nan
        Fill value to replace -1 values with
    allow_fill : boolean, default True
        If False, indexer is assumed to contain no -1 values so no filling
        will be done.  This short-circuits computation of a mask.  Result is
        undefined if allow_fill == False and -1 is present in indexer.

    Returns
    -------
    subarray : array-like
        May be the same type as the input, or cast to an ndarray.
    """

PR09: Parameter description should finish with "."

Bad

The description of the parameter axis does not finish with ".":

def cumsum(self, axis=0, *args, **kwargs):
        """
        Cumulative sum of non-NA/null values.

        When performing the cumulative summation, any non-NA/null values will
        be skipped. The resulting SparseArray will preserve the locations of
        NaN values, but the fill value will be `np.nan` regardless.

        Parameters
        ----------
        axis : int or None
            Axis over which to perform the cumulative summation. If None,
            perform cumulative summation over flattened array

        Returns
        -------
        cumsum : SparseArray
        """

Good

def cumsum(self, axis=0, *args, **kwargs):
        """
        Cumulative sum of non-NA/null values.

        When performing the cumulative summation, any non-NA/null values will
        be skipped. The resulting SparseArray will preserve the locations of
        NaN values, but the fill value will be `np.nan` regardless.

        Parameters
        ----------
        axis : int or None
            Axis over which to perform the cumulative summation. If None,
            perform cumulative summation over flattened array.

        Returns
        -------
        cumsum : SparseArray
        """

Possible false positives

Parameters ending with deprecated

    truediv : bool, optional
        Whether to use true division, like in Python >= 3.
        deprecated:: 1.0.0

RT02: The first line of the Returns section should contain only the type, unless multiple values are being returned

Bad

The first line of the Returns section should contain only the type:

def is_overlapping(self):
        """
        Return True if the IntervalIndex has overlapping intervals, else False.

        Two intervals overlap if they share a common point, including closed
        endpoints. Intervals that only have an open endpoint in common do not
        overlap.

        .. versionadded:: 0.24.0

        Returns
        -------
        bool : Boolean indicating if the IntervalIndex has overlapping intervals.

Good

def is_overlapping(self):
        """
        Return True if the IntervalIndex has overlapping intervals, else False.

        Two intervals overlap if they share a common point, including closed
        endpoints. Intervals that only have an open endpoint in common do not
        overlap.

        .. versionadded:: 0.24.0

        Returns
        -------
        bool
            Boolean indicating if the IntervalIndex has overlapping intervals.

RT03: Return value has no description

Bad

def is_overlapping(self):
        """
        Return True if the IntervalIndex has overlapping intervals, else False.

        Two intervals overlap if they share a common point, including closed
        endpoints. Intervals that only have an open endpoint in common do not
        overlap.

        .. versionadded:: 0.24.0

        Returns
        -------
        bool

Good

def is_overlapping(self):
        """
        Return True if the IntervalIndex has overlapping intervals, else False.

        Two intervals overlap if they share a common point, including closed
        endpoints. Intervals that only have an open endpoint in common do not
        overlap.

        .. versionadded:: 0.24.0

        Returns
        -------
        bool
            Boolean indicating if the IntervalIndex has overlapping intervals.

YD01: No Yields section found

Bad

def __iter__(self):
        """
        Return an iterator over the boxed values
        """
        ...
        for i in range(chunks):
            start_i = i * chunksize
            end_i = min((i + 1) * chunksize, length)
            converted = tslib.ints_to_pydatetime(
                data[start_i:end_i], tz=self.tz, freq=self.freq, box="timestamp"
            )
            for v in converted:
                yield v

Good

def __iter__(self):
        """
        Return an iterator over the boxed values

        Yields
        ------
        tstamp : Timestamp
        """
        ...
        for i in range(chunks):
            start_i = i * chunksize
            end_i = min((i + 1) * chunksize, length)
            converted = tslib.ints_to_pydatetime(
                data[start_i:end_i], tz=self.tz, freq=self.freq, box="timestamp"
            )
            for v in converted:
                yield v

SA04: Missing description for see also

Bad

def mean(self, skipna=True):
        """
        Return the mean value of the Array.

        .. versionadded:: 0.25.0

        Parameters
        ----------
        skipna : bool, default True
            Whether to ignore any NaT elements.

        Returns
        -------
        scalar
            Timestamp or Timedelta.

        See Also
        --------
        numpy.ndarray.mean
        Series.mean

        Notes
        -----
        mean is only defined for Datetime and Timedelta dtypes, not for Period.
        """

Good

def mean(self, skipna=True):
        """
        Return the mean value of the Array.

        .. versionadded:: 0.25.0

        Parameters
        ----------
        skipna : bool, default True
            Whether to ignore any NaT elements.

        Returns
        -------
        scalar
            Timestamp or Timedelta.

        See Also
        --------
        numpy.ndarray.mean : Returns the average of array elements along a given axis.
        Series.mean : Return the mean value in a Series.

        Notes
        -----
        mean is only defined for Datetime and Timedelta dtypes, not for Period.
        """

EX02: Examples do not pass tests:\n{doctest_log}

To see exactly which test fails, you can run:

python scripts/validate_docstrings.py pandas.Series.str.split

For example you'll see that one of the failed tests is:

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 50, in pandas.Series.str.split
Failed example:
    s = pd.Series(["this is a regular sentence",
                   "https://docs.python.org/3/tutorial/index.html",
                   np.nan])
Expected:
    0                       this is a regular sentence
    1    https://docs.python.org/3/tutorial/index.html
    2                                              NaN
    dtype: object
Got nothing

When you check the docs, it looks like this:

    Examples
    --------
    >>> s = pd.Series(["this is a regular sentence",
    ...                "https://docs.python.org/3/tutorial/index.html",
    ...                np.nan])
    0                       this is a regular sentence
    1    https://docs.python.org/3/tutorial/index.html
    2                                              NaN
    dtype: object

This docs does not pass the test because, the way it's written, it seems that it is expecting

    0                       this is a regular sentence
    1    https://docs.python.org/3/tutorial/index.html
    2                                              NaN
    dtype: object

When we run

    s = pd.Series(["this is a regular sentence",
                    "https://docs.python.org/3/tutorial/index.html",
                    np.nan])

Which is false.

For the code to give us the expected output, we need to run s separately, by doing

>>> s

So the new doc should look like this:

    Examples
    --------
    >>> s = pd.Series(["this is a regular sentence",
    ...                "https://docs.python.org/3/tutorial/index.html",
    ...                np.nan])

    >>> s
    0                       this is a regular sentence
    1    https://docs.python.org/3/tutorial/index.html
    2                                              NaN
    dtype: object

Possible false positives

Line 36, in pandas.Series.plot.line
Failed example:
    s.plot.line()
Expected nothing
Got:
    <matplotlib.axes._subplots.AxesSubplot object at 0x10ff55ad0>

EX03: flake8 error: {error_code} {error_message}{times_happening}

Say that you run:

python scripts/validate_docstrings.py pandas.Series.plot.line

Output:

################################################################################
################################## Validation ##################################
################################################################################

3 Errors found:
        Examples do not pass tests
        flake8 error: E121 continuation line under-indented for hanging indent
        flake8 error: E123 closing bracket does not match indentation of opening bracket's line

There are two errors here:

  1. flake8 error: E121 continuation line under-indented for hanging indent In pandas.Series.plot.line, this is the one in:
...
            The following example shows the populations for some animals
            over the years.

            >>> df = pd.DataFrame({
            ...    'pig': [20, 18, 489, 675, 1776], # CHANGE HERE - should be 4 spaces instead of 3 spaces
            ...    'horse': [4, 25, 281, 600, 1900] # CHANGE HERE - should be 4 spaces instead of 3 spaces
            ...    }, index=[1990, 1997, 2003, 2009, 2014])
            >>> lines = df.plot.line()
...

It should be:

...
            The following example shows the populations for some animals
            over the years.

            >>> df = pd.DataFrame({
            ...     'pig': [20, 18, 489, 675, 1776], # CHANGE HERE - should be 4 spaces instead of 3 spaces
            ...     'horse': [4, 25, 281, 600, 1900] # CHANGE HERE - should be 4 spaces instead of 3 spaces
            ...    }, index=[1990, 1997, 2003, 2009, 2014])
            >>> lines = df.plot.line()
...
  1. flake8 error: E123 closing bracket does not match indentation of opening bracket's line In pandas.Series.plot.line, this is the one in:
...
            The following example shows the populations for some animals
            over the years.

            >>> df = pd.DataFrame({
            ...     'pig': [20, 18, 489, 675, 1776],
            ...     'horse': [4, 25, 281, 600, 1900]
            ...    }, index=[1990, 1997, 2003, 2009, 2014]) # CHANGE HERE
            >>> lines = df.plot.line()
...

It should be:

...
           The following example shows the populations for some animals
            over the years.

            >>> df = pd.DataFrame({
            ...     'pig': [20, 18, 489, 675, 1776],
            ...     'horse': [4, 25, 281, 600, 1900]
            ... }, index=[1990, 1997, 2003, 2009, 2014]) # CHANGE HERE - bracket should match the same indentation level of the line that their opening bracket started on.
            >>> lines = df.plot.line()
...

Tip use https://www.flake8rules.com/

Check if you have corrected the errors correctly by re-running:

python scripts/validate_docstrings.py pandas.Series.plot.line

Output should be:

################################################################################
################################## Validation ##################################
################################################################################

1 Errors found:
        Examples do not pass tests

Only 1 error left, and this is not related to flake8.

SA01: See Also section not found

Bad

def argsort(
        self, ascending: bool = True, kind: str = "quicksort", *args, **kwargs
    ) -> np.ndarray:
        """
        Return the indices that would sort this array.

        Parameters
        ----------
        ascending : bool, default True
            Whether the indices should result in an ascending
            or descending sort.
        kind : {'quicksort', 'mergesort', 'heapsort'}, optional
            Sorting algorithm.
        *args, **kwargs:
            passed through to :func:`numpy.argsort`.

        Returns
        -------
        ndarray
            Array of indices that sort ``self``. If NaN values are contained,
            NaN values are placed at the end.
        """

Good

def argsort(
        self, ascending: bool = True, kind: str = "quicksort", *args, **kwargs
    ) -> np.ndarray:
        """
        Return the indices that would sort this array.

        Parameters
        ----------
        ascending : bool, default True
            Whether the indices should result in an ascending
            or descending sort.
        kind : {'quicksort', 'mergesort', 'heapsort'}, optional
            Sorting algorithm.
        *args, **kwargs:
            passed through to :func:`numpy.argsort`.

        Returns
        -------
        ndarray
            Array of indices that sort ``self``. If NaN values are contained,
            NaN values are placed at the end.

        See Also
        --------
        numpy.argsort : Sorting implementation used internally.
        """

SA04: Missing description for See Also "{reference_name}" reference

Bad

def isna(self):
        """
        Detect missing values

        Missing values (-1 in .codes) are detected.

        Returns
        -------
        a boolean array of whether my values are null

        See Also
        --------
        isna
        isnull
        Categorical.notna

        """

Good

def isna(self):
        """
        Detect missing values

        Missing values (-1 in .codes) are detected.

        Returns
        -------
        a boolean array of whether my values are null

        See Also
        --------
        isna : Top-level isna.
        isnull : Alias of isna.
        Categorical.notna : Boolean inverse of Categorical.isna.

        """

EX01: No examples section found

Bad

@staticmethod
    def _run_os(*args):
        """
        Execute a command as a OS terminal.

        Parameters
        ----------
        *args : list of str
            Command and parameters to be executed
        """

Good

@staticmethod
    def _run_os(*args):
        """
        Execute a command as a OS terminal.

        Parameters
        ----------
        *args : list of str
            Command and parameters to be executed

        Examples
        --------
        >>> DocBuilder()._run_os('python', '--version')
        """
Clone this wiki locally