Skip to content

Docstring Errors Examples

Galuh Sahid edited this page Feb 10, 2020 · 19 revisions

GL01: Docstring text (summary) should start in the line immediately after the opening quotes (not in the same line, or leaving a blank line in between)

Bad

def assert_categorical_equal(
    left, right, check_dtype=True, check_category_order=True, obj="Categorical"
):
    """Test that Categoricals are equivalent.

    Parameters
    ----------
    left : Categorical
    ...
    """
    pass

Good

def assert_categorical_equal(
    left, right, check_dtype=True, check_category_order=True, obj="Categorical"
):
    """
    Test that Categoricals are equivalent.

    Parameters
    ----------
    left : Categorical
    ...
    """
    pass

Read more about summary in Contributing Docstring - Section 2: Extended Summary

GL02: Closing quotes should be placed in the line after the last text in the docstring (do not close the quotes in the same line as the text, or leave a blank line between the last text and the quotes)

Bad

def unstack():
    """
    Pivot a row index to columns.

    When using a MultiIndex, a level can be pivoted so each value in
    the index becomes a column. This is especially useful when a subindex
    is repeated for the main index, and data is easier to visualize as a
    pivot table.

    The index level will be automatically removed from the index when added
    as columns."""
    pass

Good

def unstack():
    """
    Pivot a row index to columns.

    When using a MultiIndex, a level can be pivoted so each value in
    the index becomes a column. This is especially useful when a subindex
    is repeated for the main index, and data is easier to visualize as a
    pivot table.

    The index level will be automatically removed from the index when added
    as columns.
    """
    pass

GL08: The object does not have a docstring

Bad

@property
def right(self):
    return self._data._right

Good

@property
def right(self):
    """
    Return the right endpoints of each Interval in the IntervalIndex as
    an Index.
    """
    return self._data._right

Read how to write a docstring here.

SS06: Summary should fit in a single line

Bad

def duplicated(self, subset=None, keep="first"):
    """
    Return boolean Series denoting duplicate rows, optionally only
    considering certain columns.
    """
    ...

Good

def duplicated(self, subset=None, keep="first"):
    """
    Return boolean Series denoting duplicate rows.

    Considering certain columns is optional.
    """
    ...

Rewriting the docstring might be required.

ES01: No extended summary found

Bad

def unstack():
    """
    Pivot a row index to columns.
    """
    pass

Good

def unstack():
    """
    Pivot a row index to columns.

    When using a MultiIndex, a level can be pivoted so each value in
    the index becomes a column. This is especially useful when a subindex
    is repeated for the main index, and data is easier to visualize as a
    pivot table.

    The index level will be automatically removed from the index when added
    as columns.
    """
    pass

Read more about extended summary here.

PR01: Parameters {missing_params} not documented

Bad

class Series:
    def plot(self, kind, **kwargs):
        """
        Generate a plot.

        Render the data in the Series as a matplotlib plot of the
        specified kind.

        Note the blank line between the parameters title and the first
        parameter. Also, note that after the name of the parameter `kind`
        and before the colon, a space is missing.

        Also, note that the parameter descriptions do not start with a
        capital letter, and do not finish with a dot.

        Finally, the `**kwargs` parameter is missing.

        Parameters
        ----------

        kind: str
            kind of matplotlib plot
        """
        pass

Good

We need to add **kwargs** to the docstring:

class Series:
    def plot(self, kind, color='blue', **kwargs):
        """
        Generate a plot.

        Render the data in the Series as a matplotlib plot of the
        specified kind.

        Parameters
        ----------
        kind : str
            Kind of matplotlib plot.
        color : str, default 'blue'
            Color name or rgb code.
        **kwargs
            These parameters will be passed to the matplotlib plotting
            function.
        """
        pass

PR02: Unknown parameters {unknown_params}

Bad

def astype(self, dtype, copy=True, errors="raise", **kwargs):
	"""
        Cast a pandas object to a specified dtype ``dtype``.
        Parameters
        ----------
        dtype : data type, or dict of column name -> data type
            Use a numpy.dtype or Python type to cast entire pandas object to
            the same type. Alternatively, use {col: dtype, ...}, where col is a
            column label and dtype is a numpy.dtype or Python type to cast one
            or more of the DataFrame's columns to column-specific types.
        copy : bool, default True
            Return a copy when ``copy=True`` (be very careful setting
            ``copy=False`` as changes to values then may propagate to other
            pandas objects).
        errors : {'raise', 'ignore'}, default 'raise'
            Control raising of exceptions on invalid data for provided dtype.
            - ``raise`` : allow exceptions to be raised
            - ``ignore`` : suppress exceptions. On error return original object.
            .. versionadded:: 0.20.0
        kwargs : keyword arguments to pass on to the constructor
        Returns
        -------
        casted : same type as caller
	"""
	...

kwargs is not recognized as a parameter. It should be **kwargs.

Good

Change kwargs to **kwargs:

def astype(self, dtype, copy=True, errors="raise", **kwargs):
	"""
        Cast a pandas object to a specified dtype ``dtype``.
        Parameters
        ----------
        dtype : data type, or dict of column name -> data type
            Use a numpy.dtype or Python type to cast entire pandas object to
            the same type. Alternatively, use {col: dtype, ...}, where col is a
            column label and dtype is a numpy.dtype or Python type to cast one
            or more of the DataFrame's columns to column-specific types.
        copy : bool, default True
            Return a copy when ``copy=True`` (be very careful setting
            ``copy=False`` as changes to values then may propagate to other
            pandas objects).
        errors : {'raise', 'ignore'}, default 'raise'
            Control raising of exceptions on invalid data for provided dtype.
            - ``raise`` : allow exceptions to be raised
            - ``ignore`` : suppress exceptions. On error return original object.
            .. versionadded:: 0.20.0
        **kwargs : keyword arguments to pass on to the constructor
        Returns
        -------
        casted : same type as caller
	"""
	...

PR06: Parameter "{param_name}" type should use "{right_type}" instead of "{wrong_type}"

Bad

The code below would output an error "Parameter 'path' type should use 'str' instead of 'string'.

def read_spss(
    path: Union[str, Path],
    usecols: Optional[Sequence[str]] = None,
    convert_categoricals: bool = True,
) -> DataFrame:
    """
    Load an SPSS file from the file path, returning a DataFrame.

    .. versionadded:: 0.25.0

    Parameters
    ----------
    path : string or Path
        File path.
    usecols : list-like, optional
        Return a subset of the columns. If None, return all columns.
    convert_categoricals : bool, default is True
        Convert categorical columns into pd.Categorical.

    Returns
    -------
    DataFrame
    """

Good

def read_spss(
    path: Union[str, Path],
    usecols: Optional[Sequence[str]] = None,
    convert_categoricals: bool = True,
) -> DataFrame:
    """
    Load an SPSS file from the file path, returning a DataFrame.

    .. versionadded:: 0.25.0

    Parameters
    ----------
    path : str or Path
        File path.
    usecols : list-like, optional
        Return a subset of the columns. If None, return all columns.
    convert_categoricals : bool, default is True
        Convert categorical columns into pd.Categorical.

    Returns
    -------
    DataFrame
    """

PR07:

In the example below, the parameter axis is missing a description:

Bad

def _get_counts_nanvar(
    value_counts: Tuple[int],
    mask: Optional[np.ndarray],
    axis: Optional[int],
    ddof: int,
    dtype=float,
) -> Tuple[Union[int, np.ndarray], Union[int, np.ndarray]]:
    """ Get the count of non-null values along an axis, accounting
    for degrees of freedom.

    Parameters
    ----------
    values_shape : Tuple[int]
        shape tuple from values ndarray, used if mask is None
    mask : Optional[ndarray[bool]]
        locations in values that should be considered missing
    axis : Optional[int]
    ddof : int
        degrees of freedom
    dtype : type, optional
        type to use for count

    Returns
    -------
    count : scalar or array
    d : scalar or array
    """

Good

def _get_counts_nanvar(
    value_counts: Tuple[int],
    mask: Optional[np.ndarray],
    axis: Optional[int],
    ddof: int,
    dtype=float,
) -> Tuple[Union[int, np.ndarray], Union[int, np.ndarray]]:
    """ Get the count of non-null values along an axis, accounting
    for degrees of freedom.

    Parameters
    ----------
    values_shape : Tuple[int]
        shape tuple from values ndarray, used if mask is None
    mask : Optional[ndarray[bool]]
        locations in values that should be considered missing
    axis : Optional[int]
        axis to count along
    ddof : int
        degrees of freedom
    dtype : type, optional
        type to use for count

    Returns
    -------
    count : scalar or array
    d : scalar or array
    """

PR08: Parameter "{param_name}" description should start with a "capital letter"

Bad

The description of the parameter axis does not start with a capital letter:

def take_nd(
    arr, indexer, axis: int = 0, out=None, fill_value=np.nan, allow_fill: bool = True
):
    """
    Specialized Cython take which sets NaN values in one pass

    This dispatches to ``take`` defined on ExtensionArrays. It does not
    currently dispatch to ``SparseArray.take`` for sparse ``arr``.

    Parameters
    ----------
    arr : array-like
        Input array.
    indexer : ndarray
        1-D array of indices to take, subarrays corresponding to -1 value
        indices are filed with fill_value
    axis : int, default 0
        axis to take from
    out : ndarray or None, default None
        Optional output array, must be appropriate type to hold input and
        fill_value together, if indexer has any -1 value entries; call
        maybe_promote to determine this type for any fill_value
    fill_value : any, default np.nan
        Fill value to replace -1 values with
    allow_fill : boolean, default True
        If False, indexer is assumed to contain no -1 values so no filling
        will be done.  This short-circuits computation of a mask.  Result is
        undefined if allow_fill == False and -1 is present in indexer.

    Returns
    -------
    subarray : array-like
        May be the same type as the input, or cast to an ndarray.
    """

Good

def take_nd(
    arr, indexer, axis: int = 0, out=None, fill_value=np.nan, allow_fill: bool = True
):
    """
    Specialized Cython take which sets NaN values in one pass

    This dispatches to ``take`` defined on ExtensionArrays. It does not
    currently dispatch to ``SparseArray.take`` for sparse ``arr``.

    Parameters
    ----------
    arr : array-like
        Input array.
    indexer : ndarray
        1-D array of indices to take, subarrays corresponding to -1 value
        indices are filed with fill_value
    axis : int, default 0
        Axis to take from
    out : ndarray or None, default None
        Optional output array, must be appropriate type to hold input and
        fill_value together, if indexer has any -1 value entries; call
        maybe_promote to determine this type for any fill_value
    fill_value : any, default np.nan
        Fill value to replace -1 values with
    allow_fill : boolean, default True
        If False, indexer is assumed to contain no -1 values so no filling
        will be done.  This short-circuits computation of a mask.  Result is
        undefined if allow_fill == False and -1 is present in indexer.

    Returns
    -------
    subarray : array-like
        May be the same type as the input, or cast to an ndarray.
    """

PR09: Parameter description should finish with "."

Bad

The description of the parameter axis does not finish with ".":

def cumsum(self, axis=0, *args, **kwargs):
        """
        Cumulative sum of non-NA/null values.

        When performing the cumulative summation, any non-NA/null values will
        be skipped. The resulting SparseArray will preserve the locations of
        NaN values, but the fill value will be `np.nan` regardless.

        Parameters
        ----------
        axis : int or None
            Axis over which to perform the cumulative summation. If None,
            perform cumulative summation over flattened array

        Returns
        -------
        cumsum : SparseArray
        """

Good

def cumsum(self, axis=0, *args, **kwargs):
        """
        Cumulative sum of non-NA/null values.

        When performing the cumulative summation, any non-NA/null values will
        be skipped. The resulting SparseArray will preserve the locations of
        NaN values, but the fill value will be `np.nan` regardless.

        Parameters
        ----------
        axis : int or None
            Axis over which to perform the cumulative summation. If None,
            perform cumulative summation over flattened array.

        Returns
        -------
        cumsum : SparseArray
        """

Possible false positives

Parameters ending with deprecated

    truediv : bool, optional
        Whether to use true division, like in Python >= 3.
        deprecated:: 1.0.0

RT02: The first line of the Returns section should contain only the type, unless multiple values are being returned

Bad

The first line of the Returns section should contain only the type:

def is_overlapping(self):
        """
        Return True if the IntervalIndex has overlapping intervals, else False.

        Two intervals overlap if they share a common point, including closed
        endpoints. Intervals that only have an open endpoint in common do not
        overlap.

        .. versionadded:: 0.24.0

        Returns
        -------
        bool : Boolean indicating if the IntervalIndex has overlapping intervals.

Good

def is_overlapping(self):
        """
        Return True if the IntervalIndex has overlapping intervals, else False.

        Two intervals overlap if they share a common point, including closed
        endpoints. Intervals that only have an open endpoint in common do not
        overlap.

        .. versionadded:: 0.24.0

        Returns
        -------
        bool
            Boolean indicating if the IntervalIndex has overlapping intervals.

RT03

Bad

def is_overlapping(self):
        """
        Return True if the IntervalIndex has overlapping intervals, else False.

        Two intervals overlap if they share a common point, including closed
        endpoints. Intervals that only have an open endpoint in common do not
        overlap.

        .. versionadded:: 0.24.0

        Returns
        -------
        bool

Good

def is_overlapping(self):
        """
        Return True if the IntervalIndex has overlapping intervals, else False.

        Two intervals overlap if they share a common point, including closed
        endpoints. Intervals that only have an open endpoint in common do not
        overlap.

        .. versionadded:: 0.24.0

        Returns
        -------
        bool
            Boolean indicating if the IntervalIndex has overlapping intervals.

YD01: No Yields section found

Bad

def __iter__(self):
        """
        Return an iterator over the boxed values
        """
        ...
        for i in range(chunks):
            start_i = i * chunksize
            end_i = min((i + 1) * chunksize, length)
            converted = tslib.ints_to_pydatetime(
                data[start_i:end_i], tz=self.tz, freq=self.freq, box="timestamp"
            )
            for v in converted:
                yield v

Good

def __iter__(self):
        """
        Return an iterator over the boxed values

        Yields
        ------
        tstamp : Timestamp
        """
        ...
        for i in range(chunks):
            start_i = i * chunksize
            end_i = min((i + 1) * chunksize, length)
            converted = tslib.ints_to_pydatetime(
                data[start_i:end_i], tz=self.tz, freq=self.freq, box="timestamp"
            )
            for v in converted:
                yield v

SA04: Missing description for see also

Bad

def mean(self, skipna=True):
        """
        Return the mean value of the Array.

        .. versionadded:: 0.25.0

        Parameters
        ----------
        skipna : bool, default True
            Whether to ignore any NaT elements.

        Returns
        -------
        scalar
            Timestamp or Timedelta.

        See Also
        --------
        numpy.ndarray.mean
        Series.mean

        Notes
        -----
        mean is only defined for Datetime and Timedelta dtypes, not for Period.
        """

Good

def mean(self, skipna=True):
        """
        Return the mean value of the Array.

        .. versionadded:: 0.25.0

        Parameters
        ----------
        skipna : bool, default True
            Whether to ignore any NaT elements.

        Returns
        -------
        scalar
            Timestamp or Timedelta.

        See Also
        --------
        numpy.ndarray.mean : Returns the average of array elements along a given axis.
        Series.mean : Return the mean value in a Series.

        Notes
        -----
        mean is only defined for Datetime and Timedelta dtypes, not for Period.
        """

EX02: Examples do not pass tests:\n{doctest_log}

EX03: flake8 error: {error_code} {error_message}{times_happening}

Clone this wiki locally