Skip to content

Commit

Permalink
v1.9.1-rc0 (#352)
Browse files Browse the repository at this point in the history
Co-authored-by: rtosholdings-bot <rtosholdings-bot@sig.com>
  • Loading branch information
OrestZborowski-SIG and rtosholdings-bot authored Jun 22, 2023
1 parent 15a13a5 commit 54badde
Show file tree
Hide file tree
Showing 9 changed files with 843 additions and 96 deletions.
45 changes: 27 additions & 18 deletions docs/source/tutorial/tutorial_missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,42 +117,49 @@ negative ``inf``), so it can be good to check for them::
inf

For Datasets, ``mask_and_isnan()`` and ``mask_or_isnan()`` each return a
FastArray of Booleans with a value for each row: - ``mask_and_isnan()``
returns True for each row in which every value is ``NaN``::
FastArray of Booleans with a value for each row.

``mask_and_isnan()`` returns True for each row in which every value is ``NaN``::

>>> ds.mask_and_isnan()
FastArray([ True, False, False])

- ``mask_or_isnan()`` returns True for each row in which at least one
value is ``NaN``::
``mask_or_isnan()`` returns True for each row in which at least one value is ``NaN``::

ds.mask_or_isnan()
FastArray([ True, False, False])

Merging with Missing Values
---------------------------

As with Python and NumPy, ``rt.nan != rt.nan``. That means that merge
functions do not treat ``NaN`` keys as equal values.
Missing values are not equivalent::

>>> rt.nan == rt.nan
False

This is true for integer invalid values, string invalid values, filtered values of a
Categorical, etc. That means that merge functions do not treat invalid keys as equal
values.

The following Datasets each have a ``NaN`` in their key column::
For example, these two Datasets each have an invalid floating-point value in the Key
column::

>>> ds1 = rt.Dataset({'Key': [1.0, rt.nan, 2.0,],
... 'Value1': [1.0, 2.0, 3.0]})
>>> ds1 = rt.Dataset({'Key': [1.0, rt.nan, 2.0],
... 'Value1': ['a', 'b', 'c']})
>>> ds2 = rt.Dataset({'Key': [1.0, 2.0, rt.nan],
... 'Value2': [1.0, 2.0, 3.0]})
... 'Value2': [1, 2, 3]})

Now we do a ``merge_lookup()`` on the Key columns::

>>> ds1.merge_lookup(ds2, on='Key', columns_right='Value2')
>>> ds1.merge_lookup(ds2, on='Key')
# Key Value1 Value2
- ---- ------ ------
0 1.00 1.00 1.00
1 nan 2.00 nan
2 2.00 3.00 2.00
0 1.00 a 1
1 nan b Inv
2 2.00 c 2

The ``NaN`` key and its associated value in ``ds2`` were ignored by the
merge function.
The ``NaN`` key and its associated value in ``ds2`` were ignored, and the invalid
integer value was filled in.

Replacing Missing Values
------------------------
Expand Down Expand Up @@ -203,8 +210,6 @@ values within categories::
Propagate forward the last encountered non-``NaN`` value for the
category::

Note that until a reported bug is fixed, explicit column name declarations might not be displayed for grouping operations.

>>> ds.Cat.fill_forward(ds.x)
*gb_key_0 x
--------- -----
Expand All @@ -215,6 +220,10 @@ Note that until a reported bug is fixed, explicit column name declarations might
A 9.00
B 16.00


Note that until a reported bug is fixed, explicit column name declarations might not be
displayed for grouping operations.

Propagate backward the next encountered non-NaN value for the category::

>>> ds.Cat.fill_backward(ds.x)
Expand Down
Loading

0 comments on commit 54badde

Please sign in to comment.