Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency clean up #137

Merged
merged 9 commits into from
Oct 7, 2022
Merged

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Oct 4, 2022

No description provided.

@madsbk madsbk added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 4, 2022
@madsbk madsbk requested a review from jakirkham October 4, 2022 08:13
@madsbk madsbk changed the title Dependency Clean up Dependency clean up Oct 4, 2022
@madsbk madsbk marked this pull request as ready for review October 4, 2022 10:46
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mads! 🙏

Had a couple questions below

python/tests/test_nvcomp.py Show resolved Hide resolved
python/tests/test_examples.py Outdated Show resolved Hide resolved
python/setup.cfg Show resolved Hide resolved
conda/recipes/kvikio/meta.yaml Show resolved Hide resolved
@jakirkham
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 0957368 into rapidsai:branch-22.12 Oct 7, 2022
@jakirkham
Copy link
Member

Thanks Mads! 🙏

@madsbk madsbk deleted the dependency_cleanup branch January 13, 2023 08:28
vuule pushed a commit to vuule/kvikio that referenced this pull request Nov 8, 2023
…de (rapidsai#137)

This PR resolves an issue in `from_pandas` API where `nan`'s present in a pandas series are being converted to `<NA>` values in `cudf` resulting in incorrect column representation:
```python
In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: import cudf

In [4]: s = pd.Series([np.nan, "a", "b"])

In [5]: s
Out[5]: 
0    NaN
1      a
2      b
dtype: object

In [6]: cudf.set_option("mode.pandas_compatible", True)

In [7]: gs = cudf.from_pandas(s)

In [8]: gs
Out[8]: 
0    <NA>
1       a
2       b
dtype: object

In [9]: gs = cudf.from_pandas(s, nan_as_null=False)
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
Cell In[9], line 1
----> 1 gs = cudf.from_pandas(s, nan_as_null=False)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/nvtx/nvtx.py:115, in annotate.__call__.<locals>.inner(*args, **kwargs)
    112 @wraps(func)
    113 def inner(*args, **kwargs):
    114     libnvtx_push_range(self.attributes, self.domain.handle)
--> 115     result = func(*args, **kwargs)
    116     libnvtx_pop_range(self.domain.handle)
    117     return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/dataframe.py:7874, in from_pandas(obj, nan_as_null)
   7872     return DataFrame.from_pandas(obj, nan_as_null=nan_as_null)
   7873 elif isinstance(obj, pd.Series):
-> 7874     return Series.from_pandas(obj, nan_as_null=nan_as_null)
   7875 elif isinstance(obj, pd.MultiIndex):
   7876     return MultiIndex.from_pandas(obj, nan_as_null=nan_as_null)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/nvtx/nvtx.py:115, in annotate.__call__.<locals>.inner(*args, **kwargs)
    112 @wraps(func)
    113 def inner(*args, **kwargs):
    114     libnvtx_push_range(self.attributes, self.domain.handle)
--> 115     result = func(*args, **kwargs)
    116     libnvtx_pop_range(self.domain.handle)
    117     return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/series.py:731, in Series.from_pandas(cls, s, nan_as_null)
    729 with warnings.catch_warnings():
    730     warnings.simplefilter("ignore")
--> 731     result = cls(s, nan_as_null=nan_as_null)
    732 return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/nvtx/nvtx.py:115, in annotate.__call__.<locals>.inner(*args, **kwargs)
    112 @wraps(func)
    113 def inner(*args, **kwargs):
    114     libnvtx_push_range(self.attributes, self.domain.handle)
--> 115     result = func(*args, **kwargs)
    116     libnvtx_pop_range(self.domain.handle)
    117     return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/series.py:651, in Series.__init__(self, data, index, dtype, name, copy, nan_as_null)
    633 if not isinstance(data, ColumnBase):
    634     # Using `getattr_static` to check if
    635     # `data` is on device memory and perform
   (...)
    641     # be expensive or mark a buffer as
    642     # unspillable.
    643     has_cai = (
    644         type(
    645             inspect.getattr_static(
   (...)
    649         is property
    650     )
--> 651     data = column.as_column(
    652         data,
    653         nan_as_null=nan_as_null,
    654         dtype=dtype,
    655         length=len(index) if index is not None else None,
    656     )
    657     if copy and has_cai:
    658         data = data.copy(deep=True)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/column.py:2073, in as_column(arbitrary, nan_as_null, dtype, length)
   2069 if cudf.get_option(
   2070     "mode.pandas_compatible"
   2071 ) and _is_pandas_nullable_extension_dtype(arbitrary.dtype):
   2072     raise NotImplementedError("not supported")
-> 2073 pyarrow_array = pa.array(arbitrary, from_pandas=nan_as_null)
   2074 if arbitrary.dtype == cudf.dtype("object") and cudf.dtype(
   2075     pyarrow_array.type.to_pandas_dtype()
   2076 ) != cudf.dtype(arbitrary.dtype):
   2077     raise MixedTypeError("Cannot create column with mixed types")

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/array.pxi:323, in pyarrow.lib.array()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/array.pxi:83, in pyarrow.lib._ndarray_to_array()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/error.pxi:100, in pyarrow.lib.check_status()

ArrowInvalid: Could not convert 'a' with type str: tried to convert to double

In [10]: gs = cudf.from_pandas(s, nan_as_null=True)

In [11]: gs.to_pandas()
Out[11]: 
0    None     # In `cudf.pandas` this is problematic because we started with `np.nan` and now ending with `None`.
1       a
2       b
dtype: object
```

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: rapidsai/cudf-private#137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants