Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] cudf.Series equivalent of pd.Series.to_dict or pd.Series.iteritems or pd.Series.items #5450

Closed
paul-tqh-nguyen opened this issue Jun 11, 2020 · 2 comments · Fixed by #5340
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@paul-tqh-nguyen
Copy link

Is your feature request related to a problem? Please describe.

It's useful to convert a series into a dictionary when interoperating with other Python libraries.

We can do this with a pandas series in a few ways.

pnguyen@machine:/tmp$ python3 -c "

import pandas as pd

df = pd.DataFrame()
df['keys'] = [111, 222, 333]
df['vals'] = ['a', 'b', 'c']
df = df.set_index('keys')

dict1 = df.vals.to_dict()
dict2 = {k:v for k,v in df.vals.iteritems()}
dict3 = {k:v for k,v in df.vals.items()}

print()
print(f'dict1 {repr(dict1)}')
print(f'dict2 {repr(dict2)}')
print(f'dict3 {repr(dict3)}')
print()

"

dict1 {111: 'a', 222: 'b', 333: 'c'}
dict2 {111: 'a', 222: 'b', 333: 'c'}
dict3 {111: 'a', 222: 'b', 333: 'c'}

pnguyen@machine:/tmp$ 

The above approaches don't seem to be supported in cudf.


pnguyen@machine:/tmp$ python3 -c "

import cudf

df = cudf.DataFrame()
df['keys'] = [111, 222, 333]
df['vals'] = ['a', 'b', 'c']
df = df.set_index('keys')

dict = df.vals.to_dict()

"
Traceback (most recent call last):
  File "<string>", line 10, in <module>
AttributeError: 'Series' object has no attribute 'to_dict'
pnguyen@machine:/tmp$ python3 -c "

import cudf

df = cudf.DataFrame()
df['keys'] = [111, 222, 333]
df['vals'] = ['a', 'b', 'c']
df = df.set_index('keys')

dict = {k:v for k,v in df.vals.iteritems()}

"
Traceback (most recent call last):
  File "<string>", line 10, in <module>
AttributeError: 'Series' object has no attribute 'iteritems'
pnguyen@machine:/tmp$ python3 -c "

import cudf

df = cudf.DataFrame()
df['keys'] = [111, 222, 333]
df['vals'] = ['a', 'b', 'c']
df = df.set_index('keys')

dict = {k:v for k,v in df.vals.items()}

"
Traceback (most recent call last):
  File "<string>", line 10, in <module>
AttributeError: 'Series' object has no attribute 'items'
pnguyen@machine:/tmp$ 

This is my current workaround:


pnguyen@machine:/tmp$ python3 -c "

import cudf

df = cudf.DataFrame()
df['keys'] = [111, 222, 333]
df['vals'] = ['a', 'b', 'c']
df = df.set_index('keys')

dict = {i.item():df.vals.loc[i.item()] for i in df.vals.index.values}

print()
print(f'dict {repr(dict)}')
print()

"

dict {111: 'a', 222: 'b', 333: 'c'}

pnguyen@machine:/tmp$ 

Describe the solution you'd like

Ideally, there would be support for cudf.Series.to_dict or cudf.Series.iteritems or cudf.Series.items so that a simple pd -> cudf replacement would work.

Describe alternatives you've considered

I'm currently using the workaround shown above.

Additional context

@paul-tqh-nguyen paul-tqh-nguyen added Needs Triage Need team to review and classify feature request New feature or request labels Jun 11, 2020
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jun 11, 2020
@kkraus14
Copy link
Collaborator

@paul-tqh-nguyen we intentional haven't implemented these methods because there's a HUGE performance penalty to leaving the GPU that we don't want users to fall into implicitly. That being said we should throw more explicit exceptions here saying that they're not implemented for performance reasons, but if you need them to go via PyArrow.

@paul-tqh-nguyen
Copy link
Author

Yes, that makes complete sense. Thanks for the insight! It'll help me figure out a smarter way to deal with my problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
3 participants