Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter kernel abort when plotting a column with pandas type "category" #463

Open
dhuntley1023 opened this issue Dec 30, 2020 · 5 comments
Assignees
Labels
type: bug Something isn't working

Comments

@dhuntley1023
Copy link

I ran into the issue documented in Issue #230 and was thinking that if I marked the column as dtype=category in pandas that the dataprep plot would use that as a hint that the columns was nominal and not continuous. However when I do this, it is reliably causing a Jupyter kernel abort with no useful messages printed to the shell where jupyter notebook was started.

Repro:

  • Create a new environment as ```conda create -n test dataprep notebook=6.1.5`` (Note the notebook version is required to get around the issue I mentioned in Feature Proposal: reduce dependencies #396 )
  • Start the notebook and run the code below
  • After several seconds, a message displays 'The kernel appears to have died. It will restart automatically.'
import pandas as pd
import dataprep.eda as dp

df=pd.DataFrame({'Continuous': [1,2,3,4,5], 'Categorical': [0,1,1,0,0]})
df['Categorical'] = df['Categorical'].astype('category')
df.info()
dp.plot(df, 'Categorical')
@dovahcrow
Copy link
Member

@dhuntley1023 Thanks for the reporting! This sounds like an issue that happened in the C code, either pandas or dask. @brandonlockhart I think you are more familiar with the categorical support than I. Do you have any thoughts?

@brandonlockhart
Copy link
Contributor

I cannot reproduce the error, can you @dovahcrow? However, I use Jupyter installed with pip, not conda. @dhuntley1023 could you please try .astype('object') rather than .astype('category') to convert a column to nominal values and see if that works?

@dhuntley1023
Copy link
Author

dhuntley1023 commented Dec 30, 2020

When I use .astype('object'), it works normally (i.e. no crashes). Would it be helpful to get a dump of the package versions in my environment?

@brandonlockhart
Copy link
Contributor

Thanks for following up @dhuntley1023!

Would it be helpful to get a dump of the package versions in my environment?

Yes, that would be helpful, thanks.

I'm unsure why category is causing a problem. Under-the-hood of dataprep we can convert category to object like we do for string type #377, but I think some more investigation would be useful to see if we can figure out the problem.

@dovahcrow
Copy link
Member

let's see with a newer release if this problem is fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants