You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into the issue documented in Issue #230 and was thinking that if I marked the column as dtype=category in pandas that the dataprep plot would use that as a hint that the columns was nominal and not continuous. However when I do this, it is reliably causing a Jupyter kernel abort with no useful messages printed to the shell where jupyter notebook was started.
Repro:
Create a new environment as ```conda create -n test dataprep notebook=6.1.5`` (Note the notebook version is required to get around the issue I mentioned in Feature Proposal: reduce dependencies #396 )
Start the notebook and run the code below
After several seconds, a message displays 'The kernel appears to have died. It will restart automatically.'
import pandas as pd
import dataprep.eda as dp
df=pd.DataFrame({'Continuous': [1,2,3,4,5], 'Categorical': [0,1,1,0,0]})
df['Categorical'] = df['Categorical'].astype('category')
df.info()
dp.plot(df, 'Categorical')
The text was updated successfully, but these errors were encountered:
@dhuntley1023 Thanks for the reporting! This sounds like an issue that happened in the C code, either pandas or dask. @brandonlockhart I think you are more familiar with the categorical support than I. Do you have any thoughts?
I cannot reproduce the error, can you @dovahcrow? However, I use Jupyter installed with pip, not conda. @dhuntley1023 could you please try .astype('object') rather than .astype('category') to convert a column to nominal values and see if that works?
Would it be helpful to get a dump of the package versions in my environment?
Yes, that would be helpful, thanks.
I'm unsure why category is causing a problem. Under-the-hood of dataprep we can convert category to object like we do for string type #377, but I think some more investigation would be useful to see if we can figure out the problem.
I ran into the issue documented in Issue #230 and was thinking that if I marked the column as dtype=category in pandas that the dataprep plot would use that as a hint that the columns was nominal and not continuous. However when I do this, it is reliably causing a Jupyter kernel abort with no useful messages printed to the shell where jupyter notebook was started.
Repro:
The text was updated successfully, but these errors were encountered: