-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#3736: Don't use a 2-d array to set a categorical column. #3777
FIX-#3736: Don't use a 2-d array to set a categorical column. #3777
Conversation
…lumn. Signed-off-by: mvashishtha <mahesh@ponder.io>
Codecov Report
@@ Coverage Diff @@
## master #3777 +/- ##
==========================================
+ Coverage 80.52% 86.49% +5.96%
==========================================
Files 197 197
Lines 16414 16660 +246
==========================================
+ Hits 13218 14410 +1192
+ Misses 3196 2250 -946
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, especially thankful for the comments!
Is this only an issue with categoricals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No doubt this is working... I just wonder if we can properly fix it without waiting for pandas, or are we upstream-blocked?
I think so. In contrast to the first snippet here, this seems to work on both Pandas and Modin 0.12: import modin.pandas as pd
from numpy import array
df = pd.DataFrame({"status": ["a", "b", "c"]})
df.loc[:1, "status"] = "a" |
Co-authored-by: Vasily Litvinov <vasilij.n.litvinov@intel.com>
There is a fix PR out for pandas-dev/pandas#44703, but we decided at the meeting on December 3 to not wait until that PR is released. |
Signed-off-by: mvashishtha <mahesh@ponder.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please also create an issue within Modin to remove the workaround when we update to pandas with the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, agree with @vnlitvinov comment.
def test_feature_names(feature_names): | ||
data = np.random.randn(100, 5) | ||
label = np.array([0, 1] * 50) | ||
def test_feature_names(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvashishtha, I wonder why this PR contains changes in this file, though it shouldn't, because the changes were preliminary merged into master as part of another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't understand. Git blame correctly points to #3770 for these lines.
Signed-off-by: mvashishtha mahesh@ponder.io
What do these changes do?
Work around pandas-dev/pandas#44703 by not converting a categorical scalar to a 2-d array to set a categorical column.
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/developer/architecture.rst
is up-to-date