Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: conversion of Series to Categorical #16557

Merged
merged 3 commits into from
Jun 9, 2017

Conversation

preddy5
Copy link
Contributor

@preddy5 preddy5 commented May 31, 2017

Patch to bug fix #16524

@preddy5
Copy link
Contributor Author

preddy5 commented May 31, 2017

@jreback this fixes the issue, could you review the PR and let me know if am doing it wrong.

@codecov
Copy link

codecov bot commented May 31, 2017

Codecov Report

Merging #16557 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #16557   +/-   ##
=======================================
  Coverage   90.96%   90.96%           
=======================================
  Files         161      161           
  Lines       49263    49263           
=======================================
  Hits        44810    44810           
  Misses       4453     4453
Flag Coverage Δ
#multiple 88.71% <ø> (ø) ⬆️
#single 40.22% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/internals.py 93.43% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10c17d4...795575a. Read the comment docs.

@jreback jreback changed the title bug fix #16524 BUG: conversion of Series to Categorical May 31, 2017
@jreback
Copy link
Contributor

jreback commented May 31, 2017

pls add tests. in fact you want to write the tests first to make sure that they fail, then your fix makes it pass. also add a whatsnew entry (0.20.2) is fine.

@jreback jreback added Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 31, 2017
@preddy5 preddy5 force-pushed the bug16524 branch 2 times, most recently from ca3404b to 3e1d73f Compare June 1, 2017 15:45
series = Series(['a', 'b', 'c'])

categorical_series = Series(series, dtype='category')
expected = Categorical(['a', 'b', 'c'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can mak this expected = Series(['a', 'b', 'c'], dtype='category')

and then you can use tm.assert_series_equal(categorical_series, expected).

@preddy5 preddy5 force-pushed the bug16524 branch 2 times, most recently from 27a955c to 83dd40a Compare June 2, 2017 03:30
@preddy5
Copy link
Contributor Author

preddy5 commented Jun 2, 2017

@TomAugspurger I have made the changes according to your comments, the build is failing for some reason, can you retrigger the travis build please?

@preddy5
Copy link
Contributor Author

preddy5 commented Jun 3, 2017

could you retrigger circleci build as well please. Thank you.

@preddy5
Copy link
Contributor Author

preddy5 commented Jun 5, 2017

@jreback @TomAugspurger Could you review the PR please, Thank you.

@@ -59,6 +59,7 @@ Conversion

- Bug in :func:`to_numeric` in which empty data inputs were causing a segfault of the interpreter (:issue:`16302`)
- Silence numpy warnings when broadcasting ``DataFrame`` to ``Series`` with comparison ops (:issue:`16378`, :issue:`16306`)
- Bug in ``Series`` with dtype='category' (:issue:`16524`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in Series construction with dtype='category'

@@ -248,3 +248,11 @@ def test_intercept_astype_object(self):

result = df.values.squeeze()
assert (result[:, 0] == expected.values).all()

def test_series_to_categorical(self):
series = Series(['a', 'b', 'c'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment for the issue number

def test_series_to_categorical(self):
series = Series(['a', 'b', 'c'])

categorical_series = Series(series, dtype='category')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result =

categorical_series = Series(series, dtype='category')
expected = Series(['a', 'b', 'c'], dtype='category')

tm.assert_series_equal(categorical_series, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

categorical_series -> result

@@ -248,7 +248,8 @@ class Categorical(PandasObject):
__array_priority__ = 1000
_typ = 'categorical'

def __init__(self, values, categories=None, ordered=False, fastpath=False):
def __init__(self, values, categories=None, ordered=False, fastpath=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fix should not be here.

instead in core/internals.py/Block/_astype.py

raise_one_error can be a named argument in the signature and not passed to the constructor

@preddy5 preddy5 force-pushed the bug16524 branch 2 times, most recently from 80ed6da to b37fba3 Compare June 5, 2017 10:48
@preddy5
Copy link
Contributor Author

preddy5 commented Jun 5, 2017

@TomAugspurger @jreback I have made the changes according to your comments.

@preddy5
Copy link
Contributor Author

preddy5 commented Jun 6, 2017

@jreback hey I have made changes according to your previous comments, could you review the changes once. Thank You.

@TomAugspurger TomAugspurger added this to the 0.20.3 milestone Jun 7, 2017
@TomAugspurger
Copy link
Contributor

@preddy5 I rebased and moved the release note to 0.20.3 (releasing in a couple weeks).

I'll let Jeff comment on the changes, but I think we're good.

@@ -44,6 +44,7 @@ Conversion
^^^^^^^^^^

- Bug in pickle compat prior to the v0.20.x series, when ``UTC`` is a timezone in a Series/DataFrame/Index (:issue:`16608`)
- Bug in ``Series`` with ``dtype='category'`` (:issue:`16524`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be a bit more descriptive.

Bug in Series construction when passing type=category

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny change. ping on green.

@jorisvandenbossche jorisvandenbossche merged commit 9fdea65 into pandas-dev:master Jun 9, 2017
@jorisvandenbossche
Copy link
Member

@preddy5 Thanks!

Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jul 6, 2017
TomAugspurger pushed a commit that referenced this pull request Jul 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Converting Series to categorical
4 participants