-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maintain Dict Ordering with Concat #21512
Conversation
Hello @SaturninoMateus! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on June 21, 2018 at 02:03 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #21512 +/- ##
==========================================
+ Coverage 91.92% 91.92% +<.01%
==========================================
Files 153 153
Lines 49563 49566 +3
==========================================
+ Hits 45559 45562 +3
Misses 4004 4004
Continue to review full report at Codecov.
|
Is this in reference to a particular issue? |
Sorry? |
What issue is this solving? Typically all pull requests (except for simple doc updates) refer to an issue on the issue tracker |
Issue #21510 |
OK thanks. For future PRs please add that reference in your original comment (I've updated it for you this time). As per the checklist, you are also going to need tests and a whatsnew entry for 0.23.2 |
@@ -250,7 +250,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None, | |||
|
|||
if isinstance(objs, dict): | |||
if keys is None: | |||
keys = sorted(objs) | |||
if not isinstance(objs, OrderedDict): | |||
keys = sorted(objs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normal dicts are ordered too in python3.6+, that must be checked too. There is a PY36
constant somewhere, that you can use for this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we already took a stance on this as a project, but the orderedness of dicts in 3.6 is considered an implementation detail of cpython, while it will be a language feature starting from 3.7 . So in principle in 3.6 users should not rely on it, and we should not assume they do.
@topper-123 's comment still holds I think, but for 3.7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, given that Python 3.6 is at 3.6.5 now, and its's being formalised in 3.7, I think this is quite safe. They're not going to change implementation now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very simliar to in structure to pandas.core.common._dict_keys_to_ordered_list
. structure like this. see if we can consolidate all of these try/sorting routings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback you mean use: keys = com._dict_keys_to_ordered_list(objs)
? If so, I'm afraid we'll need to refactor _dict_keys_to_ordered_list
because it assumes that PY36 has sorted dict which I think its not always true, and will break test_concat_dict
test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's the point i am raising, I want a refactor here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, from _dict_keys_to_ordered_list
method, i suggest change the conditionif PY36 or isinstance(mapping, OrderedDict):
to if PY37 or isinstance(mapping, OrderedDict):
then, remove test_constructor_dict_order_insertion
and test_constructor_dict_order
. What do you think about it?
pandas/tests/reshape/test_concat.py
Outdated
exp_list = [('First', 0), ('First', 1), ('First', 2), | ||
('Another', 0), ('Another', 1), | ||
('Another', 2), ('Another', 3)] | ||
assert ps_list == exp_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically with test cases we have a pattern of using result
and expected
as the variable names. In this case, you should create an expected
variable that is exactly the Series you are looking for and then us tm.assert_series_equal(result, expected)
to ensure the proper outcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ok if I just change the variables names (expected and result), or I should also create the Series object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create the Series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure it out how I can create the Series with different sizes without using the concat by itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to create like this:
a = pd.Series(range(3),range(3))
b = pd.Series(range(4),range(4))
expected = pd.Series([a,b],index=['First','Another'])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -83,4 +83,5 @@ Bug Fixes | |||
**Other** | |||
|
|||
- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`) | |||
- | |||
- Bug in :class:`_Concatenator` should not sort Ordered Dictionaries (:issue:`21510`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you come up with some less technical? This is too focused on the technical aspect of the change, but you really should be writing something that a casual user could read and understand in the whatsnew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I would just say that the bug was causing pd.concat
to not respect order present in an ordered dictionary object (something along those lines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, check my last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer but you should remove any reference to _Concatenator as it is a private class. I would be fine with almost a copy/paste of what @gfyoung suggested above as well
pandas/tests/reshape/test_concat.py
Outdated
b = pd.Series(range(4), range(4)) | ||
a.index = pd.MultiIndex.from_tuples([('First', v) for v in a.index]) | ||
b.index = pd.MultiIndex.from_tuples([('Another', v) for v in b.index]) | ||
expected = a.append(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier if you just constructed the Series directly. You could doing something like index = pd.MultiIndex.from_records(...)
and then expected = pd.Series(..., index=index)
to make this as succinct and clear as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttributeError: type object 'MultiIndex' has no attribute 'from_records'
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -83,4 +83,5 @@ Bug Fixes | |||
**Other** | |||
|
|||
- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`) | |||
- | |||
- Bug in :class:`_Concatenator` should maintain dict ordering when :meth:`concat` is called (:issue:`2151`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a user facing comment, meaning it should only use the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
@@ -250,7 +250,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None, | |||
|
|||
if isinstance(objs, dict): | |||
if keys is None: | |||
keys = sorted(objs) | |||
if not isinstance(objs, OrderedDict): | |||
keys = sorted(objs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very simliar to in structure to pandas.core.common._dict_keys_to_ordered_list
. structure like this. see if we can consolidate all of these try/sorting routings.
pandas/tests/reshape/test_concat.py
Outdated
# GH 21510 | ||
result = pd.concat(OrderedDict([('First', pd.Series(range(3))), | ||
('Another', pd.Series(range(4)))])) | ||
a = pd.Series(range(3), range(3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use the 2 form constructor, then specify index=
. I don't think it actually helps here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should I specify that?
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -83,4 +83,5 @@ Bug Fixes | |||
**Other** | |||
|
|||
- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`) | |||
- | |||
- Bug in :class:`_Concatenator` should maintain dict ordering when :meth:`concat` is called (:issue:`2151`) | |||
- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to go into 0.24.0 as even though its a bug fix, its a jaring one, should be in reshaping.
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -93,4 +93,6 @@ Bug Fixes | |||
|
|||
**Other** | |||
|
|||
- | |||
- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you picked up some extra text here. also move to 0.24
# Conflicts: # doc/source/whatsnew/v0.23.2.txt
@@ -110,3 +110,4 @@ doc/source/styled.xlsx | |||
doc/source/templates/ | |||
env/ | |||
doc/source/savefig/ | |||
*my-dev-test.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this from?
@@ -250,7 +250,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None, | |||
|
|||
if isinstance(objs, dict): | |||
if keys is None: | |||
keys = sorted(objs) | |||
if not isinstance(objs, OrderedDict): | |||
keys = sorted(objs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's the point i am raising, I want a refactor here.
closing as stale, if you want to continue working, pls ping and we can re-open. you will need to merge master. |
git diff upstream/master -u -- "*.py" | flake8 --diff
although the set is instance of dict, it should not sort if it is also instance of OrderedDict