Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain Dict Ordering with Concat #21512

Closed
wants to merge 9 commits into from

Conversation

SaturninoMateus
Copy link

@SaturninoMateus SaturninoMateus commented Jun 16, 2018

although the set is instance of dict, it should not sort if it is also instance of OrderedDict

@pep8speaks
Copy link

pep8speaks commented Jun 16, 2018

Hello @SaturninoMateus! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 21, 2018 at 02:03 Hours UTC

@codecov
Copy link

codecov bot commented Jun 16, 2018

Codecov Report

Merging #21512 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21512      +/-   ##
==========================================
+ Coverage   91.92%   91.92%   +<.01%     
==========================================
  Files         153      153              
  Lines       49563    49566       +3     
==========================================
+ Hits        45559    45562       +3     
  Misses       4004     4004
Flag Coverage Δ
#multiple 90.32% <100%> (ø) ⬆️
#single 41.8% <25%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/concat.py 97.61% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 506935c...a7aa052. Read the comment docs.

@WillAyd
Copy link
Member

WillAyd commented Jun 16, 2018

Is this in reference to a particular issue?

@SaturninoMateus
Copy link
Author

Sorry?

@WillAyd
Copy link
Member

WillAyd commented Jun 16, 2018

What issue is this solving? Typically all pull requests (except for simple doc updates) refer to an issue on the issue tracker

@SaturninoMateus
Copy link
Author

SaturninoMateus commented Jun 16, 2018

Issue #21510

@WillAyd
Copy link
Member

WillAyd commented Jun 16, 2018

OK thanks. For future PRs please add that reference in your original comment (I've updated it for you this time).

As per the checklist, you are also going to need tests and a whatsnew entry for 0.23.2

@WillAyd WillAyd changed the title Fixes #21510 Remove sort if its already an ordered dict Remove sort if its already an ordered dict Jun 16, 2018
@WillAyd WillAyd changed the title Remove sort if its already an ordered dict Maintain Dict Ordering with Concat Jun 16, 2018
@@ -250,7 +250,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None,

if isinstance(objs, dict):
if keys is None:
keys = sorted(objs)
if not isinstance(objs, OrderedDict):
keys = sorted(objs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal dicts are ordered too in python3.6+, that must be checked too. There is a PY36 constant somewhere, that you can use for this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we already took a stance on this as a project, but the orderedness of dicts in 3.6 is considered an implementation detail of cpython, while it will be a language feature starting from 3.7 . So in principle in 3.6 users should not rely on it, and we should not assume they do.

@topper-123 's comment still holds I think, but for 3.7

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we've already relied on that implementation detail in other changes (see #19830, #19859, #19884) so I'd be +1 on relying on that here as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, given that Python 3.6 is at 3.6.5 now, and its's being formalised in 3.7, I think this is quite safe. They're not going to change implementation now...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very simliar to in structure to pandas.core.common._dict_keys_to_ordered_list. structure like this. see if we can consolidate all of these try/sorting routings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback you mean use: keys = com._dict_keys_to_ordered_list(objs) ? If so, I'm afraid we'll need to refactor _dict_keys_to_ordered_list because it assumes that PY36 has sorted dict which I think its not always true, and will break test_concat_dict test case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's the point i am raising, I want a refactor here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, from _dict_keys_to_ordered_list method, i suggest change the conditionif PY36 or isinstance(mapping, OrderedDict): to if PY37 or isinstance(mapping, OrderedDict): then, remove test_constructor_dict_order_insertion and test_constructor_dict_order. What do you think about it?

exp_list = [('First', 0), ('First', 1), ('First', 2),
('Another', 0), ('Another', 1),
('Another', 2), ('Another', 3)]
assert ps_list == exp_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically with test cases we have a pattern of using result and expected as the variable names. In this case, you should create an expected variable that is exactly the Series you are looking for and then us tm.assert_series_equal(result, expected) to ensure the proper outcome

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok if I just change the variables names (expected and result), or I should also create the Series object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create the Series

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure it out how I can create the Series with different sizes without using the concat by itself.

Copy link
Author

@SaturninoMateus SaturninoMateus Jun 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to create like this:

a = pd.Series(range(3),range(3))
b = pd.Series(range(4),range(4))
expected = pd.Series([a,b],index=['First','Another'])

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -83,4 +83,5 @@ Bug Fixes
**Other**

- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`)
-
- Bug in :class:`_Concatenator` should not sort Ordered Dictionaries (:issue:`21510`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you come up with some less technical? This is too focused on the technical aspect of the change, but you really should be writing something that a casual user could read and understand in the whatsnew

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I would just say that the bug was causing pd.concat to not respect order present in an ordered dictionary object (something along those lines).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, check my last commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting closer but you should remove any reference to _Concatenator as it is a private class. I would be fine with almost a copy/paste of what @gfyoung suggested above as well

@gfyoung gfyoung added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 17, 2018
b = pd.Series(range(4), range(4))
a.index = pd.MultiIndex.from_tuples([('First', v) for v in a.index])
b.index = pd.MultiIndex.from_tuples([('Another', v) for v in b.index])
expected = a.append(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier if you just constructed the Series directly. You could doing something like index = pd.MultiIndex.from_records(...) and then expected = pd.Series(..., index=index) to make this as succinct and clear as possible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeError: type object 'MultiIndex' has no attribute 'from_records'

@@ -83,4 +83,5 @@ Bug Fixes
**Other**

- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`)
-
- Bug in :class:`_Concatenator` should maintain dict ordering when :meth:`concat` is called (:issue:`2151`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a user facing comment, meaning it should only use the public API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

@@ -250,7 +250,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None,

if isinstance(objs, dict):
if keys is None:
keys = sorted(objs)
if not isinstance(objs, OrderedDict):
keys = sorted(objs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very simliar to in structure to pandas.core.common._dict_keys_to_ordered_list. structure like this. see if we can consolidate all of these try/sorting routings.

# GH 21510
result = pd.concat(OrderedDict([('First', pd.Series(range(3))),
('Another', pd.Series(range(4)))]))
a = pd.Series(range(3), range(3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use the 2 form constructor, then specify index=. I don't think it actually helps here though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should I specify that?

@@ -83,4 +83,5 @@ Bug Fixes
**Other**

- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`)
-
- Bug in :class:`_Concatenator` should maintain dict ordering when :meth:`concat` is called (:issue:`2151`)
-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to go into 0.24.0 as even though its a bug fix, its a jaring one, should be in reshaping.

@@ -93,4 +93,6 @@ Bug Fixes

**Other**

-
- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you picked up some extra text here. also move to 0.24

# Conflicts:
#	doc/source/whatsnew/v0.23.2.txt
@@ -110,3 +110,4 @@ doc/source/styled.xlsx
doc/source/templates/
env/
doc/source/savefig/
*my-dev-test.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this from?

@@ -250,7 +250,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None,

if isinstance(objs, dict):
if keys is None:
keys = sorted(objs)
if not isinstance(objs, OrderedDict):
keys = sorted(objs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's the point i am raising, I want a refactor here.

@jreback
Copy link
Contributor

jreback commented Oct 11, 2018

closing as stale, if you want to continue working, pls ping and we can re-open. you will need to merge master.

@jreback jreback closed this Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concat does not respect order of an OrderedDict
7 participants