-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix copy s.t. it always copies index/columns. #4830
Conversation
@hayd copying will work now :) |
if deep: | ||
new_axes = [ax.copy(deep=deep) for ax in self.axes] | ||
else: | ||
new_axes = list(self.axes) | ||
return self.apply('copy', axes=new_axes, deep=deep, do_integrity_check=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just pass ref_items=new_axes[0] if deep else None
(and it will get passed to the block), the you dont' need the mod in apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duh - good catch. 😄
this works for everything but SparseSeries and SparsePanel at this point...still trying to figure them out. |
@@ -1988,7 +1988,8 @@ def transform(self, func, *args, **kwargs): | |||
|
|||
# broadcasting | |||
if isinstance(res, Series): | |||
if res.index is obj.index: | |||
# TODO: Possibly could just check for length here...not clear | |||
if res.index.identical(obj.index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback, @cpcloud or others - can't do this check any more here, because of the change in how copying works (and because more ops end up returning views rather than the same object), I think identical is okay, because it seems relatively fast. I didn't see much of a change on perf tests, but they are relatively unreliable on my computer. What do you think? I wasn't sure whether what mattered was whether it was the same object or if it was just the length...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm is
is much faster than identical
, even for tiny indices:
In [9]: i = Index([1, 2, 3])
In [10]: timeit i.identical(i)
1000000 loops, best of 3: 1.43 µs per loop
In [11]: timeit i is i
10000000 loops, best of 3: 60.2 ns per loop
Is there any way to try the is
check and if that fails then use identical
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud with these changes, in the entire test suite, the is
check never succeeds. I was thinking of checking is
on the base first, since it's a view. Would that work? Here's what I get timing wise for this:
In [3]: %timeit i is i
%timeit 10000000 loops, best of 3: 95.6 ns per loop
In [4]: %timeit i.base is i.base
1000000 loops, best of 3: 369 ns per loop
In [5]: %timeit i.identical(i)
100000 loops, best of 3: 2.1 µs per loop
Works with MultiIndex too:
In [6]: mi = MultiIndex.from_tuples(zip(range(10), range(10)))
In [7]: mi.base
Out[7]: array([], dtype=object)
In [8]: mi2 = MultiIndex.from_tuples(zip(range(10), range(10))
...: )
In [9]: mi2
Out[9]:
MultiIndex
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9)]
In [10]: mi2.base
Out[10]: array([], dtype=object)
In [11]: mi.base is mi2.base
Out[11]: False
In [12]: %timeit mi is mi
%timeit mi.base i10000000 loops, best of 3: 95.8 ns per loop
In [13]: %timeit mi.base is mi.base
1000000 loops, best of 3: 351 ns per loop
In [14]: %timeit mi.identical(mi)
100000 loops, best of 3: 12.6 µs per loop
In [15]: %timeit mi.base is mi2.base
1000000 loops, best of 3: 355 ns per loop
Length check is not MI friendly:
In [18]: %timeit len(mi) == len(mi2)
100000 loops, best of 3: 4.96 µs per loop
btw - maybe we should, at some point, optimize the length check on multiindex...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope. base doesn't work in every case either :-(
re identical u might try something like if a is b or a.base is b.base: ( maybe do a share memory check too) return a.identizal(b) so the actual identity case is a lot faster |
Problem is that bases gets changed with view. I.e.:
so slight tweak on base. |
@jreback @cpcloud here's what I'm thinking for index, adding an def is_(self, other):
"""returns True if and only if the indices actually are the same underlying array.
Generally survives through views. Doesn't care about metadata."""
if self is other:
return True
s_base, o_base = self.base, other.base
return self is o_base or other is s_base or s_base is o_base (base calls aren't cached, which is why it needs to be setup this way) |
d = self._construct_axes_dict() | ||
if deep: | ||
new_data = dict((k, v.copy()) for k, v in compat.iteritems(self._frames)) | ||
d = dict((k, v.copy()) for k, v in compat.iteritems(d)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be v.copy(deep=True)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes they should
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change.
blah, this doesn't quite work either :-/ multiple levels of view screw up |
It's going to take some thinking to decide on how best to efficiently handle flexible metadata for |
i'll vbench it and report back :) |
if it's not good enough, we can shave off about 600-800ns per iteration by skipping the function call and using |
Is this the right behavior?
|
@jtratner yep |
@@ -116,7 +116,7 @@ def __init__(self, data, index=None, sparse_index=None, kind='block', | |||
|
|||
if is_sparse_array: | |||
if isinstance(data, SparseSeries) and index is None: | |||
index = data.index | |||
index = data.index.view() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so then, should all of these places actually make views on the index? Less sure about sparse...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case what's the different? whether I assign an index or its view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just whether you want to be creating a view every time. Minimal difference (if at all).
Only copies index/columns with `deep=True` on `BlockManager`. Plus some tests...yay! Change copy to make views of indices. Requires changing groupby to check whether indices are identical, rather than comparing with `is`. Plus add tests for name checks through everything. Also, fixes tests to use `is_()` rather than `is` CLN: Change groupby to use 'is_' instead
ok |
yep |
BUG: Fix copy s.t. it always copies index/columns.
Fixes #4202 (and maybe some others too).
Only copies index/columns with
deep=True
onBlockManager
. Plus sometests...yay!