Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.13.1: DataFrame.xs() ValueError: Cannot retrieve view (copy=False) #6894

Closed
bluefir opened this issue Apr 16, 2014 · 16 comments · Fixed by #6919
Closed

0.13.1: DataFrame.xs() ValueError: Cannot retrieve view (copy=False) #6894

bluefir opened this issue Apr 16, 2014 · 16 comments · Fixed by #6919
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@bluefir
Copy link

bluefir commented Apr 16, 2014

I have python 2.7.6 and pandas 0.13.1 on Windows 7.

>>>portfolio_data_all.index.names
FrozenList([u'date', u'stock_id'])
>>>portfolio_data_all.xs(cusip_etf, level=field_stock_id, copy=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-69-5300763cc7b7> in <module>()
----> 1 portfolio_data_all.xs(cusip_etf, level=field_stock_id, copy=False)

C:\Python27\lib\site-packages\pandas\core\generic.pyc in xs(self, key, axis, level, copy, drop_level)
   1261 
   1262             if not copy and not isinstance(loc, slice):
-> 1263                 raise ValueError('Cannot retrieve view (copy=False)')
   1264 
   1265             # level = 0

ValueError: Cannot retrieve view (copy=False)
@cpcloud
Copy link
Member

cpcloud commented Apr 16, 2014

numpy always returns a copy of non-contiguous indices. It's more than likely that you'll have to copy when calling xs, because you're often taking a non contiguous chunk of the array out. Why can't you pass copy=True?

@bluefir
Copy link
Author

bluefir commented Apr 16, 2014

I am bending over backwards trying to assign something to a column for rows where the second level in the MultiIndex has a particular value. For the first level, I can do it easily with .loc. How can I do it for the second level?

@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

in reality this option shouldn't exist at all
xs by definition is a copy except in very rare cases and is just confusing
don't pass the copy flag

I think maybe I'll take it out

@bluefir
Copy link
Author

bluefir commented Apr 16, 2014

@jreback no "may be". Take it out, please.

@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

in master do this: http://pandas.pydata.org/pandas-docs/dev/

you can swap levels assign then swap back in other versions

@jreback jreback added this to the 0.14.0 milestone Apr 16, 2014
@bluefir
Copy link
Author

bluefir commented Apr 16, 2014

Got it. How expensive is swaplevel? Does it create a copy of everything?

@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

swap level is cheap
if the underlying is already a copy (eg not a view) then this overall should be pretty cheap

@bluefir
Copy link
Author

bluefir commented Apr 16, 2014

Thanks!

@bluefir bluefir closed this as completed Apr 16, 2014
@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

ok going to leave open to remove the copy option from xs

@jreback jreback reopened this Apr 16, 2014
@bluefir
Copy link
Author

bluefir commented Apr 16, 2014

Why doesn't this work?

>>>portfolio_data_all.loc[cusip_etf, [field_benchmark_weight, '_temp']]
            weight_benchmark     _temp
date                                  
2014-01-01               NaN  0.074489
2014-01-02               NaN  0.074486
2014-01-03               NaN  0.075367
2014-01-06               NaN  0.075428
2014-01-07               NaN  0.075089

[5 rows x 2 columns]
>>>portfolio_data_all.loc[cusip_etf, field_benchmark_weight] = portfolio_data_all.loc[cusip_etf, '_temp']
>>>>>>portfolio_data_all.loc[cusip_etf, [field_benchmark_weight, '_temp']]
            weight_benchmark     _temp
date                                  
2014-01-01               NaN  0.074489
2014-01-02               NaN  0.074486
2014-01-03               NaN  0.075367
2014-01-06               NaN  0.075428
2014-01-07               NaN  0.075089

[5 rows x 2 columns]

@bluefir
Copy link
Author

bluefir commented Apr 16, 2014

Never mind, somehow this worked:

>>>portfolio_data_all.loc[cusip_etf, field_benchmark_weight] = portfolio_data_all['_temp']
            weight_benchmark     _temp
date                                  
2014-01-01          0.074489  0.074489
2014-01-02          0.074486  0.074486
2014-01-03          0.075367  0.075367
2014-01-06          0.075428  0.075428
2014-01-07          0.075089  0.075089

[5 rows x 2 columns]

The funny thing is that if I use a row indexer like

rows_au = portfolio_data_all[field_country] = 'AU'

I would have to do

portfolio_data_all.loc[rows_au, field_benchmark_weight] = portfolio_data_all.loc[rows_au, '_temp']

Why the disconnect?

@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

can u make an example that can be easily copy-pasted

thanks

@bluefir
Copy link
Author

bluefir commented Apr 17, 2014

Sorry, I don't have an easy example. That being said, I figured out how to do things like that: If I reset_index by moving that index level into columns, it becomes MUCH easier to do what I want because I always issue data_frame.loc[rows, column], with rows being a boolean series with the same index as data_frame. What makes interfaces somewhat non-intuitive is that .loc[index_value, column] works somewhat differently from .loc[rows, column], but that might be by design.

@jreback
Copy link
Contributor

jreback commented Apr 17, 2014

@bluefir not sure what you mean

when you reset_index you are changing the index

what do you mean by rows and index_value?

@bluefir
Copy link
Author

bluefir commented Apr 17, 2014

This code snipped seems to work fine:

rows_au = portfolio_data[field_country] == iso_au
bench_weight_au = portfolio_data.loc[rows_au, field_benchmark_weight].groupby(level=field_date).sum()
portfolio_data.reset_index(level=field_stock_id, inplace=True)
rows_etf = portfolio_data[field_stock_id] == CUSIP_ETF_AU
portfolio_data.loc[rows_etf, field_country] = iso_au
portfolio_data.loc[rows_etf, field_benchmark_weight] = bench_weight_au
portfolio_data.set_index(field_stock_id, append=True, inplace=True)
if not portfolio_data.index.is_monotonic:
    portfolio_data.sort_index(inplace=True)

This would not work:

portfolio_data.swaplevel(0, 1)
portfolio_data.loc[CUSIP_ETF_AU, field_benchmark_weight] = bench_weight_au

I tried some other things, but it was really hard trying to get it work with the MultiIndex. First and foremost, for most things I need to sort it first. Anyway, it's not a big deal. Just me learning pandas :-)

@jreback
Copy link
Contributor

jreback commented Apr 17, 2014

swap level returns a new frame (as do almost all pandas operations)
and yes u almost always have to sort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
3 participants