Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: New short indexer for operating on values #14976

Closed
skycaptain opened this issue Dec 23, 2016 · 6 comments
Closed

ENH: New short indexer for operating on values #14976

skycaptain opened this issue Dec 23, 2016 · 6 comments

Comments

@skycaptain
Copy link
Contributor

skycaptain commented Dec 23, 2016

First of all, if I missed a point, please feel free to comment.

Using arithmetic operations on pd.DataFrames is sometimes a mouthful. Take the following example, where columns a and b should be multiplied by the column c:

import numpy as np
import pandas as pd

np.random.seed(0)

df = pd.DataFrame(np.random.randn(3, 3), columns=list('abc'))

df[['a', 'b']] * df['c']

Apparently this doesn't work as expected. Instead one has to use either pd.Dataframe.mul(), which brings up poor legibility, or pd.Dataframe.values, which yields long lines and therefore also results in poor legibility:

# using pd.DataFrame.mul()
df[['a', 'b']].mul(df['c'], axis='index')

# This is quite short, but does not work...
df[['a', 'b']] * df[['c']].values

# .. you have to use numpy arrays instead
df[['a', 'b']].values * df[['c']].values

Surely, the last call in this example returns a numpy array, but in my case thats the only thing I'm interested in, since I'm rewrapping my data at a later stage.

I'm proposing a new short indexer for operating on values, sth like:

df.v[['a', 'b']] * df.v[['c']]

# which returns the same as
df[['a', 'b']].values * df[['c']].values

Or even more sophisticated:

df[['a', 'b']] * df.v[['c']]

# which returns the same as
df[['a', 'b']].mul(df['c'], axis='index')

Btw the same goes for all other arithmetic operators.

@jreback
Copy link
Contributor

jreback commented Dec 24, 2016

  • this would expose internal implementation detail (users would have to understand numpy )
  • make code code more obscure / unreadable
  • make the api more complex (we have another indexer, what is the reason???)

Apparently this doesn't work as expected. Instead one has to use either pd.Dataframe.mul(), which bbroadcasting a multiplication is

why do you think this should work this way? The point is to align operations on the index by default

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 27, 2016

This is basically the same as @shoyer's point in #10000 (comment) right?

IIRC the current behavior of dataframe * series is to match the behavior of NumPy to broadcast the last index (columns)?

I think expecting

df[['a', 'b']] * df['c']

to return

In [20]: df[['a', 'b']].mul(df['c'], axis=0)
Out[20]:
          a         b
0  1.726545  0.391649
1 -2.189975 -1.825123
2 -0.098067  0.015623

is perfectly reasonable. That said, this would be a big API change, with no clear way of deprecation.

@shoyer
Copy link
Member

shoyer commented Dec 27, 2016

In my experience, the best way to write such arithmetic currently is something like (df[['a', 'b']].T * df['c']).T (which is hardly ideal).

I think this would be reasonable behavior to change for pandas 2.0 but probably not before.

I'm not excited about the proposal here, which feels like a work-around for fundamentally broken broadcasting behavior rather than a fix of the root cause.

@jreback
Copy link
Contributor

jreback commented Dec 28, 2016

@shoyer if you want to create an issue for pandas 2 would be great.

closing this one as no-action in pandas 1.0

@jreback jreback closed this as completed Dec 28, 2016
@jreback jreback modified the milestones: No action, won't fix Dec 28, 2016
@shoyer
Copy link
Member

shoyer commented Dec 28, 2016

See wesm/pandas2#30

@skycaptain
Copy link
Contributor Author

Thanks for the discussion here.

I'm not excited about the proposal here, which feels like a work-around for fundamentally broken broadcasting behavior rather than a fix of the root cause.

My proposal was afaik a minor fix for a common problem, which people like me have now. But, I've learned, that even this addition would mean a lot of trouble/confusion to others. So, I agree with @shoyer and @jreback that this issue is reasonable, but also too profound.

@TomAugspurger TomAugspurger modified the milestones: won't fix, No action Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants