Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: chained reshaping ops #11485

Closed
2 tasks done
jreback opened this issue Oct 30, 2015 · 8 comments
Closed
2 tasks done

API: chained reshaping ops #11485

jreback opened this issue Oct 30, 2015 · 8 comments
Labels
API Design Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Master Tracker High level tracker for similar issues Needs Discussion Requires discussion from core team before further action Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

accept callables

In [4]: df = DataFrame({'A' : [1,2,3,4], 'B' : ['a',np.nan,'b','a']})

In [5]: df
Out[5]: 
   A    B
0  1    a
1  2  NaN
2  3    b
3  4    a

an operation that changes the shape of the DataFrame

In [9]: res = df.dropna()

In [10]: res[res.B=='a']
Out[10]: 
   A  B
0  1  a
3  4  a

can be done like this

In [8]: df.dropna().pipe(lambda x: x[x.B=='a'])
Out[8]: 
   A  B
0  1  a
3  4  a

SQL calls this select, which pandas has, but both select/filter are used for filtering LABELS (and not data).

I suppose making this work:

df.dropna().loc[lambda x: x[x.B=='a']] is maybe a slight enhancement of this

any thoughts?

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Needs Discussion Requires discussion from core team before further action labels Oct 30, 2015
@jreback
Copy link
Contributor Author

jreback commented Oct 30, 2015

cc @jorisvandenbossche @TomAugspurger @sinhrks @shoyer

FYI @TomAugspurger I really do like .pipe & chaining!

comes from this example

(tidy
     .dropna()
     .pipe(lambda df: df[df.team == 'Los Angeles Lakers'])
     .pipe(sns.FacetGrid, col='team', hue='team')
     .map(sns.barplot, "variable", "rest")
 )

@jreback
Copy link
Contributor Author

jreback commented Oct 30, 2015

forgot about .query which is a nice soln for this actually

In [63]: df.dropna().query('B=="a"')            
Out[63]: 
   A  B
0  1  a
3  4  a

@jreback jreback closed this as completed Oct 30, 2015
@shoyer
Copy link
Member

shoyer commented Oct 30, 2015

Yes, .query works but I hate coding in strings. I would be supportive of accepting a lambda in query, e.g., df.query(lambda df: df.B == 'a')

@jreback
Copy link
Contributor Author

jreback commented Oct 30, 2015

yeh, then of course

df[lambda x: x.B == 'a']
df.loc[lambda x: x.B == 'a']

for consistency

ok, then it allows easy chaining based on values, which is nice to do (and then is consistent with how .assign works, e.g. accepts an expression or a lambda)

@jreback jreback reopened this Oct 30, 2015
@jreback jreback added this to the Next Major Release milestone Oct 30, 2015
@shoyer
Copy link
Member

shoyer commented Oct 30, 2015

Yep, putting support for functions in indexing makes sense to me.

@jreback jreback modified the milestones: 0.18.0, Next Major Release Jan 31, 2016
@max-sixty
Copy link
Contributor

Would be nice if this also worked on Series (unlike query which currently just works on DFs).

And +1 for @shoyer's suggestion re extending query for this: df.query(lambda df: df.B == 'a'), despite the inability to set those values.

@kawochen
Copy link
Contributor

kawochen commented Feb 3, 2016

@MaximilianR I'm using your example from #12226:
What's wrong with the following?

In [15]: pd.Series(range(10)).mul(5).pipe(lambda x: x**2).pipe(lambda x: x-500).pipe(lambda x: x[x>200])
Out[15]:
6     400
7     725
8    1100
9    1525
dtype: int64

@max-sixty
Copy link
Contributor

@kawochen Ah - that's very nice. Not sure how I missed that. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Master Tracker High level tracker for similar issues Needs Discussion Requires discussion from core team before further action Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants