Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Explode multiple columns of DataFrame #28465

Closed
wants to merge 13 commits into from

Conversation

kylestahl
Copy link

Now .explode() can take a list of column names and will explode multiple at the same time (given that each element across all the columns have the same length in every single row

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Kyle Stahl added 11 commits September 14, 2019 13:28
explode multiple columns at same time
Now if you pass a list of column names to .explode(), so long as all the lengths of lists are consistent across all the columns for each records, all the columns will be exploded.
ENH: DataFrame.explode() allow for multiple columns
Now explode() can also take in a list of columns and explode them all, given that for every record in the dataframe the elements of the exploding columns all have the same length
ENH: DataFrame.explode() multiple columns
@pep8speaks
Copy link

pep8speaks commented Sep 16, 2019

Hello @stahl085! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-09-16 16:48:53 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always add tests first

@WillAyd
Copy link
Member

WillAyd commented Sep 16, 2019

Just a heads up - this was part of the original implementation in #27267 so you can check there for inspiration on tests and implementation. The big blocker there though was how to handle duplicate values, i.e. whether we should generate a cartesian product or not. Do you know how other similar tools would handle that?

@kylestahl
Copy link
Author

Thanks for the info, I like that #27267 implementation much better!
By duplicate do you mean the user passes in the same column name twice? ex: ['A', 'A']

I haven't seen this implemented elsewhere, but to me it seems un-natural for this to return a cartesian product. Would it make sense to include that as an optional argument cartesian=False?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would need a number of tests; the impl will be very non-performant, so needs updating.

@kylestahl kylestahl closed this Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants