-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: (explode) Splitting a column content over multiple rows while duplicating other columns content to these rows #16538
Comments
can you post a copy-pastable example. having these in-line in the top description (as well as a link) is useful. |
Done;) |
Is a good starting point, but i guess that reindexing the other columns is not that easy. And anyway, it would be better to use a dedicated algorithm than to make a general call to various functions. |
you are asking for sure I think we could have this as a string method. It is generally useful. |
Thanks a lot! yes, this is it. actually the most satisfactory method I found is from SO InoDB (https://stackoverflow.com/users/1894184/inodb)
(I came up with the stack and before, I was not good with reindexing and joining) But anyway a general purpose function with adjustable setting will be nice. Thanks! |
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
Caught this on pandas-dev-request, and there is a similar feature that I need with respect to indexing. In [2]: idx = pd.MultiIndex.from_product([['a','b'], [1,2,3]], names=['ab
...: ','ott'])
...: s = pd.Series(range(6), index=idx)
...: s
Out[2]:
ab ott
a 1 0
2 1
3 2
b 1 3
2 4
3 5
dtype: int64
In [3]: xy = ['x', 'y']
...: (s
...: .to_frame('s')
...: .assign(one=1)
...: .reset_index()
...: .merge(pd.Series(1, index=pd.Index(xy,name='xy'))
...: .to_frame('one').reset_index(), on='one')
...: .set_index(s.index.names+['xy'])
...: .drop(columns=['one'])
...: )
Out[3]:
s
ab ott xy
a 1 x 0
y 0
2 x 1
y 1
3 x 2
y 2
b 1 x 3
y 3
2 x 4
y 4
3 x 5
y 5 I'd like to "explode" the |
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.
* [ENH] Add DataFrame method to explode a list-like column (GH #16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case. * move to Series * handle generic list-like * lint on asv * move is_list_like to cython and share impl * moar docs * test larger sides to avoid a segfault * fix ref * typos * benchmarks wrong * add inversion * add usecase * cimport is_list_like * use cimports * doc-string * docs & lint * isort * clean object check & update doc-strings * lint * test for nested * better test * try adding frame * test for nested EA * lint * remove multi subset support * update docs * doc-string * add test for MI * lint and docs * ordering * moar lint * multi-index column support * 32-bit compat * moar 32-bit compat
Hello
I know it is not a problem per se but I think there should be some pandas built in solution for this problem, as no simple function exists for this.
It is largely discussed here with various ideas (https://stackoverflow.com/questions/12680754/split-pandas-dataframe-string-entry-to-separate-rows) but they are quite tricky actually, not using built in things but hacking normal behavior for the most part.
I believe that could use some strategy drawn from the reindexing/filling functions, but as I am not a Pandas specialist, I am not sure this would use the most efficient function to perform this task.
Let's say we have
Best regards
The text was updated successfully, but these errors were encountered: