ENH: (explode) Splitting a column content over multiple rows while duplicating other columns content to these rows #16538

BLMeltdown · 2017-05-30T13:21:16Z

Hello
I know it is not a problem per se but I think there should be some pandas built in solution for this problem, as no simple function exists for this.
It is largely discussed here with various ideas (https://stackoverflow.com/questions/12680754/split-pandas-dataframe-string-entry-to-separate-rows) but they are quite tricky actually, not using built in things but hacking normal behavior for the most part.

I believe that could use some strategy drawn from the reindexing/filling functions, but as I am not a Pandas specialist, I am not sure this would use the most efficient function to perform this task.

Let's say we have

df

var1  var2

0 a,b,c x
1 d,e,f y

df.desired_function("var1")

var1 var2
0 a x
1 b x
2 c x
3 d y
4 e y
5 f y

Best regards

The text was updated successfully, but these errors were encountered:

jreback · 2017-05-30T13:25:14Z

can you post a copy-pastable example. having these in-line in the top description (as well as a link) is useful.

BLMeltdown · 2017-05-30T13:31:05Z

Done;)

BLMeltdown · 2017-05-30T13:47:57Z

d=pd.DataFrame(list(zip(*s2.str.split(","))))
Out[93]: 
     0    1    2
0   a2   b2   c2
1   a3   b3   c3

d.stack()
>

0  0     a2
   1     b2
   2     c2
1  0     a3
   1     b3
   2     c3

Is a good starting point, but i guess that reindexing the other columns is not that easy. And anyway, it would be better to use a dedicated algorithm than to make a general call to various functions.

jreback · 2017-05-30T13:56:47Z

you are asking for explode: see some other references here: #8517 (comment)

sure I think we could have this as a string method. It is generally useful.

BLMeltdown · 2017-05-30T14:01:46Z

Thanks a lot! yes, this is it. actually the most satisfactory method I found is from SO InoDB (https://stackoverflow.com/users/1894184/inodb)

a=pd.DataFrame({"var1":"a,b,c d,e,f".split(),"var2":[1,2]})
s = a.var1.str.split(",").apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1)
del a['var1']
a.join(s)
var2 var1
0 1 a
0 1 b
0 1 c
1 2 d
1 2 e
1 2 f

(I came up with the stack and before, I was not good with reindexing and joining)

But anyway a general purpose function with adjustable setting will be nice. Thanks!

…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.

Dr-Irv · 2019-05-15T16:21:53Z

Caught this on pandas-dev-request, and there is a similar feature that I need with respect to indexing.
Here's an example. In this example, I have a Series that has a MultiIndex with names 'ab' and 'ott'. Now I want to repeat that Series for every element in the list xy. So the code I use to do this is as follows:

In [2]: idx = pd.MultiIndex.from_product([['a','b'], [1,2,3]], names=['ab
   ...: ','ott'])
   ...: s = pd.Series(range(6), index=idx)
   ...: s
Out[2]:
ab  ott
a   1      0
    2      1
    3      2
b   1      3
    2      4
    3      5
dtype: int64

In [3]: xy = ['x', 'y']
   ...: (s
   ...:  .to_frame('s')
   ...:  .assign(one=1)
   ...:  .reset_index()
   ...:  .merge(pd.Series(1, index=pd.Index(xy,name='xy'))
   ...:         .to_frame('one').reset_index(), on='one')
   ...:  .set_index(s.index.names+['xy'])
   ...:  .drop(columns=['one'])
   ...: )
Out[3]:
           s
ab ott xy
a  1   x   0
       y   0
   2   x   1
       y   1
   3   x   2
       y   2
b  1   x   3
       y   3
   2   x   4
       y   4
   3   x   5
       y   5

I'd like to "explode" the Series (or a DataFrame ) from the existing MultiIndex by adding a level to the MultiIndex using a list (or any sequence) that I provide. Alternatively, just adding 'xy' as column would work as well.

…ev#16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case.

* [ENH] Add DataFrame method to explode a list-like column (GH #16538) Sometimes a values column is presented with list-like values on one row. Instead we may want to split each individual value onto its own row, keeping the same mapping to the other key columns. While it's possible to chain together existing pandas operations (in fact that's exactly what this implementation is) to do this, the sequence of operations is not obvious. By contrast this is available as a built-in operation in say Spark and is a fairly common use case. * move to Series * handle generic list-like * lint on asv * move is_list_like to cython and share impl * moar docs * test larger sides to avoid a segfault * fix ref * typos * benchmarks wrong * add inversion * add usecase * cimport is_list_like * use cimports * doc-string * docs & lint * isort * clean object check & update doc-strings * lint * test for nested * better test * try adding frame * test for nested EA * lint * remove multi subset support * update docs * doc-string * add test for MI * lint and docs * ordering * moar lint * multi-index column support * 32-bit compat * moar 32-bit compat

jreback added Difficulty Intermediate Enhancement Strings String extension data type and string data labels May 30, 2017

jreback added this to the Next Major Release milestone May 30, 2017

jreback changed the title ~~Splitting a column content over multiple rows while duplicating other columns content to these rows~~ ENH: (explode) Splitting a column content over multiple rows while duplicating other columns content to these rows May 30, 2017

jreback mentioned this issue Dec 30, 2017

DOC: section on caveats of storing lists inside DataFrame/Series #17027

Open

jreback mentioned this issue Nov 3, 2018

Feature suggestion: unnest function for pandas #23470

Closed

changhiskhan mentioned this issue Dec 20, 2018

[ENH] Add DataFrame method to explode a list-like column (GH #16538) #24366

Closed

4 tasks

jreback mentioned this issue Mar 14, 2019

Feature request: Series.flatmap, DataFrame.flatmap #8517

Closed

jreback mentioned this issue Jul 6, 2019

ENH: Add Series method to explode a list-like column #27267

Merged

jreback closed this as completed in #27267 Jul 18, 2019

erfannariman mentioned this issue Jun 22, 2020

ENH: add ignore_index argument to DataFrame.explode / Series.explode #34932

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: (explode) Splitting a column content over multiple rows while duplicating other columns content to these rows #16538

ENH: (explode) Splitting a column content over multiple rows while duplicating other columns content to these rows #16538

BLMeltdown commented May 30, 2017 •

edited by jreback

Loading

jreback commented May 30, 2017

BLMeltdown commented May 30, 2017 •

edited

Loading

BLMeltdown commented May 30, 2017 •

edited by jreback

Loading

jreback commented May 30, 2017

BLMeltdown commented May 30, 2017

Dr-Irv commented May 15, 2019

ENH: (explode) Splitting a column content over multiple rows while duplicating other columns content to these rows #16538

ENH: (explode) Splitting a column content over multiple rows while duplicating other columns content to these rows #16538

Comments

BLMeltdown commented May 30, 2017 • edited by jreback Loading

jreback commented May 30, 2017

BLMeltdown commented May 30, 2017 • edited Loading

BLMeltdown commented May 30, 2017 • edited by jreback Loading

jreback commented May 30, 2017

BLMeltdown commented May 30, 2017

Dr-Irv commented May 15, 2019

BLMeltdown commented May 30, 2017 •

edited by jreback

Loading

BLMeltdown commented May 30, 2017 •

edited

Loading

BLMeltdown commented May 30, 2017 •

edited by jreback

Loading