Skip to content

Commit

Permalink
API: add top-level melt function as method to DataFrame
Browse files Browse the repository at this point in the history
xref pandas-dev#12640
xref pandas-dev#14876

Author: Aleksey Bilogur <aleksey.bilogur@gmail.com>

Closes pandas-dev#15521 from ResidentMario/12640 and squashes the following commits:

1657246 [Aleksey Bilogur] two doc changes
28a38f2 [Aleksey Bilogur] tweak whatsnew entry.
5f306a9 [Aleksey Bilogur] +whatsnew
ff895fe [Aleksey Bilogur] Add tests, update docs.
11f3fe4 [Aleksey Bilogur] rm stray debug.
3cbbed5 [Aleksey Bilogur] Melt docstring.
d54dc2f [Aleksey Bilogur] +pd.DataFrame.melt.
  • Loading branch information
ResidentMario authored and jreback committed Apr 4, 2017
1 parent faf6401 commit e50d397
Show file tree
Hide file tree
Showing 6 changed files with 182 additions and 133 deletions.
1 change: 1 addition & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -933,6 +933,7 @@ Reshaping, sorting, transposing
DataFrame.swaplevel
DataFrame.stack
DataFrame.unstack
DataFrame.melt
DataFrame.T
DataFrame.to_panel
DataFrame.to_xarray
Expand Down
11 changes: 6 additions & 5 deletions doc/source/reshaping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,8 +265,8 @@ the right thing:
Reshaping by Melt
-----------------

The :func:`~pandas.melt` function is useful to massage a
DataFrame into a format where one or more columns are identifier variables,
The top-level :func:``melt` and :func:`~DataFrame.melt` functions are useful to
massage a DataFrame into a format where one or more columns are identifier variables,
while all other columns, considered measured variables, are "unpivoted" to the
row axis, leaving just two non-identifier columns, "variable" and "value". The
names of those columns can be customized by supplying the ``var_name`` and
Expand All @@ -281,10 +281,11 @@ For instance,
'height' : [5.5, 6.0],
'weight' : [130, 150]})
cheese
pd.melt(cheese, id_vars=['first', 'last'])
pd.melt(cheese, id_vars=['first', 'last'], var_name='quantity')
cheese.melt(id_vars=['first', 'last'])
cheese.melt(id_vars=['first', 'last'], var_name='quantity')
Another way to transform is to use the ``wide_to_long`` panel data convenience function.
Another way to transform is to use the ``wide_to_long`` panel data convenience
function.

.. ipython:: python
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,7 @@ Other Enhancements
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)

- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
- ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`).
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).

- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
Expand Down
104 changes: 104 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4051,6 +4051,110 @@ def unstack(self, level=-1, fill_value=None):
from pandas.core.reshape import unstack
return unstack(self, level, fill_value)

_shared_docs['melt'] = ("""
"Unpivots" a DataFrame from wide format to long format, optionally
leaving identifier variables set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
%(versionadded)s
Parameters
----------
frame : DataFrame
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : int or string, optional
If columns are a MultiIndex then use this level to melt.
See also
--------
%(other)s
pivot_table
DataFrame.pivot
Examples
--------
>>> import pandas as pd
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6
>>> %(caller)sid_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> %(caller)sid_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6
The names of 'variable' and 'value' columns can be customized:
>>> %(caller)sid_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5
If you have multi-index columns:
>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6
>>> %(caller)scol_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> %(caller)sid_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5
""")

@Appender(_shared_docs['melt'] %
dict(caller='df.melt(',
versionadded='.. versionadded:: 0.20.0\n',
other='melt'))
def melt(self, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None):
from pandas.core.reshape import melt
return melt(self, id_vars=id_vars, value_vars=value_vars,
var_name=var_name, value_name=value_name,
col_level=col_level)

# ----------------------------------------------------------------------
# Time series-related

Expand Down
96 changes: 6 additions & 90 deletions pandas/core/reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
import pandas.core.algorithms as algos
from pandas._libs import algos as _algos, reshape as _reshape

from pandas.core.frame import _shared_docs
from pandas.util.decorators import Appender
from pandas.core.index import MultiIndex, _get_na_value


Expand Down Expand Up @@ -701,98 +703,12 @@ def _convert_level_number(level_num, columns):
return result


@Appender(_shared_docs['melt'] %
dict(caller='pd.melt(df, ',
versionadded="",
other='DataFrame.melt'))
def melt(frame, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None):
"""
"Unpivots" a DataFrame from wide format to long format, optionally leaving
identifier variables set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
Parameters
----------
frame : DataFrame
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : int or string, optional
If columns are a MultiIndex then use this level to melt.
See also
--------
pivot_table
DataFrame.pivot
Examples
--------
>>> import pandas as pd
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6
>>> pd.melt(df, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6
The names of 'variable' and 'value' columns can be customized:
>>> pd.melt(df, id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5
If you have multi-index columns:
>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6
>>> pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5
"""
# TODO: what about the existing index?
if id_vars is not None:
if not is_list_like(id_vars):
Expand Down
Loading

0 comments on commit e50d397

Please sign in to comment.