Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 28c61d770f6dfca6857fd0fa6979d4119a31129e
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 12:18:19 2018 -0600

    uncomment

commit bae2e322523efc73a1344464f51611e2dc555ccb
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 12:17:09 2018 -0600

    maybe fixes

commit 6cb4db05c9d6ceba3794096f0172cae5ed5f6019
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 09:57:37 2018 -0600

    we back

commit d97ab57fb32cb23371169d9ed659ccfac34cfe45
Merge: a117de4 b78aa8d
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 09:51:51 2018 -0600

    Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased2

commit b78aa8d
Author: gfyoung <gfyoung17+GitHub@gmail.com>
Date:   Thu Dec 6 07:18:44 2018 -0500

    REF/TST: Add pytest idiom to reshape/test_tile (pandas-dev#24107)

commit 2993b8e
Author: gfyoung <gfyoung17+GitHub@gmail.com>
Date:   Thu Dec 6 07:17:55 2018 -0500

    REF/TST: Add more pytest idiom to scalar/test_nat (pandas-dev#24120)

commit b841374
Author: evangelineliu <hsiyinliu@gmail.com>
Date:   Wed Dec 5 18:21:46 2018 -0500

    BUG: Fix concat series loss of timezone (pandas-dev#24027)

commit 4ae63aa
Author: jbrockmendel <jbrockmendel@gmail.com>
Date:   Wed Dec 5 14:44:50 2018 -0800

    Implement DatetimeArray._from_sequence (pandas-dev#24074)

commit 2643721
Author: jbrockmendel <jbrockmendel@gmail.com>
Date:   Wed Dec 5 14:43:45 2018 -0800

    CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116)

commit 8ea7744
Author: chris-b1 <cbartak@gmail.com>
Date:   Wed Dec 5 14:21:23 2018 -0600

    PERF: ascii c string functions (pandas-dev#23981)

commit cb862e4
Author: jbrockmendel <jbrockmendel@gmail.com>
Date:   Wed Dec 5 12:19:46 2018 -0800

    BUG: fix mutation of DTI backing Series/DataFrame (pandas-dev#24096)

commit aead29b
Author: topper-123 <contribute@tensortable.com>
Date:   Wed Dec 5 19:06:00 2018 +0000

    API: rename MultiIndex.labels to MultiIndex.codes (pandas-dev#23752)
  • Loading branch information
TomAugspurger committed Dec 6, 2018
1 parent a117de4 commit 1f463a1
Show file tree
Hide file tree
Showing 111 changed files with 2,682 additions and 2,325 deletions.
132 changes: 132 additions & 0 deletions LICENSES/MUSL_LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
musl as a whole is licensed under the following standard MIT license:

----------------------------------------------------------------------
Copyright © 2005-2014 Rich Felker, et al.

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
----------------------------------------------------------------------

Authors/contributors include:

Anthony G. Basile
Arvid Picciani
Bobby Bingham
Boris Brezillon
Brent Cook
Chris Spiegel
Clément Vasseur
Emil Renner Berthing
Hiltjo Posthuma
Isaac Dunham
Jens Gustedt
Jeremy Huntwork
John Spencer
Justin Cormack
Luca Barbato
Luka Perkov
M Farkas-Dyck (Strake)
Michael Forney
Nicholas J. Kain
orc
Pascal Cuoq
Pierre Carrier
Rich Felker
Richard Pennington
sin
Solar Designer
Stefan Kristiansson
Szabolcs Nagy
Timo Teräs
Valentin Ochs
William Haddon

Portions of this software are derived from third-party works licensed
under terms compatible with the above MIT license:

The TRE regular expression implementation (src/regex/reg* and
src/regex/tre*) is Copyright © 2001-2008 Ville Laurikari and licensed
under a 2-clause BSD license (license text in the source files). The
included version has been heavily modified by Rich Felker in 2012, in
the interests of size, simplicity, and namespace cleanliness.

Much of the math library code (src/math/* and src/complex/*) is
Copyright © 1993,2004 Sun Microsystems or
Copyright © 2003-2011 David Schultz or
Copyright © 2003-2009 Steven G. Kargl or
Copyright © 2003-2009 Bruce D. Evans or
Copyright © 2008 Stephen L. Moshier
and labelled as such in comments in the individual source files. All
have been licensed under extremely permissive terms.

The ARM memcpy code (src/string/armel/memcpy.s) is Copyright © 2008
The Android Open Source Project and is licensed under a two-clause BSD
license. It was taken from Bionic libc, used on Android.

The implementation of DES for crypt (src/misc/crypt_des.c) is
Copyright © 1994 David Burren. It is licensed under a BSD license.

The implementation of blowfish crypt (src/misc/crypt_blowfish.c) was
originally written by Solar Designer and placed into the public
domain. The code also comes with a fallback permissive license for use
in jurisdictions that may not recognize the public domain.

The smoothsort implementation (src/stdlib/qsort.c) is Copyright © 2011
Valentin Ochs and is licensed under an MIT-style license.

The BSD PRNG implementation (src/prng/random.c) and XSI search API
(src/search/*.c) functions are Copyright © 2011 Szabolcs Nagy and
licensed under following terms: "Permission to use, copy, modify,
and/or distribute this code for any purpose with or without fee is
hereby granted. There is no warranty."

The x86_64 port was written by Nicholas J. Kain. Several files (crt)
were released into the public domain; others are licensed under the
standard MIT license terms at the top of this file. See individual
files for their copyright status.

The mips and microblaze ports were originally written by Richard
Pennington for use in the ellcc project. The original code was adapted
by Rich Felker for build system and code conventions during upstream
integration. It is licensed under the standard MIT terms.

The powerpc port was also originally written by Richard Pennington,
and later supplemented and integrated by John Spencer. It is licensed
under the standard MIT terms.

All other files which have no copyright comments are original works
produced specifically for use as part of this library, written either
by Rich Felker, the main author of the library, or by one or more
contibutors listed above. Details on authorship of individual files
can be found in the git version control history of the project. The
omission of copyright and license comments in each file is in the
interest of source tree size.

All public header files (include/* and arch/*/bits/*) should be
treated as Public Domain as they intentionally contain no content
which can be covered by copyright. Some source modules may fall in
this category as well. If you believe that a file is so trivial that
it should be in the Public Domain, please contact the authors and
request an explicit statement releasing it from copyright.

The following files are trivial, believed not to be copyrightable in
the first place, and hereby explicitly released to the Public Domain:

All public headers: include/*, arch/*/bits/*
Startup files: crt/*
4 changes: 2 additions & 2 deletions asv_bench/benchmarks/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -473,8 +473,8 @@ def setup(self):
n1 = 400
n2 = 250
index = MultiIndex(levels=[np.arange(n1), tm.makeStringIndex(n2)],
labels=[np.repeat(range(n1), n2).tolist(),
list(range(n2)) * n1],
codes=[np.repeat(range(n1), n2).tolist(),
list(range(n2)) * n1],
names=['lev1', 'lev2'])
arr = np.random.randn(n1 * n2, 3)
arr[::10000, 0] = np.nan
Expand Down
10 changes: 5 additions & 5 deletions asv_bench/benchmarks/join_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,16 +115,16 @@ class Join(object):
def setup(self, sort):
level1 = tm.makeStringIndex(10).values
level2 = tm.makeStringIndex(1000).values
label1 = np.arange(10).repeat(1000)
label2 = np.tile(np.arange(1000), 10)
codes1 = np.arange(10).repeat(1000)
codes2 = np.tile(np.arange(1000), 10)
index2 = MultiIndex(levels=[level1, level2],
labels=[label1, label2])
codes=[codes1, codes2])
self.df_multi = DataFrame(np.random.randn(len(index2), 4),
index=index2,
columns=['A', 'B', 'C', 'D'])

self.key1 = np.tile(level1.take(label1), 10)
self.key2 = np.tile(level2.take(label2), 10)
self.key1 = np.tile(level1.take(codes1), 10)
self.key2 = np.tile(level2.take(codes2), 10)
self.df = DataFrame({'data1': np.random.randn(100000),
'data2': np.random.randn(100000),
'key1': self.key1,
Expand Down
4 changes: 2 additions & 2 deletions asv_bench/benchmarks/multiindex_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@ def setup(self):
levels = [np.arange(n),
tm.makeStringIndex(n).values,
1000 + np.arange(n)]
labels = [np.random.choice(n, (k * n)) for lev in levels]
self.mi = MultiIndex(levels=levels, labels=labels)
codes = [np.random.choice(n, (k * n)) for lev in levels]
self.mi = MultiIndex(levels=levels, codes=codes)

def time_duplicated(self):
self.mi.duplicated()
Expand Down
6 changes: 3 additions & 3 deletions asv_bench/benchmarks/reindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,9 @@ class LevelAlign(object):
def setup(self):
self.index = MultiIndex(
levels=[np.arange(10), np.arange(100), np.arange(100)],
labels=[np.arange(10).repeat(10000),
np.tile(np.arange(100).repeat(100), 10),
np.tile(np.tile(np.arange(100), 100), 10)])
codes=[np.arange(10).repeat(10000),
np.tile(np.arange(100).repeat(100), 10),
np.tile(np.tile(np.arange(100), 100), 10)])
self.df = DataFrame(np.random.randn(len(self.index), 4),
index=self.index)
self.df_level = DataFrame(np.random.randn(100, 4),
Expand Down
16 changes: 8 additions & 8 deletions asv_bench/benchmarks/stat_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@ class FrameMultiIndexOps(object):

def setup(self, level, op):
levels = [np.arange(10), np.arange(100), np.arange(100)]
labels = [np.arange(10).repeat(10000),
np.tile(np.arange(100).repeat(100), 10),
np.tile(np.tile(np.arange(100), 100), 10)]
index = pd.MultiIndex(levels=levels, labels=labels)
codes = [np.arange(10).repeat(10000),
np.tile(np.arange(100).repeat(100), 10),
np.tile(np.tile(np.arange(100), 100), 10)]
index = pd.MultiIndex(levels=levels, codes=codes)
df = pd.DataFrame(np.random.randn(len(index), 4), index=index)
self.df_func = getattr(df, op)

Expand Down Expand Up @@ -67,10 +67,10 @@ class SeriesMultiIndexOps(object):

def setup(self, level, op):
levels = [np.arange(10), np.arange(100), np.arange(100)]
labels = [np.arange(10).repeat(10000),
np.tile(np.arange(100).repeat(100), 10),
np.tile(np.tile(np.arange(100), 100), 10)]
index = pd.MultiIndex(levels=levels, labels=labels)
codes = [np.arange(10).repeat(10000),
np.tile(np.arange(100).repeat(100), 10),
np.tile(np.tile(np.arange(100), 100), 10)]
index = pd.MultiIndex(levels=levels, codes=codes)
s = pd.Series(np.random.randn(len(index)), index=index)
self.s_func = getattr(s, op)

Expand Down
7 changes: 6 additions & 1 deletion doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@ analysis.

See the :ref:`cookbook<cookbook.multi_index>` for some advanced strategies.

.. versionchanged:: 0.24.0

:attr:`MultiIndex.labels` has been renamed to :attr:`MultiIndex.codes`
and :attr:`MultiIndex.set_labels` to :attr:`MultiIndex.set_codes`.

Creating a MultiIndex (hierarchical index) object
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -469,7 +474,7 @@ values across a level. For instance:
.. ipython:: python
midx = pd.MultiIndex(levels=[['zero', 'one'], ['x', 'y']],
labels=[[1, 1, 0, 0], [1, 0, 1, 0]])
codes=[[1, 1, 0, 0], [1, 0, 1, 0]])
df = pd.DataFrame(np.random.randn(4, 2), index=midx)
df
df2 = df.mean(level=0)
Expand Down
4 changes: 2 additions & 2 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1712,7 +1712,7 @@ MultiIndex Attributes

MultiIndex.names
MultiIndex.levels
MultiIndex.labels
MultiIndex.codes
MultiIndex.nlevels
MultiIndex.levshape

Expand All @@ -1723,7 +1723,7 @@ MultiIndex Components
:toctree: generated/

MultiIndex.set_levels
MultiIndex.set_labels
MultiIndex.set_codes
MultiIndex.to_hierarchical
MultiIndex.to_flat_index
MultiIndex.to_frame
Expand Down
2 changes: 1 addition & 1 deletion doc/source/dsintro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -961,7 +961,7 @@ From DataFrame using ``to_panel`` method
.. ipython:: python
:okwarning:
midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], labels=[[1,1,0,0],[1,0,1,0]])
midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], codes=[[1,1,0,0],[1,0,1,0]])
df = pd.DataFrame({'A' : [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=midx)
df.to_panel()
Expand Down
6 changes: 3 additions & 3 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1571,9 +1571,9 @@ Setting metadata

Indexes are "mostly immutable", but it is possible to set and change their
metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and
``labels``).
``codes``).

You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_labels``
You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_codes``
to set these attributes directly. They default to returning a copy; however,
you can specify ``inplace=True`` to have the data change in place.

Expand All @@ -1588,7 +1588,7 @@ See :ref:`Advanced Indexing <advanced>` for usage of MultiIndexes.
ind.name = "bob"
ind
``set_names``, ``set_levels``, and ``set_labels`` also take an optional
``set_names``, ``set_levels``, and ``set_codes`` also take an optional
`level`` argument

.. ipython:: python
Expand Down
10 changes: 5 additions & 5 deletions doc/source/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,23 +74,23 @@ MultiIndex
~~~~~~~~~~

Internally, the ``MultiIndex`` consists of a few things: the **levels**, the
integer **labels**, and the level **names**:
integer **codes** (until version 0.24 named *labels*), and the level **names**:

.. ipython:: python
index = pd.MultiIndex.from_product([range(3), ['one', 'two']],
names=['first', 'second'])
index
index.levels
index.labels
index.codes
index.names
You can probably guess that the labels determine which unique element is
You can probably guess that the codes determine which unique element is
identified with that location at each layer of the index. It's important to
note that sortedness is determined **solely** from the integer labels and does
note that sortedness is determined **solely** from the integer codes and does
not check (or care) whether the levels themselves are sorted. Fortunately, the
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
if you compute the levels and labels yourself, please be careful.
if you compute the levels and codes yourself, please be careful.

Values
~~~~~~
Expand Down
4 changes: 2 additions & 2 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3728,8 +3728,8 @@ storing/selecting from homogeneous index ``DataFrames``.
index = pd.MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
['one', 'two', 'three']],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
names=['foo', 'bar'])
df_mi = pd.DataFrame(np.random.randn(10, 3), index=index,
columns=['A', 'B', 'C'])
Expand Down
9 changes: 9 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1101,6 +1101,13 @@ Other API Changes
Deprecations
~~~~~~~~~~~~

- :attr:`MultiIndex.labels` has been deprecated and replaced by :attr:`MultiIndex.codes`.
The functionality is unchanged. The new name better reflects the natures of
these codes and makes the ``MultiIndex`` API more similar to the API for :class:`CategoricalIndex`(:issue:`13443`).
As a consequence, other uses of the name ``labels`` in ``MultiIndex`` have also been deprecated and replaced with ``codes``:
- You should initialize a ``MultiIndex`` instance using a parameter named ``codes`` rather than ``labels``.
- ``MultiIndex.set_labels`` has been deprecated in favor of :meth:`MultiIndex.set_codes`.
- For method :meth:`MultiIndex.copy`, the ``labels`` parameter has been deprecated and replaced by a ``codes`` parameter.
- :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated the ``encoding`` argument. The encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`)
- :meth:`MultiIndex.to_hierarchical` is deprecated and will be removed in a future version (:issue:`21613`)
- :meth:`Series.ptp` is deprecated. Use ``numpy.ptp`` instead (:issue:`21614`)
Expand Down Expand Up @@ -1236,6 +1243,7 @@ Performance Improvements
- Improved performance of :func:`pd.concat` for `Series` objects (:issue:`23404`)
- Improved performance of :meth:`DatetimeIndex.normalize` and :meth:`Timestamp.normalize` for timezone naive or UTC datetimes (:issue:`23634`)
- Improved performance of :meth:`DatetimeIndex.tz_localize` and various ``DatetimeIndex`` attributes with dateutil UTC timezone (:issue:`23772`)
- Fixed a performance regression on Windows with Python 3.7 of :func:`pd.read_csv` (:issue:`23516`)
- Improved performance of :class:`Categorical` constructor for `Series` objects (:issue:`23814`)
- Improved performance of :meth:`~DataFrame.where` for Categorical data (:issue:`24077`)

Expand Down Expand Up @@ -1549,6 +1557,7 @@ Reshaping
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
- Bug in ``Series`` construction when passing no data and ``dtype=str`` (:issue:`22477`)
- Bug in :func:`cut` with ``bins`` as an overlapping ``IntervalIndex`` where multiple bins were returned per item instead of raising a ``ValueError`` (:issue:`23980`)
- Bug in :func:`pandas.concat` when joining ``Series`` datetimetz with ``Series`` category would lose timezone (:issue:`23816`)
- Bug in :meth:`DataFrame.join` when joining on partial MultiIndex would drop names (:issue:`20452`).

.. _whatsnew_0240.bug_fixes.sparse:
Expand Down
6 changes: 6 additions & 0 deletions pandas/_libs/src/headers/portable.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,10 @@
#define strcasecmp( s1, s2 ) _stricmp( s1, s2 )
#endif

// GH-23516 - works around locale perf issues
// from MUSL libc, MIT Licensed - see LICENSES
#define isdigit_ascii(c) ((unsigned)c - '0' < 10)
#define isspace_ascii(c) (c == ' ' || (unsigned)c-'\t' < 5)
#define toupper_ascii(c) (((unsigned)c-'a' < 26) ? (c & 0x5f) : c)

#endif
Loading

0 comments on commit 1f463a1

Please sign in to comment.