Squashed commit of the following:

commit 28c61d770f6dfca6857fd0fa6979d4119a31129e Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:18:19 2018 -0600 uncomment commit bae2e322523efc73a1344464f51611e2dc555ccb Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:17:09 2018 -0600 maybe fixes commit 6cb4db05c9d6ceba3794096f0172cae5ed5f6019 Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:57:37 2018 -0600 we back commit d97ab57fb32cb23371169d9ed659ccfac34cfe45 Merge: a117de4 b78aa8d Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:51:51 2018 -0600 Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased2 commit b78aa8d Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:18:44 2018 -0500 REF/TST: Add pytest idiom to reshape/test_tile (pandas-dev#24107) commit 2993b8e Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:17:55 2018 -0500 REF/TST: Add more pytest idiom to scalar/test_nat (pandas-dev#24120) commit b841374 Author: evangelineliu <hsiyinliu@gmail.com> Date: Wed Dec 5 18:21:46 2018 -0500 BUG: Fix concat series loss of timezone (pandas-dev#24027) commit 4ae63aa Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:44:50 2018 -0800 Implement DatetimeArray._from_sequence (pandas-dev#24074) commit 2643721 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:43:45 2018 -0800 CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116) commit 8ea7744 Author: chris-b1 <cbartak@gmail.com> Date: Wed Dec 5 14:21:23 2018 -0600 PERF: ascii c string functions (pandas-dev#23981) commit cb862e4 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 12:19:46 2018 -0800 BUG: fix mutation of DTI backing Series/DataFrame (pandas-dev#24096) commit aead29b Author: topper-123 <contribute@tensortable.com> Date: Wed Dec 5 19:06:00 2018 +0000 API: rename MultiIndex.labels to MultiIndex.codes (pandas-dev#23752)
TomAugspurger · Dec 6, 2018 · 1f463a1 · 1f463a1
1 parent a117de4
commit 1f463a1
Show file tree

Hide file tree

Showing 111 changed files with 2,682 additions and 2,325 deletions.
diff --git a/LICENSES/MUSL_LICENSE b/LICENSES/MUSL_LICENSE
@@ -0,0 +1,132 @@
+musl as a whole is licensed under the following standard MIT license:
+
+----------------------------------------------------------------------
+Copyright © 2005-2014 Rich Felker, et al.
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+----------------------------------------------------------------------
+
+Authors/contributors include:
+
+Anthony G. Basile
+Arvid Picciani
+Bobby Bingham
+Boris Brezillon
+Brent Cook
+Chris Spiegel
+Clément Vasseur
+Emil Renner Berthing
+Hiltjo Posthuma
+Isaac Dunham
+Jens Gustedt
+Jeremy Huntwork
+John Spencer
+Justin Cormack
+Luca Barbato
+Luka Perkov
+M Farkas-Dyck (Strake)
+Michael Forney
+Nicholas J. Kain
+orc
+Pascal Cuoq
+Pierre Carrier
+Rich Felker
+Richard Pennington
+sin
+Solar Designer
+Stefan Kristiansson
+Szabolcs Nagy
+Timo Teräs
+Valentin Ochs
+William Haddon
+
+Portions of this software are derived from third-party works licensed
+under terms compatible with the above MIT license:
+
+The TRE regular expression implementation (src/regex/reg* and
+src/regex/tre*) is Copyright © 2001-2008 Ville Laurikari and licensed
+under a 2-clause BSD license (license text in the source files). The
+included version has been heavily modified by Rich Felker in 2012, in
+the interests of size, simplicity, and namespace cleanliness.
+
+Much of the math library code (src/math/* and src/complex/*) is
+Copyright © 1993,2004 Sun Microsystems or
+Copyright © 2003-2011 David Schultz or
+Copyright © 2003-2009 Steven G. Kargl or
+Copyright © 2003-2009 Bruce D. Evans or
+Copyright © 2008 Stephen L. Moshier
+and labelled as such in comments in the individual source files. All
+have been licensed under extremely permissive terms.
+
+The ARM memcpy code (src/string/armel/memcpy.s) is Copyright © 2008
+The Android Open Source Project and is licensed under a two-clause BSD
+license. It was taken from Bionic libc, used on Android.
+
+The implementation of DES for crypt (src/misc/crypt_des.c) is
+Copyright © 1994 David Burren. It is licensed under a BSD license.
+
+The implementation of blowfish crypt (src/misc/crypt_blowfish.c) was
+originally written by Solar Designer and placed into the public
+domain. The code also comes with a fallback permissive license for use
+in jurisdictions that may not recognize the public domain.
+
+The smoothsort implementation (src/stdlib/qsort.c) is Copyright © 2011
+Valentin Ochs and is licensed under an MIT-style license.
+
+The BSD PRNG implementation (src/prng/random.c) and XSI search API
+(src/search/*.c) functions are Copyright © 2011 Szabolcs Nagy and
+licensed under following terms: "Permission to use, copy, modify,
+and/or distribute this code for any purpose with or without fee is
+hereby granted. There is no warranty."
+
+The x86_64 port was written by Nicholas J. Kain. Several files (crt)
+were released into the public domain; others are licensed under the
+standard MIT license terms at the top of this file. See individual
+files for their copyright status.
+
+The mips and microblaze ports were originally written by Richard
+Pennington for use in the ellcc project. The original code was adapted
+by Rich Felker for build system and code conventions during upstream
+integration. It is licensed under the standard MIT terms.
+
+The powerpc port was also originally written by Richard Pennington,
+and later supplemented and integrated by John Spencer. It is licensed
+under the standard MIT terms.
+
+All other files which have no copyright comments are original works
+produced specifically for use as part of this library, written either
+by Rich Felker, the main author of the library, or by one or more
+contibutors listed above. Details on authorship of individual files
+can be found in the git version control history of the project. The
+omission of copyright and license comments in each file is in the
+interest of source tree size.
+
+All public header files (include/* and arch/*/bits/*) should be
+treated as Public Domain as they intentionally contain no content
+which can be covered by copyright. Some source modules may fall in
+this category as well. If you believe that a file is so trivial that
+it should be in the Public Domain, please contact the authors and
+request an explicit statement releasing it from copyright.
+
+The following files are trivial, believed not to be copyrightable in
+the first place, and hereby explicitly released to the Public Domain:
+
+All public headers: include/*, arch/*/bits/*
+Startup files: crt/*
diff --git a/asv_bench/benchmarks/groupby.py b/asv_bench/benchmarks/groupby.py
@@ -473,8 +473,8 @@ def setup(self):
         n1 = 400
         n2 = 250
         index = MultiIndex(levels=[np.arange(n1), tm.makeStringIndex(n2)],
-                           labels=[np.repeat(range(n1), n2).tolist(),
-                                   list(range(n2)) * n1],
+                           codes=[np.repeat(range(n1), n2).tolist(),
+                                  list(range(n2)) * n1],
                            names=['lev1', 'lev2'])
         arr = np.random.randn(n1 * n2, 3)
         arr[::10000, 0] = np.nan

diff --git a/asv_bench/benchmarks/join_merge.py b/asv_bench/benchmarks/join_merge.py
@@ -115,16 +115,16 @@ class Join(object):
     def setup(self, sort):
         level1 = tm.makeStringIndex(10).values
         level2 = tm.makeStringIndex(1000).values
-        label1 = np.arange(10).repeat(1000)
-        label2 = np.tile(np.arange(1000), 10)
+        codes1 = np.arange(10).repeat(1000)
+        codes2 = np.tile(np.arange(1000), 10)
         index2 = MultiIndex(levels=[level1, level2],
-                            labels=[label1, label2])
+                            codes=[codes1, codes2])
         self.df_multi = DataFrame(np.random.randn(len(index2), 4),
                                   index=index2,
                                   columns=['A', 'B', 'C', 'D'])
 
-        self.key1 = np.tile(level1.take(label1), 10)
-        self.key2 = np.tile(level2.take(label2), 10)
+        self.key1 = np.tile(level1.take(codes1), 10)
+        self.key2 = np.tile(level2.take(codes2), 10)
         self.df = DataFrame({'data1': np.random.randn(100000),
                              'data2': np.random.randn(100000),
                              'key1': self.key1,

diff --git a/asv_bench/benchmarks/multiindex_object.py b/asv_bench/benchmarks/multiindex_object.py
@@ -79,8 +79,8 @@ def setup(self):
         levels = [np.arange(n),
                   tm.makeStringIndex(n).values,
                   1000 + np.arange(n)]
-        labels = [np.random.choice(n, (k * n)) for lev in levels]
-        self.mi = MultiIndex(levels=levels, labels=labels)
+        codes = [np.random.choice(n, (k * n)) for lev in levels]
+        self.mi = MultiIndex(levels=levels, codes=codes)
 
     def time_duplicated(self):
         self.mi.duplicated()

diff --git a/asv_bench/benchmarks/reindex.py b/asv_bench/benchmarks/reindex.py
@@ -71,9 +71,9 @@ class LevelAlign(object):
     def setup(self):
         self.index = MultiIndex(
             levels=[np.arange(10), np.arange(100), np.arange(100)],
-            labels=[np.arange(10).repeat(10000),
-                    np.tile(np.arange(100).repeat(100), 10),
-                    np.tile(np.tile(np.arange(100), 100), 10)])
+            codes=[np.arange(10).repeat(10000),
+                   np.tile(np.arange(100).repeat(100), 10),
+                   np.tile(np.tile(np.arange(100), 100), 10)])
         self.df = DataFrame(np.random.randn(len(self.index), 4),
                             index=self.index)
         self.df_level = DataFrame(np.random.randn(100, 4),

diff --git a/asv_bench/benchmarks/stat_ops.py b/asv_bench/benchmarks/stat_ops.py
@@ -31,10 +31,10 @@ class FrameMultiIndexOps(object):
 
     def setup(self, level, op):
         levels = [np.arange(10), np.arange(100), np.arange(100)]
-        labels = [np.arange(10).repeat(10000),
-                  np.tile(np.arange(100).repeat(100), 10),
-                  np.tile(np.tile(np.arange(100), 100), 10)]
-        index = pd.MultiIndex(levels=levels, labels=labels)
+        codes = [np.arange(10).repeat(10000),
+                 np.tile(np.arange(100).repeat(100), 10),
+                 np.tile(np.tile(np.arange(100), 100), 10)]
+        index = pd.MultiIndex(levels=levels, codes=codes)
         df = pd.DataFrame(np.random.randn(len(index), 4), index=index)
         self.df_func = getattr(df, op)
 
@@ -67,10 +67,10 @@ class SeriesMultiIndexOps(object):
 
     def setup(self, level, op):
         levels = [np.arange(10), np.arange(100), np.arange(100)]
-        labels = [np.arange(10).repeat(10000),
-                  np.tile(np.arange(100).repeat(100), 10),
-                  np.tile(np.tile(np.arange(100), 100), 10)]
-        index = pd.MultiIndex(levels=levels, labels=labels)
+        codes = [np.arange(10).repeat(10000),
+                 np.tile(np.arange(100).repeat(100), 10),
+                 np.tile(np.tile(np.arange(100), 100), 10)]
+        index = pd.MultiIndex(levels=levels, codes=codes)
         s = pd.Series(np.random.randn(len(index)), index=index)
         self.s_func = getattr(s, op)
 

diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
@@ -49,6 +49,11 @@ analysis.
 
 See the :ref:`cookbook<cookbook.multi_index>` for some advanced strategies.
 
+.. versionchanged:: 0.24.0
+
+   :attr:`MultiIndex.labels` has been renamed to :attr:`MultiIndex.codes`
+   and :attr:`MultiIndex.set_labels` to :attr:`MultiIndex.set_codes`.
+
 Creating a MultiIndex (hierarchical index) object
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -469,7 +474,7 @@ values across a level. For instance:
 .. ipython:: python
 
    midx = pd.MultiIndex(levels=[['zero', 'one'], ['x', 'y']],
-                        labels=[[1, 1, 0, 0], [1, 0, 1, 0]])
+                        codes=[[1, 1, 0, 0], [1, 0, 1, 0]])
    df = pd.DataFrame(np.random.randn(4, 2), index=midx)
    df
    df2 = df.mean(level=0)

diff --git a/doc/source/api.rst b/doc/source/api.rst
@@ -1712,7 +1712,7 @@ MultiIndex Attributes
 
    MultiIndex.names
    MultiIndex.levels
-   MultiIndex.labels
+   MultiIndex.codes
    MultiIndex.nlevels
    MultiIndex.levshape
 
@@ -1723,7 +1723,7 @@ MultiIndex Components
    :toctree: generated/
 
    MultiIndex.set_levels
-   MultiIndex.set_labels
+   MultiIndex.set_codes
    MultiIndex.to_hierarchical
    MultiIndex.to_flat_index
    MultiIndex.to_frame

diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
@@ -961,7 +961,7 @@ From DataFrame using ``to_panel`` method
 .. ipython:: python
    :okwarning:
 
-   midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], labels=[[1,1,0,0],[1,0,1,0]])
+   midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], codes=[[1,1,0,0],[1,0,1,0]])
    df = pd.DataFrame({'A' : [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=midx)
    df.to_panel()
 

diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst
@@ -1571,9 +1571,9 @@ Setting metadata
 
 Indexes are "mostly immutable", but it is possible to set and change their
 metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and
-``labels``).
+``codes``).
 
-You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_labels``
+You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_codes``
 to set these attributes directly. They default to returning a copy; however,
 you can specify ``inplace=True`` to have the data change in place.
 
@@ -1588,7 +1588,7 @@ See :ref:`Advanced Indexing <advanced>` for usage of MultiIndexes.
   ind.name = "bob"
   ind
 
-``set_names``, ``set_levels``, and ``set_labels`` also take an optional
+``set_names``, ``set_levels``, and ``set_codes`` also take an optional
 `level`` argument
 
 .. ipython:: python

diff --git a/doc/source/internals.rst b/doc/source/internals.rst
@@ -74,23 +74,23 @@ MultiIndex
 ~~~~~~~~~~
 
 Internally, the ``MultiIndex`` consists of a few things: the **levels**, the
-integer **labels**, and the level **names**:
+integer **codes** (until version 0.24 named *labels*), and the level **names**:
 
 .. ipython:: python
 
    index = pd.MultiIndex.from_product([range(3), ['one', 'two']],
                                       names=['first', 'second'])
    index
    index.levels
-   index.labels
+   index.codes
    index.names
 
-You can probably guess that the labels determine which unique element is
+You can probably guess that the codes determine which unique element is
 identified with that location at each layer of the index. It's important to
-note that sortedness is determined **solely** from the integer labels and does
+note that sortedness is determined **solely** from the integer codes and does
 not check (or care) whether the levels themselves are sorted. Fortunately, the
 constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
-if you compute the levels and labels yourself, please be careful.
+if you compute the levels and codes yourself, please be careful.
 
 Values
 ~~~~~~

diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -3728,8 +3728,8 @@ storing/selecting from homogeneous index ``DataFrames``.
 
         index = pd.MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                                       ['one', 'two', 'three']],
-                              labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
-                                      [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
+                              codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
+                                     [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
                               names=['foo', 'bar'])
         df_mi = pd.DataFrame(np.random.randn(10, 3), index=index,
                              columns=['A', 'B', 'C'])

diff --git a/doc/source/whatsnew/v0.24.0.rst b/doc/source/whatsnew/v0.24.0.rst
@@ -1101,6 +1101,13 @@ Other API Changes
 Deprecations
 ~~~~~~~~~~~~
 
+- :attr:`MultiIndex.labels` has been deprecated and replaced by :attr:`MultiIndex.codes`.
+  The functionality is unchanged. The new name better reflects the natures of
+  these codes and makes the ``MultiIndex`` API more similar to the API for :class:`CategoricalIndex`(:issue:`13443`).
+  As a consequence, other uses of the name ``labels`` in ``MultiIndex`` have also been deprecated and replaced with ``codes``:
+  - You should initialize a ``MultiIndex`` instance using a parameter named ``codes`` rather than ``labels``.
+  - ``MultiIndex.set_labels`` has been deprecated in favor of :meth:`MultiIndex.set_codes`.
+  - For method :meth:`MultiIndex.copy`, the ``labels`` parameter has been deprecated and replaced by a ``codes`` parameter.
 - :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated the ``encoding`` argument. The encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`)
 - :meth:`MultiIndex.to_hierarchical` is deprecated and will be removed in a future version (:issue:`21613`)
 - :meth:`Series.ptp` is deprecated. Use ``numpy.ptp`` instead (:issue:`21614`)
@@ -1236,6 +1243,7 @@ Performance Improvements
 - Improved performance of :func:`pd.concat` for `Series` objects (:issue:`23404`)
 - Improved performance of :meth:`DatetimeIndex.normalize` and :meth:`Timestamp.normalize` for timezone naive or UTC datetimes (:issue:`23634`)
 - Improved performance of :meth:`DatetimeIndex.tz_localize` and various ``DatetimeIndex`` attributes with dateutil UTC timezone (:issue:`23772`)
+- Fixed a performance regression on Windows with Python 3.7 of :func:`pd.read_csv` (:issue:`23516`)
 - Improved performance of :class:`Categorical` constructor for `Series` objects (:issue:`23814`)
 - Improved performance of :meth:`~DataFrame.where` for Categorical data (:issue:`24077`)
 
@@ -1549,6 +1557,7 @@ Reshaping
 - Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
 - Bug in ``Series`` construction when passing no data and ``dtype=str`` (:issue:`22477`)
 - Bug in :func:`cut` with ``bins`` as an overlapping ``IntervalIndex`` where multiple bins were returned per item instead of raising a ``ValueError`` (:issue:`23980`)
+- Bug in :func:`pandas.concat` when joining ``Series`` datetimetz with ``Series`` category would lose timezone (:issue:`23816`)
 - Bug in :meth:`DataFrame.join` when joining on partial MultiIndex would drop names (:issue:`20452`).
 
 .. _whatsnew_0240.bug_fixes.sparse:

diff --git a/pandas/_libs/src/headers/portable.h b/pandas/_libs/src/headers/portable.h
@@ -5,4 +5,10 @@
 #define strcasecmp( s1, s2 ) _stricmp( s1, s2 )
 #endif
 
+// GH-23516 - works around locale perf issues
+// from MUSL libc, MIT Licensed - see LICENSES
+#define isdigit_ascii(c) ((unsigned)c - '0' < 10)
+#define isspace_ascii(c) (c == ' ' || (unsigned)c-'\t' < 5)
+#define toupper_ascii(c) (((unsigned)c-'a' < 26) ? (c & 0x5f) : c)
+
 #endif