Make goals section leaner / more concise per comments

wesm · Aug 23, 2016 · 801259d · 801259d
1 parent 94d0281
commit 801259d
Show file tree

Hide file tree

Showing 2 changed files with 34 additions and 45 deletions.
diff --git a/source/goals.rst b/source/goals.rst
@@ -51,37 +51,24 @@ familiar with some of these internal details, particular around performance and
 memory use, and so the degree to which users are impacted will vary quite a
 lot.
 
-Key areas of work
-=================
-
-Possible changes or improvements to pandas's internals fall into a number of
-different buckets to be explored in great detail:
-
-* **Decoupling from NumPy while preserving interoperability**: by eliminating
-  the presumption that pandas objects internally must contain data stored in
-  NumPy ``ndarray`` objects, we will be able to bring more consistency to
-  pandas's semantics and enable the core developers to extend pandas more
-  cleanly with new data types, data structures, and computational semantics.
-* **Exposing a pandas Cython and/or C/C++ API to other Python library
-  developers**: the internals of Series and DataFrame are only weakly
-  accessible in other developers' native code. At minimum, we wish to better
-  enable developers to construct the precise data structures / memory
-  representation that fill the insides of Series and DataFrame.
-* **Improving user control and visibility of memory use**: pandas's memory use,
-  as a result of its internal implementation, can frequently be opaque to the
-  user or outright unpredictable.
-* **Improving performance and system utilization**: We aim to improve both the
-  micro (operations that take < 1 ms) and macro (all other operations)
-  performance of pandas across the board. As part of this, we aim to make it
-  easier for pandas's core developers to leverage multicore systems to
-  accelerate computations (without running into any of Python's well-known
-  concurrency limitations)
-* **Removal of deprecated / underutilized functionality**: As the Python data
-  ecosystem has grown, a number of areas of pandas (e.g. plotting and datasets
-  with more than 2 dimensions) may be better served by other open source
-  projects. Also, functionality that has been explicitly deprecated or
-  discouraged from use (like the ``.ix`` indexing operator) would ideally be
-  removed.
+Goals
+=====
+
+Some high levels goals of the pandas 2.0 plan include the following:
+
+* Fixing long-standing limitations or inconsistencies in missing data: null
+  values in integer and boolean data, and a more consistent notion of null /
+  NA.
+* Improved performance and utilization of multicore systems
+* Better user control / visibility of memory usage (which can be opaque and
+  difficult to conttrol)
+* Clearer semantics around non-NumPy data types, and permitting new pandas-only
+  data types to be added
+* Exposing a "libpandas" C/C++ API to other Python library developers: the
+  internals of Series and DataFrame are only weakly accessible in other
+  developers' native code. This has been a limitation for scikit-learn and
+  other projects requiring C or Cython-level access to pandas object data.
+* Removal of deprecated functionality
 
 Non-goals / FAQ
 ===============

diff --git a/source/internal-architecture.rst b/source/internal-architecture.rst
@@ -288,32 +288,34 @@ Preserving NumPy interoperability
 Some of types of intended interoperability between NumPy and pandas are as
 follows:
 
-* Users can obtain the a ``numpy.ndarray`` (possibly a view depending on the
-  internal block structure, more on this soon) in constant time and without
-  copying the actual data. This has a couple other implications
+* **Access to internal data**: Users can obtain the a ``numpy.ndarray``
+  (possibly a view depending on the internal block structure, more on this
+  soon) in constant time and without copying the actual data. This has a couple
+  other implications
 
   * Changes made to this array will be reflected in the source pandas object.
   * If you write C extension code (possibly in Cython) and respect pandas's
     missing data details, you can invoke certain kinds of fast custom code on
     pandas data (but it's somewhat inflexible -- see the latest discussion on
     adding a native code API to pandas).
 
-* NumPy ufuncs (like ``np.sqrt`` or ``np.log``) can be invoked on
+* **Ufuncs**: NumPy ufuncs (like ``np.sqrt`` or ``np.log``) can be invoked on
   pandas objects like Series and DataFrame
 
-* ``numpy.asarray`` will always yield some array, even if it discards metadata
-  or has to create a new array. For example ``asarray`` invoked on
-  ``pandas.Categorical`` yields a reconstructed array (rather than either the
-  categories or codes internal arrays)
+* **Array protocol**: ``numpy.asarray`` will always yield some array, even if
+  it discards metadata or has to create a new array. For example ``asarray``
+  invoked on ``pandas.Categorical`` yields a reconstructed array (rather than
+  either the categories or codes internal arrays)
 
-* Many NumPy methods designed to work on subclasses (or duck-typed classes) of
-  ``ndarray`` may be used. For example ``numpy.sum`` may be used on a Series
-  even though it does not invoke NumPy's internal C sum algorithm. This means
-  that a Series may be used as an interchangeable argument in a large set of
-  functions that only know about NumPy arrays.
+* **Interchangeability**: Many NumPy methods designed to work on subclasses (or
+  duck-typed classes) of ``ndarray`` may be used. For example ``numpy.sum`` may
+  be used on a Series even though it does not invoke NumPy's internal C sum
+  algorithm. This means that a Series may be used as an interchangeable
+  argument in a large set of functions that only know about NumPy arrays.
 
 By and large, I think much of this can be preserved, but there will be some API
-breakage.
+breakage. In particular, interchangeability is not something we can or should
+guarantee.
 
 If we add more composite data structures (Categorical can be thought of as
 one existing composite data structure) to pandas or alternate non-NumPy data