-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: read_stata with index_col=None should return RangeIndex #49745
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mroeschke
reviewed
Nov 17, 2022
mroeschke
reviewed
Nov 17, 2022
7696335
to
c028180
Compare
mroeschke
reviewed
Nov 17, 2022
doc/source/whatsnew/v2.0.0.rst
Outdated
@@ -340,6 +340,7 @@ Other API changes | |||
- Passing strings that cannot be parsed as datetimes to :class:`Series` or :class:`DataFrame` with ``dtype="datetime64[ns]"`` will raise instead of silently ignoring the keyword and returning ``object`` dtype (:issue:`24435`) | |||
- Passing a sequence containing a type that cannot be converted to :class:`Timedelta` to :func:`to_timedelta` or to the :class:`Series` or :class:`DataFrame` constructor with ``dtype="timedelta64[ns]"`` or to :class:`TimedeltaIndex` now raises ``TypeError`` instead of ``ValueError`` (:issue:`49525`) | |||
- Changed behavior of :class:`Index` constructor with sequence containing at least one ``NaT`` and everything else either ``None`` or ``NaN`` to infer ``datetime64[ns]`` dtype instead of ``object``, matching :class:`Series` behavior (:issue:`49340`) | |||
- If no parameter ``index_col`` is given to :func:`read_stata`, the index will be a :class:`RangeIndex` Previously the index would have been a :class:`Int64Index` (:issue:`49745`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Performance improvement note can be used instead of this one
mroeschke
reviewed
Nov 17, 2022
doc/source/whatsnew/v2.0.0.rst
Outdated
@@ -594,6 +595,7 @@ Performance improvements | |||
- Memory improvement in :meth:`RangeIndex.sort_values` (:issue:`48801`) | |||
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when ``by`` is a categorical type and ``sort=False`` (:issue:`48976`) | |||
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when ``by`` is a categorical type and ``observed=False`` (:issue:`49596`) | |||
- Performance improvement in :func:`read_stata` with parameter ``index_col`` set to ``None``(the default). Now the index will be a :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`49745`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docbuild is complaining about this line
Updated. |
mroeschke
approved these changes
Nov 18, 2022
Thanks @topper-123 |
mliu08
pushed a commit
to mliu08/pandas
that referenced
this pull request
Nov 27, 2022
…dev#49745) * API: read_stata with index_col=None return RangeIndex * fix comments * fix comments II Co-authored-by: Terji Petersen <terjipetersen@Terjis-MacBook-Air.local> Co-authored-by: Terji Petersen <terjipetersen@Terjis-Air.fritz.box>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If
read_stata
was used with parameterindex=None
, an index based onnp.arange
was supplied to the constructed DataFrame, i.e. (pre pandas 2.0) anInt64Index
.np.arange
has dtypenp.int_
, i.e. likenp.intp
, except is always 32bit on windows, which makes it annoying to use with tests when indexes can take all numpy numeric dtypes (like after #49560), so I'm looking into howarange
is used in #49560. One case I found it was used is inread_stata
and in that case it's better to use arange
, so we get aRangeIndex
instead of anIndex[int_]
when usingread_stata(index_col=None)
.This is a slight change in API, so I separate it out into its own PR here, so #49560, which is a large Pr, can be as focused as possible.