-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specific Timestamps breaks time series indexing (.loc returns wrong results) #18029
Comments
[deleted - see top post] |
@linar-jether Thanks for the report! First note: in the just released 0.21.0 release, using
So the bug in |
@linar-jether : BTW, if you could by any chance present a smaller example to demonstrate the bug, that would be easier for us to read in the future. |
@gfyoung the second post (not the top one) already includes a smaller example (it is the one I used in my response). But would be good to update the top post with that for clarity |
So given this is deprecated behaviour, I am not sure we should put time into trying to fix this. But, I am trying to think of similar cases where this bug could come up that are not deprecated? |
Thanks for the response @jorisvandenbossche, this bug can easily be overlooked and lead to false results, so maybe even a small patch to raise an exception when using loc with both duplicates and non-existing labels?
|
And also note that using reindex is not the same as using loc, as reindex will not work when there are duplicates in the source dataframe. |
as @jorisvandenbossche points out this already will show a deprecation warning; this will be changed to an exception in 1.0 (after 0.22.0). The 'this can be overlooked' is the reason.
Of course if you have duplicates in an index, then you are on your own, that behavior is also not directly supported. |
This behavior looks fixed on master if anyone wants to put up at test.
|
In light of those two facts, I'm going to close in fact. |
When try to access labels (
.loc
) by using a specific list ofTimestamp
objects or aDatetimeIndex
object (See attached csv file), the resulting index is returned in UTC offset but the original timezone is not removed.This seems to happen only in very specific cases, when the index passed to
.loc
contains labels that do not exist in the DataFrame and also contains duplicates.@yuval-jether
Output of
pd.show_versions()
pd.show_versions()
2017-10-30 03:15:08 [pip.utils] [DEBUG] lzma module is not available
2017-10-30 03:15:08 [pip.vcs] [DEBUG] Registered VCS backend: git
2017-10-30 03:15:08 [pip.vcs] [DEBUG] Registered VCS backend: hg
2017-10-30 03:15:08 [pip.pep425tags] [DEBUG] Config variable 'Py_DEBUG' is unset, Python ABI tag may be incorrect
2017-10-30 03:15:08 [pip.pep425tags] [DEBUG] Config variable 'WITH_PYMALLOC' is unset, Python ABI tag may be incorrect
2017-10-30 03:15:08 [pip.pep425tags] [DEBUG] Config variable 'Py_UNICODE_SIZE' is unset, Python ABI tag may be incorrect
2017-10-30 03:15:08 [pip.pep425tags] [DEBUG] Config variable 'Py_DEBUG' is unset, Python ABI tag may be incorrect
2017-10-30 03:15:08 [pip.pep425tags] [DEBUG] Config variable 'WITH_PYMALLOC' is unset, Python ABI tag may be incorrect
2017-10-30 03:15:08 [pip.pep425tags] [DEBUG] Config variable 'Py_UNICODE_SIZE' is unset, Python ABI tag may be incorrect
2017-10-30 03:15:08 [pip.vcs] [DEBUG] Registered VCS backend: svn
2017-10-30 03:15:09 [pip.vcs] [DEBUG] Registered VCS backend: bzr
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 36.5.0
Cython: None
numpy: 1.13.1
scipy: 0.18.1
xarray: 0.9.6
IPython: 5.5.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: None
lxml: 3.8.0
bs4: 4.5.1
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.4.0
The text was updated successfully, but these errors were encountered: