Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Endianness problem with weird data types and shape #57457

Closed
2 of 3 tasks
ilan-gold opened this issue Feb 16, 2024 · 6 comments · Fixed by #57519
Closed
2 of 3 tasks

BUG: Endianness problem with weird data types and shape #57457

ilan-gold opened this issue Feb 16, 2024 · 6 comments · Fixed by #57519
Labels
Bug good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@ilan-gold
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd, numpy as np

columns = ['']
data = [np.array([], dtype='>f8')]
index = pd.Index([], dtype='uint64', name='rows')
df = pd.DataFrame(dict(zip(columns, data)), index=index)
df[df.columns] # or df[['']]

Issue Description

I'm actually not 100% this is a "bug", but it came up in a hypothesis test and I whittled it down to this. The error is ValueError: Big-endian buffer not supported on little-endian compiler

Expected Behavior

I guess it shouldn't error?

Installed Versions

INSTALLED VERSIONS

commit : fd3f571
python : 3.11.6.final.0
python-bits : 64
OS : Darwin
OS-release : 22.6.0
Version : Darwin Kernel Version 22.6.0: Wed Oct 4 21:26:23 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 2.2.0
numpy : 1.26.3
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.8
pytest : 7.4.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.20.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.12.2
gcsfs : None
matplotlib : 3.8.2
numba : 0.58.1
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None

@ilan-gold ilan-gold added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 16, 2024
@ilan-gold
Copy link
Author

This doesn't seem to exist on the main branch.

@ilan-gold
Copy link
Author

Not sure if it should be open or closed since the main check is not necessary.

@rhshadrach
Copy link
Member

Confirmed it is not on main - but a test should still be added. Result of a git bisect shows it was fixed by CoW:

commit 3b57972ceb49c11198eb0bec6be0006260d48085
Author: Patrick Hoefler
Date:   Thu Feb 1 09:04:05 2024 +0000

    CoW: Enable CoW by default and remove warning build (#56633)

I thought it might be possible that #57459 reintroduces it, but it does not.

cc @phofl @jorisvandenbossche

@rhshadrach rhshadrach added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 16, 2024
@ilan-gold
Copy link
Author

ilan-gold commented Feb 19, 2024

@rhshadrach Could you perhaps provide a route forward for me here? Will this be a bug fix in a release or do I need to do some special casing to avoid this situation?

@phofl
Copy link
Member

phofl commented Feb 19, 2024

You can try to set

pd.options.mode.copy_on_write = True

on 2.2 releases or try the nightly releases where this is enabled by default

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Feb 19, 2024

I think the only reason that this specific snippet no longer raises an error on main (or when CoW is enabled in pandas 2.2) is because that it was failing to select columns df[...], and before CoW that would do an actual take on the columns (the traceback points to Block.take_nd / pandas._libs.algos.take_2d_axis1_float64_float64), and now that no longer does a take.

If you do something else that still ends up doing an actual take (eg selecting rows with df.reindex(index=[0, 1]) with the above snippet), that still gives the same error on main.

Now, I suppose we just never have supported that? (and so the solution is to teach hypothesis to not generate arrays with non-native endianness for this test?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants