-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StataWriter for version 117 fails on None in a string column long enough to be a Stata StrL. #23633
Comments
@jtkiley See the difference in this code: import pandas as pd
df1 = pd.DataFrame({'str1': ['string' * 500, '']})
df2 = pd.DataFrame({'str1': ['string' * 500, None]})
df1.to_stata('df1_117.dta', version=117)
# Works
df2.to_stata('df2_117.dta', version=117)
# Attribute error (can't encode None to a string) |
@kylebarron I may be missing it (a cue for more coffee either way), but I think this is similar yet not quite the same. In #23572, you seem to suggest that it fails because it's all Here, I have one string and one Thanks for the pointer to #23572. I'll think about it a bit more and comment there, too. |
Yes, on second thought you're right that this isn't functionally the same. I did want to point you to that issue anyways to get your thoughts on when |
Just a note for anyone happening to end up here: there's some extended conversation on when/where to coerce over on #23572. |
@bashtage I've been looking through stata.py, and I think I see what the issue is, but I'm not sure I understand all of the moving parts (and the consequences of changes) well enough to make a PR. Lines 2359 to 2383 in a08bf3d
In the 114 writer, line 2360 doesn't do anything, so we keep going and strings get filled by line 2371. In the 117 writer, So, it seems like the fix is something like one of these two:
Thoughts? If option 2 is viable, I can make a PR for that, as I don't see as much concern for breaking things that I don't understand. |
Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633
Simple fix in #23692 |
@bashtage Great, thanks. I figured there would be a better place, but I haven't read enough to the 117 docs to quite grasp what you were doing in that part of |
Enable export of large columns to Stata strls when the column contains None as a null value closes #23633
Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633
Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633
Enable export of large columns to Stata strls when the column contains None as a null value closes pandas-dev#23633
The version 114 writer seems to handle columns of strings containing
None
just fine, but the 117 writer produces theAttributeError
below.Code to reproduce
Error:
Problem description
This seems like a fairly straightforward regression in the Stata StrL part of the 117 writer compared to how the 114 writer does some more checking on regular strings before working with them.
Note: I'm not sure if other datatypes are handled differently and may have a similar issue, but this is the one I encountered and could reproduce.
If you're wondering why my actual data is so ugly that I encountered this, I'd like to blame something else, but I wrote the code that parses the data that ends up there.
Expected Output
I'd expect the 117 writer to clean up these StrL columns like the 114 writer cleans up standard strings (and so does the 117 writer, because it appears to use the same codepath for standard strings and only handles StrLs differently).
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.10.0
pip: 18.1
setuptools: 40.5.0
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 7.1.1
sphinx: 1.8.1
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.1
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.13
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: 0.7.0
The text was updated successfully, but these errors were encountered: