-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: allow JSON (de)serialization of ExtensionDtypes #44722
Conversation
@jmg-duarte Can you do the following:
|
Do you mean |
Yes. Also, We'll need to state that for user-implemented extension arrays, we're possibly generating JSON Table schema that is non-conforming. (e.g., if you do the test for |
One other note - I see that tests are already failing. You might want to run the existing tests for the table schema locally before pushing your next commit:
(The |
A little checklist to track the current progress:
|
@attack68 You might be the best person to review this PR once the checklist at #44722 (comment) is complete. If not, do you have other suggestions? |
I just tested In [1]: import pandas as pd
In [2]: from pandas.tests.extension.decimal import DecimalArray
In [3]: from decimal import Decimal
In [4]: pd.read_json(pd.DataFrame({'dec': DecimalArray([Decimal(10)])}).to_json(orient='table'), orient='table')
Out[4]:
dec
0 10.0
In [5]: pd.read_json(pd.DataFrame({'dec': DecimalArray([Decimal(10)])}).to_json(orient='table'), orient='table').dtypes
Out[5]:
dec object
dtype: object
In [6]: pd.DataFrame({'dec': DecimalArray([Decimal(10)])}).to_json(orient='table')
Out[6]: '{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"dec","type":"string"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":0,"dec":10.0}]}' Out of the box, the solution proposed in this PR will throw the following error: TypeError: All values must be of type <class 'decimal.Decimal'> Due to the following code: pandas/pandas/tests/extension/decimal/array.py Lines 89 to 91 in e992969
With a bit of adapting I can make this work, however I'm open to feedback on how to handle this. |
I've updated the On another note, why don't these lines use the pandas/pandas/io/json/_table_schema.py Lines 293 to 294 in e992969
|
It's why I suggested using |
Seems like this was never noticed before. Can you move Note in the docs in |
I think in the overview pages I referred to earlier is where you want to discuss how |
Due to my lack of experience with pandas (and a bit of documentation too) I'm doing this in an ad-hoc approach - implement a bit, see what breaks and repeat.
Sure thing, I will address it.
I hadn't noticed the links were for "extra" documentation, I will also address this. |
@jmg-duarte One other note, since this is your first PR. I don't think you should keep pushing each commit you do, as it fires off our continuous integration testing scripts, and you know that you are still working on things. Make sure the testing scripts work locally on your own machine, and when that's true, then you can push, and the full suite of tests will be run, possibly catching errors in different environments that don't happen on your own machine. When you make the doc changes, build them locally to make sure they look right, before pushing. |
@Dr-Irv all tests are passing besides the docs, could you (or someone else) help me figure out the problem? The changes I made built locally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. Some minor changes. I think the CI failure is intermittent when the docs are built. There are other PR with no doc changes that show the same error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls don't change the version like you are doing. a number of comments.
@jmg-duarte in the future, can you not force push your branch, but just push a commit that reflects your latest changes? Thanks |
I'm unsure what might have caused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some comments that relate to the feedback from @jreback and should resolve his issues, and there is still an additional test to add at https://github.com/pandas-dev/pandas/pull/44722/files#r763388551
Once you address those, I will review again, and I can approve, but final approval will rest with @jreback
@jmg-duarte can you also look at https://github.com/pandas-dev/pandas/pull/44722/files#r763388551 . Idea is to explicitly create those extension types in addition to the Decimal and Date ones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this. Up to @jreback now
self._day == dt.date.min.day, | ||
) | ||
|
||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you are going to all the work of making an extension type like this then you should simply add it to the regular tests and import from there (you can disable any tests that don't work rn)
e.g. put it here: https://github.com/pandas-dev/pandas/tree/master/pandas/tests/extension/decimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand. You're suggesting that I move the extension to the decimal
folder or create a date
folder without a test_date.py
?
Should the whole test_json_table_schema_ext_dtype.py
be moved too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think you could put all the extension type code in pandas/tests/extension/date
but leave the stuff that tests JSON reading where it is. You don't need to add a test_date.py
- that can be a separate PR.
@jmg-duarte can you revert the 1.3.5 notes and fix conflicts on other, ping on green. |
thanks @jmg-duarte very nice. If you can followup with a test that deserializes a 0.20.0 format table schema would be great (it almost certainly works but would be great to have a specific test for this). |
I didn't mess with any previous schema tests so they work. Am I missing something? |
@jmg-duarte thats the point we need an explicit example that provides backward compatibility make one in an older version of pandas and then save it in a file |
Oh, you meant on reads.
A terça, 21 de dez de 2021, 13:06, Jeff Reback ***@***.***>
escreveu:
… @jmg-duarte <https://github.com/jmg-duarte> thats the point we need an
explicit example that provides backward compatibility
make one in an older version of pandas and then save it in a file
—
Reply to this email directly, view it on GitHub
<#44722 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADVCBSYJCX7FOOI3AC6WDQLUSB3UVANCNFSM5JHLVG2A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This PR allows ExtensionDtypes to be (de)serialized using
.to_json(orient='table')
.Example: