Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Format decimal.Decimal as full precision strings in .to_json(...) #60698

Merged
merged 4 commits into from
Jan 14, 2025

Conversation

Tolker-KU
Copy link
Contributor

@Tolker-KU Tolker-KU commented Jan 12, 2025

Encodes decimal.Decimal as full precision strings in .to_json(...). The old behaviour is to convert decimal.Decimal to floats potentially loosing precision.

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@Tolker-KU Tolker-KU force-pushed the json-decimal-as-str branch from d0f2310 to 21f2689 Compare January 12, 2025 14:28
@Tolker-KU Tolker-KU force-pushed the json-decimal-as-str branch from 21f2689 to 1c6781d Compare January 12, 2025 15:05
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this

pandas/_libs/src/vendored/ujson/python/objToJSON.c Outdated Show resolved Hide resolved
pandas/_libs/src/vendored/ujson/python/objToJSON.c Outdated Show resolved Hide resolved
pandas/_libs/src/vendored/ujson/python/objToJSON.c Outdated Show resolved Hide resolved
pandas/_libs/src/vendored/ujson/python/objToJSON.c Outdated Show resolved Hide resolved
@Tolker-KU
Copy link
Contributor Author

Thanks for working on this

Thanks for taking your time to review this. I'm planing to open an issue discussing if this should be hidden behind a flag, such that the default behaviour remains unchanged. This draft is just a proof of concept.

The decimal module has no C API, unfortunately. I found a recent issue in the cpython repo discussing an implementation but it hasn't happened yet. As far as I recall the maintainers of cpython thought this would bloat the C API without any real gain.

I do agree on your comments and will update the patch. It should be noted that most of the implementation is copy/pasted from here. Maybe this has some bugs as-well?

static char *PyTimeToJSON(JSOBJ _obj, JSONTypeContext *tc, size_t *outLen) {
PyObject *obj = (PyObject *)_obj;
PyObject *str = PyObject_CallMethod(obj, "isoformat", NULL);
if (str == NULL) {
*outLen = 0;
if (!PyErr_Occurred()) {
PyErr_SetString(PyExc_ValueError, "Failed to convert time");
}
((JSONObjectEncoder *)tc->encoder)->errorMsg = "";
return NULL;
}
if (PyUnicode_Check(str)) {
PyObject *tmp = str;
str = PyUnicode_AsUTF8String(str);
Py_DECREF(tmp);
}
GET_TC(tc)->newObj = str;
*outLen = PyBytes_GET_SIZE(str);
char *outValue = PyBytes_AS_STRING(str);
return outValue;
}

@WillAyd
Copy link
Member

WillAyd commented Jan 13, 2025

It should be noted that most of the implementation is copy/pasted from here. Maybe this has some bugs as-well?

That's certainly possible. Ironically I am probably also the one responsible for that. Without looking at the blame...I'll just say the less I know, the better.

Fixing that is out of scope for this PR, but I would prefer to not copy/paste those issues over. Thanks!

@WillAyd
Copy link
Member

WillAyd commented Jan 13, 2025

@Tolker-KU generally this looks pretty good. Can you add a whatsnew note to our 3.0 release for this?

@Tolker-KU Tolker-KU force-pushed the json-decimal-as-str branch from f3152b9 to d60f71f Compare January 13, 2025 19:44
@@ -1467,8 +1488,18 @@ static void Object_beginTypeContext(JSOBJ _obj, JSONTypeContext *tc) {
tc->type = JT_UTF8;
return;
} else if (object_is_decimal_type(obj)) {
pc->doubleValue = PyFloat_AsDouble(obj);
tc->type = JT_DOUBLE;
PyObject *is_nan_py = PyObject_RichCompare(obj, obj, Py_NE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the test cases cover this code? If not, can you add a NaN case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET_TC(tc)->newObj = str;

Py_ssize_t s_len;
char *outValue = (char *)PyUnicode_AsUTF8AndSize(str, &s_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be ideal if our functions retuned const char* so we didn't have to cast like this. A lot of this code is old and has changed hands many times, so not surprised there are some code smells...but if you are looking to continuing contributions on this, that cleanup would be great

@Tolker-KU Tolker-KU changed the title Format decimal.Decimal as full precision strings in .to_json(...) ENH: Format decimal.Decimal as full precision strings in .to_json(...) Jan 14, 2025
@Tolker-KU Tolker-KU marked this pull request as ready for review January 14, 2025 06:01
@WillAyd WillAyd added the IO JSON read_json, to_json, json_normalize label Jan 14, 2025
@WillAyd WillAyd added this to the 3.0 milestone Jan 14, 2025
@WillAyd
Copy link
Member

WillAyd commented Jan 14, 2025

@mroeschke mind taking a look?

@mroeschke mroeschke merged commit 817b706 into pandas-dev:main Jan 14, 2025
55 checks passed
@mroeschke
Copy link
Member

Thanks @Tolker-KU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants