-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Optional dependencies for accelerating JSON serialization #2944
Comments
Some complement on the performance of orjson: https://python-rapidjson.readthedocs.io/en/latest/benchmarks.html#tables I have also been digging into the JSON serialization performance in plotly, and noticed that, on a large plot
On the figure generation (so not related to JSON) for this same large Dataframe, it is more than 13x faster (from 1.8s to 0.4s) to:
fig = px.line(df.iloc[:1])
data = fig["data"]
traces = {trace["name"]: trace for trace in data}
x = df.index
for col, y in df.items():
trace = traces[str(col)]
trace["x"] = x
trace["y"] = y and in this case, we can also manage the NaN more efficiently by removing them from the trace fig = px.line(df.iloc[:1])
data = fig["data"]
traces = {trace["name"]: trace for trace in data}
x = df.index
for col, y in df.items():
trace = traces[str(col)]
notnan = ~y.isna()
trace["x"] = x[notnan]
trace["y"] = y[notnan] I hope this information can help improving plotly performances. I haven't tested with the change from #2880 |
Thanks for sharing your observations here @sdementen. |
On top of #2943, I investigated a couple of interesting libraries we could potentially use as optional dependencies to further accelerate JSON serialization
pybase64
I played with
pybase64
a little, and it looks like an easy way to get a decent speedup over the built-in pythonbase64
module for performing the numpy base64 encoding step being introduced in #2943. This wouldn't require any refactoring or anything, and can drop the base64 encoding time (which is a substantial portion of the total json encoding time for figures that contain large numpy arrays) by something liek 20% to 40%.orjson
orjson
is a really impressive alternative JSON encoder that, in playing with a little bit, I've seen it be 2x to 5x times faster than the built-in Pythonjson
encoder.orjson
doesn't support custom JSON encoder classes (likePlotlyJSONEncoder
), so supporting this as an optional dependency would require a total refactor of the current json encoding process.Basically, we would need to switch to an architecture where we would preprocess the figure dictionary recursively to perform any conversions we need, and then feed that dictionary through the JSON encoder.
Another nice thing about
orjson
is that it automatically convertsnan
andinfinity
values to JSONnull
values, so the JSON re-endcoding stuff we were working through in #2880 wouldn't be needed (cc @emmanuelle ).The text was updated successfully, but these errors were encountered: