Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON encoding refactor and orjson encoding #2955

Merged
merged 49 commits into from
May 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
40b9af1
WIP accelerated encoding with orjson
jonmmease Dec 5, 2020
f79e318
support fig to dict in io without cloning
jonmmease Dec 5, 2020
55720de
Merge branch 'master' into orjson_encoding
jonmmease Dec 5, 2020
7b3593a
fix clone default
jonmmease Dec 5, 2020
da915d6
Add pio.json.config object to configure default encoder
jonmmease Dec 5, 2020
7b235ef
default_encoder to default_engine
jonmmease Dec 5, 2020
7895b6a
blacken
jonmmease Dec 5, 2020
ce05a68
Handle Dash objects in to_json
jonmmease Dec 6, 2020
4ef6510
add JSON encoding tests
jonmmease Dec 31, 2020
101ba85
add testing of from_plotly_json
jonmmease Dec 31, 2020
67d3670
Better error message when orjson not installed and orjson engine requ…
jonmmease Dec 31, 2020
02c00da
Add orjson as optional testing dependency
jonmmease Dec 31, 2020
99ea6a1
Replace Python 3.5 CI tests with 3.8
jonmmease Dec 31, 2020
d44ec26
Try only install orjson with Python 3.6+
jonmmease Dec 31, 2020
b7d8422
Don't test orjson engine when orjson not installed
jonmmease Dec 31, 2020
ddcd6f5
Try new 3.8.7 docker image since prior guess doesn't exist
jonmmease Dec 31, 2020
33359f3
greater than!
jonmmease Dec 31, 2020
c7c1819
Bump scikit image version for Python 3.8 compatibility
jonmmease Dec 31, 2020
a8d52ab
Try to help Python 2 from getting confused about which json module to…
jonmmease Dec 31, 2020
619838f
Update pandas for Python 3
jonmmease Dec 31, 2020
7c7a272
Revert 3.8 CI updates. Too much for this PR
jonmmease Dec 31, 2020
1708703
Doh
jonmmease Dec 31, 2020
66cab10
Don't skip copying during serialization
jonmmease Dec 31, 2020
56a8945
Rename new JSON functions:
jonmmease Jan 2, 2021
0a51020
Ensure cleaned numpy arrays are contiguous
jonmmease Jan 2, 2021
4e9d64e
Use to_json_plotly in html and orca logic
jonmmease Jan 8, 2021
d4068de
Add orjson documentation dependency
jonmmease Jan 8, 2021
58b7192
Handle pandas Timestamp scalars in orjson engine
jonmmease Jan 8, 2021
974fcba
Rework date and string encoding, add and fix tests
jonmmease Jan 8, 2021
a651a63
default JSON engine to "auto"
jonmmease Jan 8, 2021
af1d88d
Fix expected JSON in html export (no spaces)
jonmmease Jan 8, 2021
1d6acc3
Merge remote-tracking branch 'origin/master' into orjson_encoding
jonmmease Jan 8, 2021
d51fd94
blacken
jonmmease Jan 8, 2021
042c54c
Fix expected JSON in matplotlylib test
jonmmease Jan 8, 2021
ddc1b8f
Fix expected JSON in html repr test
jonmmease Jan 8, 2021
d7928b0
Merge remote-tracking branch 'origin/master' into orjson_encoding
jonmmease Jan 13, 2021
76cc625
Don't drop timezones during serialization, just let Plotly.js ignore …
jonmmease Jan 13, 2021
453461d
Merge branch 'numpy_date_serialization' into orjson_encoding
jonmmease Jan 13, 2021
84ba4b5
no need to skip legacy tests now
jonmmease Jan 13, 2021
340aed3
Only try `datetime_as_string` on datetime kinded numpy arrays
jonmmease Jan 13, 2021
6cea61d
Don't store object or unicode numpy arrays in figure. Coerce to lists
jonmmease Jan 21, 2021
93815c1
Try orjson encoding without cleaning first
jonmmease Jan 21, 2021
242d1fa
Merge remote-tracking branch 'origin/master' into orjson_encoding
jonmmease Jan 21, 2021
8a3a4b3
blacken
jonmmease Jan 21, 2021
1de750a
remove scratch file
jonmmease Jan 21, 2021
81f73d5
Remove unused clone
jonmmease Jan 21, 2021
80be8bd
Remove the new "json" encoder
jonmmease Jan 22, 2021
cb54f88
Reorder dict cleaning for performance
jonmmease Jan 22, 2021
1fbfa0d
Merge remote-tracking branch 'origin/master' into orjson_encoding
jonmmease Apr 29, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ umap-learn==0.5.1
pooch
wget
nbconvert==5.6.1
orjson
79 changes: 35 additions & 44 deletions packages/python/plotly/_plotly_utils/basevalidators.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def to_scalar_or_list(v):
return v


def copy_to_readonly_numpy_array(v, kind=None, force_numeric=False):
def copy_to_readonly_numpy_array_or_list(v, kind=None, force_numeric=False):
"""
Convert an array-like value into a read-only numpy array

Expand Down Expand Up @@ -89,7 +89,13 @@ def copy_to_readonly_numpy_array(v, kind=None, force_numeric=False):

# u: unsigned int, i: signed int, f: float
numeric_kinds = {"u", "i", "f"}
kind_default_dtypes = {"u": "uint32", "i": "int32", "f": "float64", "O": "object"}
kind_default_dtypes = {
"u": "uint32",
"i": "int32",
"f": "float64",
"O": "object",
"U": "U",
}

# Handle pandas Series and Index objects
if pd and isinstance(v, (pd.Series, pd.Index)):
Expand All @@ -113,18 +119,12 @@ def copy_to_readonly_numpy_array(v, kind=None, force_numeric=False):
if not isinstance(v, np.ndarray):
# v has its own logic on how to convert itself into a numpy array
if is_numpy_convertable(v):
return copy_to_readonly_numpy_array(
return copy_to_readonly_numpy_array_or_list(
np.array(v), kind=kind, force_numeric=force_numeric
)
else:
# v is not homogenous array
v_list = [to_scalar_or_list(e) for e in v]

# Lookup dtype for requested kind, if any
dtype = kind_default_dtypes.get(first_kind, None)

# construct new array from list
new_v = np.array(v_list, order="C", dtype=dtype)
return [to_scalar_or_list(e) for e in v]
elif v.dtype.kind in numeric_kinds:
# v is a homogenous numeric array
if kind and v.dtype.kind not in kind:
Expand All @@ -135,6 +135,12 @@ def copy_to_readonly_numpy_array(v, kind=None, force_numeric=False):
else:
# Either no kind was requested or requested kind is satisfied
new_v = np.ascontiguousarray(v.copy())
elif v.dtype.kind == "O":
if kind:
dtype = kind_default_dtypes.get(first_kind, None)
return np.array(v, dtype=dtype)
else:
return v.tolist()
else:
# v is a non-numeric homogenous array
new_v = v.copy()
Expand All @@ -149,12 +155,12 @@ def copy_to_readonly_numpy_array(v, kind=None, force_numeric=False):
if "U" not in kind:
# Force non-numeric arrays to have object type
# --------------------------------------------
# Here we make sure that non-numeric arrays have the object
# datatype. This works around cases like np.array([1, 2, '3']) where
# Here we make sure that non-numeric arrays become lists
# This works around cases like np.array([1, 2, '3']) where
# numpy converts the integers to strings and returns array of dtype
# '<U21'
if new_v.dtype.kind not in ["u", "i", "f", "O", "M"]:
new_v = np.array(v, dtype="object")
return v.tolist()

# Set new array to be read-only
# -----------------------------
Expand Down Expand Up @@ -191,7 +197,7 @@ def is_homogeneous_array(v):
if v_numpy.shape == ():
return False
else:
return True
return True # v_numpy.dtype.kind in ["u", "i", "f", "M", "U"]
return False


Expand Down Expand Up @@ -393,7 +399,7 @@ def validate_coerce(self, v):
# Pass None through
pass
elif is_homogeneous_array(v):
v = copy_to_readonly_numpy_array(v)
v = copy_to_readonly_numpy_array_or_list(v)
elif is_simple_array(v):
v = to_scalar_or_list(v)
else:
Expand Down Expand Up @@ -598,7 +604,7 @@ def validate_coerce(self, v):
self.raise_invalid_elements(invalid_els[:10])

if is_homogeneous_array(v):
v = copy_to_readonly_numpy_array(v)
v = copy_to_readonly_numpy_array_or_list(v)
else:
v = to_scalar_or_list(v)
else:
Expand Down Expand Up @@ -754,7 +760,7 @@ def validate_coerce(self, v):
elif self.array_ok and is_homogeneous_array(v):
np = get_module("numpy")
try:
v_array = copy_to_readonly_numpy_array(v, force_numeric=True)
v_array = copy_to_readonly_numpy_array_or_list(v, force_numeric=True)
except (ValueError, TypeError, OverflowError):
self.raise_invalid_val(v)

Expand Down Expand Up @@ -881,7 +887,7 @@ def validate_coerce(self, v):
pass
elif self.array_ok and is_homogeneous_array(v):
np = get_module("numpy")
v_array = copy_to_readonly_numpy_array(
v_array = copy_to_readonly_numpy_array_or_list(
v, kind=("i", "u"), force_numeric=True
)

Expand Down Expand Up @@ -1042,26 +1048,7 @@ def validate_coerce(self, v):
if invalid_els:
self.raise_invalid_elements(invalid_els)

if is_homogeneous_array(v):
np = get_module("numpy")

# If not strict, let numpy cast elements to strings
v = copy_to_readonly_numpy_array(v, kind="U")

# Check no_blank
if self.no_blank:
invalid_els = v[v == ""][:10].tolist()
if invalid_els:
self.raise_invalid_elements(invalid_els)

# Check values
if self.values:
invalid_inds = np.logical_not(np.isin(v, self.values))
invalid_els = v[invalid_inds][:10].tolist()
if invalid_els:
self.raise_invalid_elements(invalid_els)

elif is_simple_array(v):
if is_simple_array(v) or is_homogeneous_array(v):
if not self.strict:
v = [StringValidator.to_str_or_unicode_or_none(e) for e in v]

Expand Down Expand Up @@ -1338,8 +1325,12 @@ def validate_coerce(self, v, should_raise=True):
# Pass None through
pass
elif self.array_ok and is_homogeneous_array(v):
v = copy_to_readonly_numpy_array(v)
if self.numbers_allowed() and v.dtype.kind in ["u", "i", "f"]:
v = copy_to_readonly_numpy_array_or_list(v)
if (
not isinstance(v, list)
and self.numbers_allowed()
and v.dtype.kind in ["u", "i", "f"]
):
# Numbers are allowed and we have an array of numbers.
# All good
pass
Expand All @@ -1353,9 +1344,9 @@ def validate_coerce(self, v, should_raise=True):

# ### Check that elements have valid colors types ###
elif self.numbers_allowed() or invalid_els:
v = copy_to_readonly_numpy_array(validated_v, kind="O")
v = copy_to_readonly_numpy_array_or_list(validated_v, kind="O")
else:
v = copy_to_readonly_numpy_array(validated_v, kind="U")
v = copy_to_readonly_numpy_array_or_list(validated_v, kind="U")
elif self.array_ok and is_simple_array(v):
validated_v = [self.validate_coerce(e, should_raise=False) for e in v]

Expand Down Expand Up @@ -1870,7 +1861,7 @@ def validate_coerce(self, v):
self.raise_invalid_elements(invalid_els)

if is_homogeneous_array(v):
v = copy_to_readonly_numpy_array(validated_v, kind="U")
v = copy_to_readonly_numpy_array_or_list(validated_v, kind="U")
else:
v = to_scalar_or_list(v)
else:
Expand Down Expand Up @@ -1918,7 +1909,7 @@ def validate_coerce(self, v):
# Pass None through
pass
elif self.array_ok and is_homogeneous_array(v):
v = copy_to_readonly_numpy_array(v, kind="O")
v = copy_to_readonly_numpy_array_or_list(v, kind="O")
elif self.array_ok and is_simple_array(v):
v = to_scalar_or_list(v)
return v
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,29 @@ def test_validator_acceptance_simple(val, validator):


@pytest.mark.parametrize(
"val",
[np.array([2, 3, 4]), pd.Series(["a", "b", "c"]), np.array([[1, 2, 3], [4, 5, 6]])],
"val", [np.array([2, 3, 4]), np.array([[1, 2, 3], [4, 5, 6]])],
)
def test_validator_acceptance_homogeneous(val, validator):
coerce_val = validator.validate_coerce(val)
assert isinstance(coerce_val, np.ndarray)
assert np.array_equal(validator.present(coerce_val), val)


# Accept object array as list
@pytest.mark.parametrize(
"val",
[
["A", "B", "C"],
np.array(["A", "B", "C"], dtype="object"),
pd.Series(["a", "b", "c"]),
],
)
def test_validator_accept_object_array_as_list(val, validator):
coerce_val = validator.validate_coerce(val)
assert isinstance(coerce_val, list)
assert coerce_val == list(val)


# ### Rejection ###
@pytest.mark.parametrize("val", ["Hello", 23, set(), {}])
def test_rejection(val, validator):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ def test_rejection_by_element_aok(val, validator_aok):
[],
["bar12"],
("foo", "bar012", "baz"),
np.array([]),
np.array([], dtype="object"),
np.array(["bar12"]),
np.array(["foo", "bar012", "baz"]),
],
Expand All @@ -135,7 +135,7 @@ def test_acceptance_aok(val, validator_aok_re):
# Values should be accepted and returned unchanged
coerce_val = validator_aok_re.validate_coerce(val)
if isinstance(val, (np.ndarray, pd.Series)):
assert np.array_equal(coerce_val, np.array(val, dtype=coerce_val.dtype))
assert coerce_val == list(np.array(val))
elif isinstance(val, (list, tuple)):
assert validator_aok_re.present(coerce_val) == tuple(val)
else:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,13 +149,10 @@ def test_color_validator_object(color_validator, color_object_pandas):
res = color_validator.validate_coerce(color_object_pandas)

# Check type
assert isinstance(res, np.ndarray)

# Check dtype
assert res.dtype == "object"
assert isinstance(res, list)

# Check values
np.testing.assert_array_equal(res, color_object_pandas)
assert res == color_object_pandas.tolist()


def test_color_validator_categorical(color_validator, color_categorical_pandas):
Expand All @@ -164,13 +161,10 @@ def test_color_validator_categorical(color_validator, color_categorical_pandas):

# Check type
assert color_categorical_pandas.dtype == "category"
assert isinstance(res, np.ndarray)

# Check dtype
assert res.dtype == "object"
assert isinstance(res, list)

# Check values
np.testing.assert_array_equal(res, np.array(color_categorical_pandas))
assert res == color_categorical_pandas.tolist()


def test_data_array_validator_dates_series(
Expand All @@ -180,13 +174,10 @@ def test_data_array_validator_dates_series(
res = data_array_validator.validate_coerce(datetime_pandas)

# Check type
assert isinstance(res, np.ndarray)

# Check dtype
assert res.dtype == "object"
assert isinstance(res, list)

# Check values
np.testing.assert_array_equal(res, dates_array)
assert res == dates_array.tolist()


def test_data_array_validator_dates_dataframe(
Expand All @@ -197,10 +188,7 @@ def test_data_array_validator_dates_dataframe(
res = data_array_validator.validate_coerce(df)

# Check type
assert isinstance(res, np.ndarray)

# Check dtype
assert res.dtype == "object"
assert isinstance(res, list)

# Check values
np.testing.assert_array_equal(res, dates_array.reshape(len(dates_array), 1))
assert res == dates_array.reshape(len(dates_array), 1).tolist()
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,7 @@ def test_acceptance_aok_scalars(val, validator_aok):
def test_acceptance_aok_list(val, validator_aok):
coerce_val = validator_aok.validate_coerce(val)
if isinstance(val, np.ndarray):
assert isinstance(coerce_val, np.ndarray)
assert np.array_equal(coerce_val, np.array(val, dtype=coerce_val.dtype))
assert coerce_val == val.tolist()
elif isinstance(val, list):
assert validator_aok.present(val) == tuple(val)
else:
Expand Down Expand Up @@ -178,9 +177,7 @@ def test_rejection_aok_values(val, validator_aok_values):
)
def test_acceptance_no_blanks_aok(val, validator_no_blanks_aok):
coerce_val = validator_no_blanks_aok.validate_coerce(val)
if isinstance(val, np.ndarray):
assert np.array_equal(coerce_val, np.array(val, dtype=coerce_val.dtype))
elif isinstance(val, list):
if isinstance(val, (list, np.ndarray)):
assert validator_no_blanks_aok.present(coerce_val) == tuple(val)
else:
assert coerce_val == val
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,7 @@ def test_color_validator_object(color_validator, color_object_xarray):
res = color_validator.validate_coerce(color_object_xarray)

# Check type
assert isinstance(res, np.ndarray)

# Check dtype
assert res.dtype == "object"
assert isinstance(res, list)

# Check values
np.testing.assert_array_equal(res, color_object_xarray)
assert res == list(color_object_xarray)
2 changes: 2 additions & 0 deletions packages/python/plotly/_plotly_utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,10 @@ def encode(self, o):
# We catch false positive cases (e.g. strings such as titles, labels etc.)
# but this is ok since the intention is to skip the decoding / reencoding
# step when it's completely safe

if not ("NaN" in encoded_o or "Infinity" in encoded_o):
return encoded_o

# now:
# 1. `loads` to switch Infinity, -Infinity, NaN to None
# 2. `dumps` again so you get 'null' instead of extended JSON
Expand Down
15 changes: 15 additions & 0 deletions packages/python/plotly/plotly/basedatatypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -3297,6 +3297,7 @@ def to_dict(self):
# Frame key is only added if there are any frames
res = {"data": data, "layout": layout}
frames = deepcopy([frame._props for frame in self._frame_objs])

if frames:
res["frames"] = frames

Expand Down Expand Up @@ -3413,6 +3414,13 @@ def to_json(self, *args, **kwargs):
remove_uids: bool (default True)
True if trace UIDs should be omitted from the JSON representation

engine: str (default None)
The JSON encoding engine to use. One of:
- "json" for an encoder based on the built-in Python json module
- "orjson" for a fast encoder the requires the orjson package
If not specified, the default encoder is set to the current value of
plotly.io.json.config.default_encoder.

Returns
-------
str
Expand Down Expand Up @@ -3469,6 +3477,13 @@ def write_json(self, *args, **kwargs):
remove_uids: bool (default True)
True if trace UIDs should be omitted from the JSON representation

engine: str (default None)
The JSON encoding engine to use. One of:
- "json" for an encoder based on the built-in Python json module
- "orjson" for a fast encoder the requires the orjson package
If not specified, the default encoder is set to the current value of
plotly.io.json.config.default_encoder.

Returns
-------
None
Expand Down
Loading