Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alt.Chart.from_dict slow due to large number of calls to utils.schemaapi._use_referencing_library #3382

Closed
RobinL opened this issue Mar 24, 2024 · 0 comments · Fixed by #3383
Labels

Comments

@RobinL
Copy link
Contributor

RobinL commented Mar 24, 2024

I'm the author of a FOSS library that makes extensive use of alt.Chart.from_dict.

I noticed this is quite slow, and I think I have identified the root cause. Specifically, alt.Chart.from_dict makes a very large number of calls to utils.schemaapi._use_referencing_library. In turn, this makes repeated calls to importlib, resulting in more than half of its runtime in importlib.metadata

Since the result of utils.schemaapi._use_referencing_library is a bool and will not change within a Python session (it only changes if the user installs a different version of the jsonschema package), it can be computed a single time, resulting in significantly improved performace.

Reprex

Reprex including timings
import json
import time
import urllib.request

import altair as alt

spec_json_url = "https://gist.githubusercontent.com/RobinL/de91f5d95aa29ee0b5464bca4bfd3dcf/raw/c391db54295a8f43e3258aa6c42a8a39f239b4bc/match_weights.json"

with urllib.request.urlopen(spec_json_url) as url:
    data = json.loads(url.read().decode())

start_time = time.time()
chart = alt.Chart.from_dict(data)
end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time: {execution_time} seconds")

spec_json_url = "https://gist.githubusercontent.com/RobinL/de91f5d95aa29ee0b5464bca4bfd3dcf/raw/c391db54295a8f43e3258aa6c42a8a39f239b4bc/waterfall.json"

with urllib.request.urlopen(spec_json_url) as url:
    data = json.loads(url.read().decode())

start_time = time.time()
chart = alt.Chart.from_dict(data)
end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time: {execution_time} seconds")

The timings for the above on main are:

Execution time: 0.9597208499908447 seconds
Execution time: 1.2827000617980957 seconds

on a 2019 macbook pro.

After removing the repeated importlib calls, the timings are:

Execution time: 0.2081460952758789 seconds
Execution time: 0.3407268524169922 seconds

Solution

I've opened a PR that I think should address this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant