-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change plot resolution to per release in plot over time scripts #254
base: main
Are you sure you want to change the base?
Conversation
Change plot res to per release in plot over time scripts
For the absolute error plot, does each dot still represent a new version of that software package? If so, would we want it to have the label as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments on the logic for selecting and filtering which benchmark results end up included in the average performance on earliest version release are my main open questions.
But stepping back, I think this may still be fragile or at least challenging given how the benchmark results are stored by the date of the run currently. What if some days have more benchmark runs than others? If so, will that be a fair comparison if some days average over many runs and others only a single run?
As I (poorly) started to sketch out in #251, I think we should consider switching to associate benchmark results with the git hash (and corresponding git commit time) to ensure a singular mapping from set of compiler versions (as reflected in the lock file in the commit) to benchmark results. Then we can more cleanly pick out results when versions change and plot accordingly.
That is clearly a larger scope than this change, but I suspect getting the plot we want will be hard using the current way we run and store benchmarks. Working around it might be possible, but the code will get more complicated and arguably worth testing to be sure all the cases are handled.
|
||
fig, ax = plt.subplots(2, 1, figsize=(8, 8), sharex=False, dpi=150) | ||
# Rotate x labels on axes 0 | ||
plt.setp(ax[0].xaxis.get_majorticklabels(), rotation=45) | ||
|
||
|
||
filtered_avg_compiled_ratio = avg_compiled_ratio[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we thinking about which dates are included in the earlier average and this filter?
avg_compiled_ratio
is calculated earlier as:
avg_compiled_ratio = (
df_dates.groupby(["compiler", "date", "compiler_version"])[
"compiled_ratio"
]
.mean()
.reset_index()
.sort_values("date")
)
If we happened to have run multiple benchmarks for that compiler on that date, where say some were for version 1.2.0 and then some for version 1.3.0, would this filter be showing results that average over both versions?
Would we also need to filter to only include/average results from that date where the latest compiler version was used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this groups by the date, compiler name, and compiler version, it would not average two different versions together. On the plot I think it would just show them as different datapoints in the same color on the same date and label them correspondingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct @jordandsullivan .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you both.
If we had multiple datapoints, would we want to only plot the "latest version" dot for that date? Just wondering if it might get cluttered otherwise.
unique_versions = sorted(df_dates["compiler_version"].unique()) | ||
dates = [] | ||
for version in unique_versions: | ||
df_versions = df_dates[df_dates["compiler_version"] == version] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this compiler_version
only the semver string? If so, is it possible that this might mix the same version of different packages? For example cirq 1.2.0
and qiskit 1.2.0
? If so, I think that might end up taking the earliest date across the union of those two.
@@ -33,9 +39,18 @@ | |||
# Convert the 'date' column to datetime | |||
df_all["date"] = pd.to_datetime(df_all["date"]) | |||
|
|||
unique_versions = sorted(df_all["compiler_version"].unique()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note similar questions as above on whether this is selecting the intended dates and versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I added a print statement to check, and these are just the numbers, not including the compiler names:
Here are unique_versions ['0.1.0', '0.1.1', '0.2.0', '0.4.0', '0.4.2', '1.2.0', '1.3.0', '1.3.1', '1.3.2', '1.3.3', '1.35.0', '1.36.0', '1.37.0', '1.39.0', '1.4.0', '1.4.1', '1.40.0', '1.5.0.']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware these are only the numbers, but the actual filtering is done on the date each new version appears in the dataset:
dates = []
for version in unique_versions:
df_versions = df_dates[df_dates["compiler_version"] == version]
earliest_date = sorted(df_versions["date"].unique())[0]
dates.append(earliest_date)
filtered_avg_compiled_ratio = avg_compiled_ratio[avg_compiled_ratio["date"].isin(dates)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a toy example to consider (ignoring the actual data columns)
date, compiler, compiler_version
2025-01-01,cirq,1.2.0
2025-01-02,qiskit,1.2.0
2025-01-10,cirq,1.3.0
I think this might select only 2025-01-1
and 2025-01-10
as earliest dates, as the cirq version 1.2.0 would be earlier than the qiskit 1.2.0. Then the qiskit results for 1.2.0 wouldn't show up until much later? (In this parallel universe where their versions are close and overlapping).
Co-authored-by: Brad Chase <bchase@ripple.com>
Fixes #200.
Change plot resolution to per compiler release in plot over time scripts, so the plots will look like the following:


Note: I pasted (instead of pushed) these plots because they'd have to be removed before merge anyway.