fix: session recording performance metrics #20230
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Our performance metrics for Session Replay are misreported. I found three issues at play here:
The cache is never cleared until the final
recording loaded
event is fired. This means that if you navigate away from a recording that is loading the start timer will retain its value. Because of this check we will reuse that same start value if the logic is reloaded much later. Most likely where we're seeing the large 20+ hour load times.We only called
reportUsageIfFullyLoaded
when snapshots and/or events loaded successfully. This meant that if the metadata was the last to load we would never call thereportUsageIfFullyLoaded
action again (which would only then track the metric becausefullyLoaded
was now true). We could be missing data for recordings where the metadata loads last.The report generation function was always based off
Math.round(performance.now() - cache[someValue])
. This meant that the difference was always the current time minus the metrics load start time. Therefore all values would be based off the slowest metric to load (i.e. the last metric needed for thefullyLoaded
condition inreportUsageIfFullyLoaded
to betrue
). We should instead be saving the actual completion time of any metric to the cache and only computing the value if it is not present in the cache (e.g. it has not completed yet)Changes
size
measures for generated durations reportreportUsageIfFullyLoaded
from the success handlers for meta, snapshots & events (addresses problem # 2)It is worth noting that the
recording viewed
metric will includeload_time
values for each of events, snapshots and metadata. Given therecording viewed
fires when the first snapshot source loads the timings for events and metadata may still be "in progress". If you want accurate timings (e.g. total completion times) for those parameters you should look at therecording loaded
event instead.How did you test this code?
metadata
does not change between first paint and fully loaded => it had completed before first paintevents
increased but did not surpasssnapshots
=>events
were still loading at the time of the first paint but finished before all snapshots were loaded. Snapshots took the longestfirst_paint
does not change