Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: session recording performance metrics #20230

Merged
merged 7 commits into from
Feb 9, 2024

Conversation

daibhin
Copy link
Contributor

@daibhin daibhin commented Feb 9, 2024

Problem

Our performance metrics for Session Replay are misreported. I found three issues at play here:

  • The cache is never cleared until the final recording loaded event is fired. This means that if you navigate away from a recording that is loading the start timer will retain its value. Because of this check we will reuse that same start value if the logic is reloaded much later. Most likely where we're seeing the large 20+ hour load times.

  • We only called reportUsageIfFullyLoaded when snapshots and/or events loaded successfully. This meant that if the metadata was the last to load we would never call the reportUsageIfFullyLoaded action again (which would only then track the metric because fullyLoaded was now true). We could be missing data for recordings where the metadata loads last.

  • The report generation function was always based off Math.round(performance.now() - cache[someValue]). This meant that the difference was always the current time minus the metrics load start time. Therefore all values would be based off the slowest metric to load (i.e. the last metric needed for the fullyLoaded condition in reportUsageIfFullyLoaded to be true). We should instead be saving the actual completion time of any metric to the cache and only computing the value if it is not present in the cache (e.g. it has not completed yet)

Changes

  • Remove unused size measures for generated durations report
  • Reset the cache before unmounting the session recording logic (addresses problem # 1)
  • Add success & failure handlers for all requested data (meta, snapshots, events) so that we can report accurate completion timings (addresses problem # 3)
  • Call reportUsageIfFullyLoaded from the success handlers for meta, snapshots & events (addresses problem # 2)

It is worth noting that the recording viewed metric will include load_time values for each of events, snapshots and metadata. Given the recording viewed fires when the first snapshot source loads the timings for events and metadata may still be "in progress". If you want accurate timings (e.g. total completion times) for those parameters you should look at the recording loaded event instead.

How did you test this code?

  • Added tests
  • Verified locally
Screenshot 2024-02-09 at 16 15 59
  • metadata does not change between first paint and fully loaded => it had completed before first paint
  • events increased but did not surpass snapshots => events were still loading at the time of the first paint but finished before all snapshots were loaded. Snapshots took the longest
  • first_paint does not change

Copy link
Contributor

github-actions bot commented Feb 9, 2024

Size Change: -50 B (0%)

Total Size: 2.22 MB

ℹ️ View Unchanged
Filename Size Change
frontend/dist/toolbar.js 2.22 MB -50 B (0%)

compressed-size-action

Copy link
Member

@pauldambra pauldambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great!

@daibhin daibhin merged commit 29fec8d into master Feb 9, 2024
79 checks passed
@daibhin daibhin deleted the dn-chore/fix-recordings-performance-metrics branch February 9, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants