Fix performance of splitCssText #1615

eoghanmurray · 2024-12-18T11:24:29Z

See #1603 for an excellent bug report.

I didn't include the benchmark from that report as it didn't demonstrate the pathological cases that were being experienced in the wild, but rather just the degree of slowdown of the 'split' vs. 'no split' code paths.

See #1437 for the context as to why splitCssText exists;
To recap, a <style> element can have multiple text nodes. We currently serialize by processing styleEl.sheet.cssRules into a single string, but if one of the text nodes is programmatically modified (via a text mutation), then we want to be able to map the mutation back to only modify the relevant part, and not blow away the entire css text if we were not to do the split in the first place.

This PR massively improves the performance of the splitting in the case where we need to search through large strings to find similar parts, we need to compare after normalization so there's a lot of back and forth. This PR changes that process to more like a binary search rather than a crawling search which was producing the pathological performance.

changeset-bot · 2024-12-18T11:24:33Z

🦋 Changeset detected

Latest commit: 0fbf355

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages

Name	Type
rrweb-snapshot	Patch
rrweb	Patch
rrdom	Patch
rrdom-nodejs	Patch
rrweb-player	Patch
@rrweb/all	Patch
@rrweb/replay	Patch
@rrweb/record	Patch
@rrweb/types	Patch
@rrweb/packer	Patch
@rrweb/utils	Patch
@rrweb/web-extension	Patch
rrvideo	Patch
@rrweb/rrweb-plugin-console-record	Patch
@rrweb/rrweb-plugin-console-replay	Patch
@rrweb/rrweb-plugin-sequential-id-record	Patch
@rrweb/rrweb-plugin-sequential-id-replay	Patch
@rrweb/rrweb-plugin-canvas-webrtc-record	Patch
@rrweb/rrweb-plugin-canvas-webrtc-replay	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

eoghanmurray · 2025-01-03T11:36:45Z

There's a lot of commits in here that can be ignored as I was iterating trying to fix an issue in Github Actions

eoghanmurray · 2025-01-03T11:38:52Z

.

Juice10 · 2025-01-10T07:08:37Z

Lets add a PR for the GitHub Actions change and merge this one after that has happened

…section onwards

…ge css texts - e.g. for a (doubled) benchmark.css, we were running normalizeCssText 9480 times before k got to the right place

…make better guess at how big a jump to make - can reduce iter_limit from 300 to 50 to prove that this approach is better

…here

Failed to launch the browser process! [...FATAL:zygote_host_impl_linux.cc(117)] No usable sandbox! Update your kernel ...

…fault css values from shorthand properties when retrieved via `sheet.rules[0].cssText`

… if this LTS version also solves: Failed to launch the browser process! [...FATAL:zygote_host_impl_linux.cc(117)] No usable sandbox! Update your kernel ...

…ified - going by https://github.com/puppeteer/puppeteer/blob/5d72c7d3c3e9c8cd53bcb4b597a0924dced1b5fc/docs/supported-browsers.md

… iterating through versions

…72c7d3c3e9c8cd53bcb4b597a0924dced1b5fc/docs/supported-browsers.md

…security/apparmor-userns-restrictions.md

…/docs/security/apparmor-userns-restrictions.md

…' error

…c7d3c3e9c8cd53bcb4b597a0924dced1b5fc/docs/supported-browsers.md

…seemed to solve things

…xed with `runs-on: ubuntu-22.04`

- Fix bug where the right split point was not being picked for the 3rd section onwards - Fix that it wasn't able to find a split when both halves were identical - Add test to put splitCssText through it's paces with a large file - Introduce a limit on the iteration which causes the 'efficiently' test to fail - Fix poor 'crawling' performance in the 'matching' algorithm for large css texts - e.g. for a (doubled) benchmark.css, we were running `normalizeCssText` 9480 times before `k` got to the right place - Further algorithm efficiency: need to take larger jumps; use the scaling factor to make better guess at how big a jump to make

Fixes a browser 'lock up' at record time due to a presence of large amounts of css in <style> elements, which are split over multiple text nodes, which triggers the new code added in #1437 (see that PR for full explanation of why this all exists). #1437 was not written with performance in mind as it was believed to be an edge case, but things like Grammarly browser extension (#1603) among other scenarios were triggering pathological behavior, some of which was solved in #1615. See also #1640 (comment) for further discussion. * Fix the case when there are multiple matches and we end up not finding a unique one - just go with the best guess when there are many splits by looking at the previous chunk's size * Also add '0px' -> '0' stylesheet normalization, which also fixes the sample problem in a different way * Add new test and modify it so that it can trigger a failure in the absence of the '0px' normalization; there may be other unknown ways of triggering a similar bug, so ensure that the primary 'best guess' method doesn't suffer a regression * Leverage the 'best guess' method so that we can quit after 100 iterations trying to find a unique substring; hopefully this bit along with the `iterLimit` already added will prevent any future pathological cases. Failing example extracted from large files identified by Paul D'Ambra (Posthog) ... see comment from MartinWorkfully: PostHog/posthog-js#1668

* Fix up the 'should replace the existing DOM nodes on iframe navigation with `isAttachIframe`' test (rrweb-io#1636) - it was working for me when the test was run in isolation (`-t` option), but when the entire cross-origin-iframes test was run, the change of iframe contents didn't seem to happen in time * [chore]: Update actions/upload-artifact to v4 (rrweb-io#1643) * update actions/upload-artifact to v4 --------- Co-authored-by: Eoghan Murray <eoghan@getthere.ie> * Fix a code path where masking could be skipped on textareas (rrweb-io#1599) * Fixes rrweb-io#1596 * [chore] Cache yarn packages for CI (rrweb-io#1646) * [chore] Cache yarn packages for CI * Cache yarn in release.yml * [chore] Update deprecated download artifact on CI (rrweb-io#1647) * I'm merging even though ESLint is stlll failing in Github Actions as I believe it's running actions _without_ this PR applied yet * Fix env puppeteer error in cross-origin-iframes.test.ts (rrweb-io#1629) * chore(ci): track bundle size (rrweb-io#1630) * chore(ci): track bundle size --------- Co-authored-by: pauldambra <pauldambra@users.noreply.github.com> * Fix adapt css with split (rrweb-io#1600) Fix for rrweb-io#1575 where postcss was raising an exception * adapt the entire CSS as a whole in one pass with postcss, rather than adapting each split part separately * break up the postcss output again and assign to individual text nodes (kind of inverse of splitCssText at record side) * impose an upper bound of 30 iterations on the substring searches to preempt possible pathological behavior * add tests to demonstrate the scenario and prevent regression More technical details: * Fix algorithm; checks against `ix_end` within loop were incorrect when `ix_start` was bigger than zero. * Fix that length check against wrong array was causing 'should record style mutations with multiple child nodes and replay them correctly' test to fail. Note on last point: I haven't looked into things more deeply than that the test was complaining about missing .length after `replayer.pause(1000);` * Warn instead of fail on exceptions thrown from postcss (rrweb-io#1580) * postcss was introduced in rrweb-io#1458 for use within adaptCssForReplay * rrweb-io#1600 fixes the main case where invalid css could be introduced when if valid css from the output of `sheet.cssRules` was split according to how it was split across text nodes of the <style> * the guard introduced here is still useful as we likely in future will switch to capturing the raw stylesheet contents (both <style> and <link>), at which point we will be much less confident of getting valid css * Fix splitCssText again (rrweb-io#1640) Fixes a browser 'lock up' at record time due to a presence of large amounts of css in <style> elements, which are split over multiple text nodes, which triggers the new code added in rrweb-io#1437 (see that PR for full explanation of why this all exists). rrweb-io#1437 was not written with performance in mind as it was believed to be an edge case, but things like Grammarly browser extension (rrweb-io#1603) among other scenarios were triggering pathological behavior, some of which was solved in rrweb-io#1615. See also rrweb-io#1640 (comment) for further discussion. * Fix the case when there are multiple matches and we end up not finding a unique one - just go with the best guess when there are many splits by looking at the previous chunk's size * Also add '0px' -> '0' stylesheet normalization, which also fixes the sample problem in a different way * Add new test and modify it so that it can trigger a failure in the absence of the '0px' normalization; there may be other unknown ways of triggering a similar bug, so ensure that the primary 'best guess' method doesn't suffer a regression * Leverage the 'best guess' method so that we can quit after 100 iterations trying to find a unique substring; hopefully this bit along with the `iterLimit` already added will prevent any future pathological cases. Failing example extracted from large files identified by Paul D'Ambra (Posthog) ... see comment from MartinWorkfully: PostHog/posthog-js#1668 * fix: move patch function into utils to improve bundling (rrweb-io#1631) * fix: move patch function into utils to improve bundling --------- Co-authored-by: pauldambra <pauldambra@users.noreply.github.com> Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com> --------- Co-authored-by: Eoghan Murray <eoghan@getthere.ie> Co-authored-by: Kevin Townsend <11738094+kevinatown@users.noreply.github.com> Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com> Co-authored-by: Paul D'Ambra <paul@posthog.com> Co-authored-by: pauldambra <pauldambra@users.noreply.github.com> Co-authored-by: John Henry Gunther <jguntherenator@gmail.com>

eoghanmurray changed the title ~~Fix performance of aplitCssText~~ Fix performance of splitCssText Dec 18, 2024

eoghanmurray mentioned this pull request Dec 18, 2024

[Bug]: splitCssText causes degraded performance when recording #1603

Closed

1 task

eoghanmurray force-pushed the fix-1603 branch from 03f7351 to 6d3047a Compare December 20, 2024 15:23

Juice10 approved these changes Jan 10, 2025

View reviewed changes

eoghanmurray added 22 commits January 10, 2025 18:23

Fix bug where the right split point was not being picked for the 3rd …

a065eaf

…section onwards

Add test to put splitCssText through it's paces with a large file

e2fe660

Introduce a limit which causes the 'efficiently' test to fail

2f663d4

Fix that it wasn't able to find a split when both halves were identical

8ba3b54

Fix poor 'crawling' performance in this part of the algorithm for lar…

61e8b5f

…ge css texts - e.g. for a (doubled) benchmark.css, we were running normalizeCssText 9480 times before k got to the right place

Need to take larger jumps to be efficient; use the scaling factor to …

3474cb2

…make better guess at how big a jump to make - can reduce iter_limit from 300 to 50 to prove that this approach is better

Add changeset

906dd3b

Presuming the match on a character is more eficient than the indexOf …

e0fb33d

…here

Fix eslint

f5fcfa5

Update puppeteer to try to solve the following issue in github actions:

a797e96

Failed to launch the browser process! [...FATAL:zygote_host_impl_linux.cc(117)] No usable sandbox! Update your kernel ...

Bump Chrome to see if it solves following issue in github actions:

defd363

Failed to launch the browser process! [...FATAL:zygote_host_impl_linux.cc(117)] No usable sandbox! Update your kernel ...

Both Firefox and the 131 version of Chrome I've upgraded to remove de…

ad42cba

…fault css values from shorthand properties when retrieved via `sheet.rules[0].cssText`

Drop chrome down again as getting overwhelmed with issues so will see…

dec4ad5

… if this LTS version also solves: Failed to launch the browser process! [...FATAL:zygote_host_impl_linux.cc(117)] No usable sandbox! Update your kernel ...

Bump all puppeteer to match the LTS version of chrome we've just spec…

7d1e100

…ified - going by https://github.com/puppeteer/puppeteer/blob/5d72c7d3c3e9c8cd53bcb4b597a0924dced1b5fc/docs/supported-browsers.md

Fix json error

0c809e1

The previous version combo didn't install, trying next one up

2bae508

Still getting 'zygote_host_impl_linux.cc(128)] No usable sandbox!' so…

c43d695

… iterating through versions

Keep searching through https://github.com/puppeteer/puppeteer/blob/5d…

e16e2ac

…72c7d3c3e9c8cd53bcb4b597a0924dced1b5fc/docs/supported-browsers.md

Following https://chromium.googlesource.com/chromium/src/+/main/docs/…

c0bf8f8

…security/apparmor-userns-restrictions.md

Continuing with https://chromium.googlesource.com/chromium/src/+/main…

417a831

…/docs/security/apparmor-userns-restrictions.md

Another bump to find minimum version to remove the 'No Usable Sandbox…

a72a28e

…' error

Final bump to top of https://github.com/puppeteer/puppeteer/blob/5d72…

a677fb6

…c7d3c3e9c8cd53bcb4b597a0924dced1b5fc/docs/supported-browsers.md

eoghanmurray added 3 commits January 10, 2025 18:23

Further bump of Chrome as issue still isn't solved - this previously …

9086b94

…seemed to solve things

Revert all puppeteer and chrome related changes as the problem was fi…

724a27b

…xed with `runs-on: ubuntu-22.04`

I keep accidentally forgetting camelCase

0fbf355

eoghanmurray force-pushed the fix-1603 branch from e5f22e3 to 0fbf355 Compare January 10, 2025 18:23

eoghanmurray merged commit dc20cd4 into rrweb-io:master Jan 10, 2025
4 checks passed

This was referenced Jan 10, 2025

Version Packages (alpha) #1605

Open

Version Packages (alpha) stemcloudmedia/rrweb#1

Open

pauldambra mentioned this pull request Jan 19, 2025

feat: update player to rrweb 18 PostHog/posthog#27674

Closed

github-actions bot mentioned this pull request Jan 21, 2025

Version Packages (alpha) pendo-io/rrweb#10

Open

kevinansfield mentioned this pull request Jan 22, 2025

Some CSS hits a pathological case in the rrweb splitCssText pathway causing slowdown in processing PostHog/posthog-js#1668

Open

pauldambra mentioned this pull request Jan 22, 2025

fix: patch for css parsing performance PostHog/posthog-js#1670

Merged

eoghanmurray mentioned this pull request Jan 29, 2025

Fix splitCssText again #1640

Merged

This was referenced Feb 4, 2025

Version Packages (alpha) Midpath-Software/rrweb#3

Merged

Version Packages (alpha) kevinatown/rrweb#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix performance of splitCssText #1615

Fix performance of splitCssText #1615

eoghanmurray commented Dec 18, 2024 •

edited

Loading

changeset-bot bot commented Dec 18, 2024 •

edited

Loading

eoghanmurray commented Jan 3, 2025

eoghanmurray commented Jan 3, 2025 •

edited

Loading

Juice10 commented Jan 10, 2025

Fix performance of splitCssText #1615

Fix performance of splitCssText #1615

Conversation

eoghanmurray commented Dec 18, 2024 • edited Loading

changeset-bot bot commented Dec 18, 2024 • edited Loading

🦋 Changeset detected

eoghanmurray commented Jan 3, 2025

eoghanmurray commented Jan 3, 2025 • edited Loading

Juice10 commented Jan 10, 2025

eoghanmurray commented Dec 18, 2024 •

edited

Loading

changeset-bot bot commented Dec 18, 2024 •

edited

Loading

eoghanmurray commented Jan 3, 2025 •

edited

Loading