Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare benchmarks + ~1.7x prepare perf improvement #121

Merged
merged 6 commits into from
Nov 28, 2024

Conversation

taj-p
Copy link
Contributor

@taj-p taj-p commented Nov 26, 2024

Intent

I am considering using this library for rendering on low tier devices. In evaluating the library, I discovered that there is a significant amount of time spent in CPU maintaining the 2 LRUs and Hash Sets during the prepare (roughly 40-50% of the time in spent in CPU for an example application I used):

image

I'm taking a look to find any low hanging fruit that could improve prepare time. I think the first step is to add in some benchmarks. I've added 4:

  1. A single text area for Moby Dick Chapter 1 (the same used by Cosmic Text here)
  2. A single text area for some Arabic text
  3. Many text areas for Moby Dick Chapter 1
  4. Many text areas for some Arabic text

The benchmarks can be run with cargo bench:

❯ cargo bench
...

Prepare/Moby Dick Chapter 1 - Single Text Area
                        time:   [214.20 µs 215.12 µs 216.24 µs]
                        change: [-0.4030% +0.2005% +1.1070%] (p = 0.60 > 0.05)
                        No change in performance detected.
...
Prepare/Arabic - Single Text Area
                        time:   [42.637 µs 43.086 µs 43.507 µs]
                        change: [-3.3350% -1.5361% +0.2985%] (p = 0.10 > 0.05)
                        No change in performance detected.
...
Prepare/Moby Dick Chapter 1 - Many Text Areas
                        time:   [73.678 ms 73.795 ms 73.936 ms]
                        change: [-1.7922% -1.0727% -0.5384%] (p = 0.00 < 0.05)
                        Change within noise threshold.
...
Prepare/Arabic - Many Text Areas
                        time:   [26.608 ms 26.826 ms 27.157 ms]
                        change: [-0.4681% +0.4464% +1.7030%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
...

Low Hanging Fruit

~1.7x faster => Skipping unnecessary peeks - 717474f

I noticed that when preparing a glyph, we check both LRUs if it contains the glyph, if it does, we promote them. Then, outside of the block, we peek into both LRUs trying to find the glyph (with an associated .unwrap).

We can both promote and get the glyph using [lru].get(...). This enables us to:

  • Perform 1 lookup on each LRU at most once instead of twice
  • Prevent unnecessary lookup into an LRU using atlas.glyph when we know what LRU to lookup
Test Improvement
Moby Dick Ch 1 -42%
Arabic -24%
Moby Dick - Many -39%
Arabic - Many -43%
image

@taj-p taj-p changed the title Prepare Benchmarks Prepare Benchmarks + Low Hanging Perf Improvements Nov 26, 2024
@taj-p taj-p changed the title Prepare Benchmarks + Low Hanging Perf Improvements Prepare Benchmarks + ~1.7x low hanging perf improvement Nov 26, 2024
@taj-p taj-p changed the title Prepare Benchmarks + ~1.7x low hanging perf improvement Prepare benchmarks + ~1.7x low hanging perf improvement Nov 26, 2024
src/text_atlas.rs Outdated Show resolved Hide resolved
src/text_render.rs Outdated Show resolved Hide resolved
@taj-p taj-p marked this pull request as ready for review November 26, 2024 23:50
@taj-p taj-p changed the title Prepare benchmarks + ~1.7x low hanging perf improvement Prepare benchmarks + ~1.7x perf improvement Nov 27, 2024
@taj-p taj-p changed the title Prepare benchmarks + ~1.7x perf improvement Prepare benchmarks + ~1.7x prepare perf improvement Nov 27, 2024
@grovesNL
Copy link
Owner

Thanks! This looks great to me.

Could you include a reference for the samples and ensure they're licensed in a way that we can include them here? Otherwise we could create our own sample.

@taj-p taj-p force-pushed the tajp/bench branch 2 times, most recently from bf11fb8 to 7c966ea Compare November 28, 2024 09:24
@taj-p
Copy link
Contributor Author

taj-p commented Nov 28, 2024

Thanks! This looks great to me.

Thanks @grovesNL for the speedy attention 🙏 .

Could you include a reference for the samples and ensure they're licensed in a way that we can include them here? Otherwise we could create our own sample.

Yes, definitely - thank you for the good suggestion and catch. I've updated the sources in 093eed7 and attributed them in samples/README.md.

Copy link
Owner

@grovesNL grovesNL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

@grovesNL grovesNL merged commit 1055c2e into grovesNL:main Nov 28, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants