Prepare benchmarks + ~1.7x prepare perf improvement #121

taj-p · 2024-11-26T22:31:11Z

Intent

I am considering using this library for rendering on low tier devices. In evaluating the library, I discovered that there is a significant amount of time spent in CPU maintaining the 2 LRUs and Hash Sets during the prepare (roughly 40-50% of the time in spent in CPU for an example application I used):

I'm taking a look to find any low hanging fruit that could improve prepare time. I think the first step is to add in some benchmarks. I've added 4:

A single text area for Moby Dick Chapter 1 (the same used by Cosmic Text here)
A single text area for some Arabic text
Many text areas for Moby Dick Chapter 1
Many text areas for some Arabic text

The benchmarks can be run with cargo bench:

❯ cargo bench
...

Prepare/Moby Dick Chapter 1 - Single Text Area
                        time:   [214.20 µs 215.12 µs 216.24 µs]
                        change: [-0.4030% +0.2005% +1.1070%] (p = 0.60 > 0.05)
                        No change in performance detected.
...
Prepare/Arabic - Single Text Area
                        time:   [42.637 µs 43.086 µs 43.507 µs]
                        change: [-3.3350% -1.5361% +0.2985%] (p = 0.10 > 0.05)
                        No change in performance detected.
...
Prepare/Moby Dick Chapter 1 - Many Text Areas
                        time:   [73.678 ms 73.795 ms 73.936 ms]
                        change: [-1.7922% -1.0727% -0.5384%] (p = 0.00 < 0.05)
                        Change within noise threshold.
...
Prepare/Arabic - Many Text Areas
                        time:   [26.608 ms 26.826 ms 27.157 ms]
                        change: [-0.4681% +0.4464% +1.7030%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
...

Low Hanging Fruit

~1.7x faster => Skipping unnecessary peeks - `717474f`

I noticed that when preparing a glyph, we check both LRUs if it contains the glyph, if it does, we promote them. Then, outside of the block, we peek into both LRUs trying to find the glyph (with an associated .unwrap).

We can both promote and get the glyph using [lru].get(...). This enables us to:

Perform 1 lookup on each LRU at most once instead of twice
Prevent unnecessary lookup into an LRU using atlas.glyph when we know what LRU to lookup

Test	Improvement
Moby Dick Ch 1	-42%
Arabic	-24%
Moby Dick - Many	-39%
Arabic - Many	-43%

src/text_atlas.rs

src/text_render.rs

grovesNL · 2024-11-27T20:04:31Z

Thanks! This looks great to me.

Could you include a reference for the samples and ensure they're licensed in a way that we can include them here? Otherwise we could create our own sample.

taj-p · 2024-11-28T09:28:57Z

Thanks! This looks great to me.

Thanks @grovesNL for the speedy attention 🙏 .

Could you include a reference for the samples and ensure they're licensed in a way that we can include them here? Otherwise we could create our own sample.

Yes, definitely - thank you for the good suggestion and catch. I've updated the sources in 093eed7 and attributed them in samples/README.md.

grovesNL

Great, thank you!

taj-p added 3 commits November 27, 2024 07:02

Add prepare benchmarks

73572c0

.

265d0e0

.

eaeeb42

taj-p force-pushed the tajp/bench branch from d02c4e9 to 8372d0a Compare November 26, 2024 22:34

Skip unnescessary peaks

717474f

taj-p force-pushed the tajp/bench branch from 8372d0a to 717474f Compare November 26, 2024 22:39

taj-p changed the title ~~Prepare Benchmarks~~ Prepare Benchmarks + Low Hanging Perf Improvements Nov 26, 2024

taj-p changed the title ~~Prepare Benchmarks + Low Hanging Perf Improvements~~ Prepare Benchmarks + ~1.7x low hanging perf improvement Nov 26, 2024

taj-p changed the title ~~Prepare Benchmarks + ~1.7x low hanging perf improvement~~ Prepare benchmarks + ~1.7x low hanging perf improvement Nov 26, 2024

taj-p commented Nov 26, 2024

View reviewed changes

src/text_atlas.rs Outdated Show resolved Hide resolved

taj-p commented Nov 26, 2024

View reviewed changes

src/text_render.rs Outdated Show resolved Hide resolved

taj-p marked this pull request as ready for review November 26, 2024 23:50

Avoid .unwrap

52c099c

taj-p changed the title ~~Prepare benchmarks + ~1.7x low hanging perf improvement~~ Prepare benchmarks + ~1.7x perf improvement Nov 27, 2024

taj-p changed the title ~~Prepare benchmarks + ~1.7x perf improvement~~ Prepare benchmarks + ~1.7x prepare perf improvement Nov 27, 2024

taj-p force-pushed the tajp/bench branch 2 times, most recently from bf11fb8 to 7c966ea Compare November 28, 2024 09:24

Cite sample sources

093eed7

taj-p force-pushed the tajp/bench branch from 7c966ea to 093eed7 Compare November 28, 2024 09:28

grovesNL approved these changes Nov 28, 2024

View reviewed changes

grovesNL merged commit 1055c2e into grovesNL:main Nov 28, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare benchmarks + ~1.7x prepare perf improvement #121

Prepare benchmarks + ~1.7x prepare perf improvement #121

taj-p commented Nov 26, 2024 •

edited

Loading

grovesNL commented Nov 27, 2024

taj-p commented Nov 28, 2024

grovesNL left a comment

Prepare benchmarks + ~1.7x prepare perf improvement #121

Prepare benchmarks + ~1.7x prepare perf improvement #121

Conversation

taj-p commented Nov 26, 2024 • edited Loading

Intent

Low Hanging Fruit

~1.7x faster => Skipping unnecessary peeks - 717474f

grovesNL commented Nov 27, 2024

taj-p commented Nov 28, 2024

grovesNL left a comment

Choose a reason for hiding this comment

taj-p commented Nov 26, 2024 •

edited

Loading

~1.7x faster => Skipping unnecessary peeks - `717474f`