Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add v8.0 Performance FAQ #12597

Merged
merged 13 commits into from
Jun 2, 2021
31 changes: 16 additions & 15 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,20 @@
We expect this release to ship in the DevTools of [Chrome 93](https://chromiumdash.appspot.com/schedule), and to PageSpeed Insights within a day!

## Notable changes
* The **Performance Category** had a number of scoring changes to align with other performance tools and to better reflect the state of the web:
- TODO: decide on FAQ incorporation and either replace these with that or flesh these out
- scoring: rebalance perf metric weightings ([#12577](https://github.com/GoogleChrome/lighthouse/pull/12577))
- scoring: update TBT score curve ([#12576](https://github.com/GoogleChrome/lighthouse/pull/12576))
- update cumulative-layout-shift ([#12554](https://github.com/GoogleChrome/lighthouse/pull/12554))
- update FCP score curve ([#12556](https://github.com/GoogleChrome/lighthouse/pull/12556))
* The **Performance Category** had a number of scoring changes to align with other performance tools and to better reflect the state of the web.
- The Performance Score has been reweighted ([#12577](https://github.com/GoogleChrome/lighthouse/pull/12577))
- The TBT and FCP score curves have been updated ([#12576](https://github.com/GoogleChrome/lighthouse/pull/12576), [#12556](https://github.com/GoogleChrome/lighthouse/pull/12556))
- CLS has been updated to its new, windowed
definition ([#12554](https://github.com/GoogleChrome/lighthouse/pull/12554))

See the [v8.0 Performance FAQ](https://github.com/GoogleChrome/lighthouse/blob/master/docs/v8-perf-faq.md) for more detail.

<img width="550" alt="the new metric weightings in the Lighthouse score calculator" src="https://user-images.githubusercontent.com/39191/120410971-de337100-c308-11eb-9fb6-368a33c0855e.png">


* The report includes a new metric filter. Pick a metric to focus on the opportunities and diagnostics most relevant to improving just that metric:

<img width="583" alt="the new metric filter in the lighthouse report" src="https://user-images.githubusercontent.com/316891/120384128-61de6500-c2eb-11eb-9d15-5b92981d897e.png">
<img width="350" alt="the new metric filter in the lighthouse report" src="https://user-images.githubusercontent.com/316891/120384128-61de6500-c2eb-11eb-9d15-5b92981d897e.png">
* The [Lighthouse Treemap](#treemap-release) is now available across all the major Lighthouse clients. If your site exposes source maps to Lighthouse, look for the "View Treemap" button to see a breakdown of your shipped JavaScript, filterable by size and coverage on load.

## 🆕 New audits
Expand Down Expand Up @@ -88,23 +93,19 @@ Thanks to our new contributor 👽🐷🐰🐯🐻!
<a name="treemap-release"></a>
We are releasing the Lighthouse Treemap!

![image](https://user-images.githubusercontent.com/4071474/118602146-2d08d480-b767-11eb-9273-9a8de7000e67.png)
<a href="https://user-images.githubusercontent.com/4071474/118602146-2d08d480-b767-11eb-9273-9a8de7000e67.png"><img src="https://user-images.githubusercontent.com/4071474/118602146-2d08d480-b767-11eb-9273-9a8de7000e67.png" width="48%"></a> <a href="https://user-images.githubusercontent.com/4071474/118602240-4742b280-b767-11eb-9f6a-433788029a30.png"><img src="https://user-images.githubusercontent.com/4071474/118602240-4742b280-b767-11eb-9f6a-433788029a30.png" width="48%"></a>

You may already be familiar with treemaps thanks to [webtreemap](https://github.com/evmar/webtreemap) (which we use!) or [source-map-explorer](https://github.com/danvk/source-map-explorer). With Lighthouse Treemap, you'll be able to view all the JavaScript bundles on your page easily from a Lighthouse report, in addition to some insights that may help reduce the amount of JavaScript on a page. The only requirement is that source maps are accessible (either publicly, or securely from the same computer that is running the Lighthouse audit).

We even collect code coverage data from Chrome, and extrapolate the coverage of individual modules in a bundle:

![image](https://user-images.githubusercontent.com/4071474/118602240-4742b280-b767-11eb-9f6a-433788029a30.png)

Note: this only takes into account a cold-load: code only used after user interaction will be marked as unused. Stay tuned for a future release, which will enable you to configure user flows and capture even more accurate performance insights.
We even collect **code coverage** data from Chrome, and extrapolate the coverage of individual modules in a bundle. Note: this only takes into account a cold-load: code only used after user interaction will be marked as unused. Stay tuned for a future release, which will enable you to configure user flows and capture even more accurate performance insights.

If we detect a large module included by multiple bundles, we'll alert you of that too.

You can access Lighthouse Treemap from the report:

![image](https://user-images.githubusercontent.com/4071474/118600879-6e4cb480-b766-11eb-85f7-e8281006711b.png)
<img src="https://user-images.githubusercontent.com/39191/120411450-aed13400-c309-11eb-9746-2f07d5efd276.png" width="350">

Currently, only reports generated with the Lighthouse Node CLI will connect to the Lighthouse Treemap App. We are working on adding this functionality to Lighthouse in DevTools and PageSpeed Insights.
Currently, only reports generated with the Lighthouse Node CLI will connect to the Lighthouse Treemap App. This functionality will be in DevTools and PageSpeed Insights as of Lighthouse v8.0.

## Core

Expand Down
244 changes: 244 additions & 0 deletions docs/v8-perf-faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
# v8.0 Performance FAQ

### Give me a summary of the perf score changes in v8.0. What's new/different?

First, it may be useful to refresh on [the math behind Lighthouse's metric
scores and performance score.](https://web.dev/performance-scoring/)

In [Lighthouse v8.0](https://github.com/GoogleChrome/lighthouse/releases/tag/v8.0.0), we updated the score curves for FCP and TBT measurements,
making both a bit more strict. CLS has been updated to its new, [windowed
definition](https://web.dev/evolving-cls/). Additionally, the Performance
Score's weighted average was
[rebalanced](https://googlechrome.github.io/lighthouse/scorecalc/#FCP=3000&SI=5800&FMP=4000&TTI=7300&FCI=6500&LCP=4000&TBT=600&CLS=0.25&device=mobile&version=8&version=6&version=5),
giving more weight to CLS and TBT than before, and slightly decreasing the
weights of FCP, SI, and TTI.

From an analysis of HTTP Archive's latest [crawl of the
web](https://httparchive.org/faq#how-does-the-http-archive-decide-which-urls-to-test),
we project that the performance score for the majority of sites will stay the
same or improve in Lighthouse 8.0.
- ~20% of sites may see a drop of up to 5 points, though likely less
- ~20% of sites will see little detectable change
- ~30% of sites should see a moderate improvement of a few points
- ~30% could see a significant improvement of 5 points or more

The biggest drops in scores are due to TBT scoring becoming stricter and the
metric's slightly higher weight. The biggest improvements in scores are also due
to TBT changes in the long tail and the windowing of CLS, and both metrics'
higher weights.

### What are the exact score weighting changes?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aw, the github table formatting is kind of a bummer. Maybe throw some italics on the v6 weights column to deemphasize? Should we separate the phase weights somehow (separate table? put the metric names in the middle with different weights on each end?) to make it more readable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempted something. added some italics on all the phases. i can also try splitting the table out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempted something. added some italics on all the phases. i can also try splitting the table out.

ah, looks great split like that. Sorry to send you down this rabbit hole. Github really just needs to add one or two more table styling options


#### Changes by metric

| metric | v6 weight | v8 weight | Δ |
|--------------------------------|-----------|-----------|----------|
| First Contentful Paint (FCP) | 15 | **10** | -5 |
| Speed Index (SI) | 15 | **10** | -5 |
| Largest Contentful Paint (LCP) | 25 | **25** | 0 |
| Time To Interactive (TTI) | 15 | **10** | -5 |
| Total Blocking Time (TBT) | 25 | **30** | 5 |
| Cumulative Layout Shift (CLS) | 5 | **15** | 10 |

#### Changes by phase

| phase | metric | v6 phase weight | v8 phase weight | Δ |
|----------------|--------------------------------|-----------------|-----------------|-----|
| early | First Contentful Paint (FCP) | 15 | 10 | -5 |
| mid | Speed Index (SI) | 40 | 35 | -5 |
| | Largest Contentful Paint (LCP) | | | |
| interactivity | Time To Interactive (TTI) | 40 | 40 | 0 |
| | Total Blocking Time (TBT) | | | |
| predictability | Cumulative Layout Shift (CLS) | 5 | 15 | 10 |

### Why did the weight of CLS go up?

When introduced in Lighthouse v6, it was still early days for the metric.
There've been [many improvements and
bugfixes](https://chromium.googlesource.com/chromium/src/+/refs/heads/main/docs/speed/metrics_changelog/cls.md)
to the CLS metric since then. Now, given its maturity and established placement in Core
Web Vitals, the weight increases from 5% to 15%.

### Why are the Core Web Vitals metrics weighted differently in the performance score?

The Core Web Vitals metrics are [independent signals in the Page Experience
ranking
update](https://support.google.com/webmasters/thread/104436075/core-web-vitals-page-experience-faqs-updated-march-2021).
Lighthouse weighs each lab equivalent metric based on what we believe creates
the best incentives to improve overall page experience for users.

LCP, CLS, and TBT are [very good
metrics](https://chromium.googlesource.com/chromium/src/+/lkgr/docs/speed/good_toplevel_metrics.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LCP, CLS, and TBT are very good
metrics

https://twitter.com/dog_rates/status/775410014383026176

and that's why they are the three highest-weighted metrics in the performance
score.

### How should I think about the Lighthouse performance score in relation to Core Web Vitals?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump on my offer :)


[Core Web Vitals](https://web.dev/vitals/) refer to a specific set of key user
experience metrics, their passing thresholds, and percentile at which they're measured.
In general, CWV's primary focus is field data.

The Lighthouse score is a means to understand the degree of opportunity
available to improve critical elements of user experience. The lower the score,
the more likely the user will struggle with load performance, responsiveness, or
content stability.

Lighthouse's lab-based data overlaps with Core Web Vitals in a few key ways.
Lighthouse features two of the three core vitals (LCP and CLS) with the exact
same passing thresholds. There's no user input in a Lighthouse run, so it cannot
compute FID. Instead, we have TBT, which you can consider a proxy metric for
FID, and though they measure two different things they are both signals about a
page's interactivity.

_So CWV and Lighthouse have commonalities, but are different. How can you
rationalize paying attention to both?_

Ultimately, a combination approach is most effective. Use field data for the
long-term overview of your user's experience, and use lab data to iterate your
way to the best experience possible for your users. CrUX data summarizes [the
most recent 28
days](https://developers.google.com/web/tools/chrome-user-experience-report/api/reference#data-pipeline),
so it'll take some time to confidently determine that any change has definite
impact.

Lighthouse's analysis allows you to debug and optimize in an environment that is
repeatable with an immediate feedback loop. In addition, lab-based tooling can
provide significantly more detail than field instrumentation, as it's not
limited to web-exposed APIs and cross-origin restrictions.

The exact numbers of your lab and field metrics aren't expected to match, but
any substantial improvement to your lab metrics should be observable in the
field once it's been deployed. The higher the Lighthouse score, the less you're
leaving up to chance in the field.

### What blindspots from the field does lab tooling illuminate?

Field data analyzes all successful page loads. Lab tooling analyzes the
experience of a fixed configuration for a hypothetical user. If every potential
user in the world successfully loaded an equal number of pages on your site, we
might not need to focus on the experience of a hypothetical one, but in reality
we know this isn't the case. Users who have better experiences use your site
more; that's why we care about performance in the first place! Lab tooling shows
you the quality of the experience for these hypothetical users that field data
might be [missing
entirely](https://blog.chriszacharias.com/page-weight-matters).

Lighthouse mobile reports emulate a slow 4G connection on a mid-tier Android
device. While field data might not indicate these conditions are especially
common for your site, analyzing how your site performs in these tougher
conditions helps expand your site's audience. Lighthouse identifies the worst
experiences, experiences you can't see in the field because they were so bad the
user never came back (or waited around in the first place).

### How should I work to optimize CLS differently given that it has been updated?

The [windowing
adjustment](https://www.google.com/url?q=https://web.dev/evolving-cls/&sa=D&source=editors&ust=1622570731600000&usg=AOvVaw2R7Y5uFrQX7Mpdj__5SdYq)
will likely not have much effect for the lab measurement, but instead will have
a large effect on the field CLS for long-lived pages.

Lighthouse 8 introduces another adjustment to our CLS definition: including
layout shift contributions from subframes. This brings our implementation in
line with how CrUX computes field CLS. This comes with the implication that
iframes (including ones you may not control) may be adding layout shifts which
ultimately affect your CLS score. Keep in mind that the subframe contributions
are [weighted by the in-viewport
portion](https://github.com/WICG/layout-instability#cumulative-scores) of the
iframe.

### Why don't the numbers for TBT and FID match, if TBT is a proxy metric for FID?

The commonality between TBT (collected in a lab environment) and FID (collected
in a field context) is that they measure the impact on input responsiveness from
long tasks on the main thread. Beyond that, they're quite different. FID
captures the delay in handling the first input event of the page, whenever that
input happened. TBT roughly captures how dangerous the length of all the main
thread's tasks are.

It's very possible to have a page that does well on FID, but poorly on TBT. And
it's slightly harder, but possible, to do well on TBT but poorly on FID\*. So,
you shouldn't expect your TBT and FID measurements to correlate strongly. A
large-scale analysis found their [Spearman's
ρ](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient) at
about 0.40, which indicates a connection, but not one as strong as many would
prefer.

From the Lighthouse project's perspective, the current passing threshold for FID
is quite lenient but more importantly, the percentile-of-record for FID (75th
percentile) is not sufficient for detecting issues. The 95th percentile is a
much stronger indicator of problematic interactions for this metric. We
encourage user-centric teams to focus on the 95th percentile of all input delays
(not just the first) in their field data in order to identify and address
problems that surface just 5% of the time.

\*Aside: the [Chrome 91 FID change for
double-tap-to-zoom](https://chromium.googlesource.com/chromium/src.git/+/refs/heads/main/docs/speed/metrics_changelog/2021_05_fid.md)
fixes a lot of high FID / low TBT cases and may be observable in your field
metrics, with higher percentiles improving slightly. Most remaining high FID /
low TBT cases are likely due to incorrect meta viewport tags, which [Lighthouse
will
flag](https://www.google.com/url?q=https://web.dev/viewport/&sa=D&source=editors&ust=1622651275263000&usg=AOvVaw1OS_kJ9oNMlPSjIJbFy7c8).
Delivering a mobile-friendly viewport, reducing main-thread blocking JS, and
keeping your TBT low is the best defense against bad FID in the field.

### Overall, what motivated the changes to the performance score?

As with all Lighthouse score updates, changes are made to reflect
the latest in how to measure user-experience quality holistically and accurately,
and to focus attention on key priorities.

Heavy JS and long tasks are a problem for the web that's
[worsening](https://httparchive.org/reports/state-of-javascript#bytesJs). Field
FID is currently too lenient and not sufficiently incentivizing action to
address the problem. Lighthouse has historically weighed its interactivity
metrics at 40-55% of the performance score and—as interactivity is key to user
experience—we maintain a 40% weighting (TBT and TTI together) in Lighthouse
8.0.

[FCP's score curve was
adjusted](https://github.com/GoogleChrome/lighthouse/pull/12556) to align with
the current de facto ["good" threshold](https://web.dev/fcp/#what-is-a-good-fcp-score),
and as a result will score a bit more strictly.

The curve for TBT was made stricter to [more closely
approach](https://github.com/GoogleChrome/lighthouse/pull/12576) the ideal score
curve. TBT has had (and still has) a more lenient curve than our methodology
dictates, but the new curve is more linear which means there's a larger range
where improvements in the metric are rewarded with improvements in the score. If
your page currently scores poorly with TBT, the new curve will be more
responsive to changes as page performance incrementally improves.

FCP's weight drops slightly from 15% to 10% because it's fairly gameable and is also partly
captured by Speed Index.

### What's the story with TTI?

TTI serves a useful role as it's the largest metric value reported (often &gt;10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TTI serves a useful role as it's the largest metric value reported (often &gt;10
TTI serves a useful role as it's the longest metric value reported (often &gt;10

maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels weird as longest. but also weird as largest. ohwell

seconds) and helps anchor perceptions.

We see TBT as a stronger metric for evaluating the health of your main thread
and its impact on interactivity, plus it [has lower
variability](https://docs.google.com/document/d/1xCERB_X7PiP5RAZDwyIkODnIXoBk-Oo7Mi9266aEdGg/edit).
TTI serves as a nice complement that captures the cost of long tasks, often
from heavy JavaScript. That said, we expect to continue to reduce the weight
of TTI and will likely remove it in a future major Lighthouse release.

### How does the Lighthouse Perf score get calculated? What is it based on?

The Lighthouse perf score is calculated from a weighted, blended set of
performance metrics. You can see the current and previous Lighthouse score
compositions (which metrics we are blending together, and at what weights) in
the [score
calculator](https://googlechrome.github.io/lighthouse/scorecalc/#FCP=3000&SI=5800&FMP=4000&TTI=7300&FCI=6500&LCP=4000&TBT=600&CLS=0.25&device=mobile&version=8&version=6&version=5),
and learn more about the [calculation specifics
here](https://web.dev/performance-scoring/).

### What is the most exciting update in LH v8?

We're really excited about the [interactive
treemap](https://github.com/GoogleChrome/lighthouse/blob/v8changelog/changelog.md#treemap-release),
[filtering audits by
metric](https://github.com/GoogleChrome/lighthouse/blob/v8changelog/changelog.md#:~:text=new%20metric%20filter),
and the new [Content Security Policy
audit](https://web.dev/strict-csp/#adopting-a-strict-csp), which was a
collaboration with the Google Web Security team.