Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC 392 #493

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@

Read this [Help Center article on funnel analysis' FAQs](https://help.amplitude.com/hc/en-us/articles/360054203872) to learn more.

{{#
## Case 2: Analyze your experiment's results based on a subset of users

Imagine your experiment targets all users, but you want to take a deeper look at the experiment's effect on a subset of users, such as exposed users in the United States only. It may be tempting to simply add a filter on the country property; however, this doesn't generate the results you expect.
Expand All @@ -60,7 +61,8 @@
Be cautious of analyzing your experiment's results based on just one subset. You may encounter a false positive when looking for true statistically significant results.

Remember that when you run a [multiple hypothesis test](/docs/feature-experiment/advanced-techniques/multiple-hypothesis-testing) in this situation, you're actually running a separate hypothesis test for each segment. You may see a positive lift with one subset and a negative with another subset. Your decision whether to roll out or roll back in these situations isn't clear-cut. One option is to roll out only to the group that shows positive lift.
#}}

## Case 3: Threshold Metrics
## Case 2: Threshold Metrics

Check warning on line 66 in content/collections/advanced-techniques/en/advanced-metric-use-cases.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Headings] 'Case 2: Threshold Metrics' should use sentence-style capitalization. Raw Output: {"message": "[Amplitude.Headings] 'Case 2: Threshold Metrics' should use sentence-style capitalization.", "location": {"path": "content/collections/advanced-techniques/en/advanced-metric-use-cases.md", "range": {"start": {"line": 66, "column": 4}}}, "severity": "WARNING"}

Sometimes you want to define a success as a user doing an event multiple times. In other words, if the user needs to buy something 3 times to count as a conversion. You can achieve this by creating a funnel counting by uniques with 3 purchase events.
2 changes: 1 addition & 1 deletion content/collections/experiment/en/analysis-view.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Hover over a metric's name to see its definition.
Click the metric dropdown below the table to update analysis charts for that metric: confidence interval over time, mean over time, and the bar chart or funnel chart.

{{partial:admonition type='note'}}
If you want to look at segments of users, from the *Analysis* card click *Open in Chart,* then add a *where* clause by clicking on *Select property... .* This allows you to review results by a specific group of users. 
If you want to look at segments of users, use the filter card at the top of the *Analysis* tab.
{{/partial:admonition}}

## Significance
Expand Down
22 changes: 22 additions & 0 deletions content/collections/experiment/en/dimensional-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
id: f544b5a1-0689-4137-ab14-690903ae7902
blueprint: experiment
title: 'Dimensional Analysis'
landing: false
exclude_from_sitemap: false
updated_by: 0c3a318b-936a-4cbd-8fdf-771a90c297f0
updated_at: 1737480058
---

Sometimes, you might want to remove QA users or other internal traffic from your analyses because they're not representative of your customer base, and may skew results.

Amplitude's Dimensional Analysis capabilities enable you to exclude groups of users that you define from analysis on a per-experiment basis.

## Define your testers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section header is about "testers" but there is information that is not specific to "testers"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably retire this article and move the information about testers to somewhere more logical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a new article right?


In your Feature Experiment, use Targeting settings to define your test users.

Oftentimes you may want to remove QA users or your internal traffic from analysis because those users are not representative of your customer base and may skew results. You can do this by clicking on the "All Users" dropdown and selecting "All Users without testing users". Doing so will remove the users in the "Testing" section on the "Settings" tab from the analysis you are seeing. If you have selected multiple targeting segments, you may want to analyze each of the segments individually because you may see a lift in iOS for example but not on android. You can do this with a single click by clicking on the segment name in the "All Users" dropdown. The users in

Check warning on line 19 in content/collections/experiment/en/dimensional-analysis.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Contractions] Use 'aren't' instead of 'are not'. Raw Output: {"message": "[Amplitude.Contractions] Use 'aren't' instead of 'are not'.", "location": {"path": "content/collections/experiment/en/dimensional-analysis.md", "range": {"start": {"line": 19, "column": 103}}}, "severity": "WARNING"}

Check warning on line 19 in content/collections/experiment/en/dimensional-analysis.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Will] Future tense! Avoid using 'will remove'. Raw Output: {"message": "[Amplitude.Will] Future tense! Avoid using 'will remove'.", "location": {"path": "content/collections/experiment/en/dimensional-analysis.md", "range": {"start": {"line": 19, "column": 284}}}, "severity": "WARNING"}
"Testing" section on the "Settings" tab will also be filtered out from the analysis and diagnostics charts.

Check warning on line 20 in content/collections/experiment/en/dimensional-analysis.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Will] Future tense! Avoid using 'will also'. Raw Output: {"message": "[Amplitude.Will] Future tense! Avoid using 'will also'.", "location": {"path": "content/collections/experiment/en/dimensional-analysis.md", "range": {"start": {"line": 20, "column": 41}}}, "severity": "WARNING"}

Check warning on line 20 in content/collections/experiment/en/dimensional-analysis.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Passive] 'be filtered' looks like passive voice. Raw Output: {"message": "[Amplitude.Passive] 'be filtered' looks like passive voice.", "location": {"path": "content/collections/experiment/en/dimensional-analysis.md", "range": {"start": {"line": 20, "column": 51}}}, "severity": "WARNING"}
It can be helpful to investigate the impact of experiments on specific user segments. Experiments that are not statistically significant overall can often contain a small group of users for which the result is statistically significant. Likewise, for statistically significant results, the overall performance can be driven by a small segment of users.

Check warning on line 21 in content/collections/experiment/en/dimensional-analysis.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Contractions] Use 'aren't' instead of 'are not'. Raw Output: {"message": "[Amplitude.Contractions] Use 'aren't' instead of 'are not'.", "location": {"path": "content/collections/experiment/en/dimensional-analysis.md", "range": {"start": {"line": 21, "column": 104}}}, "severity": "WARNING"}

Check warning on line 21 in content/collections/experiment/en/dimensional-analysis.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.TooWordy] 'overall' is too wordy. Use a simpler word. Raw Output: {"message": "[Amplitude.TooWordy] 'overall' is too wordy. Use a simpler word.", "location": {"path": "content/collections/experiment/en/dimensional-analysis.md", "range": {"start": {"line": 21, "column": 138}}}, "severity": "WARNING"}

Check warning on line 21 in content/collections/experiment/en/dimensional-analysis.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.TooWordy] 'overall' is too wordy. Use a simpler word. Raw Output: {"message": "[Amplitude.TooWordy] 'overall' is too wordy. Use a simpler word.", "location": {"path": "content/collections/experiment/en/dimensional-analysis.md", "range": {"start": {"line": 21, "column": 291}}}, "severity": "WARNING"}

Check warning on line 21 in content/collections/experiment/en/dimensional-analysis.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Passive] 'be driven' looks like passive voice. Raw Output: {"message": "[Amplitude.Passive] 'be driven' looks like passive voice.", "location": {"path": "content/collections/experiment/en/dimensional-analysis.md", "range": {"start": {"line": 21, "column": 315}}}, "severity": "WARNING"}
You can further investigate the impact of the experiment on specific user segments by clicking on the "All Users" button to look at particular Amplitude out of the box segments, saved segments, or cohorts. If you want to add other user property filters, you can click on the "Add Filter" button.
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ When debugging a user timeline, keep these things in mind:
Normal variant jumping may occur due to:

* [Targeting changes](#targeting-changes): Someone has made changes to targeting rules while your experiment is running.
* [Anonymous identity merging](#anonymous-identity-merging): Anonymous users, bucketed by Amplitude ID, may receive different variants until they're eventually resolved via a matching user ID.
* [Anonymous identity merging](#anonymous-identity-merging): Anonymous users, bucketed by Amplitude ID, may receive different variants until they're eventually resolved through a matching user ID.

### Targeting changes

Expand Down Expand Up @@ -113,4 +113,4 @@ Another common case is simple overlooked implementation error. For example, the

## Remove users who variant jumped from experiment analysis

As you analyze results, be careful when you remove data, as you may introduce bias in your results. It's better to understand the cause of variant jumping and fix any implementation bugs, so this doesn't happen again in future experiments. If you feel that removing users who jumped variants is the best course of action, click *All Exposed Users* and enable *Exclude users who variant jumped*.
As you analyze results, be careful when you remove data, as you may introduce bias in your results. It's better to understand the cause of variant jumping and fix any implementation bugs, so this doesn't happen again in future experiments. If you feel that removing users who jumped variants is the best course of action, use the Filter card on the Experiment Analysis tab. If the `All exposed users` segment, is enabled by default, click it and select *Experiment Segments > Exclude users who variant jumped*.
140 changes: 99 additions & 41 deletions content/collections/workflow/en/experiment-learnings.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,59 +13,117 @@

In the *Analysis* card, you’ll be able to tell at a glance whether your experiment has yielded **statistically-significant** results, as well as what those results actually are. Amplitude Experiment takes the information you gave it during the design and rollout phases and plugs them in for you automatically, so there’s no repetition of effort. It breaks the results out by variant, and provides you with a convenient, detailed tabular breakdown.

{{partial:admonition type='note'}}
This article continues directly from the [article in our Help Center on rolling out your experiment](/docs/feature-experiment/workflow/experiment-test). If you haven’t read that and followed the process it describes, do so before continuing here.
Amplitude doesn't generate p-values or confidence intervals for experiments using binary metrics (for example, unique conversions) until each variant has 100 users **and** 25 conversions. Experiments using non-binary metrics need only to reach 100 users per variant.

## Filter card

On the Filter card, set criteria that updates the analysis on the page. Filter your experiment results with the following:

* Date
* Segment
* Property

### Date filter

The date filter defaults to your experiment's start and end date. Adjust the range to scope experiment results to those specific dates.

### Segment filter

The segment filter enables you to select predefined segments, or create one ad-hoc. Predefined segments include:
* Experiment
* All exposed users. Users who were exposed to a variant.
* Testers. Users added as "testers" during experiment configuration.
* Exclude testers. Excludes users added as "testers" during experiment configuration
* Exclude users who variant jumped. Excludes users who were exposed to more than one variant.
* Amplitude
* New user. Users who triggered at least one new user event during the selected date range.
* Mobile web. Users who triggered events on the web from a mobile device.
* Desktop web. Users who triggered events on the web from a desktop device.

{{partial:admonition type="note" heading="Support for segments"}}
The Testers and Exclude Testers segments are available on feature experiments that use [Remote evaluation](/docs/feature-experiment/remote-evaluation).

The Exclude users who variant jumped segment is available on experiment types other than [multi-armed bandit](/docs/feature-experiment/workflow/multi-armed-bandit-experiments).
{{/partial:admonition}}

Amplitude will not generate p-values or confidence intervals for experiments using binary metrics (i.e., unique conversions) until each variant has 100 users **and** 25 conversions. Experiments using non-binary metrics need only to reach 100 users per variant.
These segments update in real-time.

## View results
Click *+Create Segment* to open the Segment builder, where you can define a new segment on the fly. Segments you create in one experiment are available across all other experiments, and appear in the *All Saved Segments* category.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a doc that explains saved segments, we can link to that doc


To generate and view experimental results, follow these steps:
### Property filter

1. In your experiment, the *Activity* page includes two sections to view your results. The *Summary* section and the *Analysis* card. The *Summary* section will describe your experiment's hypothesis and note whether it has or has not reached statistical significance.
Filter your experiment results based on user properties. For example, create a filter that excludes users from a specific country or geographic region, or users that have a specific account type on your platform.

An experiment is said to be **statistically significant** when we can confidently say that the results are highly unlikely to have occurred due to random chance. (More technically, it’s when we reject the null hypothesis.) That might sound pretty subjective, but it’s grounded solidly in statistics. Stat sig relies on a variant’s **p-value**, which is the probability of observing the data we see, assuming there is no difference between the variant and the control. If this probability drops below a certain threshold (statisticians refer to this threshold as the **alpha**), then we consider our experiment to have achieved statistical significance.
## Data Quality card

Check warning on line 57 in content/collections/workflow/en/experiment-learnings.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Headings] 'Data Quality card' should use sentence-style capitalization. Raw Output: {"message": "[Amplitude.Headings] 'Data Quality card' should use sentence-style capitalization.", "location": {"path": "content/collections/workflow/en/experiment-learnings.md", "range": {"start": {"line": 57, "column": 4}}}, "severity": "WARNING"}

The *Summary* section will display a badge labeled *Significant* if stat sig was met, and a badge labeled *Not Significant* if stat sig was not met.
{{partial:admonition type="note" heading="Availability"}}
Data Quality is available to organizations with access to Experiment who have recommendations enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a doc to link out to for how to enable the recommendations? Recommendations are enabled by default but people can disable them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this experiment-specific recommendations? Or the recommendations in Audiences?

Copy link
Contributor

@akhil-prakash akhil-prakash Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-01-30 at 5 05 20 PM

experiment specific recommendations

{{/partial:admonition}}

The *Summary* section may include multiple badges simultaneously:
Data Quality checks the setup, instrumentation, and statistical integrity of your experiment as it runs, and alerts you to issues it finds.

* *Inconclusive*: the test was inconclusive for the primary metric.
* *Above Goal* or *Below Goal:* the primary metric's mean was either **above** or **below** its goal depending on the direction of the test (increase = above, decrease = below).
* *Above Control* or *Below Control:* the primary metric's mean was either **above** or **below** the control's mean, depending on the direction of the test (increase = above, decrease = below). These badges are only relevant to stat sig results.
When you expand a category, or click *Guide*, the Data Quality Guide opens in a side panel where you can address or dismiss issues

![summary.png](/docs/output/img/workflow/summary-png.png)
## Summary card

2. At the top of the *Analysis* section is an overview of how your experiment performed, broken down by metric and variant. Below that is the experiment's **exposure definition:** how many variants were shown, what the primary metric was, and what the **exposure event** was. This is the event users will have to fire before being included in an experiment.

{{partial:admonition type='note'}}
The exposure event is **not the same thing** as the assignment event. If, for example, you’re running an experiment on your pricing page, a user might be evaluated on the home page for the experiment—but if they don’t visit the pricing page, they'll never actually be exposed to it. For that reason, this user should not be considered to be part of the experiment.
{{/partial:admonition}}

To learn more about exposure events, see [this article in the Amplitude Developer Center](/docs/feature-experiment/under-the-hood/event-tracking).

Click _Chart Controls_ to see the chart definition.

You can also create a chart in Amplitude Analytics from this experiment by clicking *Open in Chart*.

{{partial:admonition type='note'}}If you are running an A/B/n test, Amplitude Experiment displays the confidence interval / p-value for the control against each treatment individually. To instead see the comparison between two non-control treatments, either change the control variant, or open the test in Analytics and create a chart using the two treatments you're interested in.
{{/partial:admonition}}

3. If desired, adjust the experiment’s **confidence level**. The default is 95%. You can also [choose between a sequential test and a T-test](/docs/feature-experiment/workflow/finalize-statistical-preferences). 
{{partial:admonition type="note" heading="Availability"}}
Summary is available to organizations with access to Experiment who have recommendations enabled.
{{/partial:admonition}}

The Summary card describes your experiment's hypothesis and lets you know if it's reached statistical significance.

{{partial:admonition type="note" heading="Statisical significance and Amplitude"}}
Amplitude considers an experiment to be **statistically significant** when Amplitude can confidently say that the results are unlikely to have occurred due to random chance. More technically, it’s when Amplitude rejects the null hypothesis. That may sound subjective, but it’s grounded solidly in statistics. Statistical significance relies on a variant’s **p-value**, which is a value that represents the likelihood that your results occurred by chance. A lower p-value means your results are probably not random, and there's evidence to support your hypothesis. If this value drops below a threshold, Amplitude considers the experiment to be statistically significant.
{{/partial:admonition}}

The Summary card displays a badge labeled *Significant* if the experiment reached statistical significance, and a badge labeled *Not Significant* if it didn't. This card can display several badges at once:

* *Inconclusive*: the test was inconclusive for the primary metric.
* *Above Goal* or *Below Goal:* the primary metric's mean was either **above** or **below** its goal depending on the direction of the test (increase = above, decrease = below).
* *Above Control* or *Below Control:* the primary metric's mean was either **above** or **below** the control's mean, depending on the direction of the test (increase = above, decrease = below). These badges are only relevant to stat sig results.

![summary.png](/docs/output/img/workflow/summary-png.png)


## Analysis card

At the top of the Analysis card is an overview that explains how your experiment performed, broken down by metric and variant. Below that, a collection of experiment results charts, which you can analyze by metric, display information about:

* Confidence intervals
* Cumulative exposure*
* Event totals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chart is not always for event totals. For example the metric can be prop sum or uniques or anything

* Mean value over time

For more information, see [Dig deeper into experimentation data with Experiment Results](/docs/analytics/charts/experiment-results/experiment-results-dig-deeper#interpret-your-results.)

{{partial:admonition type="tip" heading="Chart filtering"}}
The Experiment Results chart on the Activity tab responds to the selections you make in the [Filter card](#filter-card).
{{/partial:admonition}}

Click *Open in Chart* to open a copy of the Experiment Results in a new chart.

{{partial:admonition type='note'}}
If you are running an A/B/n test, Amplitude Experiment displays the confidence interval / p-value for the control against each treatment individually. To instead see the comparison between two non-control treatments, either change the control variant, or open the test in Analytics and create a chart using the two treatments you're interested in.
{{/partial:admonition}}

If desired, adjust the experiment’s **confidence level**. The default is 95%. You can also [choose between a sequential test and a T-test](/docs/feature-experiment/workflow/finalize-statistical-preferences). 

{{partial:admonition type='note'}}
Lowering your experiment’s confidence level will make it more likely that your experiment achieves statistical significance, but the trade-off is that doing so increases the likelihood of a false positive.
{{/partial:admonition}}
{{partial:admonition type='note'}}
Lowering your experiment’s confidence level makes it more likely that your experiment achieves statistical significance, but the trade-off is that doing so increases the likelihood of a false positive.
{{/partial:admonition}}

4. Set the **time frame** for your experiment analysis, either from the selection of pre-set durations, or by opening the date picker and choosing a custom date range.
## Diagnostics card

The tables, graphs, and charts shown in the Analysis section are explained in depth in the articles on [understanding the Experiment Analysis view](/docs/feature-experiment/analysis-view) and [interpreting the cumulative exposures graph in Amplitude Experiment](/docs/feature-experiment/advanced-techniques/cumulative-exposure-change-slope).
The Diagnostics card provides information about how your experiment is delivering. It shows charts about:

{{partial:admonition type='note'}}
Amplitude Experiment needs something to compare your control to in order to generate results. If you neglect to include **both** the control and **at least one** variant, your chart will not display anything.
{{/partial:admonition}}
* Assignment events (cumulative and non-cumulative)
* Exposure events (cumulative and non-cumulative)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Assignment to exposure conversion
* [Variant jumping](/docs/feature-experiment/troubleshooting/variant-jumping)
* Anonymous exposures (cumulative and non-cumulative)
* [Exposures without Assignments](/docs/feature-experiment/troubleshooting/exposures-without-assignments) (cumulative and non-cumulative)

For more control, open any of these charts in the chart build.

## Interpret notifications

Expand All @@ -75,10 +133,10 @@

Click the check box next to the desired notification:

* **Experiment end reached:** You will receive this notification when your experiment is complete.
* **SRM detected:** You will receive this notification if a [sample ratio mismatch](/docs/feature-experiment/troubleshooting/sample-ratio-mismatch) issue is identified.
* **Long-running experiments:** You will receive this notification when your long-running experiment is complete.
* **Statsig for the recommendation metric is reached:** You will receive this notification when your experiment's recommendation metric has reached stat sig.
* **Experiment end reached:** Amplitude sends this notification when your experiment is complete.
* **SRM detected:** Amplitude sends this notification if it identifies a [sample ratio mismatch](/docs/feature-experiment/troubleshooting/sample-ratio-mismatch) issue.
* **Long-running experiments:** Amplitude sends this notification when your long-running experiment is complete.
* **Statsig for the recommendation metric is reached:** Amplitude sends this notification when your experiment's recommendation metric has reached stat sig.

Amplitude Experiment sends a notification to the editors of the experiment.

Expand Down
Loading
Loading