Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(docs): Section on noir-profiler execution-opcodes #7480

Merged
merged 5 commits into from
Feb 21, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 41 additions & 15 deletions docs/docs/tooling/profiler.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
---
title: Noir Profiler
description: Learn about the Noir Profiler, how to generate execution flamegraphs, identify bottlenecks, and visualize optimizations.
keywords: [profiling, profiler, flamegraph]
description: Learn about the Noir Profiler, how to generate execution flamegraphs, identify bottlenecks, and visualize optimizations.
keywords: [profiling, profiler, flamegraph]
sidebar_position: 0
---

## Noir Profiler

`noir-profiler` is a sampling profiler designed to analyze and visualize Noir programs. It assists developers to identify bottlenecks by mapping execution data back to the original source code.
`noir-profiler` is a sampling profiler designed to analyze and visualize Noir programs. It assists developers to identify bottlenecks by mapping execution data back to the original source code.

### Installation

`noir-profiler` comes out of the box with [noirup](../getting_started/noir_installation.md). Test that you have the profiler installed by running `noir-profiler --version`.
`noir-profiler` comes out of the box with [noirup](../getting_started/noir_installation.md). Test that you have the profiler installed by running `noir-profiler --version`.

### Usage

Expand All @@ -33,11 +33,11 @@
array = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
```

Running `nargo info` we can get some information about the opcodes produced by this program, but it doesn't give us a lot of info on its own. Compile and execute this program normally using `nargo compile` and `nargo execute`.
Running `nargo info` we can get some information about the opcodes produced by this program, but it doesn't give us a lot of info on its own. Compile and execute this program normally using `nargo compile` and `nargo execute`.

### Generating an ACIR opcode flamegraph

The program on its own is quite high-level. Let's get a more granular look at what is happening by using `noir-profiler`.
The program on its own is quite high-level. Let's get a more granular look at what is happening by using `noir-profiler`.

After compiling the program, run the following:
```sh
Expand Down Expand Up @@ -87,18 +87,18 @@
<img src="../../static/img/tooling/profiler/acir-flamegraph-optimized.png">
</picture>

In the above image we searched for the ACIR opcodes due to `i > ptr` in the source code. Trigger a search by clicking on "Search" in the top right corner of the flamegraph. In the bottom right corner of the image above, you will note that the flamegraph displays the percentage of all opcodes associated with that search. Searching for `memory::op` in the optimized flamegraph will result in no matches. This is due to no longer using a dynamic array in our circuit. By dynamic array, we are referring to using a dynamic index (values reliant upon witness inputs) when working with arrays. Most of the memory operations, have now been replaced with arithmetic operations as we are reading two arrays from known constant indices.
In the above image we searched for the ACIR opcodes due to `i > ptr` in the source code. Trigger a search by clicking on "Search" in the top right corner of the flamegraph. In the bottom right corner of the image above, you will note that the flamegraph displays the percentage of all opcodes associated with that search. Searching for `memory::op` in the optimized flamegraph will result in no matches. This is due to no longer using a dynamic array in our circuit. By dynamic array, we are referring to using a dynamic index (values reliant upon witness inputs) when working with arrays. Most of the memory operations, have now been replaced with arithmetic operations as we are reading two arrays from known constant indices.

### Generate a backend gates flamegraph

Unfortunately, ACIR opcodes do not give us a full picture of where the cost of this program lies.
The `gates` command also accepts a backend binary. In the [quick start guide](../getting_started/quick_start.md#proving-backend) you can see how to get started with the [Barretenberg proving backend](https://github.com/AztecProtocol/aztec-packages/tree/master/barretenberg).
Unfortunately, ACIR opcodes do not give us a full picture of where the cost of this program lies.
The `gates` command also accepts a backend binary. In the [quick start guide](../getting_started/quick_start.md#proving-backend) you can see how to get started with the [Barretenberg proving backend](https://github.com/AztecProtocol/aztec-packages/tree/master/barretenberg).

Run the following command:
```sh
noir-profiler gates --artifact-path ./target/program.json --backend-path bb --output ./target
```
`--backend-path` accepts a path to the backend binary. In the above command we assume that you have the backend binary path saved in your PATH. If you do not, you will have to pass the binary's absolute path.
`--backend-path` accepts a path to the backend binary. In the above command we assume that you have the backend binary path saved in your PATH. If you do not, you will have to pass the binary's absolute path.

This produces the following flamegraph with 3,737 total backend gates (using `bb` version 0.76.4):
<picture>
Expand All @@ -107,21 +107,47 @@

Searching for ACIR `memory::op` opcodes, they look to cause about 18.2% of the backend gates.

You will notice that the majority of the backend gates come from the ACIR range opcodes. This is due to the way UltraHonk handles range constraints, which is the backend used in this example. UltraHonk uses lookup tables internally for its range gates. These can take up the majority of the gates for a small circuit, but whose impact becomes more meaningful in larger circuits. If our array was much larger, range gates would become a much smaller percentage of our total circuit.
You will notice that the majority of the backend gates come from the ACIR range opcodes. This is due to the way UltraHonk handles range constraints, which is the backend used in this example. UltraHonk uses lookup tables internally for its range gates. These can take up the majority of the gates for a small circuit, but whose impact becomes more meaningful in larger circuits. If our array was much larger, range gates would become a much smaller percentage of our total circuit.
Here is an example backend gates flamegraph for the same program in this guide but with an array of size 2048:
<picture>
<img src="../../static/img/tooling/profiler/gates-flamegraph-unoptimized-2048.png">
</picture>
Every backend implements ACIR opcodes differently, so it is important to profile both the ACIR and the backend gates to get a full picture.
</picture>
Every backend implements ACIR opcodes differently, so it is important to profile both the ACIR and the backend gates to get a full picture.

Now let's generate a graph for our optimized circuit with an array of size 32. We get the following flamegraph that produces 3,062 total backend gates:
<picture>
<img src="../../static/img/tooling/profiler/gates-flamegraph-optimized.png">
</picture>

In the optimized flamegraph, we searched for the backend gates due to `i > ptr` in the source code. The backend gates associated with this call stack were only 3.8% of the total backend gates. If we look back to the ACIR flamegraph, that same code was the cause of 43.3% ACIR opcodes. This discrepancy reiterates the earlier point about profiling both the ACIR opcodes and backend gates.
In the optimized flamegraph, we searched for the backend gates due to `i > ptr` in the source code. The backend gates associated with this call stack were only 3.8% of the total backend gates. If we look back to the ACIR flamegraph, that same code was the cause of 43.3% ACIR opcodes. This discrepancy reiterates the earlier point about profiling both the ACIR opcodes and backend gates.

For posterity, here is the flamegraph for the same program with a size 2048 array:
<picture>
<img src="../../static/img/tooling/profiler/gates-flamegraph-optimized-2048.png">
</picture>
</picture>

### Generate an unconstrained execution trace flamegraph

The profiler also enables developers to generate a flamegraph of the unconstrained execution trace. For unconstrained functions Noir compiles down to Brillig bytecode, thus we will be seeing a flamegraph of Brillig opcodes, rather than ACIR opcodes.

Let's take our initial program and simply add an `unconstrained` modifier before main (e.g. `unconstrained fn main`). Then run the following command:
```sh
noir-profiler execution-opcodes --artifact-name ./target/program.json --prover_toml_path Prover.toml --output ./target
```
This matches the `opcodes` command, except that now we need to accept a `Prover.toml` file to profile execution with a specific set of inputs.

We will get the following flamegraph with 1,582 opcodes executed:
<picture>
<img src="../../static/img/tooling/profiler/brillig-trace-initial-32.png">
</picture>

Circuit programming (ACIR) is an entirely different execution paradigm compared to regular programming. To demonstrate this point further, let's generate an execution trace for our optimized ACIR program once we have modified `main` to be `unconstrained`.

We then get the follwoing flamegraph with 2,125 opcodes executed:

Check warning on line 146 in docs/docs/tooling/profiler.md

View workflow job for this annotation

GitHub Actions / Documentation

Unknown word (follwoing)

Check warning on line 146 in docs/docs/tooling/profiler.md

View workflow job for this annotation

GitHub Actions / Code

Unknown word (follwoing)
<picture>
<img src="../../static/img/tooling/profiler/brillig-trace-opt-32.png">
</picture>

In the above graph we are searching for `new_array`, which shows up zero matches in the initial program. In the unconstrained environment, the updated program essentially just adds extra unnecessary checks. Thus, we see a longer execution trace.

`execution-opcodes` is useful for when you are searching for bottlenecks in unconstrained code. This can be especially meaningful for optimizing witness generation. Even though unconstrained execution helps us skip proving steps, we still need to compute the relevant inputs/outputs outside of the circuit before proving.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading