Skip to content

Commit

Permalink
[tutorials] Reflect recent changes in documentation
Browse files Browse the repository at this point in the history
This mainly updates the documentation of the "Introduction" tutorial,
reflecting recent changes in file layout and file paths and fixing a
couple typos along the way. Frontend scripts for the tutorial are also
updated to reflect filepath changes. A .gitignore is also added to the
tutorial's subfolder so that git ignores output directories generated
during the tutorial.
  • Loading branch information
lucas-rami committed Feb 18, 2024
1 parent 35693c1 commit 02b5c17
Show file tree
Hide file tree
Showing 8 changed files with 52 additions and 40 deletions.
32 changes: 16 additions & 16 deletions docs/Tutorials/Introduction/ModifyingDynamatic.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
This tutorial logically follows the [Using Dynamatic](UsingDynamatic.md) tutorial, and as such requires that you are already familiar with the concepts touched on in the latter. In this tutorial, we will write a small compiler optimization pass in C++ that will transform dataflow muxes into merges in an attempt to optimize our circuits' area and throughput. While we will write a little bit of C++ in this tutorial, it does not require much knowledge in the language.

Below are some technical details about this tutorial.
- All resources are located in the repository's [`tutorials/Introduction/`](../../../tutorials/Introduction/) folder. Data exclusive to this chapter is located in the [`ModifyingDynamatic`](../../../tutorials/Introduction/ModifyingDynamatic/) subfolder, but we will also reuse data from the previous [`UsingDynamatic`](../../../tutorials/Introduction/UsingDynamatic/) part.
- All resources are located in the repository's [`tutorials/Introduction/`](../../../tutorials/Introduction/) folder. Data exclusive to this chapter is located in the [`Ch2`](../../../tutorials/Introduction/Ch2/) subfolder, but we will also reuse data from the [previous chapter](../../../tutorials/Introduction/Ch1/).
- All relative paths mentionned throughout the tutorial are assumed to start at Dynamatic's top-level folder.
- We assume that you have already built Dynamatic from source using the instructions in the top-level [README](../../../README.md) or that you have access to a Docker container that has a pre-built version of Dynamatic .

Expand All @@ -16,7 +16,7 @@ This tutorial is divided into the following sections.

## Spotting an optimization opportunity

Let's start by re-considering the [same `loop_accumulate` kernel]([`tutorials/Introduction/UsingDynamatic/loop_accumulate.c`](../../../tutorials/Introduction/UsingDynamatic/loop_accumulate.c)) from the previous tutorial. See its definition below.
Let's start by re-considering the [same `loop_accumulate` kernel]([`tutorials/Introduction/Ch1/loop_accumulate.c`](../../../tutorials/Introduction/Ch1/loop_accumulate.c)) from the previous tutorial. See its definition below.

```c
// The kernel under consideration
Expand All @@ -32,10 +32,10 @@ unsigned loop_accumulate(in_int_t a[N]) {
This simple kernel multiplies a number by itself at each iteration of a simple loop from 0 to any number `N` where the corresponding element of an array equals 0. The function returns the accumulated value after the loop exits.
If you have deleted the data generated by the synthesis flow on this kernel, you can regenerate it fully using the [`loop_accumulate.sh`](../../../tutorials/Introduction/ModifyingDynamatic/loop_accumulate.sh) that has already been written for you. Just run the following command from Dynamatic's top-level folder.
If you have deleted the data generated by the synthesis flow on this kernel, you can regenerate it fully using the [`loop-accumulate.dyn`](../../../tutorials/Introduction/Ch2/loop-accumulate.dyn) that has already been written for you. Just run the following command from Dynamatic's top-level folder.
```sh
./bin/dynamatic --run tutorials/Introduction/ModifyingDynamatic/loop_accumulate.sh
./bin/dynamatic --run tutorials/Introduction/Ch2/loop-accumulate.dyn
```

This will compile the C kernel, functionally verify the generated VHDL, and generate data for the dataflow visualizer. Note the `[IMFO] Simulation succeeded` message in the output (after the `simulate` command), indicating that outputs of the VHDL design matched those of the original C kernel. All output files are generated in [`tutorials/Introduction/usingDynamatic/out`](`../../../tutorials/Introduction/usingDynamatic/out`).
Expand Down Expand Up @@ -305,14 +305,14 @@ void loop_store(inout_int_t a[N]) {
}
```
You can find the source code of this function in [`tutorials/Introduction/ModifyingDynamatic/loop_store.c`](../../../tutorials/Introduction/ModifyingDynamatic/loop_store.c).
You can find the source code of this function in [`tutorials/Introduction/Ch2/loop_store.c`](../../../tutorials/Introduction/Ch2/loop_store.c).
This has the same rough structure as our previous example, except that now the kernel stores the squared iteration index in the array at each iteration where the corresponding array element is 0; otherwise it stores the index itself.
Now run the [`tutorials/Introduction/ModifyingDynamatic/loop_store.sh`](../../../tutorials/Introduction/ModifyingDynamatic/loop_store.sh) frontend script. It is almost identical to the previous frontend script we used; its only difference is that it synthesizes `loop_store.c` instead of `loop_accumulate.c`.
Now run the [`tutorials/Introduction/Ch2/loop-store.dyn`](../../../tutorials/Introduction/Ch2/loop-store.dyn) frontend script. It is almost identical to the previous frontend script we used; its only difference is that it synthesizes `loop_store.c` instead of `loop_accumulate.c`.
```sh
./bin/dynamatic --run tutorials/Introduction/ModifyingDynamatic/loop_store.sh
./bin/dynamatic --run tutorials/Introduction/Ch2/loop-store.dyn
```

Observe the frontend's output when running `simulate`. You should see the following.
Expand All @@ -329,9 +329,9 @@ dynamatic> simulate
That's bad! It means that the content of the kernel's input array `a` was different after exceution of the C code and after simulation of the generated VHDL design for it. Our optimization broke something in the dataflow circuit, yielding an incorrect result.

> [!TIP]
> If you would like, you can make sure that it is indeed our new pass that broke the circuit by removing the `--handshake-mux-to-merge` flag from the [`compile.sh` script](../../../tools/dynamatic/scripts/compile.sh) and re-running the [`loop_store.sh` script](../../../tutorials/Introduction/ModifyingDynamatic/loop_store.sh). You will see that the frontend prints `[INFO] Sumulation succeeded` instead of the failure message we just saw.
> If you would like, you can make sure that it is indeed our new pass that broke the circuit by removing the `--handshake-mux-to-merge` flag from the [`compile.sh` script](../../../tools/dynamatic/scripts/compile.sh) and re-running the [`loop-store.dyn` frontend script](../../../tutorials/Introduction/Ch2/loop-store.dyn). You will see that the frontend prints `[INFO] Sumulation succeeded` instead of the failure message we just saw.
Let's go check the `simulate` command's output folder to see the content of the array `a` before and after the kernel. First, open the file `tutorials/Introduction/ModifyingDynamatic/out/sim/INPUT_VECTORS/input_a.dat`. This contains the initial content of array `a` before the kernel executes. Each line between the `[[transation]]` tags represent one element of the array, in order. As you can see, elements at even indices have value 0 whereas elements at odd indices have value 1.
Let's go check the `simulate` command's output folder to see the content of the array `a` before and after the kernel. First, open the file `tutorials/Introduction/Ch2/out/sim/INPUT_VECTORS/input_a.dat`. This contains the initial content of array `a` before the kernel executes. Each line between the `[[transation]]` tags represent one element of the array, in order. As you can see, elements at even indices have value 0 whereas elements at odd indices have value 1.

```
[[[runtime]]]
Expand All @@ -350,7 +350,7 @@ Let's go check the `simulate` command's output folder to see the content of the
[[[/runtime]]]
```

Looking back at our C kernel, we then should expect that every element at an even index becomes the square of its index, whereas elements at at odd index become their index. This is indeed what we see in `tutorials/Introduction/ModifyingDynamatic/out/sim/C_OUT/output_a.dat`, which stores the array's content after kernel execution.
Looking back at our C kernel, we then should expect that every element at an even index becomes the square of its index, whereas elements at at odd index become their index. This is indeed what we see in `tutorials/Introduction/Ch2/out/sim/C_OUT/output_a.dat`, which stores the array's content after kernel execution.

```
[[[runtime]]]
Expand All @@ -370,7 +370,7 @@ Looking back at our C kernel, we then should expect that every element at an eve
```

> [!TIP]
> Let's now see what the array `a` looks like after simulation of our dataflow circuit. Open `tutorials/Introduction/ModifyingDynamatic/out/sim/VHDL_OUT/output_a.dat` and compare it with the C output.
> Let's now see what the array `a` looks like after simulation of our dataflow circuit. Open `tutorials/Introduction/Ch2/out/sim/VHDL_OUT/output_a.dat` and compare it with the C output.
```
[[[runtime]]]
Expand All @@ -389,7 +389,7 @@ Looking back at our C kernel, we then should expect that every element at an eve
[[[/runtime]]]
```

This is significantly different! It looks like elements are shuffled compared to the expected output, as if they were being reordered by the circuit. Let's open the dataflow visualizer on this dataflow circuit and try to find out what happened. The DOT and CSV file for this dataflow circuit are located in `tutorials/Introduction/ModifyingDynamatic/out/visual`.
This is significantly different! It looks like elements are shuffled compared to the expected output, as if they were being reordered by the circuit. Let's open the dataflow visualizer on this dataflow circuit and try to find out what happened. The DOT and CSV file for this dataflow circuit are located in `tutorials/Introduction/Ch2/out/visual`.

> [!TIP]
> As the simulation's output indicates, the array's content is wrong even at the first iteration. We expect 0 to be stored in the array but instead we get a 1. To debug this problem, iterate through the simulation's cycles and locate the first time that the store port (`mc_store0`) transfers a token to the memory controller (`mem_controller0`). Then, from the circuit's structure, infer which input to the `mc_store0` node is the store address, and which is the store data.
Expand All @@ -401,13 +401,13 @@ We are especially interested in the store's data input, since it is the one feed
By replacing the mux previosuly in the place of `merge10`, we caused data tokens to arrive reordered at the store port, hence creating incorrect writes to memory! This is due to the fact that the loop's throughput is much higher when the if branch is not taken, since the multiplier has a latency of 4 cycles while most of our other components have 0 sequential latency.

Let's go verify that we are correct by modifying manually the IR that ultimately gets transformed into the dataflow circuit and re-simulating. Open the `tutorials/Introduction/ModifyingDynamatic/out/comp/handshake_export.mlir` MLIR file. It contains the last version of MLIR-formatted IR that gets transformed into a Graphviz-formatted file and then in a VHDL design. While the syntax may be a bit daunting at first, do not worry as we will only modify two lines to "revert" the transformation of the mux into `merge10`. The tutorial's goal is not to teach you MLIR syntax, so we will not go into details into how the IR is formatted in text. To give you an idea, the syntax of an operation is usually as follows.
Let's go verify that we are correct by modifying manually the IR that ultimately gets transformed into the dataflow circuit and re-simulating. Open the `tutorials/Introduction/Ch2/out/comp/handshake_export.mlir` MLIR file. It contains the last version of MLIR-formatted IR that gets transformed into a Graphviz-formatted file and then in a VHDL design. While the syntax may be a bit daunting at first, do not worry as we will only modify two lines to "revert" the transformation of the mux into `merge10`. The tutorial's goal is not to teach you MLIR syntax, so we will not go into details into how the IR is formatted in text. To give you an idea, the syntax of an operation is usually as follows.

```mlir
<SSA results> = <operation name> <SSA operands> {<operation attributes>} : <return types>
```

Back to ou faulty IR; on line 31, you should see the following.
Back to our faulty IR; on line 31, you should see the following.

```mlir
%23 = merge %22, %16 {bb = 3 : ui32, name = #handshake.name<"merge10">} : i10
Expand All @@ -431,10 +431,10 @@ Replace it with
%32, %muxIndex = control_merge %trueResult_2, %falseResult_3 {bb = 3 : ui32, name = #handshake.name<"my_control_merge">} : none, i1
```

And you are done! For convenience we provide a little shell script that will only run the part of the synthesis flow that comes after this file is generated. It will regenerate the VHDL design from the MLIR file, simulate it, and prepare data for the visualizer. From Dynamatic's top-level folder, run
And you are done! For convenience we provide a little shell script that will only run the part of the synthesis flow that comes after this file is generated. It will regenerate the VHDL design from the MLIR file, simulate it, and prepare data for the visualizer. From Dynamatic's top-level folder, run the provided shell script

```sh
./tutorials/Introduction/ModifyingDynamatic/partial-flow.sh
./tutorials/Introduction/Ch2/partial-flow.sh
```

You should now see that simulation succeeds!
Expand Down
Loading

0 comments on commit 02b5c17

Please sign in to comment.