Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulating write-before-read memory #11

Closed
askvortsov1 opened this issue Jul 23, 2021 · 6 comments
Closed

Simulating write-before-read memory #11

askvortsov1 opened this issue Jul 23, 2021 · 6 comments
Labels
forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system.

Comments

@askvortsov1
Copy link

askvortsov1 commented Jul 23, 2021

We're running into a bit of trouble simulating our register file. We'd like to implement it as "write before read", so that we can write to a register in writeback and read from the same register in decode without needing to explicitly forward writeback => decode. By default, it looks like synchronous reads occur before writes. ram_wbr doesn't work because it only supports one read port. We're extended this to 2 read ports, and that works. We've confirmed so by generating RTL and simulating in Vivado.

However, in our Hardcaml simulator tests, the written data is only read on the next cycle. My guess is that this occurs because Cyclesim only reads at the start of every clock cycle, and might be mitigated by reading at every clock edge. We haven't found configuration for evaluating the signal graph twice a cycle though. Is there support for simulating write-before-read memory in Hardcaml?

@andrewray
Copy link

There are 2 types of memory in Hardcaml - asynchronously read, and synchronously read.

The former include the memory and (newer) multiport_memory primitives.

The synchronous ones are ram_wbr and functions in the Ram module. The notion of read-before-write and write-before-read only exists in the synchronous case.

By synchronous we mean it takes 1 clock cycle to read the data.

I dont think this issue is caused by the simulator, and I suspect the function you want exists as

Ram.create
      ~collision_mode:Write_before_read ~size:32 
       ~write_ports:[| one_write_port |] ~read_ports:[| two_read_ports |]

That said you say:

However, in our Hardcaml simulator tests, the written data is only read on the next cycle.

That has to be the case for a synchronous read ram (there is a register in the read path - without that you are not describing a RAM that can be implemented on the FPGA). If what you want is an asynchronous read ram, but with write before read you will need to construct that by detecting this case outside the memory ie

let q = async_memory_instantiation .... data_in in
let q = mux2 (memory_is_writing &: write_address ==:. read_address) data_in q in

@github-iron github-iron added the forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system. label Jul 23, 2021
@askvortsov1
Copy link
Author

Looking back over our implementation again, I forgot to mention that we made a tweak to ram_wbr:

let delay_address_to_falling clock address = 
  let spec = Reg_spec.create ~clock:clock () in
  let spec_on_falling = Reg_spec.override ~clock_edge:Edge.Falling spec in
  reg ~enable:vdd spec_on_falling address

let regfile rs rt clock write_enable write_address write_data =
  let write_port =
    { write_clock = clock; write_address; write_enable; write_data }
  in
  let number_of_regs = 32 in
  let delay_address = delay_address_to_falling clock in
  let regs =
    multiport_memory number_of_regs ~write_ports:[|write_port|] ~read_addresses:[| delay_address rs; delay_address rt |]
  in
  (Array.get regs 0, Array.get regs 1)

In particular, instead of using reg_empty, we configured our read delay register spec to run on falling clock cycles. As a result, the write should occur during the first half of a clock cycle, and the read during the second. The RTL generator represented this correctly:

...
always @(negedge _12) begin
        _31 <= _27;
    end
    always @(posedge _12) begin
        if (_6)
            _36[_10] <= _8;
    end
    assign _12 = clock;
    assign _32 = _23[4:0];
    always @(negedge _12) begin
        _35 <= _32;
    end
...

And Vivado simulation confirmed that it can read/write as expected in the same clock cycle.

@andrewray
Copy link

Right. This is a limitation of the Hardcaml simulator. We only support the usual synchronous logic design pattern which implies only using one edge of the clock. It's actually probably not that hard for us to support both edges except

  • Simulation performance would effectively halve, without some real effort
  • I generally don't like playing games with clocks except when absolutely necessary - simple single clock, single edge semantics make the implementation process very much easier

I still expect we can build something reasonable for a register file though.

open! Import
open Signal
open Hardcaml_waveterm

module I = struct
  type 'a t =
    { clock : 'a
    ; write_enable : 'a
    ; write_data : 'a [@bits 8]
    ; write_address : 'a [@bits 5]
    ; read_address_0 : 'a [@bits 5]
    ; read_address_1 : 'a [@bits 5]
    }
  [@@deriving sexp_of, hardcaml]
end

module O = struct
  type 'a t =
    { read_data_0 : 'a [@bits 8]
    ; read_data_1 : 'a [@bits 8]
    }
  [@@deriving sexp_of, hardcaml]
end

let reg_file
      { I.clock; write_enable; write_data; write_address; read_address_0; read_address_1 }
  =
  let q =
    multiport_memory
      32
      ~write_ports:[| { write_clock = clock; write_address; write_data; write_enable } |]
      ~read_addresses:[| read_address_0; read_address_1 |]
  in
  { O.read_data_0 =
      mux2 (write_enable &: (write_address ==: read_address_0)) write_data q.(0)
  ; read_data_1 =
      mux2 (write_enable &: (write_address ==: read_address_1)) write_data q.(1)
  }
;;

module Sim = Cyclesim.With_interface (I) (O)

let%expect_test "" =
  let sim = Sim.create reg_file in
  let waves, sim = Waveform.create sim in
  let inputs = Cyclesim.inputs sim in
  let step write_address write_data read_address_0 read_address_1 =
    inputs.write_address := Bits.of_int ~width:5 write_address;
    inputs.write_data := Bits.of_int ~width:8 write_data;
    inputs.write_enable := Bits.vdd;
    inputs.read_address_0 := Bits.of_int ~width:5 read_address_0;
    inputs.read_address_1 := Bits.of_int ~width:5 read_address_1;
    Cyclesim.cycle sim;
    inputs.write_enable := Bits.vdd
  in
  step 10 100 0 0;
  step 11 101 10 11;
  step 12 102 12 11;
  Cyclesim.cycle sim;
  Waveform.print waves ~display_height:25;
  [%expect
    {|
    ┌Signals────────┐┌Waves──────────────────────────────────────────────┐
    │clock          ││┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌──│
    │               ││    └───┘   └───┘   └───┘   └───┘   └───┘   └───┘  │
    │               ││────────┬───────┬───────────────                   │
    │read_address_0 ││ 00     │0A     │0C                                │
    │               ││────────┴───────┴───────────────                   │
    │               ││────────┬───────────────────────                   │
    │read_address_1 ││ 00     │0B                                        │
    │               ││────────┴───────────────────────                   │
    │               ││────────┬───────┬───────────────                   │
    │write_address  ││ 0A     │0B     │0C                                │
    │               ││────────┴───────┴───────────────                   │
    │               ││────────┬───────┬───────────────                   │
    │write_data     ││ 64     │65     │66                                │
    │               ││────────┴───────┴───────────────                   │
    │write_enable   ││────────────────────────────────                   │
    │               ││                                                   │
    │               ││────────┬───────┬───────────────                   │
    │read_data_0    ││ 00     │64     │66                                │
    │               ││────────┴───────┴───────────────                   │
    │               ││────────┬───────────────────────                   │
    │read_data_1    ││ 00     │65                                        │
    │               ││────────┴───────────────────────                   │
    │               ││                                                   │
    └───────────────┘└───────────────────────────────────────────────────┘ |}]
;;

@askvortsov1
Copy link
Author

Right. This is a limitation of the Hardcaml simulator. We only support the usual synchronous logic design pattern which implies only using one edge of the clock. It's actually probably not that hard for us to support both edges except

  • Simulation performance would effectively halve, without some real effort
  • I generally don't like playing games with clocks except when absolutely necessary - simple single clock, single edge semantics make the implementation process very much easier

Makes sense, thanks for the clarification! Would it make sense to add a comment discouraging falling edge use to edge.mli?

{ O.read_data_0 =
mux2 (write_enable &: (write_address ==: read_address_0)) write_data q.(0)
; read_data_1 =
mux2 (write_enable &: (write_address ==: read_address_1)) write_data q.(1)
}

We initially planned to explicitly forward writeback => decode as a workaround. This is much neater, thanks!

@andrewray
Copy link

I will update the simulator documentation - this is not said explicitly enough there.

@andrewray
Copy link

Closing this out. There is some explicit new documentation for the simulator features and restrictions which will appear soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system.
Projects
None yet
Development

No branches or pull requests

3 participants