You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It can also be said to be caused by insufficient consideration, as long as vstart is not 0 but a legal but larger value, the calculation of the address of the mask operand queue will be wrong.
To illustrate, when the system is configured with VLEN=2048, set EW to the minimum value of 8, and LMUL to 8, the maximum vlmax can reach is 8 (LMUL) * 2048 (VLEN) / 8 (EW) = 2048.
According to the SPEC, vstart is specified to be valid as long as it is less than vlmax-1. Therefore, vstart can go up to 2047, assuming 2044 for this example.
In the main sequencer, the prepared vstart value to be allocated to each lane, when NrLanes is set to the default value of 4, would be 511 (2044 divided by 4).
However, in the process of sending this vstart from the main sequencer to operand_requester_i[MaskM] in the lane sequencer, no adjustments are made.
Consequently, when it is sent to requester_d[6] (corresponding to the mask operand queue) within operand_requester, the calculation for the addr is as follows:
It is worth noting that the lowest 3 bits are used for calculating the bank address.
Consequently, the value in [5:3] affects which entry in the bank is accessed.
In the example provided, the original intention was to access the mask register v0, but due to the vstart adjustment not being made, it now accesses v0 + 7 = v7 in bank 7.
This outcome leads to an error in lines 738-740 within the Mask Unit (the function is to copy the mask operand).
This error further impacts the final mask_i (the signal name in the mask unit is mask_o) sent to various lanes in the VFU.
This results in incorrect masking operations for instructions with the enable VM bit, affecting the final write-back to the VRF, causing incorrect byte enable values (be).
My suggested modifications are as follows:
You can right-shift the vstart of operand_requester_i[MaskM] in the lane sequencer from the main sequencer by (EW64 - int'(pe_req.vtype.vsew))
(this modification may be simpler).
Alternatively, you can keep the original operation from the main sequencer to the lane sequencer unchanged.
However, when for loop i equals 6 and you were originally changing the shift value for operand_requester (specifically, when addressing requester_d[6]), instead of altering it to access different registers, you can modify addr[5:3] to address the bank.
The idea behind this is that in the same context (VLEN = 2048, NrLanes = 4), when LMUL is 1 and vstart is 0, the Mask unit needs data from Bank 0 in each lane. This means it requires the lower 256 bits of the entire VLEN = 2048 v0 register.
However, when LMUL is 8, vlmax is 2048, and vstart is 1792 (2048-256), at this point, it needs only the highest 256 bits from the v0 register as the mask value.
Therefore, you can repurpose the original shift register values ([5:3]) to determine the bank shift value for masking operands.
In conclusion, the current issue, as per the RVV SPEC, is in Chapter 4.5, where the mask register is not independent of LMUL.
The text was updated successfully, but these errors were encountered:
It can also be said to be caused by insufficient consideration, as long as
vstart
is not 0 but a legal but larger value, the calculation of the address of the mask operand queue will be wrong.To illustrate, when the system is configured with
VLEN=2048
, setEW
to the minimum value of 8, andLMUL
to 8, the maximumvlmax
can reach is 8 (LMUL) * 2048 (VLEN) / 8 (EW) = 2048.According to the SPEC,
vstart
is specified to be valid as long as it is less thanvlmax-1
. Therefore,vstart
can go up to 2047, assuming 2044 for this example.In the main sequencer, the prepared
vstart
value to be allocated to each lane, when NrLanes is set to the default value of 4, would be 511 (2044 divided by 4).However, in the process of sending this
vstart
from the main sequencer tooperand_requester_i[MaskM]
in the lane sequencer, no adjustments are made.Consequently, when it is sent to
requester_d[6]
(corresponding to the mask operand queue) withinoperand_requester
, the calculation for theaddr
is as follows:It is worth noting that the lowest 3 bits are used for calculating the bank address.
Consequently, the value in [5:3] affects which entry in the bank is accessed.
In the example provided, the original intention was to access the mask register
v0
, but due to thevstart
adjustment not being made, it now accessesv0 + 7 = v7
in bank 7.This outcome leads to an error in lines 738-740 within the Mask Unit (the function is to copy the mask operand).
This error further impacts the final
mask_i
(the signal name in the mask unit ismask_o
) sent to various lanes in the VFU.This results in incorrect masking operations for instructions with the enable VM bit, affecting the final write-back to the VRF, causing incorrect byte enable values (
be
).My suggested modifications are as follows:
You can right-shift the vstart of operand_requester_i[MaskM] in the lane sequencer from the main sequencer by
(EW64 - int'(pe_req.vtype.vsew))
(this modification may be simpler).
Alternatively, you can keep the original operation from the main sequencer to the lane sequencer unchanged.
However, when for loop i equals 6 and you were originally changing the shift value for operand_requester (specifically, when addressing requester_d[6]), instead of altering it to access different registers, you can modify addr[5:3] to address the bank.
The idea behind this is that in the same context (
VLEN = 2048
,NrLanes = 4
), whenLMUL
is 1 andvstart
is 0, the Mask unit needs data from Bank 0 in each lane. This means it requires the lower 256 bits of the entireVLEN
= 2048 v0 register.However, when
LMUL
is 8,vlmax
is 2048, andvstart
is 1792 (2048-256), at this point, it needs only the highest 256 bits from the v0 register as the mask value.Therefore, you can repurpose the original shift register values ([5:3]) to determine the bank shift value for masking operands.
In conclusion, the current issue, as per the RVV SPEC, is in Chapter 4.5, where the mask register is not independent of LMUL.
The text was updated successfully, but these errors were encountered: