Using Tracedoctor

Tracedoctor was developed by the legendary Björn Gottschall in order to help profile the execution of benchmarks on BOOM, and thereby make a perfect reference model for performance profiling. However, this can also be used in order to trace out information from FireSim configurations in a more generic manner. This guide will show you what to modify and where, and how to process your data in a useful way.

What is tracedoctor?

Tracedoctor is an interface for tracing data out of FireSim runs, that comes with a suite of pre-defined useful configurations. Tracedoctor will cycle by cycle write out whatever data you tell it to, and then will process this data with a defined worker that can handle events or write the data directly into binaries. This can be used to calculate events, or debug when you are unable to get full waveforms due to the problem being isolated to a small part of execution.

What git versions?

Writing this guide, I am currently on the following github branches and commits:

chipyad: eecs/shadow-binding - c8dc31b7b5a8a7aac943fa76c39efc1a1d4d7f66

firesim: eecs/shadow-binding - b2a065af98f5bbe77814ebae8c293103e5e81f29

boom: eecs/shadow-binding - 2873486cea09c18d745eec8473161316d44b3abb

Configuring Tracedoctor

Tracedoctor will write out whatever data you tell it to. You write this directly to the .bits field of io.traceDoctor. If the .valid bit is set to high of the same io interface, Tracedoctor will write out the bits data that cycle. If the .valid bit is low, Tracedoctor will not write any data. It is recommended to define your own TraceBundle somewhere in core.scala that holds all the fields you want, and then drive a register with this information. For example:

class TraceBundle(implicit p: Parameters) extends BoomBundle {

  val traceTimestamp = UInt(64.W)
 
  val ldq_head = UInt(ldqAddrSz.W)
  val ldq_tail = UInt(ldqAddrSz.W)
}

val traceData = Reg(new TraceBundle)

traceData.traceTimestamp        := traceTimestamp(63, 0).pad(64)
traceData.ldq_head              := io.lsu.ldq_head
traceData.ldq_tail              := io.lsu.ldq_tail

io.traceDoctor.valid := true.B
io.traceDoctor.bits := traceData.asUInt.asBools

This will write out the timestamp, ldq_head and ldq_tail every cycle, as the valid bit is always high. You can extend TraceDoctor with any amount of signals to make it write out, up to a total bitWidth of 512. In order to make TraceDoctor work, you need to ensure that your bits match. This means checking the if (io.traceDoctor.traceWidth >= $1) to ensure that $1 is equal to the size of your traceBundle. Also in /generators/chipyard/src/main/scala/config/fragments the class WithTraceDoctorIO(traceWidth: Int = $1), $1 must also be equal to the size of your traceBundle.

Running tracedoctor

After building your firesim bitstream and flashing the board, you need to tell the firesim execution to actually use tracedoctor. For my work, I used the following

./FireSim-alveo +permissive +mm_relaxFunctionalModel_0=0 +mm_writeMaxReqs_0=10 +mm_readMaxReqs_0=10 +mm_writeLatency_0=30 +mm_readLatency_0=30 +slotid=af +blkdev0=/cluster/home/amundbk/Benchmarks/Spec2017/spec2017-test.img +tracedoctor-trigger=tracerv +tracedoctor-buffer=256,8 +tracedoctor-worker=filer,file:trace.zst +trace-start=0 +trace-end=-1 +trace-select=1 +tracefile=test.txt +permissive-off /cluster/home/amundbk/Benchmarks/Spec2017/spec2017-test-bin

Tracedoctor piggybacks off of tracerv configuration to run when told (tracedoctor-trigger=tracerv). This config sets up a filer (write to file) as a worker, names the file (trace.zst, .zst ensures it is saved in compressed format), says that tracing should start from the beginning (trace-start=0), and that tracing should never terminate (trace-end=-1). The trace-slect=1 argument additionally enables TracerV to run with instruction tracing, which is useful to get the PC of last committed instructions. I tend to enable TracerV whenever I enable TraceDoctor. Not providing a trace-select option should disable trace-select.

Getting data

You should now have a binary trace file called trace.zst. In order to get something useful out of this, you will need to process the binary data. Start by uncompressing it, and then use process_trace.py to convert it to useful data:

#process_trace.py
import math

TRACE_NAME = "trace"
OUTPUT_NAME = "trace.txt"

NUM_PRINTS = 15000

class TraceObject:
    def __init__(self, name, num_bits, format=str):
        self.name = name
        self.num_bits = num_bits
        self.format = format

    def get_name(self):
        return self.name

    def get_num_bits(self):
        return int(self.num_bits)
    
    def get_format(self):
        return self.format

def hexcast(num):
    return hex(int(num, 2))

def intcast(num):
    return int(num, 2)

TIMESTAMP           = TraceObject("timestamp", 64, hexcast)

LDQ_HEAD            = TraceObject("ldq_head", 4, intcast)
LDQ_TAIL            = TraceObject("ldq_tail", 4, intcast)

TRACE_OBJECTS = [
TIMESTAMP,

LDQ_HEAD,
LDQ_TAIL
]

TRACE_OBJECTS.reverse()

size = 0
for obj in TRACE_OBJECTS:
    size += obj.get_num_bits()

size_in_bytes = math.ceil(size / 8)
num_chunks = math.ceil(size_in_bytes / 3)

debug_counter = 0


with open(TRACE_NAME, mode="rb") as trace, open(OUTPUT_NAME, "w") as output:
    trace.seek(0, 2)
    file_size = trace.tell()
    trace.seek(0, 0)

    index = 0
    curr_bytes = b''
    bits = ""
    units = 0
    partial_bits = 0
    OBJS = []
    while (index + (num_chunks * 3) < file_size):
        while (units < size_in_bytes):
            endian_bytes = []
            for i in range(8):
                endian_bytes.append(trace.read(1))
                index += 1
            #endian_bytes.reverse()
            for byt in endian_bytes:
                curr_bytes += byt
            units += 8
        
        while (len(curr_bytes) > 0):
            #print(curr_bytes)
            curr_byte = curr_bytes[0]
            for i in range(8):
                bits += str((curr_byte >> i) & 1)
            curr_bytes = curr_bytes[1:]
        
        trace_obj = ""
        for obj in TRACE_OBJECTS:
            #print(bits)
            assert(len(bits) > obj.get_num_bits())
            obj_string = bits[0:obj.get_num_bits()]
            obj_string = obj_string[::-1]
            trace_obj += f"{obj.get_name()}:{obj.get_format()(obj_string)}\n"
            bits = bits[obj.get_num_bits():]
        
        #print(bits)
        
        trace_obj += "\n"
        
        units -= size_in_bytes
        partial_bits += 8 - (size % 8)
        bits = bits[partial_bits:]
        partial_bits = 0
        #if partial_bits >= 8:
        #    units += partial_bits // 8
        #    partial_bits = partial_bits % 8

        debug_counter = debug_counter + 1
        #if (debug_counter >= 10000):
        #    exit(1)

        OBJS.append(trace_obj)
        if (len(OBJS) > NUM_PRINTS):
            del OBJS[0]
    
    for trace_obj in OBJS:
        output.write(trace_obj)

Note that the data objects have to be listed in the same order as you listed them in the TraceBundle. The objects are printed in reverse order, as that is how asBools converts the bundle in Chisel.

Provide feedback

Saved searches