marp | paginate | theme | title |
---|---|---|---|
true |
true |
gem5 |
gem5's Standard Library |
Using gem5's python interface
Computer systems are complex. A declarative interface (e.g., ini
or json
) files, would be difficult to use.
Assuming we want to write code to describe the system we are going to simulate, we don't want to have to write down every single detail every time.
Python is well-suited for these kinds of "domain-specific languages."
You can think of gem5 similar to TensorFlow or PyTorch. It's a framework or language for describing and simulating computer systems.
- Using stdlib you don't need to specify 1,000s-10,000s lines to describe the system.
- Standard "components" reduces
- Duplicated code.
- Error-prone configurations.
- A lack of portability between different simulation setups.
Previously, there was se.py
and fs.py
- These tried to be everything to everyone
- "Spaghetti code"
- The default "interface" to gem5 was massive bash lines and hacks
Think of gem5 more like TensorFlow than a command line tool. gem5 is a framework or language.
The purpose of the gem5 Standard Library is to provide a set of predefined components that can be used to build a simulation that does the majority of the work for you.
For the remainder that is not supported by the standard library, APIs are provided that make it easy to extend the library for your own use.
Due to its modular, object-oriented design, gem5 can be thought of as a set of components that can be plugged together to form a simulation.
The types of components are boards, processors, memory systems, and cache hierarchies:
- Board: The "backbone" of the system. You plug components into the board. The board also contains the system-level things like devices, workload, etc. It's the boards job to negotiate the connections between other components.
- Processor: Processors connect to boards and have one or more cores.
- Cache hierarchy: A cache hierarchy is a set of caches that can be connected to a processor and memory system.
- Memory system: A memory system is a set of memory controllers and memory devices that can be connected to the cache hierarchy.
- A single core Arm CPU using a "simple" model.
- Two-level cache hierarchy with 32 KiB 8-way L1 and 512 KiB 16-way L2.
- Single channel DDR4 2400 memory.
Run BFS from the GAP benchmark suite.
- What is the average IPC?
- What is the total simulated time?
- What is the output of the simulated program?
Here are the imports you need to get started. It's a lot, but we'll see each one of them as we go through the bootcamp over the next few days.
from gem5.components.boards.simple_board import SimpleBoard
from gem5.components.processors.simple_processor import SimpleProcessor
from gem5.components.cachehierarchies.ruby.mesi_two_level_cache_hierarchy import (
MESITwoLevelCacheHierarchy,
)
from gem5.components.memory.single_channel import SingleChannelDDR4_2400
from gem5.components.processors.cpu_types import CPUTypes
from gem5.isas import ISA
from gem5.resources.resource import obtain_resource
from gem5.simulate.simulator import Simulator
Use the SimpleProcessor
class to create a simple processor that will run in timing mode, uses the Arm ISA, and has 1 core.
See SimpleProcessor
for hints.
processor = SimpleProcessor(cpu_type=CPUTypes.TIMING, isa=ISA.ARM, num_cores=1)
SimpleProcessor
is a component that allows you to customize the model for the underlying cores.
The cpu_type
parameter specifies the type of CPU model to use.
(We'll see more on this later.)
The isa
parameter specifies the ISA that the CPU will execute.
gem5 supports many different ISAs.
The num_cores
parameter specifies the number of cores in the processor.
The SimpleProcessor
assumes homogenous cores.
Use the MESITwoLevelCacheHierarchy
for the cache hierarchy.
Use 32 KiB 8-way L1 caches, 256 KiB 16-way L2 caches, and a single L2 bank.
See MESITwoLevelCacheHierarchy
for hints.
cache_hierarchy = MESITwoLevelCacheHierarchy(
l1d_size="32KiB",
l1d_assoc=8,
l1i_size="32KiB",
l1i_assoc=8,
l2_size="256KiB",
l2_assoc=16,
num_l2_banks=1,
)
MESITwoLevelCacheHierarchy
is a component that represents a two-level MESI cache hierarchy.
This uses the Ruby memory model.
The component for the cache hierarchy is parameterized with the sizes and associativities of the L1 and L2 caches.
Use the SingleChannelDDR4_2400
for the memory system.
See SingleChannel
classes for hints.
memory = SingleChannelDDR4_2400()
This component represents a single-channel DDR4 memory system.
There is a size
parameter that can be used to specify the capacity of the memory of the simulated system. You can reduce the size to save simulation time, or use the default for the memory type (e.g., one channel of DDR4 defaults to 8 GiB).
There are also multi-channel memories available. We'll cover this more in Memory Systems.
Use the SimpleBoard
class to create a board that will run the simulation.
Use a 3 GHz clock frequency.
See SimpleBoard
for hints.
A SimpleBoard
is a board which can run any ISA in Syscall Emulation (SE) mode.
It is "Simple" due the relative simplicity of SE mode.
Most boards are tied to a specific ISA and require more complex designs to run Full System (FS) simulation.
board = SimpleBoard(
clk_freq="3GHz",
processor=processor,
memory=memory,
cache_hierarchy=cache_hierarchy,
)
Use the set_workload
method of the board to set the workload.
See the set_workload
function for hints.
Use the obtain_resource
function to download the files needed to run the specified workload.
Use the "arm-gapbs-bfs-run" workload.
board.set_workload(obtain_resource("arm-gapbs-bfs-run"))
The obtain_resource
function downloads the files needed to run the specified workload. In this case "arm-gapbs-bfs-run" is a BFS workload from the GAP Benchmark Suite.
Here we can search the available resources: https://resources.gem5.org/.
Here is the arm-gabps-bfs-run resource: https://resources.gem5.org/resources/arm-gapbs-bfs-run?version=1.0.0.
Use the Simulator
class to set up the simulation and run it.
See the Simulator
class for hints.
Use the run
method to run the simulation.
Set up the simulation:
simulator = Simulator(board=board)
simulator.run()
(More on this later, but this is object that controls the simulation loop).
gem5-mesi 01-components.py
<iframe src="https://app.sli.do/event/qpr43XWrbjYJCdE3GHGCWg/embed/polls/63d7ea52-2bb4-45a8-ae01-9aa3ecf044cd" width="800" height="150"></iframe> <iframe src="https://app.sli.do/event/qpr43XWrbjYJCdE3GHGCWg/embed/polls/ae905f28-f1cd-475f-a758-a59be860157d" width="800" height="150"></iframe> <iframe src="https://app.sli.do/event/qpr43XWrbjYJCdE3GHGCWg/embed/polls/fe32093c-1c7a-4008-95d3-1010caa51057" width="800" height="150"></iframe>
Generate Time: 0.00462
Build Time: 0.00142
Graph has 1024 nodes and 10496 undirected edges for degree: 10
Trial Time: 0.00010
Trial Time: 0.00008
Trial Time: 0.00008
Trial Time: 0.00008
Trial Time: 0.00008
Trial Time: 0.00009
Trial Time: 0.00008
Trial Time: 0.00008
Trial Time: 0.00008
Trial Time: 0.00011
Average Time: 0.00009
simSeconds 0.009093
simTicks 9093461436
- How to change the processor?
- The number of cores
- The type of CPU model
- Details of the CPU model (e.g., pipeline depth)
- The ISA
- How to change the cache hierarchy?
- The sizes and associativities of the L1 and L2 caches
- The number of L2 banks
- How to change the hierarchy (e.g., 3-level, 2-level, write-through)
- How to change the memory system?
- The size of the memory
- The type of memory (e.g., DDR3, DDR4, HBM)
- The number of channels
We'll cover all of this in the coming sections.
A brief overview of the different kinds of components in the stdlib
gem5/src/python/gem5/components
----/boards
----/cachehierarchies
----/memory
----/processors
gem5/src/python/gem5/prebuilt
----/demo/x86_demo_board
----/riscvmatched
- gem5 stdlib in
src/python/gem5
- Two types
- Prebuilt: full systems with set parameters
- Components: Components to build systems
- Prebuilt
- Demo: Just examples to build off of
- riscvmatched: Model of SiFive Unmatched
gem5/src/python/gem5/components
----/boards
----/simple
----/arm_board
----/riscv_board
----/x86_board
----/cachehierarchies
----/memory
----/processors
- Boards: Things to plug into
- Have "set_workload" and "connect_things"
- Simple: SE-only, configurable
- Arm, RISC-V, and X86 versions for full system simulation
gem5/src/python/gem5/components
----/boards
----/cachehierarchies
----/chi
----/classic
----/ruby
----/memory
----/processors
- Have fixed interface to processors and memory
- Ruby: detailed cache coherence and interconnect
- CHI: Arm CHI-based protocol implemented in Ruby
- Classic caches: Hierarchy of crossbars with inflexible coherence
- Quick caveat: You need different gem5 binaries for different protocols
- Any binary can use classic caches
- Only one Ruby protocol per gem5 binary
gem5
: CHI (Fully configurable; based on Arm CHI)gem5-mesi
: MESI_Two_Level (Private L1s, Shared L2)gem5-vega
: GPU_VIPER (CPU: Private L1/L2 core pairs, shared L3; GPU: Private L1, shared L2)
gem5/src/python/gem5/components
----/boards
----/cachehierarchies
----/memory
----/single_channel
----/multi_channel
----/dramsim
----/dramsys
----/hbm
----/processors
- Pre-configured (LP)DDR3/4/5 DIMMs
- Single and multi channel
- Integration with DRAMSim and DRAMSys
- Not needed for accuracy, but useful for comparisons
- HBM: An HBM stack
gem5/src/python/gem5/components
----/boards
----/cachehierarchies
----/memory
----/processors
----/generators
----/simple
----/switchable
- Mostly "configurable" processors to build off of.
- Generators
- Synthetic traffic, but act like processors.
- Have linear, random, and more interesting patterns
- Simple
- Only default parameters, one ISA.
- Switchable
- We'll see this later, but you can switch from one to another during simulation.
- Processors are made up of cores.
- Cores have a "BaseCPU" as a member. This is the actual CPU model.
Processor
is what interfaces withCacheHierarchy
andBoard
- Processors are organized, structured sets of cores. They define how cores connect with each other and with outside components and the board though standard interface.
More details coming in the CPU Models section.
CPUTypes.TIMING
: A simple in-order CPU model- This is a "single cycle" CPU. Each instruction takes the time to fetch and executes immediately.
- Memory operations take the latency of the memory system.
- OK for doing memory-centric studies, but not good for most research.
Other options for CPU types
CPUTypes.O3
: An out-of-order CPU model- Highly detailed model based on the Alpha 21264.
- Has ROB, physical registers, LSQ, etc.
- Don't use
SimpleProcessor
if you want to configure this.
CPUTypes.MINOR
: An in-order core model- A high-performance in-order core model.
- Configurable four-stage pipeline
- Don't use
SimpleProcessor
if you want to configure this.
CPUTypes.ATOMIC
: Used in "atomic" mode (more later)CPUTypes.KVM
: More later
- gem5's standard library is a set of components that can be used to build a simulation that does the majority of the work for you.
- The standard library is designed around extension and encapsulation.
- The main types of components are boards, processors, memory systems, and cache hierarchies.
- The standard library is designed to be modular and object-oriented.
- The
Simulator
object controls the simulation.