Documentation editor: Po-wei Huang
I would like to thank the following peoples for their time, feedback, and contribution:
Wei Song
- Goal of this document
- Which branch is being tageted?
- Overview of Spike
- Top Level Structure
- Memory System Overview
- TLB & MMU
- Cache simulation
- Processor Overview
- Hart modeling
- Trap modeling
- Interrupt modeling
- Exception modeling
- Bus and Miscellaneous devices
- Appendix
- Let people understand the implementation of Spike.
- Work with Spike to help people understand RISC-V more as Spike is a golden reference
- Provide information about how to use the spike, especially those features that are in the code but not well known to people. Ex. cache simulation, multi-core simulation.
As Spike is a functional simulator, the simulator structure would not necessarily match the hardware structure. In order to make simulation faster, sometimes simulator optimization will be used, and these optimization will make the structure completely different. We will try to point out these difference when we meet them.
This tutorial is for branch master from the RISC-V ISA SIM repo and the commit is daaf28f.
- Spike is an ISS (instruction set simulator), which is not cycle accurate.
- Spike is a function simulator which omits all internal delays such as cache misses, memory transactions, IO accesses.
- Spike does not have a full cache model, instead, the cache is a tracer or monitor (It doesn't allocate a space to cache any data).
For spike, they use a multi-core framework. Each core includes a MMU for virtual memory, and all of the core have a common I$ and D$. Then, both I$ and D$ connect to a single L2$. The main memory follows.
The cores and the memory hierarchy are inside a class sim, and the class could interact with outside by interactive command. Moreover, the sim includes bus, debug module, boot rom, and real time clock (RTC) . The processors, boot ROM, debug module and RTC are hooked on the bus, but the memory is not. These components together enable spike to run a simple proxy kernel pk.
The code below comes from riscv-isa-sim/spike_main/spike.cc
. You could see that I$ and D$ connect to L2$ by miss handler. Moreover, for each core, it has a mmu and the mmu connect to a single ic and dc.
After all the components are connected, the method run is called to start the simulation.
if (ic && l2) ic->set_miss_handler(&*l2);
if (dc && l2) dc->set_miss_handler(&*l2);
for (size_t i = 0; i < nprocs; i++)
{
if (ic) s.get_core(i)->get_mmu()->register_memtracer(&*ic);
if (dc) s.get_core(i)->get_mmu()->register_memtracer(&*dc);
if (extension) s.get_core(i)->register_extension(extension());
}
s.set_debug(debug);
s.set_log(log);
s.set_histogram(histogram);
return s.run();
On the other hand, inside riscv-isa-sim/riscv/sim.cc, you could see many bus.add_device(), just like the following figure shows. Spike use this function to attach device on bus. After these attachments are done, spike could start to run.
The picture above is an overview of the memory system. The MMU contains a TLB, which could send back the data without invocation of cache. If the TLB fail, they will go through the table and access the cache. For cache, they model a write-back cache, and use sets/ways/line size to set the configuration. This scheme actually will make cache simulation inaccurate, but they do this in order to speed up performance of simulator.
When an instruction execute a load, it will call load function of MMU and use WRITE_RD to write the data back t register. Then, how to implement the MMU load?Below is an excerpt of riscv-isa-sim/riscv/mmu.h. The functions are defined in macro. The load will go through TLB first and then go to the slow path if TLB miss happens.
Then, when TLB fail, MMU will call the slow path, and it will ask tracer to call trace. The trace will start to access the cache. Finally, when we jump to riscv-isa-sim/riscv/cachesim.h, we could see that the tracer will call access function of cache.
* Model a RISC-V hart
* Processor stepping, including fetch and execution.
* Trap Handling including exception and interrupt handling.
* Optional: MMU for VA->PA
* Architecture state of a hart, including CSR, pc, registers and floating point registers. Below is an excerpt from spike/riscv/processor.c. The state_t contains pc, register_file, and CSR. Notice that Spike only implement some of the CSR inside the hart. It implements other CSR in the processor. To model a trap, the followings are needed:
* Cause of the trap. The information is in mcause ( machine cause register)
* For memory related trap, the faulting address needs to be saved in mbadaddr (machine bad address register).
* For trap caused by exception, virtual address of the instruction that encountered the exception. It’s in mepc(machine exception pc register).
* For trap caused by interrupt?
Inside encoding.h, the causes are defined.
Inside trap.h , two base classes are defined. The which and badaddr are for the cause and faulting address respectively. Then, macros are used to construct classes for each kind of trap and the cause are saved into the class at the same time.
* riscv/device.h
* riscv/device.cc
In this section, we want to describe how to simulate or add a device. The devices inherit from a base class abstract_device_t, which has virtual functions load and store. Then, each device implements the load/store, and provides their special functions.
In spike, five devices are simulated, including bus, rom, real time clock (rtc), processor and debug module.
* riscv/decode.h
The spike use a class instruction_t to represent instructions. To extract each field, it defines functions like rs1() or rm(), as the following code shows.
The number of x comes from the following encoding table from the spec.