Close

Some statistics from emulator

A project log for Super-V

Research possibilities for superscalar 32-bit RISC-V under GPL v3

shaosSHAOS 12/23/2018 at 04:153 Comments

I added some statistics calculations into RV32I[MA] emulator ( originally created by Fabrice Bellard and modified and shared on Hackaday by @Frank Buss ) and collected stats from some RISC-V benchmark tests (see https://github.com/riscv/riscv-tests/tree/master/benchmarks). With DEBUG_EXTRA option it collects this info from Dhrystone benchmark for example:

Instructions Stat:
LUI   = 892
AUIPC   = 7716
JAL   = 11212
JALR   = 12850
BEQ   = 33399
BNE   = 11298
BLT   = 1721
BGE   = 3480
BLTU   = 7017
BGEU   = 2248
LW   = 31050
LBU   = 27712
LHU   = 502
SB   = 4968
SH   = 502
SW   = 33037
ADDI   = 87830
SLTIU   = 1500
XORI   = 1
ORI   = 1
ANDI   = 6151
SLLI   = 10647
SRLI   = 9534
SRAI   = 95
ADD   = 11486
SUB   = 2813
SLL   = 402
SLTU   = 1844
SRL   = 353
OR   = 2459
CSRRW   = 1
CSRRS   = 8
LI*   = 20602

Five Most Frequent:
1) ADDI   = 87830 (27.05%)
2) BEQ   = 33399 (10.29%)
3) SW   = 33037 (10.17%)
4) LW   = 31050 (9.56%)
5) LBU   = 27712 (8.53%)

Memory Reading Area 80000000...80007ae2
Memory Writing Area 80001000...80007b3f

>>> Execution time: 1425296449 ns
>>> Instruction count: 324730 (IPS=227833)
>>> Jumps: 50209 (15.46%) - 18074 forwards, 32135 backwards
>>> Branching T=26147 (44.19%) F=33016 (55.81%)

Without DEBUG_EXTRA option (no instructions stat and no memory usage stats) and with -O3 option (fastest optimization) emulator is capable of doing almost 13 millions instructions per second on my relatively modern AMD64 computer with Debain Linux onboard: 

>>> Execution time: 25084843 ns
>>> Instruction count: 324730 (IPS=12945267)
>>> Jumps: 50209 (15.46%) - 18074 forwards, 32135 backwards
>>> Branching T=26147 (44.19%) F=33016 (55.81%)

Here you can see that 15% of executed instructions are jumps (when PC is changed to something different from usual PC+4) and most jumps were backwards. Also branches were 44% true (with jump) and 56% false (no jump). Below you can see similar stats for some other benchmarks:

median: 

>>> Execution time: 1391119 ns
>>> Instruction count: 16244 (IPS=11676930)
>>> Jumps: 3552 (21.87%) - 1254 forwards, 2298 backwards
>>> Branching T=2613 (53.36%) F=2284 (46.64%)

multiply:

>>> Execution time: 4743276 ns
>>> Instruction count: 49670 (IPS=10471665)
>>> Jumps: 13808 (27.80%) - 6310 forwards, 7498 backwards
>>> Branching T=12915 (86.46%) F=2022 (13.54%)

qsort: 

>>> Execution time: 19821720 ns
>>> Instruction count: 236219 (IPS=11917179)
>>> Jumps: 45487 (19.26%) - 8141 forwards, 37346 backwards
>>> Branching T=37792 (59.71%) F=25503 (40.29%)

rsort: 

>>> Execution time: 31545464 ns
>>> Instruction count: 374291 (IPS=11865129)
>>> Jumps: 15239 (4.07%) - 797 forwards, 14442 backwards
>>> Branching T=14653 (73.66%) F=5239 (26.34%)

towers: 

>>> Execution time: 1474786 ns
>>> Instruction count: 18656 (IPS=12649970)
>>> Jumps: 2027 (10.87%) - 762 forwards, 1265 backwards
>>> Branching T=1037 (57.20%) F=776 (42.80%)

vvadd: 

>>> Execution time: 1004666 ns
>>> Instruction count: 11974 (IPS=11918388)
>>> Jumps: 1830 (15.28%) - 492 forwards, 1338 backwards
>>> Branching T=1417 (62.18%) F=862 (37.82%)

As you can see it is very important to pipeline jumps properly - not just wasting cycles by wrong branching as it's usually done in simple RISC hardware designs (branch penalty) - it has to be branch prediction or even speculative execution of both branches with ignoring wrong path after condition becomes known.

Discussions

Yann Guidon / YGDES wrote 12/23/2018 at 04:37 point

Branches are an old issue that will not disappear soon...

  Are you sure? yes | no

SHAOS wrote 12/23/2018 at 13:53 point

Yes, and it's significant (4-28% of the executed code as showed above). So it has to be properly handled...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/23/2018 at 15:34 point

Conditional moves help in some places, precomputed loop targets help too (very widely used in DSP) but the most common cases (think "spaghetti") is a different story... I'm reconsidering many approaches for FC1...

  Are you sure? yes | no