Close

Exit MIG, Enter LiteDRAM.

A project log for BoxLambda

A retro-style FPGA-based microcomputer. The microcomputer serves as a platform for software and RTL experimentation.

epsilonEpsilon 12/29/2022 at 09:500 Comments

LiteDRAM in the BoxLambda Architecture.

LiteDRAM in the BoxLambda Architecture.

Initially, the plan was to use Xilinx’s MIG (Memory Interface Generator) to generate a DDR Memory Controller for Boxlambda. At the time, that was (and maybe still is) the consensus online when I was looking for memory controller options for the Arty A7. Meanwhile, Reddit user yanangao suggested I take a look at project LiteX for a memory controller. I took the advice and started playing around a bit with Litex. One thing led to another and, long story short, BoxLambda now has a DDR memory controller based on LiteDRAM, a core of the LiteX project. If you’re interested in the longer story, read on.

Recap

This is a summary of the current state of BoxLambda. We have:

LiteX and LiteDRAM

LiteX is an Open Source SoC Builder framework for FPGAs. You specify which CPU, memory, interconnect, and peripherals you want. The framework then generates the SoC and the software to go along with it. Here’s an example (with semi-randomly picked settings):

python3 digilent_arty.py --bus-standard wishbone --bus-data-width 32 --bus-interconnect crossbar --cpu-type rocket --integrated-sram-size 32768 --with-ethernet --with-sdcard --sys-clk-freq 50000000 --build --load

INFO:S7PLL:Creating S7PLL, speedgrade -1.
INFO:S7PLL:Registering Single Ended ClkIn of 100.00MHz.
INFO:S7PLL:Creating ClkOut0 sys of 50.00MHz (+-10000.00ppm).
INFO:S7PLL:Creating ClkOut1 eth of 25.00MHz (+-10000.00ppm).
INFO:S7PLL:Creating ClkOut2 sys4x of 200.00MHz (+-10000.00ppm).
INFO:S7PLL:Creating ClkOut3 sys4x_dqs of 200.00MHz (+-10000.00ppm).
INFO:S7PLL:Creating ClkOut4 idelay of 200.00MHz (+-10000.00ppm).
INFO:SoC:        __   _ __      _  __
INFO:SoC:       / /  (_) /____ | |/_/
INFO:SoC:      / /__/ / __/ -_)>  <
INFO:SoC:     /____/_/\__/\__/_/|_|
INFO:SoC:  Build your hardware, easily!
INFO:SoC:--------------------------------------------------------------------------------
INFO:SoC:Creating SoC... (2022-12-19 16:28:38)
INFO:SoC:--------------------------------------------------------------------------------
...

… and off it goes. That single command generates, synthesizes and loads the SoC onto your Arty A7.

LiteX is written in Migen, a Python-based tool that automates further the VLSI design process, to quote the website. At the heart of Migen sits FHDL, the Fragmented Hardware Description Language. FHDL is essentially a Python-based data structure consisting of basic constructs to describe signals, registers, FSMs, combinatorial logic, sequential logic etc. Here’s an example:

        aborted = Signal()        offset  = base_address >> log2_int(port.data_width//8)
        self.submodules.fsm = fsm = FSM(reset_state="CMD")        self.comb += [            port.cmd.addr.eq(wishbone.adr - offset),            port.cmd.we.eq(wishbone.we),            port.cmd.last.eq(~wishbone.we), # Always wait for reads.            port.flush.eq(~wishbone.cyc)    # Flush writes when transaction ends.        ]        fsm.act("CMD",            port.cmd.valid.eq(wishbone.cyc & wishbone.stb),            If(port.cmd.valid & port.cmd.ready &  wishbone.we, NextState("WRITE")),            If(port.cmd.valid & port.cmd.ready & ~wishbone.we, NextState("READ")),            NextValue(aborted, 0),        )        self.comb += [            port.wdata.valid.eq(wishbone.stb & wishbone.we),            If(ratio <= 1, If(~fsm.ongoing("WRITE"), port.wdata.valid.eq(0))),            port.wdata.data.eq(wishbone.dat_w),            port.wdata.we.eq(wishbone.sel),        ]

You can more or less see the Verilog equivalent. However, the fact that this is a Python data structure means that you have Python at your disposal as a meta-language to combine and organize these bits of HDL. This is a huge increase in abstraction and expressiveness, and it explains how LiteX can do what it does. The flexibility that LiteX provides in mixing and matching cores, core features, and interconnects, just can’t be achieved with vanilla SystemVerilog.

LiteX is not an all-or-nothing proposition. LiteX cores, such as the LiteDRAM memory controller, can be integrated into traditional design flows. That’s what I’ll be doing.

Why choose LiteDRAM over Xilinx MIG?

Generating a LiteDRAM core

LiteDRAM is a highly configurable core (because of Migen). For an overview of the core’s features, take a look at the LiteDRAM repository’s README file:

https://github.com/enjoy-digital/litedram/blob/master/README.md

You specify the configuration details in a .yml file. A Python script parses that .yml file and generates the core’s Verilog as well as a CSR register access layer for software.

Details are a bit sparse, but luckily example configurations are provided:

https://github.com/enjoy-digital/litedram/tree/master/examples

Starting from the arty.yml example, I created the following LiteDRAM configuration file for BoxLambda:

#This is a LiteDRAM configuration file for the Arty A7.
{    # General ------------------------------------------------------------------    "speedgrade": -1,          # FPGA speedgrade    "cpu":        "None",      # CPU type (ex vexriscv, serv, None) - We only want to generate the LiteDRAM memory controller, no CPU.    "memtype":    "DDR3",      # DRAM type    "uart":       "rs232",     # Type of UART interface (rs232, fifo) - not relevant in this configuration.
    # PHY ----------------------------------------------------------------------    "cmd_latency":     0,             # Command additional latency    "sdram_module":    "MT41K128M16", # SDRAM modules of the board or SO-DIMM    "sdram_module_nb": 2,             # Number of byte groups    "sdram_rank_nb":   1,             # Number of ranks    "sdram_phy":       "A7DDRPHY",    # Type of FPGA PHY
    # Electrical ---------------------------------------------------------------    "rtt_nom": "60ohm",  # Nominal termination    "rtt_wr":  "60ohm",  # Write termination    "ron":     "34ohm",  # Output driver impedance
    # Frequency ----------------------------------------------------------------    # The generated LiteDRAM module contains clock generation primitives, for its own purposes, but also for the rest    # of the system. The system clock is output by the LiteDRAM module and is supposed to be used as the main input clock    # for the rest of the system. I set the system clock to 50MHz because I couldn't get timing closure at 100MHz.    "input_clk_freq":   100e6, # Input clock frequency    "sys_clk_freq":     50e6, # System clock frequency (DDR_clk = 4 x sys_clk)    "iodelay_clk_freq": 200e6, # IODELAYs reference clock frequency
    # Core ---------------------------------------------------------------------    "cmd_buffer_depth": 16,    # Depth of the command buffer
    # User Ports ---------------------------------------------------------------    # We generate two wishbone ports, because BoxLambda has two system buses.    # Note that these are _classic_ wishbone ports, while BoxLamdba uses a _pipelined_ wisbone bus.    # A pipelined-to-classic wishbone adapter is needed to interface correctly to the bus.    # At some point it would be nice to have an actual pipelined wishbone frontend, with actual pipelining capability.    "user_ports": {        "wishbone_0" : {            "type":  "wishbone",            "data_width": 32, #Set data width to 32. If not specificied it defaults to 128 bits.            "block_until_ready": True,        },        "wishbone_1" : {            "type":  "wishbone",            "data_width": 32, #Set data width to 32. If not specificied it defaults to 128 bits.            "block_until_ready": True,        },    },
}

Some points about the above:

I generate two LiteDRAM core variants from this configuration:

The generated core has the following interface:

module litedram (
`ifndef SYNTHESIS      input  wire sim_trace, /*Simulation only.*/
`endif
	input  wire clk,
`ifdef SYNTHESIS  	input  wire rst,       /*FPGA only...*/
	output wire pll_locked,
	output wire [13:0] ddram_a,
	output wire [2:0] ddram_ba,
	output wire ddram_ras_n,
	output wire ddram_cas_n,
	output wire ddram_we_n,
	output wire ddram_cs_n,
	output wire [1:0] ddram_dm,
	inout  wire [15:0] ddram_dq,
	inout  wire [1:0] ddram_dqs_p,
	inout  wire [1:0] ddram_dqs_n,
	output wire ddram_clk_p,
	output wire ddram_clk_n,
	output wire ddram_cke,
	output wire ddram_odt,
	output wire ddram_reset_n,
`endif  	output wire init_done,  /*FPGA/Simulation common ports...*/
	output wire init_error,
	input  wire [29:0] wb_ctrl_adr,
	input  wire [31:0] wb_ctrl_dat_w,
	output wire [31:0] wb_ctrl_dat_r,
	input  wire [3:0] wb_ctrl_sel,
	input  wire wb_ctrl_cyc,
	input  wire wb_ctrl_stb,
	output wire wb_ctrl_ack,
	input  wire wb_ctrl_we,
	input  wire [2:0] wb_ctrl_cti,
	input  wire [1:0] wb_ctrl_bte,
	output wire wb_ctrl_err,
	output wire user_clk,
	output wire user_rst,
	input  wire [25:0] user_port_wishbone_0_adr,
	input  wire [31:0] user_port_wishbone_0_dat_w,
	output wire [31:0] user_port_wishbone_0_dat_r,
	input  wire [3:0] user_port_wishbone_0_sel,
	input  wire user_port_wishbone_0_cyc,
	input  wire user_port_wishbone_0_stb,
	output wire user_port_wishbone_0_ack,
	input  wire user_port_wishbone_0_we,
	output wire user_port_wishbone_0_err,
	input  wire [25:0] user_port_wishbone_1_adr,
	input  wire [31:0] user_port_wishbone_1_dat_w,
	output wire [31:0] user_port_wishbone_1_dat_r,
	input  wire [3:0] user_port_wishbone_1_sel,
	input  wire user_port_wishbone_1_cyc,
	input  wire user_port_wishbone_1_stb,
	output wire user_port_wishbone_1_ack,
	input  wire user_port_wishbone_1_we,
	output wire user_port_wishbone_1_err
);

Some points worth noting about this interface:

Integrating the LiteDRAM core

Litedram_wrapper

I created a litedram_wrapper module around litedram.v:

https://github.com/epsilon537/boxlambda/blob/master/components/litedram/common/rtl/litedram_wrapper.sv

This wrapper contains:

  /*Straight out of the Wishbone B4 spec. This is how you interface a classic slave to a pipelined master.   *The stall signal ensures that the STB signal remains asserted until an ACK is received from the slave.*/   assign user_port_wishbone_p_0_stall = !user_port_wishbone_p_0_cyc ? 1'b0 : !user_port_wishbone_c_0_ack;

How long to STB?

One Rookie mistake I made early on was to just set the Wishbone stall signal to 0. I figured that, as long as I didn’t generate multiple outstanding transactions, that should work just fine. That’s not the case, however. Wishbone transactions to the LiteDRAM core would just block. The reason is that in classic Wishbone, STB has to remain asserted until an ACK or ERR is signaled by the slave. Pipelined Wishbone doesn’t work that way. In pipelined Wishbone, as long as the slave is not stalling, a single access STB only remains asserted for one clock cycle.

Classic Wishbone transaction.

Classic Wishbone transaction (Illustration taken from Wishbone B4 spec).

Classic Wishbone transaction.

Pipelined Wishbone transaction - single access (Illustration taken from Wishbone B4 spec).

Hence the pipelined-to-classic Wishbone adapter in litedram_wrapper.

More Wishbone Issues: Core2WB and WB_Interconnect_SharedBus

With the litedram_wrapper in place, Wishbone transactions still weren’t working properly. Waveform analysis shows that, from the point of view of litedram_wrapper, the Wishbone Bus Master wasn’t well-behaved. That problem could either come from the Ibex memory-interface-to-wishbone adapter, core2wb.sv, or the Wishbone shared bus implementation used by the test build, wb_interconnect_shared_bus.sv, or both.

This is the Ibex Memory Interface specification:

https://ibex-core.readthedocs.io/en/latest/03_reference/load_store_unit.html#load-store-unit

There are two such interfaces. One for data, one for instructions.

The job of core2wb is to adapt that interface to a pipelined Wishbone bus master interface. That Wishbone bus master in turn requests access to the shared bus. It’s up to wb_interconnect_shared_bus to grant the bus to one of the requesting bus masters and direct the transaction to the selected slave. If either one of those modules has a bug, that will result in an incorrectly behaving bus master, from the point of view of the bus slave.

Ibex to LiteDRAM.

From Ibex to LiteDRAM.

core2wb.sv and wb_interconnect_shared_bus.sv are part of the ibex_wb repository. The ibex_wb repository no longer appears to be actively maintained. I looked long and hard at the implementation of the two modules and ultimately decided that I couldn’t figure out the author’s reasoning. I decided to re-implement both modules:

Core2WB State Diagram.

Core2WB State Diagram.

WB_Interconnect_Shared_Bus State Diagram.

WB_Interconnect_Shared_Bus State Diagram.

With those changes in place, Ibex instruction and data transactions to LiteDRAM are working fine.

ddr_test_soc

/projects/ddr_test/rtl/ddr_test_soc.sv has the test build’s top-level. It’s based on the previous test build’s top-level, extended with the LiteDRAM wrapper instance.

  litedram_wrapper litedram_wrapper_inst (
	.clk(ext_clk100), /*100MHz External clock is input for LiteDRAM module.*/    .rst(~ext_rst_n), /*External reset goes into a reset synchronizer inside the litedram module. The output of that synchronizer is sys_rst.*/    .sys_clk(sys_clk), /*LiteDRAM outputs 50MHz system clock...*/
	.sys_rst(sys_rst), /*...and system reset.*/
	.pll_locked(pll_locked),
`ifdef SYNTHESIS
	.ddram_a(ddram_a),
	.ddram_ba(ddram_ba),
	.ddram_ras_n(ddram_ras_n),
	.ddram_cas_n(ddram_cas_n),
	.ddram_we_n(ddram_we_n),
	.ddram_cs_n(ddram_cs_n),
	.ddram_dm(ddram_dm),
	.ddram_dq(ddram_dq),
	.ddram_dqs_p(ddram_dqs_p),
	.ddram_dqs_n(ddram_dqs_n),
	.ddram_clk_p(ddram_clk_p),
	.ddram_clk_n(ddram_clk_n),
	.ddram_cke(ddram_cke),
	.ddram_odt(ddram_odt),
	.ddram_reset_n(ddram_reset_n),
`endif
	.init_done(init_done_led),
	.init_error(init_err_led),
	.wb_ctrl_adr(wbs[DDR_CTRL_S].adr),
	.wb_ctrl_dat_w(wbs[DDR_CTRL_S].dat_m),
	.wb_ctrl_dat_r(wbs[DDR_CTRL_S].dat_s),
	.wb_ctrl_sel(wbs[DDR_CTRL_S].sel),    .wb_ctrl_stall(wbs[DDR_CTRL_S].stall),
	.wb_ctrl_cyc(wbs[DDR_CTRL_S].cyc),
	.wb_ctrl_stb(wbs[DDR_CTRL_S].stb),
	.wb_ctrl_ack(wbs[DDR_CTRL_S].ack),
	.wb_ctrl_we(wbs[DDR_CTRL_S].we),
	.wb_ctrl_err(wbs[DDR_CTRL_S].err),
  /*Eventually we're going to have two system buses, but for the time being, to allow testing,   *we hook up both user ports to our one shared bus.   *Both ports address the same 256MB of DDR memory, one at base address 'h40000000, the other at 'h50000000.*/
	.user_port_wishbone_p_0_adr(wbs[DDR_USR0_S].adr),
	.user_port_wishbone_p_0_dat_w(wbs[DDR_USR0_S].dat_m),
	.user_port_wishbone_p_0_dat_r(wbs[DDR_USR0_S].dat_s),
	.user_port_wishbone_p_0_sel(wbs[DDR_USR0_S].sel),
	.user_port_wishbone_p_0_stall(wbs[DDR_USR0_S].stall),
	.user_port_wishbone_p_0_cyc(wbs[DDR_USR0_S].cyc),
	.user_port_wishbone_p_0_stb(wbs[DDR_USR0_S].stb),
	.user_port_wishbone_p_0_ack(wbs[DDR_USR0_S].ack),
	.user_port_wishbone_p_0_we(wbs[DDR_USR0_S].we),
	.user_port_wishbone_p_0_err(wbs[DDR_USR0_S].err),

	.user_port_wishbone_p_1_adr(wbs[DDR_USR1_S].adr),
	.user_port_wishbone_p_1_dat_w(wbs[DDR_USR1_S].dat_m),
	.user_port_wishbone_p_1_dat_r(wbs[DDR_USR1_S].dat_s),
	.user_port_wishbone_p_1_sel(wbs[DDR_USR1_S].sel),
	.user_port_wishbone_p_1_stall(wbs[DDR_USR1_S].stall),
	.user_port_wishbone_p_1_cyc(wbs[DDR_USR1_S].cyc),
	.user_port_wishbone_p_1_stb(wbs[DDR_USR1_S].stb),
	.user_port_wishbone_p_1_ack(wbs[DDR_USR1_S].ack),
	.user_port_wishbone_p_1_we(wbs[DDR_USR1_S].we),
	.user_port_wishbone_p_1_err(wbs[DDR_USR1_S].err)  );

Clock and Reset generation is now done by LiteDRAM. LiteDRAM accepts the external clock and reset signal and provides the system clock and synchronized system reset. Previous BoxLambda test builds had a separate Clock-and-Reset-Generator instance for that.

In this test build, both user ports are hooked up to the same shared bus. That doesn’t make a lot of sense of course. I’m just doing this to verify connectivity over both buses. Eventually, BoxLambda is going to have two buses and LiteDRAM will be hooked up to both.

LiteDRAM Initialization

When the litedram_gen.py script generates the LiteDRAM Verilog core (based on the given .yml configuration file), it also generates the core’s CSR register accessors for software:

The most relevant files are csr.h and sdram_phy.h. They contain the register definitions and constants used by the memory initialization code. Unfortunately, these accessors are not the same for the FPGA and the simulated LiteDRAM cores. We’re going to have to use separate software builds for FPGA and simulation.

Sdram_init()

Sdram_phy.h also contains a function called init_sequence(). This function gets invoked as part of a more elaborate initialization function called sdram_init(). Sdram_init() is not part of the generated code, however. It’s part of sdram.c, which is part of liblitedram, which is part of the base Litex repository, not the LiteDRAM repository:

https://github.com/epsilon537/litex/tree/master/litex/soc/software/liblitedram

sdram_init()

sdram_init() vs. init_sequence().

It’s not clear to me why the liblitedram is not part of the LiteDRAM repository, but’s not a big deal. I integrated the sdram_init() function from liblitedram in the BoxLambda code base and it’s working fine.

To get things to build, I added Litex as a git submodule, to get access to liblitedram. I also tweaked some CPPFLAGS and include paths. The resulting Makefiles are checked-in here:

It’s worth noting that liblitedram expects a standard C environment, which I added in the previous BoxLambda update.

DDR Test

The DDR test program is located here:

https://github.com/epsilon537/boxlambda/blob/master/sw/projects/ddr_test/ddr_test.c

The program boots from internal memory. It invokes sdram_init(), then performs a memory test over user port 0, followed by user port 1. Finally, the program verifies CPU instruction execution from DDR by relocating a test function from internal memory to DDR and branching to it.

The memory test function used is a slightly modified version of the memtest() function provided by Litex in liblitedram.

Relevant Files

Try It Out

Repository setup

  1. Install the Prerequisites.
  2. Get the BoxLambda repository:
    git clone https://github.com/epsilon537/boxlambda/
    cd boxlambda
    
  3. Switch to the enter_litedram tag:
    git checkout enter_litedram
    
  4. Set up the repository. This initializes the git submodules used and builds picolibc for BoxLambda:
    make setup
    

Build and Run the DDR Test Image on Verilator

  1. Build the test project:
    cd projects/ddr_test
    make sim
    
  2. Execute the generated verilator model in interactive mode:
    cd generated
    ./Vmodel -i
    
  3. You should see something like this:

ddr_test on Verilator

DDR Test on Verilator.

Build and Run the DDR Test Image on Arty A7

  1. If you’re running on WSL, check BoxLambda’s documentation On WSL section.
  2. Build the test project:
    cd projects/ddr_test
    make impl
    
  3. Connect a terminal program such as Putty or Teraterm to Arty’s USB serial port. Settings: 115200 8N1.
  4. Run the project:
    make run
    
  5. Verify the test program’s output in the terminal. You should see something like this:

ddr_test on Arty - Putty Terminal

DDR Test on Arty A7-35T.

Other Changes

#This is the JTAG TCK clock generated by the BSCANE2 primitive.
#Note that the JTAG top-level ports (incl. TCK) are not used in a synthesized design. They are driven by BSCANE2 instead.
create_clock -period 1000.000 -name dmi_jtag_inst/i_dmi_jtag_tap/tck_o -waveform {0.000 500.000} [get_pins dmi_jtag_inst/i_dmi_jtag_tap/i_tap_dtmcs/TCK]

Interesting Links

https://github.com/antonblanchard/microwatt: An Open-Source FPGA SoC by Anton Blanchard using LiteDRAM. I found it helpful to look at that code base to figure out how to integrate LiteDRAM into BoxLambda.

Discussions