Close
0%
0%

Backbone Bus

Backbone is my proposal for an off-chip, Wishbone-inspired backplane interconnect that supports multiple bus masters.

Similar projects worth following
Backbone is an interconnect derived from Wishbone B4 bus specifications, where certain Wishbone requirements are adjusted as required to support board-to-board data transfers on a multi-drop, shared bus. Its purpose is to minimize the "impedance mismatch" found in projects using multiple FPGAs and which rely heavily on Wishbone-compatible cores.

This project page documents version 0.0.1-alpha. Versioning rules follow Semantic Versioning rules.

You can read the Wishbone B4 specifications at http://cdn.opencores.org/downloads/wbspec_b4.pdf .

I plan on using this bus as the backplane for supporting R&D for my Kestrel Computer Project. Signals are as follows:


SYSCON Signals.

50MHZ. A 50MHz reference clock generated by the backplane. NOTE: This doesn't mean that the bus has to run at 50MHz; you're free to insert as many wait-states as needed to slow things down to a more manageable speed. In fact, considering the physical size of even the smallest backplanes, it'll be very difficult to pull off true 50MT/s performance levels. I think the best you'll be able to do is 25MT/s. Even so, all signals on the bus are synchronized to the rising edge of the 50MHZ signal.

RESET. This backplane-generated output is high while the whole system is still being configured. It will only go low once all cards report having completed their configuration.

CDONE. This signal is generated by the card, and is high if, and only if, all configurable components have completed their configuration cycles. For example, Lattice FPGAs (and I think Xilinx too) have a pin called CDONE which goes high when the FPGA has finished bootstrapping itself from configuration flash. Note that a card may, at any time, bring this signal low (e.g., as when a user pushes a reset button).

Common MASTER/SLAVE Signals.

D0-D15. A 16-bit, bidirectional datapath. Only the current bus master can drive the data bus; all other cards can only sense their current state.

A1-A31. A 32-bit bus providing an address to read or write from. Note that the state of A0 combined with the size of the transfer is encoded in the SEL(1:0) pins. Only the current bus master is allowed to drive these pins.

SEL1-SEL0. These two pins select which half of the data bus will contain valid data. SEL1 corresponds to D8-D15, while SEL0 corresponds to D0-D7. Only the current bus master is allowed to drive these pins.

WE. This pin distinguishes a read transaction from a write transaction (as viewed from the perspective of the current bus master). Only the current bus master is allowed to drive this pin.

ACK. When the bus master addresses a peripheral, the peripheral is responsible for acknowledging the transaction. Each clock transition between the assertion of STB and ACK is a wait-state. Only the addressed peripheral is allowed to drive this pin.

STB. When the current bus master commences a bus transaction, it asserts this pin. Otherwise, it keeps this pin negated. This pin can only be driven by the current bus master.

CYC#. When the current bus master wants to take control of the bus, it brings this pin low. The master basically owns the bus for as long as this pin is held low. This pin is not bussed; the master must drive this pin high if it's not the currently selected bus master. Slaves should tie this pin to +3.3V.

CYCA. (Cycle Announce.) If any slot's CYC# pin is low, regardless of slot, CYCA goes high. This tells the card that a bus cycle is in progress, and that all other master-driven signals are valid.

BCL#. Bus Clear. A bus master is allowed to hold onto the bus as long as it needs, or even wants to, as long as it respects this pin. Any other card that wants to be a master should assert this pin low. This pin is open-drain, allowing multiple cards to drive it. It must remain low as long as another bus master wants to conduct priority traffic. Otherwise, it's more polite to wait its turn.

BGO and BGI. Bus Grant output and input, respectively. When these two signals differ, the card is the currently selected bus master. When they're equal, then the card is not the currently selected master. If a card does not want to be the master this turn around, then it should reflect the state of BGI to BGO. Otherwise, it should assert CYC# and drive the bus as appropriate.

+5V and GND. These pins provide power to the card. Although +5V is the supply voltage, the logic signaling over the bus is 3.3V. Each card is expected to have its own voltage regulator.

DIN 41612 Pin Out.

        A       B       C
    1   D0      +5V     WE
    2   D1      GND     A1
    3   D2      +5V     A2
    4   D3      GND     A3
    5   D4      +5V     A4
    6   D5      GND     A5
 7 D6 +5V A6
...
Read more »

  • Why I Am and Am Not a Fan of STEbus.

    Samuel A. Falvo II11/05/2021 at 19:22 12 comments

    First, the cons, which are deal-breakers for me.

    1. 20 Address Bits.  This might seem like an awful lot of address space; I mean, you have 1MiB of memory space with STEbus.  However, with so few address bits, this means you cannot practically add any card that exposes a memory space about that size.  Consider, a video card with a 24-bit 800x600 resolution will require close to 2MB alone, forcing the processor to switch between memory banks when updating the frame buffer.  For good performance and minimizing visual tear, this just plain sucks!  Also, consider the case of a SCSI or ATA controller, wanting to DMA data from a harddrive into memory.  The processor will be able to address a lot more memory than the STEbus can, so what is the value of having a bus master on the STEbus if you can't address the processor's memory?  The only thing that makes sense is a separate CPU card.

    2. 12 Address Bits.  When addressing I/O, you're limited to 4KiB of space for all 20 cards possible on the backplane.  For processors like the 8086 and its subsequent progeny, this is a serious limitation, as a number of peripherals first designed for the ISA bus readily use, or even hard-wire, 16-bit I/O addresses.

    3. No plug-and-play capabilities what-so-ever.  You're obliged to select a board's I/O and memory base addresses via jumpers.  You're obliged to select interrupts and DMA requests via jumpers.  There are jumpers literally everywhere in this system.  Compared to the Apple IIe bus, the STEbus offers a sharp regression in user experience and functionality.

    4. DIN 41612 connectors.  If all you care about is 10 insertions before they break,
    then this connector is fine.  You can pick them up for relatively cheap on Digikey.
    However, while building my own video/FPGA development card for the RC2014,
    I must have removed and inserted cards well in excess of this figure.  So, if you want a DIN connector that can handle more than this number of insertions, you need to start shelling out upwards of $7 per connector.  That's $7 for each connector on your backplane, plus $7 for each mating connector on your expansion cards.  This quickly gets expensive.  My RC2014 has six cards in it.  If DIN connectors were used, that'd total $14*6=$84, just in connectors used.  That's almost 1/3rd the cost of the computer!  There are significantly cheaper options.

    5. Only three bus masters throughout the whole system.  You get the "default" bus master, plus no more than two other bus masters across the whole backplane, even if your backplane has the full allotment of 20 slots!

    To fix these deficiencies, I would do the following:

    • Use at least 24 to 32 address lines on the connector.
    • Unify the I/O and memory address spaces.
    • Use geographical addressing.  Drive REGSEL# for a particular slot when addressing that slot's register space.
    • Make register space at least 64KB in size per slot.
    • Get rid of ATNRQ# lines for interrupts, and replace them with IRQ#, IRQ_IN#, and IRQ_OUT#, the latter two providing a priority chain of arbitrary length.
    • Get rid of ATNRQ# lines for DMA, and just replace them with a generic bus mastering protocol, DMA_REQ#, DMA_ACK_IN#, DMA_ACK_OUT# analogous to IRQs.
    • Swap in sets of 40-pin SIP connectors in place of DIN connectors.  A pair gives you 80 usable pins which are far cheaper, more reliable, mechanically stable, and far easier to solder and maintain.

    STEbus does have some pros which I really like, though:

    1. Fully asynchronous.  The master drives ADRSTB#.  Then it drives DATSTB#.  Then it waits for the card to assert DATACK#.  Then, the master negates DATSTB#, which triggers the card to release DATACK#.  (Concurrently and unrelated, ADRSTB# may also be negated; or it may not, in the case of a read-modify-write cycle.)  Only then can the next cycle begin.  Using chips like the 74HCT688, xCT138, and xCT74...

    Read more »

  • Parts received, but no boards yet. :(

    Samuel A. Falvo II08/02/2016 at 17:41 0 comments

    Well, I got the parts I ordered from Digikey, but so far, no boards yet.

  • More EDA woes. You'd think this was simple stuff.

    Samuel A. Falvo II06/07/2016 at 17:44 0 comments

    I'm having a great deal of difficulty resolving one final (known) bug in the PCB layout. And I cannot seem to fix it through any recommended method I know of.

    The problem is that the ADJUST pin on the voltage regulator couples into one end of a potentiometer. This should either be pin '1' or pin '3' of the pot. The opposing pin and the wiper pin should be grounded. So, either pins 1 and 2 are grounded, OR, pins 2 and 3 are grounded. Pins 1 and 3 should most definitely not be shorted.

    And yet, while this is very clearly expressed in gschem, and the footprint for the pot was redrawn just to make absolutely sure everything is correct, PCB literally insists on shorting pins 1 and 3 of the pot, leaving pin 2 to do whatever it wants.

    This is most infuriating, as you can imagine. After spending literally tens of hours trying to debug this, I was left literally screaming at the computer. I can manually route the traces, of course, but the netlist would be completely borked if I do, which renders the "find signal" (F or CTRL-F) function in PCB utterly useless.

    I'm at wits end. I don't know what to do.

  • A Bit of Hindsight: Part 2: Byte Lanes vs. Width Hierarchies

    Samuel A. Falvo II05/31/2016 at 18:35 0 comments

    Backbone, as it's currently defined, is basically Wishbone exposed to the world. It's an almost purpose-built bus interface just for the Kestrel-3's hardware development as I work towards a single-board version of the computer. Its mission, and thus its criteria for success are:

    1. It lets me explore different pieces of the Kestrel-3 in isolation of other components. With an SBC, this is not possible; I'd have to refab the entire board if I changed even just one circuit.
    2. It lets me explore bus architecture design. This is already a resounding success; I don't even have a board fabbed yet, and have already identified two things I would do differently next time I need a parallel bus. I've already documented one of these things in the previous log; this log is devoted to the second.

    One characteristic of the Wishbone bus is that, per the specification, wide interfaces need to be qualified with one or more select signals; these select signals function the same as BEx in Intel CPUs, DSx in 68K CPUs, etc. SEL0, when asserted, means that valid data appears on DAT0-DAT7. SEL1 means data appears on DAT8-DAT15, and so on. (All assuming an 8-bit granular interface, of course.) This also implies that the address bus is split into two parts: ADR0..ADRx is literally hidden from the outside world, since it combined with the desired transfer size is used to calculate the proper SEL line settings, and ADRx+1..ADRy (where y is your highest address bit; typically 15, 31, or 63 for 16-, 32-, or 64-bit address spaces). More concretely, a 64-bit wide, 8-bit granular bus will not expose A0, A1, or A2, since the meaning of these bits are used to determine which of SEL0, SEL1, SEL2, SEL3, SEL4, SEL5, SEL6, or SEL7 are asserted for bytes, which pair is asserted for half-world transfers, etc.

    This is a great optimization if you're addressing memory. Memory is inherently amenable to such row/column decomposition of an address space like this, so it makes perfect sense. The problem is that literally everything else you'd ever want to talk to on the bus is not so amenable.

    Consider the KIA, which I introduced first for the Kestrel-2, which also used a Wishbone bus. Its registers are only 8-bits wide, and the core has only a single address input. You'd expect its registers to appear at KIA+0 and KIA+1; however, this is a mistake. Because A0 is not exposed to the world, it does not participate in address decoding. Instead, A1 is attached (the Kestrel-2 is a 16-bit CPU and bus system), which means its registers are actually located at KIA+0 and KIA+2. So what appears at KIA+1 and KIA+3? Nothing. If the KIA had writable control registers, and you attempt to write to those locations, you run the real risk of loading garbage into those control registers, since the state of the byte lanes those registers would talk to exclusively would be completely undefined.

    A much better approach is to use High Enables instead. Instead of a linear decomposition of the bus lanes (where a 64-bit bus has 8 lanes of 8-bits each), a logarithmic decomposition is used instead (a 64-bit bus has 1 32-bit high word, 1 16-bit high half-word, 1 8-bit high byte, and 1 low byte). Such a bus allows 8-bit devices to focus just on D0-D7 without concern for which byte-lane it should attach to, 16-bit devices to D0-D15, and so forth.

    It is also naturally supportive of upward compatibility. To illustrate, let's start with a simple nybble-wide bus.

    A0-A3
    D0-D3
    WE
    STB
    ACK

    Pretty simple; it allows us to read or write any nybble in a 16 nybble address space. We can expand the address space easily by just tacking on more address bits: this doesn't affect old hardware since they just ignore the upper address bits.

    A0-A7
    D0-D3
    WE
    STB
    ACK

    But, if we now want to address bytes, we need to tack on another set of data bits. The CPU would tell the addressed peripheral that it wants to transfer a full byte by using a "Nybble High Enable" (NHE) control signal.

    A0-A7
    D0-D7
    WE
    STB
    ACK
    NHE

    We need to know if D0-D3 or if D0-D7 are...

    Read more »

  • A Bit of Hindsight: Part 1: Signal Routing

    Samuel A. Falvo II05/31/2016 at 17:11 0 comments

    For my needs, it doesn't really matter how I lay out the address or data bus pins. When I synthesize a design to an FPGA, the signals can be routed to arbitrary pins through the UCF or PCF files. I was relying on this when I came up with the pin layout for the DIN connectors.

    However, in retrospect, it was probably a mistake to put all data pins on row A, and all address pins on row C. Based on my experience routing the bus on the backplane, it would have been better to keep all the related signals together on the FPGA (minimizes internal routing resources), and interleave the data and address pins across rows A and C. So, instead of:

        Row A    Row C
    1    D0        A0    |    pins assigned along the row.
    2    D1        A1    |
    3    D2        A2    |
    4    D3        A3    V

    I should have done this instead:

        Row A    Row C
    1    D0        D1    --->  Pins assigned across rows.
    2    D2        D3
    3    A0        A1
    4    A2        A3

    Electrically, they're identical; it's just that it makes routing buses to relevant pins on FPGAs easier, particularly if the FPGA is in a TQFP or similar package.

    For BGA devices, I don't think it matters as much; breaking signals out of a 16x16 BGA (such as with an iCE40HX8K-CT256 device) is going to require no less than a 4-layer board and quite possibly more, just to route signals a few centimeters in any coherent direction and in any reasonable order. And, it's going to involve a lot of vias. A lot of vias.

    The one nice thing about the layout of Backbone's pinout now is that it makes interfacing to microcontrollers-as-slaves that much easier. For example, perhaps I'll replace the KIA circuit in the FPGA with a KIA-like interface in a microcontroller, which acts as a USB-keyboard-in, standard-bytecode-out KIA-like replacement. Such a device is much easier to implement using a microcontroller than using FPGA resources. (Sounds like a job for the S16X4A again!)

  • DIN41612 routing back on course.

    Samuel A. Falvo II05/31/2016 at 16:58 0 comments

    I discovered a number of settings in PCB that allows me to route all 96 pins of a DIN 41612 connector on a single side of a two-layer circuit board. I had to set my trace size to 6 mil, and reduce my annular ring size to somewhere in the vicinity of 10mil. These are figures which OSHPark seems to support, so I don't think other PCB fabs will have issues either.

    I have many of the paths routed already. I just need to find an optimal layout for the rest of the circuitry. I really wish I didn't need a 74LVT20 or 74LVT04. Capturing and responding to signals on a card-by-card basis really ruins the elegance of the overall design, and appreciably complects the routing of signals. Thankfully I have two layers to play with.

  • DIN 41612 too difficult to route.

    Samuel A. Falvo II05/30/2016 at 16:33 0 comments

    When trying to break traces out from a DIN 41612 plot on a 2-layer PCB design, I found that it was possible only with great difficulty; it required a lot of surface area that otherwise had no other components. This represents a lack of efficiency, and drives the cost of the board up significantly. It also lengthens the individual traces to well beyond four inches, so additional termination circuitry would definitely be needed. Since this backplane is not intended for industrial use, I am not able to justify the cost of a 4-layer board to myself right now.

    But, if I only have two rows of pins instead of three, I can route the bus very efficiently indeed. In fact, it can be done entirely on a single side of the PCB, leaving the other side free to be a ground pour.

    So instead of a single DIN 41612 connector, I'm thinking I should instead use two or three co-linear 2x20 box headers instead. You know the kind: they were used to connect parallel ATA devices like harddrives to PCs for years. Because of their ubiquity, they're dirt cheap (two box headers still comes to about 66% the cost of a single DIN 41612 connector), and if my math is right, increases the minimum length of a plug-in card from 3-ish inches to 4-ish inches. In other words, the average cost increase of a larger PCB is mostly offset by the lower cost of the connectors, and so it should be a wash, price-wise.

    The only disadvantage that I can see is that I'm losing 16 pins, which means I will have no room whatsoever for upward expansion. Moreover, I'm losing a large number of +5V pins as well.

    My plan is to break the bus up into two connectors, giving me a total of 80 pins to work with. Each row is segmented into four pin groups: 3 signal pins and one ground pin. The grounds are staggered; this way, no signal is more than two pins away from a ground. This leaves a total of 60 signal pins left over.

    In connector J1, you'll find an 8-bit subset of the Backbone bus. D0-D7, A1-A7 for register select purposes, and A56-A63 for I/O device decoding. As well, you'll find WE, SEL0, STB, ACK, CLK, RESET, and CDONE pins. These should be sufficient to, for instance, wire up a number of 65C22 or i8255 chips, or some other similarly simple 8-bit interface. Note that there's no need to monitor CYCA here, since if SEL0 is asserted, it will be because a cycle is in progress. What you won't be able to tell, though, is if the bus transaction is part of a read-modify-write transaction. But, honestly, that information is rarely useful except in multiprocessor configurations anyway. This results in the cheapest possible board configuration; a PCB can be even smaller than the original design, at just about 2" long on a side.

    In connector J2, you'll find D8-D15, A8-A23, SEL1, and the remaining bus mastership pins. This lets you take full advantage of the 16-bit data path, the complete address space, and/or the ability to master the bus.

View all 7 project logs

Enjoy this project?

Share

Discussions

Keith wrote 01/04/2018 at 22:08 point

I'm sorry to rain on your parade but a bus is not simply a matter of joining the dots. If your bus is going to run at any decent speed, the bus wires will act like transmission lines. You have no specified bus impedance or terminators. You have no rules about what the bus drive or loads must conform to. How many bus masters and slaves can share a bus?

  Are you sure? yes | no

Samuel A. Falvo II wrote 10/11/2020 at 02:42 point

The success of the RC2014 backplane seems to be a real-world counter-example.  I largely stopped working on this project because I learned of its existence and saw how viable the homebrewer community was around it.  Obeying the Z80 bus protocol, its peak throughput is 3.6MBps throughput (requires custom bus master logic; under more nominal conditions, it's able to move 2.4MBps), which is actually 0.1MBps faster than Amiga's Zorro-II bus.  Maybe this doesn't count as "decent speed" to you, but it's plenty good enough for most homebrew electronics applications.

For that matter, Amiga Zorro II and IBM PC/AT's ISA bus are also reasonable counter-examples too.  They all acquired more detailed specifications *after* they'd already made it to market.  But, I digress.

If your primary concern was the 50MHz clock, well, that's open for revision.  The version detailed in this project is not set in stone; I state explicitly it's 0.0.1, which is as experimental as it gets.  Fun fact: I thought of reducing it to 25MHz, but didn't, because my FPGA boards had a 50MHz oscillator on-board which I could just tap directly.  Hence, the 50MHz clock.  As far as raw data rates, though, I had zero intention of supporting 50MT/s.  In fact, I said in the very first paragraph that it'd be **very** technically difficult to achieve that rate with this design.

I guess my question is what were you hoping to achieve with your reply?  What were your assumptions that motivated you to reply what you did?  I'm genuinely confused, because most, if not all, of the concerns you raise are (perhaps implicitly) taken into account elsewhere in the project's brain-dumps.

**EDIT Oct 11**

I think I know what you're going for; you wanted to advocate for the STEbus or related technology.  I just noticed that you had STEbus as a project under your account along with a bunch of CPU cards for it.  If the time comes to design a custom backplane for the Kestrel-3, I'll definitely give it a look over.  I love that it's a fully asynchronous bus (like Zorro-III and unlike VME).  I'd probably want to eventually upgrade it to a 16-bit data path though.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates