AREDN Under Test | Details

Mesh networks are showing up everywhere. Meshes are used for home WiFi appliances, military tactical deployments, low power sensor networks, commercial data monitoring, and more. Early WiFi mesh hacking started with Linksys WRT54 home WiFi routers in the crowded 2.4 GHz Part 15 band. There are now Bluetooth and other ISM band radio meshes like LoRa meshes. OpenWRT is now available for a wide variety of commercial 802.11ac hardware products. Meshes are great for improving the range of a radio network. If there are a sufficient number of links, one failure will cause minimal network harm. The ease of set-up with plug and play operation is attractive for mobile/tactical operations. Nodes can move between groups and easily stay connected in a mobile mesh. The downside is reduced bandwidth with network overhead, increased latency and scalability problems. Meshes were intended for small deployments but can't be scaled for a large metropolitan area network.

I have watched as the deployment of the San Francisco Wireless Emergency Mesh (SFWEM) over the past three years. SFWEM is deploying AREDN (Amateur Radio Emergency Data Network) which is based on FOSS OpenWRT software running on several different commercial Part 15 devices. The AREDN software developers has a driver to QSY the radios to the Part 97 band for amateur radio use. AREDN on the Part 97 frequencies was a big motivation for hams. The AREDN gear was initially deployed as an appliance but undergoes constant software upgrades. It has never been tested as a RF network, has few users and little content.

Many mesh networks like AREDN are composed of half duplex (can either send or receive at the same time) nodes. Every packet transiting a node is first received and then resent. That reduces bandwidth by half. Six hops reduces bandwidth to 1.5% of original value. This can be fixed with full duplex nodes (simultaneous sending and receiving ), by adding new frequencies and links. Adjacent channel performance needs to be understood to create a frequency plan and device shielding may be required.

When OpenWRT was deployed more than a decade ago on commercial gear, Part 15 networks of all varieties popped up. While there have been many large Part 15 mesh attempts, none were successful. They were aimed at a variety of purposes and capabilities using different routing schemes (Babel, B.A.T.M.A.N., OSLR, etc.). They all shared the same Media Access Control (MAC) layer - CSMA. Early adopters of OpenWRT found that removing the proprietary TDMA systems with 802.11n ad-hoc mode hurt link performance but didn't appreciate the radio network side effects. AREDN's 802.11n ad-hoc mode CSMA uses RTS/CTS is to implement virtual carrier sensing in carrier sense multiple access with collision avoidance (CSMA/CA). This helps with small loads but it has problems handling overload. There is nothing built into 802.11n ad hoc mode so performance is unspecified when overloaded. The same is true with AREDN.

CSMA works best in settings where there's a high capacity central hub and much lower speed clients with short bursts of traffic. Cell systems do that and work well unless overloaded with too many users. In a mesh, nodes connect as peers, so additional connections require bandwidth sharing, not a CSMA strong suit. It is understood that CSMA fails when overloaded, so I decided to find out how AREDN performs.

Experimenting with AREDN, I found that once link bandwidth capacity was approached, strange behavior including dead locks and network oscillations occurred without external stimulus. I also discovered that once capacity is reached, bandwidth allocation is based on link SNR . With enough SNR difference, some links drop out completely - just like FM capture. I gave up on further testing as it seemed too broken to test. I wonder if all are the bugs are CSMA or bugs in OpenWRT that have persisted all these years or if there is a defect in the chips 801.11n ad-hoc implementation.

Most standard WiFi radios use proprietary TDMA systems that are much more efficient and avoid CSMA collision issues. Many meshes use CSMA as it's a common mode for the radio chip operation. TDMA radios can be complex implementing Quality of Service and priority but handle overload much better.

I developed two tests to investigate AREDN's CSMA performance. Both use just four AREDN nodes. I found MikroTek Bandwidth Test especially useful. It runs on MikroTek devices or Windows. I'd recommend a fast PC for testing. MikroTek Bandwith Test offers TX/RX/Both/UDP/TCP/ random packet size, and variable bandwidth demand.

The two tests both run with all four nodes in a collision domain. Each node can support data at > 5 Mbps (not important). All nodes are set to the same channel/bandwidth/SSID, etc.  The CSMA tests demonstrates that a shared channel, once approaching capacity demonstrates reduced capacity. Once radio channel capacity is reached abnormal behavior like lockup occur. It demonstrates that without any other form of arbitration, SNR determines bandwidth allocation. It also demonstrates that that when a node is shared, the bandwidth doesn't get divided 1/N fashion - even less than the sum of the N links bandwidth.

AREDN CSMA is a problem where there are number of same channel radios on the same collision domain. This is prescribed in the AREDN manuals' cell system topology.

Meshes are often depicted as a "complete graph" where every node is linked to every other node. That's great for survivability but not for bandwidth. If each radio is on the same frequency, the network bandwidth is at best divided by the number of nodes. Cleary a multi-frequency, multi-radio system is better but more expensive.

OpenWRT apparently supports "mesh TDMA'', but I'm not aware of any implementations. There's quite a lot that goes into a manufacturer's proprietary TDMA system, it might be too much work for a FOSS project. There have been successful TDMA meshes, but without a solid foundation, large CSMA based meshes are doomed to poor performance.

Existing CSMA networks can be improved by changing single radio, same frequency nodes to multiple frequency, multiple radio nodes. By reducing a large number of same channel Point to Multipoint CSMA connections, CSMA contention problems are reduced.

While moving to TDMA would be the best solution, a radio token ring would be easy to implement
to increase network bandwidth and make timing deterministic.

Note that mesh network routing is not addressed in this as there are upgrades available, though none have been proven to scale in a large regional network.

Ref: On avoiding RTS collisions for IEEE 802.11-based wireless ad hoc networks

Token Ring, The Betamax of Networking

STANAG 5066 The Standard for Data Applications over HF Radio

Project Details