Close

Audio Architecture

A project log for weeBell - personal central office for POTS phones

weeBell brings the goodness of old telephones into the modern age in a portable package that speaks GUI, Bluetooth and Wifi

dan-julioDan Julio 07/31/2023 at 18:130 Comments

The most interesting part, for me, of the Bluetooth Handsfree firmware for weeBell has been the audio subsystem.  I've never really done real time audio before (aside from generating tones or using a library with a I2S DAC) so it was a good learning experience.   Aside from the hard real-time nature of audio, there were several technical challenges to overcome.

Perhaps the most difficult technical challenge was Line Echo Cancellation.  All POTS telephone interfaces multiplex received and transmitted audio onto the same two wires going to the telephone.  Traditionally this was done by a circuit called a Hybrid which consisted of a set of coils or transformers.  For weeBell, it's done by the AG1171 with active circuitry.  One of the characteristics of the hybrid is that it echos back received audio due to impedance mismatches.  In a system that has hundreds of milliseconds of delay, such as a long distance line or a cellular connection, this causes the remote talker to hear their voice echoed back to them and is unpleasant.  The introduction of long distance calling required the telephone companies to find ways to cancel that echo through Line Echo Cancellation (or Echo Suppression in the early days) where the received signal is subtracted from the echoed signal in the transmit path before being sent back into the network.  Here's a paper that discusses the echo canceller Bell Labs developed when sending telephone calls through the Telstar satellite.

(picture from David Rowe's blog)

I was initially concerned about how I would perform this function but fortunately we live in the glorious time of searches and code repositories.  This problem - and others - has been solved many times and I found David Rowe's amazing OSLEC (Open Source Line Echo Cancellation) code along with a bunch of telephony functions in Steve Underwood's wonderful spandsp library which was archived from SVN to github.  Originally written for Asterisk this C code is incredibly well written and easy to use.

My code includes the ability to store the audio samples surrounding the OSLEC routine to gCore's Micro-SD card and I used that a lot while developing the code.  The following image shows OSLEC in action with the samples displayed in Audacity.  The TX audio path is from the remote speaker intended to be heard in the handset.  The RX audio path shows the echoed signal from the hybrid before echo cancellation.  And the EC audio path shows the signal from the hybrid after echo cancellation as it is returned to the remote talker.  You can see OSLEC has a very short "training" period and then is very effective in removing the signal.  There's lots more to it and you can read about OSLEC's development on David's blog starting with this entry.

The spandsp library also provides DDS tone synthesis (including DTMF tone generation) and a goertzel based DTMF tone decoder that I press into service as well.  In the future I want to use the modem functionality in the library to generate caller ID information.

The overall audio architecture is shown below.

The real-time audio subsystem is contained in the audio_task that has CPU 1 all to itself.  It runs with a 8 kHz sampling rate since that's what spandsp was designed around.  In addition to the functionality provided by spandsp the audio subsystem also has to convert the 16 kHz sampling rate the Bluetooth Handsfree audio connection can use if both the cellphone and remote device support the MSBC codec. 

Both downsampling from 16kHz to 8 kHz and upsampling from 8 kHz to 16 kHz are slightly more complicated that just halving or doubling the data because of the frequency aliasing and non-linear distortion.

Downsampling is very simple and compute efficient.  Each two samples are averaged as a simple low pass filter and generate an output rate of half.

Upsampling is slightly more complex and uses a buffer where the incoming stream is stored in every other entry and the entries in-between start off with zero.  Then a band-pass filter is run over the buffer and the result is a clean 16 kHz sampled buffer.

One final note.  When Steve and David were writing spandsp and OSLEC in the early 2000s they targeted either a [then] high-end Intel x86 CPU with MMX or SSE2 extensions or a specialized Blackfin DSP with optimized code.  They also have straight C code which presumably was used during development but the specialized code was used by actual production systems.  In 2023 the inexpensive ESP32 is capable of running the straight C code with plenty of spare cycles.  What fun!  I definitely encourage you to check out their work.

Discussions