Audio delay and VOX using ESP32 | Details

This simple and cheap project in its standard form implements an audio delay using an inexpensive ESP32S NodeMCU board and incorporates an optional VOX to trigger a transmitter with the delayed audio so that said audio is not lost due to VOX response time, key-up delays or CTCSS processing delays at endpoint receivers. It was built for a simple one way cross band rebroadcaster which uses off the shelf portable two way radios, but you could use it in any circuit that needed an audio delay if hi fidelity wasn't a requirement.

I've also included the code for the delay only, (no VOX). This code supports a (practical) audio delay of up to about 12 seconds at a nominal 10k/s sampling rate (depending on typically available heap memory), or proportionately less delay at higher sampling rates (automatic, although I've capped the maximum sample rate at 40k/s).

Why bother?

When I first decided to do this I found there didn’t seem to be much on the internet that dealt with the sort of audio delay I needed; mostly stuff for music echo/reverb effects with fairly ordinary performance using the ubiquitous PT2399 IC hence this RYO approach. The simple ability to add VOX in the same program was a bonus. It is by no means a complex or cutting edge project, just something useful.

Update: I have since found a kit from the Electric Druid that might suit folks who are just after a delay, no vox. see: https://electricdruid.net/diy-digital-delay/

Why use the ESP32?

Because a friend suggested it, because it's cheap, powerful and has an "adequate" ADC and DAC on board along with a decent size memory - and because I could program it using the Arduino IDE instead of having to learn a new language or development process. OK, yes, it doesn't use the wifi, bluetooth etc.. facilities, so again, yes, it's the proverbial sledgehammer cracking a nut.

Status

Status: working in basic form

To do: finish boxing up the working prototype for field trials.

General discussion

Performance of the delay is not bad for voice at a sample rate of 20k/sec, and the original test lashup can get by without any anti-aliasing filter on the input, probably because the audio response on the incoming receiver is too poor to carry much in the way of non-voice frequencies, and the input on the transceiver is likewise not too fussy.

The program allows for extension of the delay up to about 10 seconds at progressively lower sample rates.

Time shift VOX mode

In VOX mode, the input audio stream is continuously stored in a ring buffer. The processor keeps a running average of the input level over a configurable period of samples (0.3 seconds in prototype). If this exceeds a threshold, the VOX is triggered, raising a transmit signal so the radio which sees the delayed audio is immediately set to transmit and the output DAC is enabled thus starting to send recorded audio from a defined delay time before the VOX was triggered (time shift).

The VOX/transmit remains triggered until input audio is stopped and stays triggered for a full delay period plus a small bit extra (the Tail) before turning off the radio transmitter transmit signal.

The idea is to compensate for missing audio due to delays in VOX triggering. By transmitting a bit of silence before the payload audio, you can also compensate for slow key-up of the transmitter and slow triggering of the eventual receiver.

The unit can be used with transceivers that don't have a separate transmit line but do have their own internal VOX.

I have also added the ability to insert a beep at the start of the delayed output audio.

The VOX as implemented incorporates a 30 second Timeout Timer to stop hogging the frequency or to allow free air if there's a stuck PTT on somebodies radio, and there's a 1 minute Rekey delay after the TOT expires.

The VOX implementation is pretty simpleminded, but it seems to work satisfactorily in a relatively low-noise environment.

Needless to say, (but I will say it) this should never be used in situations where a delayed transmission might lead to a safety issue!

Prerequisites

You will need the following software:

Arduino IDE (from Arduino.cc)
ESP32 core board plugin (from Expressif)

And the following hardware:

ESP32S NodeMCU development board (from eBay!)
Various ragtag resistors and capacitors from the junk box

More discussion

Audio quality isn’t bad considering there’s only 256 levels of DAC output..

It’s easy to chop out the VOX code and just use it as a delay, and I've included a version of the code with that implemented.

Good results were obtained by simply hooking the input directly to the external speaker output of the receiver and adjusting the volume so that there was limited input clipping.

Ideally, the input signal should be approximately "line level". The maximum input voltage the ADC will read will be +-1.5V without clipping, so the closer you can get to that, the better for the highest resolution of encoding. Output levels will be the same, as I've not done any scaling.

Of course, an appropriate attenuator or matching circuit will be required to match the output to the device that processes the delayed audio. A simple audio amp suffices for testing, though.

The VOX code simply does a software based full wave rectification and running average of the input, and triggers when the average exceeds a threshold over 1000 subsequent samples. You may have to play around with the resistive divider on input and the VOX threshold program value to get the desired result. (substituting the divider with a high value linear pot may be a good strategy.) I have included a calibration routine that will display the input levels in real time on the serial port and light a LED to show VOX activation to allow you to determine the best level to use. To use this, tie pin 5 to ground before resetting the unit (don't forget to disconnect later!)

Sampling rate and delay can be adjusted, but the size of the audio buffer is the limiting factor on how much delay you can implement.

I have included a crudely recorded sample of the resulting audio after passing through an input transmitter, a receiver, the delay, a transmitter and another receiver. Altogether, intelligibility is not too bad, considering.

Project Details