Close

Encoding WSPRs

A project log for Careless WSPR

A desultorily executed Weak Signal Propagation Reporter beacon.

ziggurat29ziggurat29 08/19/2019 at 17:460 Comments

Summary

A utility module that encodes the data into the WSPR format for transmission was produced.

Deets

The WSPR system is for reporting weak signals (hence the name) which implies a low signal-to-noise ratio.  That also implies a lot of distortion that can (will) render a signal hopelessly lost in that noise.  The scheme Joe Taylor (K1JT) and Steve Franke (K9AN) used here in WSPR is 'Forward Error Correction' (FEC).  And if there's such a thing as 'forward' error correction, there aught to also be a 'backwards' error correction -- and there is.  Backwards (or 'reverse') is very easy to understand:  you detect errors and ask the sender to send things again.  Forward is much more involved, and avoids that back communications channel.  There are many applications where you simply can't have a reverse channel, and that's one place that FEC shines.  It was particularly popular in deep space probes, but now it's used all over the place.  The gist is that you add redundancy in a careful way such that you can not only detect errors, but with a maximized degree of certainty deduce what are the erroneous bits and change them to what they should have been.  That's the 'forward' part:  the extra stuff is sent forward along with the message and the receiver can figure it out for itself without having to ask for a retransmission.

There's many different schemes of FEC, and although 'convolutional coding' is used in WSPR, many of the other modes in the family of things Joe Taylor use other ones.  Joe Taylor himself has mentioned on several occasions that the evolution of these protocols was partially motivated by his fascination with communications technology and the desire to familiarize himself with the various states of the art.  It's not clear to me whether the choice to use convolutional coding here was motivated by it's technical merits relative to other choices, or whether this was more the way the wind was blowing at the time of inception.  (He had previously used Reed-Solomon in JT65, and later used LDPC in FT4.)

While researching, I came across a document that described the mechanics of the WSPR encoding process in plainer English than the original source:

G4JNT, “Non-normative specification of WSPR protocol”,
http://www.g4jnt.com/Coding/WSPR_Coding_Process.pdf

I won't repeat it except as a high-level summary with my own commentary.  The steps are:

  1. condition the data
      The conditioning step is to do sanity checking and some cleanup of the data prior to encoding.  The callsign has to be placed correctly in it's buffer (the third character must be a digit, padding as needed to ensure this), the maidenhead locator has some restrictions (e.g. the first two characters must be upper case and 'A' - 'R', the others digital), and the power level has to end in the digits 0, 3, or 7.  These numbers correspond to 1x, 2x, and 5x power levels.
  2. pack the data
      Some simple arithmetic encoding is done to use as few bits as possible for each datum.  This can be viewed as a manual form a data compression.  The conditioned callsign gets encoded into 28 bits, the locator gets encoded into 15 bits, and the power is encoded into 7.  (Power is a bit special in that the high bit is always set, and I don't know why this is -- otherwise it could have been encoded into 6 bits.)  This results in 50 bits of message data.  Then 31 bits of zero are padded out to 81 bits.  (I don't really know why the zero-padding is needed; my guess is it is to pick up the tail end of the system's convolution.  There are implicitly 31 zeros in the front as well, but it doesn't require padding to realize their effect.)
  3. transform the data using a convolutional code
      The data stream are fed into a convolutional encoder.  This use well-known polynomials discussed in:
      W. Layland, James & A. Lushbaugh, Warren. (1971). A Flexible High-Speed Sequential Decoder for Deep Space Channels. IEEE Transactions on Communications. 19. 10.1109/TCOM.1971.1090732.
      I could not get a copy of this document for review.  The implementation here is called a 'rate 1/2, constraint length 32, non-recursive, non-systemic code'.  This is jargon for:  the output data will be twice as long as input data, and the polynomials have 32 taps (this is considered long), and the output does not feed back into the input, and the output data is just the generated bits -- none of the original message data is directly included in the output.
      This convolution operation is much like that of a FIR filter:  you multiply sequentially all the data points and add the results together, and do this for each input data point.  However, here the operation is effectively much simpler because the data points are binary digits (just 0 or 1), and moreover we are doing all operations modulo-2.  This turns the multiplication into bitwise AND of the polynomial and the datastream (and because it's 'constraint length 32', this just means AND'ing a 32-bit quantity, so this multiplication step is effectively a vector operation), and the addition of those multiplied bits is simply the XOR of them all.  (I further simplified this accumulate step by using a parity lookup table, but doing it manually in a loop will work, too.)  If all this seems superficially similar to computing a CRC, or generating random numbers, that is because the underlying mathematics is related.
      Because it is a 'rate 1/2' code, this means that we do that convolution of the input data twice in parallel with two different 'polynomials' (effectively just a 32-bit number constants, in this case 0xf2d05351 0xe4613c47).  The one bit output of each of those produce the two bits output that are emitted.  This expands the 81 bits to 162 bits.
      Although it is not relevant for this project, because the constraint length is long, the decoder uses the sequential Fano algorithm instead of the more common Viterbi algorithm.  Fano has slightly lower performance, but scales well to the longer constraint lengths, whereas Viterbi is apparently used up to lengths of around 7.
  4. scramble the data
      Interference often comes in bursts, which would normally disrupt a contiguous sequence of bits.  This doesn't play to the strengths of the convolutional coding, though, so to help with this the order of the data are scrambled.  This will have the effect of moving the disruption of what are physically contiguous bits to non-contiguous bits.  This improves the ability of the decoder to cope with the disruption.  In this case, the scrambling is done via 'bit reversed addressing', i.e. that conventional addresses of 0-161 are numerically bit reversed to determine the new location of the data in the modified sequence.
      I don't really know why bit-reversed addressing was used in particular, but I happened to have a bit-reversed lookup table on hand (I used it previously for doing 180 font rotation) so I was able to avoid looping to compute the bit reversed values.  Ultimately, there was plenty of computational time for all this since the WSPR data stream is sent very slowly, so I doubt my table approach was required.
  5. merge with synchronization bits
      A fixed array of 162 bits is used as a synchronization vector.  In fact, it is interleaved with the data stream, rather than, say, prepending to it.  It's not clear to me why this was done (maybe it was also to spread interference a little).  The synchronization vector is a 'random' sequence 'with good auto-correlation properties'.  What this means is that the auto-correlation will be high when the tested data matches at offset 0 from the known data, and will be as low as practical everywhere else.  The auto-correlation is computed in a way quite similar to the convolution -- the arithmetic is the same, it's the origin of the data that's different (in the case of convolution, the data is the time-reverse impulse response of the system, and in the case of auto-correlation it is the sequence being searched).
      How to use this data is something the decoder has to do, but I suspect that first the correlation is computed to detect the likely presence of a WSPR message, and it's temporal location.

This was implemented.  The WSPR software includes a program WSPR.exe which can be used to compute the sequence.  This program was used to produce some test vectors to validate this implementation.

Next

Wiring in the encoder

Discussions