Software - Round 394 | Details

Another month, another kajillion hours in front of the computer working on the software.

Fixes/changes this time:

-Multicore! The rp2040 has two cores, this now uses both. One is dedicated to sampling and per-sample calculations (calibration, filtering), the other handles low-speed IO, ranging, mode switching, and printing to the screen. Huge pain to keep variables synced between cores, but it works now.

-Stable sampling frequency. Previously this would one-shot ADC samples as needed, now the ADC samples at regular intervals and the pico reads them as they're available. This is really the big one, opens up a ton of new options. Did require going multi-core so the Serial & USB printing doesn't make us wait too long and miss one. With this I can now do...

-AC measurement! Finally got around to that. Needs a high, stable sample rate to get accurate readings. Supports true RMS, over a pretty significant number of readings. Little hack for this I was kind of proud of below as well.

-Also, filtering! IIR filters for 60hz and harmonics are added in and work. I do want some more refinement on switching them in and out and maybe even adjusting strength, Q, etc, but for now I can notch 60Hz, which is the main thing.

Might not sound like a ton, but this required redoing pretty much everything that hadn't already been redone last month. It's more of a lasagna-with-one-layer-of-spaghetti of code now. Still need to clean up the file structure a bit though.

Few specifics:

1) This ADC I'm using does not have a very stable oscillator, spec is only "somewhere between 3 and 6.6MHz". This sets the sample frequency, so I'm now using one of the Pico's GPIO (driven of course by a PIO) as the reference clock. First time I've written anything for PIO! Very exciting, very cool peripherals. I did plan ahead for this, though more for synchronizing it with something else if that became necessary/interesting, I hadn't noticed that spec. This is really useful though, because now I can set the clock to anything, on the fly. This lets us change up sampling for high speed, short sampling instant, accuracy, low noise, etc.

2) The way I'm doing RMS conversion is nontrivial. It didn't really seem like there's a good way to do an RMS sliding average that works down to 10's of hz without being dreadfully sluggish, so I went with a full sliding window. Problem is, I've got to sample at ~1KHz or more, and want to integrate for ~1 second so low-frequency waveforms are reasonably accurate, even if I'm not at an exact multiple of the fundamental. Problem now is, storing and re-adding up 1000+ floating point numbers is going to take a decent chunk of space and time on the pico, probably could but there's a better way.

First, we pick a tradeoff of output rate vs storage space vs integration time. The two control parameters are the "chunk" size, and list length (I'll explain those in a second)

Memory taken up ~= list length * 2 * reading size (so for a list length of 64, and 32 bit floats, that's 64 * 2 * 4 = 512 bytes). List length must be a power of 2.

Sample rate = ADC sample rate / chunk size - we get the Vrms readings at "sample rate", . If we didn't care about sample rate, we could do huge chunks and save memory, I used a chunk size of 16.

Integration time = chunk size * list length / ADC sample rate - This is the real kicker - since we need integration time to be only the order of a second, and the sample rate is >1KHz, this means chunk size * list length has to be over 1000 as well, so at least one has to be sizeable, which either means slow updates, OR a lot of memory taken up.

On startup we create two arrays of floats, the list, and the sum tree, both empty to start. Also an index.

list will just be a circular buffer, sum tree will be the clever-ish thing I came up with.

The "chunk" I keep referring to is basically just the average of <chunk size> readings in a row (really, the square of the reading minus the DC bias, since it's RMS). So start a variable at 0 and just keep adding to a running total of readings (keep in mind, we're dealing with these one-at-a-time, not processing a list after the fact). When we get to <chunk size> readings on the pile, divide by <chunk size> to get the average. We then write this average to list - so list [ index ] = chunk average.

If the list is short, maybe up to 16 or so, then we can just sum up list and be done. (rather sum, divide by list length, and take 2*sqrt(sum). That wold give the RMS reading. As it stands in the meter, list length is only 64, so this is *probably* fine to do.

For longer lists, or to be extra fancy though, we can use a different method involving that sum tree mentioned earlier. Since we only replaced a single element in the list, really not much has changed (say we replaced element 50 in a list of size 64, the sum of the first 50 elements hasn't changed at all!) Now there's a LOT of little tricks we could play here but a common one is to remember the previous sum, subtract off whatever element we're replacing in the list (before doing so), then add ours! This is great and only takes a single subtract and single add! HOWEVER, this is **floating point**, not theoretical math. Since this does not recompute the total sum at any point, tiny errors will accumulate over time - eventually it will become sort of unstable and diverge from the true reading. Especially since we're mixing large and small numbers. A better way for this is to use a tree! The first layer of the tree has the sum of every pair in the main list, so tree[0] = list[0] + list[1] , tree[1] = list[2] + list[3], and so on. The second layer is sums of pairs of elements of the first layer, so tree[0] + tree[1], and so on. Each layer is half the size of the last, until you get to a single element, which is the total sum!

(Note this is useful even by itself, and in other contexts! Summing a huge list of floating point numbers can cause problems! Adding a small number in the list to a large running total can introduce significant error - for example, in a 32-bit float, 100,000,000 + 1 = 100,000,000! If you had a list of 200 million 1's and took their sum by indexing and accumulating, you'd get less than 100,000,000!!! Doing it the tree way, sort of level-by-level, you're adding numbers of roughly the same size every time and will get an accurate result)

Tangent aside, we keep this tree in memory, and every time we add something new, we only recompute the branch above that number, so if we replace list[3], we recompute tree[1], then the next layer, and so on until we get to the sum, This only takes log2(list length) operations! So for a list length of 1024, we can add up all 1024 items (again, assuming we only replaced one) in just 10 operations.

I'm using a list length of 64, so it's only 6 - plenty fast for me.

static int index = 0;
index = (index + 1) % listLength;
list[ index ] = newReading;
uint32_t i = (index & 0xFFFFFE) / 2; //starting branch of the tree
tree[ i ] = list[ index & 0xFFFFFE] + list[ index | 1 ]; //either i & i+1 if i is even or i & i-1 if odd
i = i & 0xFFFFFE; //easier to not mess with even/odd - i is always even from here out
uint32_t next = 0;
upperBit = listLength-1; // 0b1000... where the # of 0's is log(listLength)-1
while( i != ( listLength-2 ) ) { //we're at the top of the tree
	next = ((i>>1) | upperBit); //weird math, but finds the index of one the sum that's one layer up from i
	tree[ next ] = tree[ i ] + tree[ i+1 ];
	i = next & 0xFFFFFE;
	}
}
return tree[ next ];

Rather than a complex data structure in code, tree is just another array of the same length as list. If you mess with it a little it's apparent that the total number of nodes in a binary tree is 1 smaller than twice the number of leaves. Since list is our circular buffer, we need half as many leaves, so total size is the same (minus one). The structure of elements within the "tree" list is basically just that it wraps around to the next level, so if there's 8 elements in list, tree[0] = list[0]+list[1] ... tree[3] = list[6] + list[7], then, tree[4] = tree[0] + tree[1] , tree[5] = tree[2] + tree[3], and tree[6] = tree[4] + tree[5] (the final/total sum). Doing it this way means that some really simple bitmath ( i >> 1 | upperBit ) gives us the next node to solve for "up" from i, which makes solving this really fast.

Dang that was long. Anyway that's it for now.

Software - Round 394

Software update

More software!

Discussions

Become a Hackaday.io Member