Close

Hash enhancements

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 09/25/2023 at 02:430 Comments

The modulus m=741454 is suited for a scrambler or a checksum with 39 bits of entropy. This is great because its structure (a square root of an odd power of two) greatly increases the avalanche, unlike w16 or even w26. This is a better hash with the internal scrambling, but there is still a risk when input data (such as short character strings or packet headers ?) have low entropy. This effect killed Adler and Fletcher's algos in the IETF standards. 

741454d = 1011.0101.0000.0100.1110b has 8 bits set among the 20, which is not perfect but it's still much better than w16. In the application that I focus on, I want en even better avalanche, which only occurs when the accumulators exceed m. So not only do we need to mingle with the intermediary results more, but also more often by making accumulators overflow a lot.

The context of interest is hashing pairs of bytes (16 bits) at once, like w16, but the 4 extra bits of the modulus create a sort of "buffer" that greatly delay the event of an avalanche : at best 11 cycles when the input data is FFFFh. So we must force the rollover at least once every two cycles.

There are three simple ways to achieve this by adjusting the 16 bits of input :

741454d = B504Eh  

Adding a constant to the input word great benefits :

So the offset must be chosen close to m, but not too close. The hard limit is 741454 - 65536 = 675918 to prevent an overflow.

Just as in "pulse density modulation", this value will determine the quantity of rollover, hence the avalanche, it shouldn't be too high though or else it would become meaningless. 675918/2=337959 would be a first approximation but its bit pattern is too close to the modulus (rember : A+A=2A and PEAC does a LOT of additions) so we must reconstruct a custom bit pattern. It must be quite arbitrary and dense to favor avalanche and reduce interferences. It should also help spread bits from ASCII bytes.

The leader nibble would be 0111 = 7 to promote overflow, but not too much (the bias is at about 62%).

The next nibbles need to take into account the spread of ASCII values : the 5 or 6 LSB have quite a high entropy but the 2 MSB have low entropy. Low entropy means longer strings of 1s in the offset's pattern, while low entropy calls for short sequences of 01. So for each character, there is 11010101.

The final pattern is : 0111.11010101.11010101 = 7D5D5h wich can avalanche from the lower character to the higher one. Nice. Even small values will have a chance to affect higher bits faster.

OK this adds another operation with a magic constant to the mix but this also helps avalanche arbitrary data much faster and reduce the number of post-scan tumbling, which would add too much overhead for small strings. Also, let's not forget that the last operation that constructs the 39-bit hash is a multiplication, which adds its own overhead.

With the duo m=741454 o=513493 we might have a pretty serious and fast mixer...

Discussions