Close

4-LRU

A project log for LRU

Scratching the itch of Least-Recently Used algorithms and topologies. I found something funny.

yann-guidon-ygdesYann Guidon / YGDES 04/28/2024 at 03:160 Comments

The permutation deamon is biting again, while I investigate a related caching subject. I am now considering the full LRU for 4 ways, using 5 bits arranged in 2-2-1 fields.

Field A : 2 bits, it is a fully encoded (4 values from 00 to 11)  that directly points to the least recently used way. It is handy because the value is directly available.

Field B : The middle 2-bit field has only 3 values, and 00 is forbidden (marks a cleared set). so the number of the 2nd-to-least recently used is a XOR with field A. The XOR is not required, after all, but not too heavy on the circuit anyway.

Field C: 1 bit, selects which one of the 2 remaining sets comes first. The logic gets complex here, but now I wonder if that part matters at all. Maybe this field can be simply skipped and removed, because it does not affect the LRU much.

In fact what matters is the least recently used, not the most recently used. So the first 2 can be skipped.

All one has to do is compare a new set with the fields A and B, and if there is no match, no update.

If there is a match then the matched field is replaced with the one on the left : this acts like a sort of shift register. And given the fields A and B, some boolean logic is enough to restore one of the missing set numbers.

This greatly reduces the size of the LRU logic, probably as small as the IBM 3tree system but slightly better LRU behaviour. And there is no large lookup table as suggested in the project intro.

***************************************

What's crazy is : while writing it, it all made sense and it was totally unexpected.

Here is some pseudocode:

A : 2 bits \ from LRU
B : 2 bits /
B' : 2 bits, is actual B.
S : 2 bits (new set to update)
A2, B2 : updated LRU fields.

B' = A xor B;

If S != A and S != B' : Do nothing.
Else
   if S == A then
     A2 <= B
     B2 <= newB(A, B', S)
   else
     if S == B' then
       -- A is not modified
       B2 <= newB(A, B', S)

 So that's actually a few XOR and MUX.

The secret sauce is in the function newB that creates the new value of B, which is derived from A, B' and S. This is used in both branches of the IF so we implicitely know that either S==A or S==B', as well as A!=B. A first approximation uses B, because it's already the difference which is never 0 ! (which could add a bias...)

Here is the updated pseudo-VHDL code:

B' := A xor B;
newB := B;

A2 <= B'          when  S == A               else A;
B2 <= newB xor A2 when (S == A) or (S == B') else B;

update if (S == A) or (S == B')

The correct formula of B2 is not yet defined but you get the idea : it's quite simple. No weird LUT to precompute.

newB can have two sets of values:

There is the risk that choosing only one field all the time (always A or always B) creates a bias that will disturb and affect the cache behaviour. This can be solved by using a 5th bit (resurrect field C) but that might be overkill so a random source is used.

Some boolean simplifications could propagate through the circuit. At this point we do not really care that field B has only values 01, 10 or 11 so we spare some gates at the output. However B' =A^B is still required by newB.

.

Fun fact : this method can choose between LRU and MRU on the fly (for example if a stream is detected) by forcing a "hit" on field A (field B can have a match but will do the same as with A).

Discussions