Close

Delta coding and some stats

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 05/16/2023 at 03:420 Comments

With the new harvest, I estimate that the perfect and maximal orbits under m=1M is around 100K.

This starts to make quite a large .h file to include in the sieve generator. Not that it's such a concern but scalability, you know.... And embedding such a huge array in the constants area of a program is not such a great idea, particularly if it is meant to evolve fast. However, playing with .h files is good because they are easy to process with UNIX scripts in bash, and this saves the pain of writing C code to import the text file, with error management etc.

But the list of orbits might get even larger in the future and I believe it's a missed opportunity to make things a bit better so, after staring at the numbers, I came to the conclusion that differential coding, or "delta coding", is an easy and interesting thing to do. So I now only encode the difference between consecutive orbits.

Using partial/incomplete data, I came up with this quick and dirty script to produce the following histogram :

S=0
sed   's/ -//'   gPEAC.000002-$MAX.txt | while read -r line ; do
  echo "$(( line - S ))"
  S="$line"
done > gPEAC.000002-$MAX.diff.txt

sort -g gPEAC.000002-907575.diff.txt |uniq -c |less
   6202 1
   5127 2
   5055 3
   6549 4
   5979 5
   4706 6
   3787 7
   3292 8
   3660 9
   3889 10
   3380 11
   2289 12
   1967 13
   2017 14
   2699 15
   1706 16
   1314 17
   1508 18
   1341 19
   1505 20
   1005 21
   1002 22
    747 23
    845 24
    942 25
    702 26
    441 27
    567 28
    613 29
    541 30
    390 31
    351 32
    370 33
    296 34
    335 35
    283 36
    232 37
    196 38
    173 39
    265 40
    157 41
    110 42
    112 43
    157 44
    136 45
     91 46
     64 47
    103 48
     83 49
     74 50
     75 51
     41 52
     33 53
     55 54
     60 55
     34 56
     40 57
     28 58
     31 59
     31 60
     21 61
     19 62
     20 63
     15 64
     17 65
     17 66
     15 67
     16 68
      9 69
      9 70
      5 71
     13 72
      3 73
      7 74
     11 75
      7 76
      5 77
      4 78
      5 79
      6 80
      3 81
      2 82
      1 83
      3 84
      1 86
      3 87
      1 88
      1 89
      2 90
      1 91
      1 92
      2 95
      1 96
      1 99
      1 100
      1 101
      2 102
      1 103
      2 104
      1 105
      1 114
      1 119

So far the largest gap is 119, which fits in a uint8_t instead of the previous uint32_t. The .h source file shrinks from 628041 to 275813 bytes, not a 4× gain as the binary format but still good and compresses well : 198495--->70512 !

There is still much room for improvement, for the numbers below 16, but I won't get into this... yet... if ever ?

When the scan reaches a point where the gap goes beyond 255, I'll have to reconsider my approach but so far, it's a pretty little hack and it gives a bit more insight on the spread of the perfect and maximal orbits. I observe that their density decreases quite slowly... Though no formula yet.

Discussions