Close

Aiming for w27

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 01/13/2022 at 21:420 Comments

Now that w26 is solved, I enter the vast void of the gap until w32. w27 is not exciting in itself but it's a good way to test the tools and methods, right ? So this time I'm trying to distribute the workload. I have the 12-thread i7 and the older 8-thread i7 that could offload some work...

The .p7k files are trivially concatenated and the fusion program might even gracefully drop eventual duplicate semi-arcs, and I'll see if I can run the fusion program on a partial output (I don't expect much but it will be a small speedup in the end).

At this moment, I'm benchmarking the i7 10750H with a fraction of the total work: 2^22, or 4 million semi-arcs. w27 is 128 millions, so the total is 32-1 times more work. I want to know in advance how much of the dataset I will dedicate to each computer. Alone, the 10750H would take about two weeks so maybe we could shorten this by one third, or 5 days ?

The i7-10750H is running a stock Fedora 35 and the governor is "powersave", limiting the clock to 3.8GHz. So I'm trying to see how to configure this relatively recent feature of Linux, on a pretty recent platform... Can I get to 4.8GHz ? It is advertised as being able to reach 5GHz on a single thread though I don't know how the TDP is managed on this particular laptop.

So I installed the kernel-tools package and can now run cpupower.

cpupower frequency-set --governor performance

does nothing,  the clock is still between 3.7 and 3.8GHz...

AAAaaand this is the moment where I realise that the little Thinkpad has only a dual-core, 4-threaded processor... So despite the same clock speed, it can only provide 1/3 of the big boy's speed. Sigh. Anyway, overall this makes 16 threads running in parallel, with the puny Thinkpad getting 1/4th of the overall work...

Maybe by the time this run is over, I'll have developed a SIMD version, which promises 4 to 8× speedup.

_____________________________________________________

The 10750H finished 4Mi semi-arcs (4194304) in 10h 30 minutes, the 7600U is 1/4 though... I'll now schedule partial runs of one or two days, on each computer, as they finish each.

____________________________________________________

I7-7600U took 34h45m to complete the 4Mi semi-arcs (4194304), which is 3.3× slower... while the 10750H took 62h52m to compute 24Mi (25165824) semi-arcs.

Since I just got another hexacore, there is no point in using the 7600U again. I'll run the 10750H again on 32Mi more semi-arcs while I configure the Ryzen 5600H to run the last half of the computations.

____________________________________________________

The Ryzen is operational. On w27 it runs about 7140 semi-arcs per minute... I'll see how fast it can complete the 33554432 semi-arcs.

____________________________________________________

w27 is completely scanned ! I will not give run times because they are totally messed up (I started some programs in parallel to attempt to make them all finish at the same time) and that's why I should try to develop #CHOMP. I have 6 partial files that are easy to concatenate.

The Ryzen had a first run of 33554432 semi-arcs that took 113h at 882%CPU because I started another run in parallel : 16777216 semi-arcs in 71h @700%CPU.

Meanwhile the 10750H completed 33554432 semi-arcs in 85h at 1172%CPU then 16777216 semi-arcs in 44h @1148%CPU (non overlapping then).

And I wasn't even there when they both finished...

Discussions