-
faster sqrt but less accurate
09/17/2019 at 15:25 • 0 commentsi found this in the background somewhere on the internet.
and indeed it is faster . i had to correct the 1L to 127L as explained in the article bellow with some users having issues with it originally. it however has an error rate of up to 8% but for whole numbers <6%
float fastsqrt(float val) {
long tmp = *(long *)&val;
tmp -= 127L<<23; /* Remove IEEE bias from exponent (-2^23) */
/* tmp is now an appoximation to logbase2(val) */
tmp = tmp >> 1; /* divide by 2 */
tmp += 127L<<23; /* restore the IEEE bias from the exponent (+2^23) */
return *(float *)&tmp;
}this code makes error rate with whole number <3%
float fastsqrt(float val) {
float invertDivide=0.66666666666 ;// ~1/1.5 rounded down to float precision single float
long tmp = *(long *)&val;
val*=0.22474487139;
long tmp2 = *(long *)&val;
tmp -= 127L<<23; //* Remove IEEE bias from exponent (-2^23) */1065353216
tmp2 -= 127L<<23; //* Remove IEEE bias from exponent (-2^23) */1065353216
temp2=tmp>>20;//any time number is negative it is more error rate
tmp = tmp >> 1; //* divide by 2 *
tmp2 = tmp2 >> 1; //* divide by 2 *
temp=tmp;//any time number is negative it is more error rate
// if (tmp <0) tmp=tmp+1L<<22;//invert -10066330
//23-8bits
temp2=tmp>>23;//any time number is negative it is more error rate
//when tmp=0 error rate is high also -2,-1
tmp +=1065353216; /* restore the IEEE bias from the exponent (+2^23) */
tmp2 +=1065353216; /* restore the IEEE bias from the exponent (+2^23) */
float offset=*(float *)&tmp2;
val= *(float *)&tmp;
return (val+offset)*invertDivide;
}first version <4.3% numb range 0 to 100
2 1.414214===== 1=1.5 variation % = 4.289323
version with offset <2.7% numb range 0 to 100
2 1.414214===== 1=1.466326 variation % == 2.605647
but it gets tricky for the numbers with digits before 0.
original has an error rate of >24% for below as new version is 11.5%
0.01 0.100000===== 1=0.101153 variation % == 11.530593
so some work needs to be done to make it adjust to the precision of the float
i'll play around with it more. it seems something that is useful for some people, but i would like precision to be closer to 0.5% to 1% or even better.
I'll update if i get something faster or better.....
-
speed up i2c, faster than it can be normally. caching data you use.
09/10/2019 at 13:32 • 0 commentswhen reading i2c data, there are performance penalties from reading a few bytes at a time, and performance penalties from reading too much at a time. so probably the best method is to read only the range that you would need and nothing more. from my data it seems that from 32-64 words at a time is the ideal caching.
some cpu designs have cache for i2c, and some have dma access. loading more i2c than needed in many cases will have an overhead of cpu cycles. so you want to have the optimum amount of cache.
https://i2c.info/i2c-bus-specification
All i have is a data comparison for reading the mlx90640 sensor. it is because the sensor in my case requires reading of 768 words of data so 768x2 or 1536 bytes of data. with using an uno or nano processor with only 2k of ram this is not that practical, also there is a lot of overhead, the processor can not do anything else while getting that much data at a time.
i wanted to determine the optimum i2c cache where the benefits of reading data are maximized.
keep in mind that start and stop bits require about a byte cycle as well. (some times it is only 2 bits of clocks)
for example i2c in my case can go up to 1mhz. the protocol for i2c is the following
address of device to talk to start 2 bytes (this gets up to 10 bit address for access)
address of start of memory to read 1 to 2 bytes. in this is also a bit that says write or read
address of last byte of memory to read 2 bytes
there are are also start and stop bits and clocking of serial data. we treat those as bytes as well.
os for each byte of data sent back (1 byte), but we request 2 at a time minimum for word data
so for our purposes we can assume that reading 2 bytes of
data (minimum for memory read values for mlx90640 sensor)
READING 1 uint16_t (word) == 9 bytes of data to receive (clock pulses serially)
-------------------startbit---device_ID_---start_address_+end_address_DATA__end bit
read 2bytes =1byte + 1byte + 2bytes + 2bytes + 2bytes + 1byte
so to read only 1 word of data from an i2c device in or case takes 9 bytes
READING 2 uint16_t (word) == 11 bytes of data to receive (clock pulses serially)
to read 2 words of data takes 11 bytes
-------------------startbit---device_ID_---start_address_+end_address_DATA____DATA_end bit
read 2bytes =1byte + 1byte + 2bytes + 2bytes + 2bytes + 2bytes + 1byte
the over head of 6 bytes is in every transfer even if we get several at a time
here is the theoretical performance increase based on the amount of bytes read
a 1 would be 100%
1 word is at 22% efficient
16 words 80% efficient or 3.6 times faster than 1 byte
32 words is 90% efficient. 4.1 times faster than 1 byte
64 words is 94% efficient 4.3 times faster than 1 byte
128 words is 97% efficient 4.4 times faster than 1 byte
the table that shows the efficiency is at the end of this log.
here is the i2c code i use. i just read 32 or 64 at a time, and only when a cache miss occurs do i reload. i make sure that my reads of the data go as linear as possible this prevents over use of the caching.
the cache can be enabled and disabled. and it works without any other coding consideration other than trying to read data within 32 words in order. even if you don't it will still work.
you can see why i would be using a 64 word buffer, and am considering going down to a 32 word buffer so i can use the ram saved to cache other information (serial data caching to burst to screen)
in my case 16 or 32 words of data would be appropriate. i have other issues causing more overhead than the speed of the data transfer of i2c data. but you can see from below what the efficiency would be on size of buffer
this example uses i2c data to read 1 word to as many words as you want at a time.
uint16_t RamGetStoredInLocal(uint16_t value){
#if customSmallCacheForMemReads !=true //we read only 2 bytes at a time with a lot of address overhead
MLX90640_I2CRead(MLX90640_address, 1024+value, 1,worddata);
return worddata[0];
#endif
#if customSmallCacheForMemReads ==true //we read only 2 bytes at a time with a lot of address overhead
//here is predictive loading. if y is different, we do different things. if y is different, we determin if y is incremental or further away. if further away we cache 64 values
uint8_t valueofy =(value>>6) ;//we are gettint the y line data /64
if (linecache !=valueofy ){//this means we have a cache miss.
//7 bits wide 65408 //6bits wide 65472 //5bits wide is65504// for masking//below is i2c data return.
MLX90640_I2CRead(MLX90640_address, 1024+(value&65472),64,SmallMemCache_i2c_efficency);
linecache =valueofy ; }//we take the small time to update the cache and change the line buffered
//this part is alwasy ready on reads and if line is already cached data is instantly available.
uint8_t valueofx =(value& 63); //32=31//64=63//128=127//0-X //this is a mask
return SmallMemCache_i2c_efficency[valueofx];//we return cached value most if time
#endif //end of different cached or non cached methods}
after that it becomes very small improvements
efficiency overhead. data received at a time. efficiency level
7(bytes) overhead 0word(0bytes):::::::::0
7(bytes) overhead 1word(2bytes):::::::::0.2222222222222222
7(bytes) overhead 2word(4bytes):::::::::0.36363636363636365
7(bytes) overhead 3word(6bytes):::::::::0.46153846153846156
7(bytes) overhead 4word(8bytes):::::::::0.5333333333333333
7(bytes) overhead 5word(10bytes):::::::::0.5882352941176471
7(bytes) overhead 6word(12bytes):::::::::0.631578947368421
7(bytes) overhead 7word(14bytes):::::::::0.6666666666666666
7(bytes) overhead 8word(16bytes):::::::::0.6956521739130435
7(bytes) overhead 9word(18bytes):::::::::0.72
7(bytes) overhead 10word(20bytes):::::::::0.7407407407407407
7(bytes) overhead 11word(22bytes):::::::::0.7586206896551724
7(bytes) overhead 12word(24bytes):::::::::0.7741935483870968
7(bytes) overhead 13word(26bytes):::::::::0.7878787878787878
7(bytes) overhead 14word(28bytes):::::::::0.8
7(bytes) overhead 15word(30bytes):::::::::0.8108108108108109
7(bytes) overhead 16word(32bytes):::::::::0.8205128205128205
7(bytes) overhead 17word(34bytes):::::::::0.8292682926829268
7(bytes) overhead 18word(36bytes):::::::::0.8372093023255814
7(bytes) overhead 19word(38bytes):::::::::0.8444444444444444
7(bytes) overhead 20word(40bytes):::::::::0.851063829787234
7(bytes) overhead 21word(42bytes):::::::::0.8571428571428571
7(bytes) overhead 22word(44bytes):::::::::0.8627450980392157
7(bytes) overhead 23word(46bytes):::::::::0.8679245283018868
7(bytes) overhead 24word(48bytes):::::::::0.8727272727272727
7(bytes) overhead 25word(50bytes):::::::::0.8771929824561403
7(bytes) overhead 26word(52bytes):::::::::0.8813559322033898
7(bytes) overhead 27word(54bytes):::::::::0.8852459016393442
7(bytes) overhead 28word(56bytes):::::::::0.8888888888888888
7(bytes) overhead 29word(58bytes):::::::::0.8923076923076924
7(bytes) overhead 30word(60bytes):::::::::0.8955223880597015
7(bytes) overhead 31word(62bytes):::::::::0.8985507246376812
7(bytes) overhead 32word(64bytes):::::::::0.9014084507042254
7(bytes) overhead 33word(66bytes):::::::::0.9041095890410958
7(bytes) overhead 34word(68bytes):::::::::0.9066666666666666
7(bytes) overhead 35word(70bytes):::::::::0.9090909090909091
7(bytes) overhead 36word(72bytes):::::::::0.9113924050632911
7(bytes) overhead 37word(74bytes):::::::::0.9135802469135802
7(bytes) overhead 38word(76bytes):::::::::0.9156626506024096
7(bytes) overhead 39word(78bytes):::::::::0.9176470588235294
7(bytes) overhead 40word(80bytes):::::::::0.9195402298850575
7(bytes) overhead 41word(82bytes):::::::::0.9213483146067416
7(bytes) overhead 42word(84bytes):::::::::0.9230769230769231
7(bytes) overhead 43word(86bytes):::::::::0.9247311827956989
7(bytes) overhead 44word(88bytes):::::::::0.9263157894736842
7(bytes) overhead 45word(90bytes):::::::::0.9278350515463918
7(bytes) overhead 46word(92bytes):::::::::0.9292929292929293
7(bytes) overhead 47word(94bytes):::::::::0.9306930693069307
7(bytes) overhead 48word(96bytes):::::::::0.9320388349514563
7(bytes) overhead 49word(98bytes):::::::::0.9333333333333333
7(bytes) overhead 50word(100bytes):::::::::0.9345794392523364
7(bytes) overhead 51word(102bytes):::::::::0.9357798165137615
7(bytes) overhead 52word(104bytes):::::::::0.9369369369369369
7(bytes) overhead 53word(106bytes):::::::::0.9380530973451328
7(bytes) overhead 54word(108bytes):::::::::0.9391304347826087
7(bytes) overhead 55word(110bytes):::::::::0.9401709401709402
7(bytes) overhead 56word(112bytes):::::::::0.9411764705882353
7(bytes) overhead 57word(114bytes):::::::::0.9421487603305785
7(bytes) overhead 58word(116bytes):::::::::0.943089430894309
7(bytes) overhead 59word(118bytes):::::::::0.944
7(bytes) overhead 60word(120bytes):::::::::0.9448818897637795
7(bytes) overhead 61word(122bytes):::::::::0.9457364341085271
7(bytes) overhead 62word(124bytes):::::::::0.9465648854961832
7(bytes) overhead 63word(126bytes):::::::::0.9473684210526315
7(bytes) overhead 64word(128bytes):::::::::0.9481481481481482
7(bytes) overhead 65word(130bytes):::::::::0.948905109489051
7(bytes) overhead 66word(132bytes):::::::::0.9496402877697842
7(bytes) overhead 67word(134bytes):::::::::0.950354609929078
7(bytes) overhead 68word(136bytes):::::::::0.951048951048951
7(bytes) overhead 69word(138bytes):::::::::0.9517241379310345
7(bytes) overhead 70word(140bytes):::::::::0.9523809523809523
7(bytes) overhead 71word(142bytes):::::::::0.9530201342281879
7(bytes) overhead 72word(144bytes):::::::::0.9536423841059603
7(bytes) overhead 73word(146bytes):::::::::0.954248366013072
7(bytes) overhead 74word(148bytes):::::::::0.9548387096774194
7(bytes) overhead 75word(150bytes):::::::::0.9554140127388535
7(bytes) overhead 76word(152bytes):::::::::0.9559748427672956
7(bytes) overhead 77word(154bytes):::::::::0.9565217391304348
7(bytes) overhead 78word(156bytes):::::::::0.9570552147239264
7(bytes) overhead 79word(158bytes):::::::::0.9575757575757575
7(bytes) overhead 80word(160bytes):::::::::0.9580838323353293
7(bytes) overhead 81word(162bytes):::::::::0.9585798816568047
7(bytes) overhead 82word(164bytes):::::::::0.9590643274853801
7(bytes) overhead 83word(166bytes):::::::::0.9595375722543352
7(bytes) overhead 84word(168bytes):::::::::0.96
7(bytes) overhead 85word(170bytes):::::::::0.96045197740113
7(bytes) overhead 86word(172bytes):::::::::0.9608938547486033
7(bytes) overhead 87word(174bytes):::::::::0.9613259668508287
7(bytes) overhead 88word(176bytes):::::::::0.9617486338797814
7(bytes) overhead 89word(178bytes):::::::::0.9621621621621622
7(bytes) overhead 90word(180bytes):::::::::0.9625668449197861
7(bytes) overhead 91word(182bytes):::::::::0.9629629629629629
7(bytes) overhead 92word(184bytes):::::::::0.9633507853403142
7(bytes) overhead 93word(186bytes):::::::::0.9637305699481865
7(bytes) overhead 94word(188bytes):::::::::0.9641025641025641
7(bytes) overhead 95word(190bytes):::::::::0.9644670050761421
7(bytes) overhead 96word(192bytes):::::::::0.964824120603015
7(bytes) overhead 97word(194bytes):::::::::0.9651741293532339
7(bytes) overhead 98word(196bytes):::::::::0.9655172413793104
7(bytes) overhead 99word(198bytes):::::::::0.9658536585365853
7(bytes) overhead 100word(200bytes):::::::::0.966183574879227
7(bytes) overhead 101word(202bytes):::::::::0.9665071770334929
7(bytes) overhead 102word(204bytes):::::::::0.966824644549763
7(bytes) overhead 103word(206bytes):::::::::0.9671361502347418
7(bytes) overhead 104word(208bytes):::::::::0.9674418604651163
7(bytes) overhead 105word(210bytes):::::::::0.967741935483871
7(bytes) overhead 106word(212bytes):::::::::0.9680365296803652
7(bytes) overhead 107word(214bytes):::::::::0.9683257918552036
7(bytes) overhead 108word(216bytes):::::::::0.968609865470852
7(bytes) overhead 109word(218bytes):::::::::0.9688888888888889
7(bytes) overhead 110word(220bytes):::::::::0.9691629955947136
7(bytes) overhead 111word(222bytes):::::::::0.9694323144104804
7(bytes) overhead 112word(224bytes):::::::::0.9696969696969697
7(bytes) overhead 113word(226bytes):::::::::0.9699570815450643
7(bytes) overhead 114word(228bytes):::::::::0.9702127659574468
7(bytes) overhead 115word(230bytes):::::::::0.9704641350210971
7(bytes) overhead 116word(232bytes):::::::::0.9707112970711297
7(bytes) overhead 117word(234bytes):::::::::0.970954356846473
7(bytes) overhead 118word(236bytes):::::::::0.9711934156378601
7(bytes) overhead 119word(238bytes):::::::::0.9714285714285714
7(bytes) overhead 120word(240bytes):::::::::0.97165991902834
7(bytes) overhead 121word(242bytes):::::::::0.9718875502008032
7(bytes) overhead 122word(244bytes):::::::::0.9721115537848606
7(bytes) overhead 123word(246bytes):::::::::0.9723320158102767
7(bytes) overhead 124word(248bytes):::::::::0.9725490196078431
7(bytes) overhead 125word(250bytes):::::::::0.9727626459143969
7(bytes) overhead 126word(252bytes):::::::::0.972972972972973
7(bytes) overhead 127word(254bytes):::::::::0.9731800766283525
7(bytes) overhead 128word(256bytes):::::::::0.973384030418251 -
create a fast lookup table for squaring numbers if you know the range
09/08/2019 at 15:20 • 0 commentsI recently had some math optimizations that required improving the performance of POW operations.
these numbers were only of powers of 2. and only went up to 48.
this is currently about 100x the speed of the normal method. it would be 1000x or more but it is not in ram.
so i created a table, and decided the fastest way to store them without using much memory was to create this table so it would be in progmem, for Arduino that is stored in its flash for review later.
this may be modified further with a predictive cache, or a compressed table store as getting values returned from flash is slower than ram, and ram is valuable on Arudino. for processors with ram to spare this table should be located in ram.
here is the function that pulls values to be used in equations. this table processes 2^x with x being 0 to 63
float SimplePowFast2s(uint8_t x){//we cause 2^x
return pgm_read_float_near(power_of2table+ x);
}
here is the table
const float power_of2table[] PROGMEM = {
1,
2,
4,
8,
16,
32,
64,
128,
256,
512,
1024,
2048,
4096,
8192,
16384,
32768,
65536,
131072,
262144,
524288,
1048576,
2097152,
4194304,
8388608,
16777216,
33554432,
67108864,
134217728,
268435456,
536870912,
1073741824,
2147483648,
4294967296,
8589934592,
17179869184,
34359738368,
68719476736,
137438953472,
274877906944,
549755813888,
1099511627776,
2199023255552,
4398046511104,
8796093022208,
17592186044416,
35184372088832,
70368744177664,
140737488355328,
281474976710656,
562949953421312,
1125899906842624,
2251799813685248,
4503599627370496,
9007199254740992,
18014398509481984,
36028797018963970,
72057594037927940,
144115188075855870,
288230376151711740,
576460752303423500,
1152921504606847000,
2305843009213694000,
4611686018427388000,
9223372036854776000,
}; -
if you can cache a problem
09/08/2019 at 13:00 • 0 commentsthis method of boosting performance is universal. so it is probably one of the most effective ways to increase performance across designs, as it just reduces the amount redundant work. it does not use a specific syntax, or a special mode or method of a cpu design. all it uses is some variables outside of the loop.
there are times that a problem is solved more than one time. in some cases it is solved several times with the same results. there are also tables of data, that some data does not change. here is an example of a single function that can be called several times. the 2 values can be replaces with an array with reference to the cell it is being measured from. here is a basic view of how caching math works.
here is an example of a function that caches results. this is a simple version it only caches results if they are the same.
In order to cache data we need two values stored outside of the loop.
We need the ResultCache value. this is just a number we output without doing any further work
and we need the reference Compare Cached value. this is the value we use as a reference to verify that the problem has not changed.
here is an example function DataconvertDegCTodegF that takes a deg in c and changes it to deg in f.
We have several requests for this a second, but we never know how often it actually changes. several values are like this especially with sensors and data sets. many values are the same most of the time.
this is a simple example of Math caching.
float DataconvertDegCTodegF(float temperature_Deg_C ){//this value is read several times unknown interval of change
//we output a conversion to deg c from F
//(0°C × 9/5) + 32 = 32°F
if (reference_Compare_Cached_value !=float temperature_Deg_C ){
//we want to see if reference value has changed if it has we do work
ResultCache_value=(temperature_Deg_C *1.8)+32;//we get deg in F
reference_Compare_Cached_value ==temperature_Deg_C ;//we update reference
}
//we have completed work if it changed, if not we send back what we have cached
return ResultCache_value;
}
this loop is not complex, but if it is ran several times a second it can have an impact on performance. some of the code i did for the amg8833 sensor used code for caching.