Close

Another possible extension

A project log for Aligned Strings format

Developing a more flexible pointer and structure format that keeps some POSIX compatibility without the hassles

yann-guidon-ygdesYann Guidon / YGDES 09/25/2023 at 01:120 Comments

The recent article Faster String Processing With Bloom Filters got my attention. It refers to https://codeconfessions.substack.com/p/cpython-bloom-filter-usage. And I thought: What if...

It is possible to create a larger structure that contains an even larger field, to hold either a Bloom sieve or a hash/signature, for 2 use cases : sub-string search or string matching. For example, if a string is determined to be tested for equality at some later point, a programming language / Compiler would "tag" the string so the hash is created once the value is assigned. Then the actual test becomes a simple scalar value comparison, followed by a real comparison in case of a hit, which might save some CPU cycles when misses are likely.

Similarly, before performing a sub-string search, the strings (possibly many, again) can be pre-processed so only the 64-bit Bloom filter is used at first, to sieve candidates faster than the naïve method.

However, with already 3 LSB used, such an extension is impractical. The memory losses from alignment might become ridiculous. The hash or sieve values would be handled by the programmer, case-by-case at first, in a separate structure.

And speaking of hashes, I might have another trick up my sleeve, look up #PEAC Pisano with End-Around Carry algorithm !

Discussions