Close

More food for thoughts

A project log for Aligned Strings format

Developing a more flexible pointer and structure format that keeps some POSIX compatibility without the hassles

yann-guidon-ygdesYann Guidon / YGDES 01/17/2023 at 02:591 Comment

I have to credit @Gravis for challenging me in the comments. Even though it does not go in the direction he seems to want, feedback is progress and this has brought some indirect useful bits. Even though the broad lines are defined, there are still details to perfect.

A quirky little detail I didn't properly consider before is how to manage or handle the NULL value. It belongs to the type 0 so the 2 LSB are cleared. There is no incentive to swap it with type 3 again, as type 0 is not dereferenced (so no risk of a NULL pointer crashing the program) and type 3 has an easy mask-based routine to fetch the total length up to 24 bits.

However type 0 gives code points and NULL translates to a character of value 0. This potentially causes confusion in cases where a pointer is returned as an error code, like malloc() failing. It shouldn't insert a 0 in the stream (eventually a Unicode glyph that says "error").

Furthermore a "placeholder" value is defined to indicate that the entry in the list must be skipped. Currently this is defined as value 4 (100b).

Something has to be shuffled around to make the whole thing safer. It is easier to re-attribute the less-used placeholder value, giving the following table :

AttributionMSBother bits
b2b1-b0
NULL


000
Unicode point (including value 0)
0
Unicode value
100
Placeholder / skip marker
1

100


The value of the placeholder/skip can now be -4 because Unicode values can't affect the MSB.

I don't know yet how to display a NULL value. Maybe ⚠ - U+26A0 - ⚠ ? The Replacement character U+FFFD � is already used for a different purpose and should be avoided to prevent confusion. U+1F6AB 🚫 or U+1F5F2 🗲 could also represent an error.

The summary of this change is below:

Type UP means "Unicode Point or Placeholder". Now let's transform the specification into working code.

Discussions

Starwort wrote 02/13/2023 at 07:49 point

Consider U+2400, ␀ for replacing nulls

  Are you sure? yes | no