• s12
    link
    fedilink
    arrow-up
    37
    arrow-down
    1
    ·
    6 months ago

    Umm… someone explain this code please?

    • magic_lobster_party@kbin.run
      link
      fedilink
      arrow-up
      62
      ·
      edit-2
      6 months ago

      Bit shift magic.

      My guess is that all the individual characters of Hello World are found inside the 0xC894 number. Every 4 bits of x shows where in this number we can find the characters for Hello World.

      You can read x right to left. (Skip the rightmost 0 as it’s immediately bit shifted away in first iteration)

      3 becomes H 2 becomes e 1 becomes l 5 becomes o

      etc.

      I guess when we’ve exhausted all bits of x only 0 will be remaining for one final iteration, which translates to !

      • CanadaPlus@lemmy.sdf.org
        link
        fedilink
        arrow-up
        17
        ·
        edit-2
        6 months ago

        Too readable. You’ve gotta encode the characters as the solutions of a polynomial over a finite field, implemented with linear feedback on the bit shifts. /s

      • s12
        link
        fedilink
        arrow-up
        4
        ·
        6 months ago

        I understand that the characters are probably encoded into that number, but I’m struggling to understand that C/C++ code.

        • EmptySlime@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          7
          ·
          6 months ago
          #include <stdio.h>
          
          int main() {
          
          Long long x = 0x7165498511230;
          
          while (x) putchar(32 + ((0xC894A7875116601 >> ((x >>= 4) & 15) * 7) & 0x7F));
          
          return 0;
          }
          

          Might be wrong on a few things here as I haven’t done C++ in a while, but my understanding is this. I’m sure you can guess that this is just a very cheekily written while loop to print the characters of “Hello, World!” but how does it work? So first off, all ASCII characters have an integer value. That 32 there is the value for the space character. So depending on what ((0xC894A7875116601 >> ((x >>= 4) & 15) * 7) & 0x7F)) evaluates down into you’ll get different characters. The value for “H” for example is 72 so that first iteration we know that term somehow evaluated to the number 40 as 72 - 32 = 40.

          So how do we get there? That big number, 0xC894A7875116601 is getting shifted right some number of bits. Let’s start evaluating the parenthesis. (X >>= 4) means set x to be itself after bit shifting it right by 4 bits then whatever that number is we bitwise AND it with 15 or 1111 in binary. This essentially just means each iteration we discard the rightmost digit of 0x7165498511230, then pull out the new right most digit. So the first iteration the ((x >>= 4) & 15) term will evaluate to 3, then 2, then 1, then 1, etc until we run out of digits and the loop ends since effectively we’re just looking for x to be 0.

          Next we take that number and multiply it by 7. Simple enough, now for that first iteration we have 21. So we shift that 0xC894A7875116601 right 21 bits, then bitwise AND that against 0x7F or 0111 1111 in binary. Just like the last time this means we’re just pulling out the last 7 bits of whatever that ends up being. Meaning our final value for that expression is gonna be some number between 0 and 127 that is finally added to 32 to tell us our character to print.

          There are only 10 unique characters in “Hello, World!” So they just assigned each one a digit 0-9, making 0x7165498511230 essentially “0xdlroW ,olleH!” The first assignment happens before the first read, and the loop has a final iteration with x = 0 before it terminates. Which is how the “!” gets from one end to the other. So they took the decimal values for all those ASCII characters, subtracted 32 then smushed them all together in 7 bit chunks to make 0xC894A7875116601 the space is kinda hidden in the encoding since it was assigned 9 putting it right at the end which with the expression being 32 + stuff makes it 0 and there’s an infinitely assumed parade of 0s to the left of the C.

      • barsoap@lemm.ee
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        6 months ago

        32 is ASCII space, the highest number you need is 114 for r (or 122 for z if you want to be generic), that’s a range of 82 or 90 values.

        The target string has 13 characters, a long long has 8 bytes or 16 nibbles – 13 fits into 16 so nibbles (the (x >>= 4) & 15) it is. Also the initial x happens to have 13 nibbles in it so that makes sense. But a nibble only has 16 values, not 82, so you need some kind of compression and that’s the rest of the math, no idea how it was derived.

        If I were to write that thing I’d throw PAQ at it it can probably spit out an arithmetic coding that works, and look even more arcane as you wouldn’t have the obvious nibble steps. Or, wait, throw NEAT at it: Train it to, given a specific initial seed, produce a second seed and a character, score by edit distance. The problem space is small enough for the approach to be feasible even though it’s actually a terrible use of the technique, but using evolution will produce something that’s utterly, utterly inscrutable.