I see a lot about source codes being leaked and I’m wondering how it that you could make something like an exact replica of Super Mario Bros without the source code or how you can’t take the finished product and run it back through the compilation software?

  • KoboldCoterie@pawb.social
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    10 months ago

    The main issue is that to make code human-readable, we include a lot of conventions that computers don’t need. We use specific formatting, name conventions, code structure, comments, etc. to help someone look at the code and understand its function.

    Let’s say I write code, and I have a function named ‘findUserName’ that takes a variable ‘text’ and checks it against a global variable ‘userName’, to see if the user name is contained in the text, and returns ‘true’ if so. If I compile and decompile that, the result will be (for example) a function named ‘function_002’ that takes a variable ‘var_local_000’ and checks it against ‘var_global_115’. Also, my comments will be gone, and finding where the function was called from will be difficult. Yes, you could look at that code and figure out that it’s comparing the contents of two variables, but you wouldn’t know that var_global_115 is a username, so you’d have to go find where that variable was set and try to puzzle out where it was coming from, and follow that rabbit hole backwards until you eventually find a request for user input which you’d have to use context clues to determine the purpose of. You also wouldn’t have the context around what ‘var_local_000’ represented unless you found where the function was called, and followed a similar line backwards to find the origin of that variable.

    It’s not that the code you get back from a decompiler is incorrect or inefficient, it’s that it’s very much not human-readable without a lot of extra investigatory work.

    • Hotzilla
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      10 months ago

      This might change now relatively fast, now that large language models can process code, you could give the function to LLM to rename the function. Iterating over the code and rename all functions and variables.

      This won’t of course reproduce exact code, but it makes one really heavy part of reconstruction to human readable much lighter.