I’m curious how software can be created and evolve over time. I’m afraid that at some point, we’ll realize there are issues with the software we’re using that can only be remedied by massive changes or a complete rewrite.
Are there any instances of this happening? Where something is designed with a flaw that doesn’t get realized until much later, necessitating scrapping the whole thing and starting from scratch?
I would say the whole set of C based assumptions underlying most modern software, specifically errors being just an integer constant that is translated into a text so it has no details about the operation tried (who tried to do what to which object and why did that fail).
You have stderr to throw errors into. And the constants are just error codes, like HTTP error codes. Without it how computer would know if the program executed correctly.
stderr is useless if the syscall already returns a single integer only because of stupid C conventions.
You throw an exception like a gentleman. But C doesn’t support them. So you need to abuse the return type to also indicate “success” as well as a potential value the caller wanted.
Exceptionss are bad coding, and what’s abusive of using the full range of an integer? 0 success, everything else, error - check the API for details or call
strerror
.errno is bad programming.
Returning error codes in-band is the reason for a significant percentage of C bugs and security holes when the return value is used without checking. Something like Rust’s Result type that forces you to distinguish the two cases is much better design here. And no, you are not working with a whole language ecosystem of “sufficiently disciplined programmers” so that nobody ever forgets to check a return value.
Not to mention that errno is just a very broken design in the times of modern thread and event systems, signals, interrupts and all kinds of other ways to produce race conditions and overwrite the errno value before it is checked.
errno is not shared between threads. Also:
There does not add more race conditions because signal handlers execute in one of regular threads. In single-threaded program signals are functions that can be called by OS at any point of execution, but they do not execute at same time with threads.
You don’t need to.
Returnung structs, returning by pointer, signals, error flags, setjmp/longjmp, using cxa for exceptions(lol, now THIS is real abuse).
You mean 0 indicating success and any other value indicating some arbitrary meaning? I don’t see any problem with that.
Passing around extra error handling info for the worst case isn’t free, and the worst case doesn’t happen 99.999% of the time. No reason to spend extra cycles and memory hurting performance just to make debugging easier. That’s what debug/instrumented builds are for.
The case “I want to know why this error happened” is basically 100% of the time when an error actually happens.
And the case of “Permission denied” or similar useless nonsense without any details costing me hours of my life in debugging time that wouldn’t be necessary if it just told me permission for who to do what to which object happens quite regularly.
“0.001% of the time, I wanna know every time 👉😎👉”
Yeah, I get that. But are we talking about during development (which is why we’re choosing between C and something else)? In that case, you should be running instrumented builds, or with debug functionality enabled. I agree that most programs just fail and don’t tell you how to go about enabling debug info or anything, and that could be improved.
For the “Permission Denied” example, I also assume we’re making system calls and having them fail? In that case it seems straight forward: the user you’re running as can’t access the resource you were actively trying to access. But if we’re talking about some random log file just saying “Error: permission denied” and leaving you nothing to go on, that’s on the program dumping the error to produce more useful information.
In general, you often don’t want to leak more info than just Worked or Didn’t Work for security reasons. Or a mix of security/performance reasons (possible DOS attacks).
During development is just about the only time when that doesn’t matter because you have direct access to the source code to figure out which function failed exactly. As a sysadmin I don’t have the luxury of reproducing every issue with a debug build with some debugger running and/or print statements added to figure out where exactly that value originally came from. I really need to know why it failed the first time around.
Yeah, so it sounds like your complaint is actually with application not propagating relevant error handling information to where it’s most convenient for you to read it. Linux is not at fault in your example, because as you said, it returns all the information needed to fix the issue to the one who developed the code, and then they just dropped the ball.
Maybe there’s a flag you can set to dump those kinds of errors to a log? But even then, some apps use the fail case as part of normal operation (try to open a file, if we can’t, do this other thing). You wouldn’t actually want to know about every single failure, just the ones that the application considers fatal.
As long as you’re running on a turing complete machine, it’s on the app itself to sufficiently document what qualifies as an error and why it happened.
The whole point of my complaint is that shitty C conventions produce shitty error messages. If I could rely on the programmer to work around those stupid conventions every time by actually checking the error and then enriching it with all relevant information I would have no complaints.
As sysadmin you should know about strace
I know about strace, strace still requires me to reproduce the issue and then to look at backtraces if nobody bothered to include any detail in the error.
Somehow (lack of) backtrace and details in error is “C based assumption”
Ugh, I do not miss C…
Errors and return values are, and should be, different things. Almost every other language figured this out and handles it better than C.
It’s more of an ABI thing though, C just doesn’t have error handling.
And if you do exception handling wrong in most other languages, you hamstring your performance.
The unofficial C motto “Make it fast, who gives a shit about correctness”
That’s why errno and return value are different things.
Assembly doesn’t have concept of objects.
It does very much have the concept of objects as in subject, verb, object of operations implemented in assembly.
As in who (user foo) tried to do what (open/read/write/delete/…) to which object (e.g. which socket, which file, which Linux namespace, which memory mapping,…).
Indeed. Assembly is(can be) used to implement them.
Kernel implements it in software(except memory mappings, it is implemented in MMU). There are no sockets, files and namespaces in ISA.
You were the one who brought up assembly.
And stop acting like you don’t know what I am talking about. Syscalls implement operations that are called by someone who has certain permissions and operate on various kinds of objects. Nobody who wants to debug why that call returned “Permission denied” or “File does not exist” without any detail cares that there is hardware several layers of abstraction deeper down that doesn’t know anything about those concepts. Nothing in the hardware forces people to make APIs with bad error reporting.
And why “Permission denied” is bad reporting?
Because if a program dies and just prints strerror(errno) it just gives me “Permission denied” without any detail on which operation had permissions denied to do what. So basically I have not enough information to fix the issue or in many cases even to reproduce it.
It may just not print anything at all. This is logging issue, not “C based assumption”. I wouldn’t be surprised if you will call “403 Forbidden” a “C based assumtion” too.
But since we are talking about local program, competent sysadmin can
strace
program. It will print arguments and error codes.