jmillikin a year ago

As usual HN comments react to the headline, without reading the content.

A lot of modern userspace code, including Rust code in the standard library, thinks that invariant failures (AKA "programmer errors") should cause some sort of assertion failure or crash (Rust or Go `panic`, C/C++ `assert`, etc). In the kernel, claims Linus, failing loudly is worse than trying to keep going because failing would also kill the failure reporting mechanisms.

He advocates for a sort of soft-failure, where the code tells you you're entering unknown territory and then goes ahead and does whatever. Maybe it crashes later, maybe it returns the wrong answer, who knows, the only thing it won't do is halt the kernel at the point the error was detected.

Think of the following Rust API for an array, which needs to be able to handle the case of a user reading an index outside its bounds:

  struct Array<T> { ... }
  impl<T> Array<T> {
    fn len(&self) -> usize;

    // if idx >= len, panic
    fn get_or_panic(&self, idx: usize) -> T;

    // if idx >= len, return None
    fn get_or_none(&self, idx: usize) -> Option<T>;

    // if idx >= len, print a stack trace and return
    // who knows what
    unsafe fn get_or_undefined(&self, idx: usize) -> T;
  }
The first two are safe by the Rust definition, because they can't cause memory-unsafe behavior. The second two are safe by the Linus/Linux definition, because they won't cause a kernel panic. If you have to choose between #1 and #3, Linus is putting his foot down and saying that the kernel's answer is #3.
  • EdSchouten a year ago

    The policy of ‘oopsing’ and limping on is, in my opinion, literally one of Linux’s worst features. It has bitten me in various cases:

    - Remember when Linux had that caused the kernel to partially crash and eat 100% CPU due to some bug in the leap second application code? That caused a >1MW spike in power usage at Hetzner at the time. That must have been >1GW globally. Many people didn’t notice it immediately, so it must have taken weeks before everyone rebooted.

    - I’ve personally run into issues where not crashing caused Linux to go on and eat my file system.

    On any Linux server I maintain, I always toggle those sysctls that cause the kernel to panic on oops, and reboot on panic.

    • mike_hock a year ago

      So instead of a power spike, we'd have had a major internet outage across the world, across the entire industry and beyond, probably, if everyone had panicked on oops. The blame really lies with people not monitoring their systems.

      As you said, you have the option to reboot on panic, but Linus is absolutely not wrong that this size does not fit all.

      What about a medical procedure that WILL kill the patient if interrupted? What about life support in space? Hitting an assert in those kinds of systems is a very bad place to be, but an automatic halt is worse than at least giving the people involved a CHANCE to try and get to a state where it's safe to take the system offline and restart it.

      • notacoward a year ago

        > What about a medical procedure that WILL kill the patient if interrupted? What about life support in space?

        The proper answer to those is redundancy, not continuing in an unknown and quite likely harmful state.

        • John23832 a year ago

          Right. It's clear that many people have not heard of, or considered, Therac-25[1].

          [1] https://en.wikipedia.org/wiki/Therac-25

          • eesmith a year ago

            Therac-25 removed redundancy. Quoting the Wikipedia article: "Previous models had hardware interlocks to prevent such faults, but the Therac-25 had removed them, depending instead on software checks for safety."

            • John23832 a year ago

              Right, that is the point I was making.

        • yencabulator a year ago

          The leap second bug would have crashed all nodes of a redundant system, at the same time...

          • notacoward a year ago

            Perhaps. On the other hand, letting a medical device continue moving an actuator or dispensing a medication when it's known to be in a bad "never happen" state could also be fatal. Ditto for the "life support in space" example. Ditto for anything reliant on position, where the system suddenly realizes it has no idea whether its position is correct. Imagine that e.g. on a warship. Limiting responses to external inputs (including time adjustments) can ameliorate such problems. So can software diversity. Many safety-critical systems require one or both, and other measures as well. Picking one black-swan event while ignoring literally every day scenarios doesn't seem very helpful. That's especially true when the thing you're advocating is what actually happened and led to its own bad outcomes.

            • yencabulator a year ago

              Picking medical devices and warships is also quite the cherry picking. Most Linuxes aren't like that. Critical embedded systems tend to have a hard realtime component, and if Linux is on the system it sits under e.g. seL4, or on a different CPU.

              At the end of the day, what Linux does is what Linus wants out of it. He's stated, often, that halting the CPU at the exact moment something goes wrong is not the goal. If your goal is to do that, you might not be able to use Linux. If your goal is to put Rust in the Linux kernel, you might have to let go of your goal.

              • notacoward a year ago

                > Picking medical devices ... is also quite the cherry picking

                It wasn't my example. It was mike_hock's, and I was responding in the context they had set.

                > Most Linuxes aren't like that.

                Your ally picked the medical-device and space-life-support examples. If you think they're invalid because such systems don't use Linux, why did you forego bringing it up with them and then change course when replying to me? As I said: not helpful.

                The point is not specific to Linux, and more Linux systems than you seem to be aware of do adopt the "crash before doing more damage" approach because they have some redundancy. If you're truly interested, I had another whole thread in this discussion explaining one class of such cases in what I feel was a reasonably informative and respectful way while another bad-faith interlocutor threw out little more than one-liners.

      • ok_dad a year ago

        > What about a medical procedure that WILL kill the patient if interrupted? What about life support in space? Hitting an assert in those kinds of systems is a very bad place to be, but an automatic halt is worse than at least giving the people involved a CHANCE to try and get to a state where it's safe to take the system offline and restart it.

        Kinda a strawman there. That's got to account for, what, 0.0001% of all use of computers, and probably they would never ever use Linux for these applications (I know medical devices DO NOT use Linux).

        • Kim_Bruning a year ago

          Industrial control systems (sadly, imho) don't use Linux as often as they could/should, but such systems do have the ability to injure their operators or cause large amounts of damage of course. [1]

          The first priority is safety, absolutely and without question. And then the immediate second priority is the fact that time is money. For every minute that the system is not operating, x amount of product is not being produced.

          Generally, having the software fully halt on error is both dangerous and time-consuming.

          Instead you want to switch to an ERROR and/or EMERGENCY_STOP state, where things like lasers or plasma torches get turned off, motors are stopped, brakes are applied, doors get locked/unlocked (as appropriate/safe), etc. And then you want to report that to the user, and give them tools to diagnose and correct the source of the error and to restart the machine/line [safely!] as quickly as possible.

          In short, error handling and recovery is its own entire thing, and tends to be something that gets tested for separately during commissioning.

          [1] PLC's do have the ability to <not stop> and execute code in a real time manner, but I haven't encountered a lot of PLC programmers who actually exploit these abilities effectively. Basically for more complex situations you're quickly going to be better off with more general purpose tools [2], at most handing off critical tasks to PLCs, micro-controllers, or motor controllers etc.

          [2] except for that stupid propensity to give-up-and-halt at exactly that moment where it'll cause the most damage.

        • gmueckl a year ago

          Do you know absolutely every medical device in existence and do you know how broad the definition of a medical device is (including e.g. the monitor attached to the PC used for displaying X-ray images)?

          • ok_dad a year ago

            I worked in medical device quality control and so, yes, I know all about the FDA requirements for medical devices and ISO 13485. I can say, with certainty, that base Linux would not be allowed to run in a medical device in the USA. It's software of unknown provenance (SOUP) and would absolutely NOT be used as-is.

            • gmueckl a year ago

              Then you should know that the use of SOUP is not so clear cut. It depends on the class of device and more specifically, on the part of the device that the software is used on. I know medical devices running SOUP operating systems like Linux. They went to some length to show that the parts running Linux and the critical functions of the device were sufficiently independent. This isolation is specifically allowed by the standards you quote.

              It's even worse on things like car dashboards: some warning lights on dashboards need to be ASIL-D conformant, which is quite strict. However, developing the whole dashboard software stack to that standard is too expensive. So the common solution these days is to have a safe, ASIL-D compliant compositor and a small renderer for the warning lights section of the display while the rendering for all the flashy graphics runs in an isolated VM on standard software with lower safety requirements. It's all done on the same CPU and GPU.

              • ok_dad a year ago

                > They went to some length to show that the parts running Linux and the critical functions of the device were sufficiently independent.

                Let's not be too pedantic. You, as an experienced medical device engineer, probably knew what I meant was that they would never use Linux in the critical parts of a medical device as the OP had originally argued. Any device would definitely do all of it's functionality without the part with Linux on it.

                The OP was still a major strawman, regardless of my arguments, because the Linux kernel will never be in the critical path of a medical device without a TON of work to harden it from errors and such. Just the fact that Linus' stance is as said would mean that it's not an appropriate kernel for a medical device, because they should always fail with an error and stop under unknown conditions rather than just doing some random crap.

            • voakbasda a year ago

              That’s an odd thing to claim. I have worked on certified medical devices that run custom Linux distribution.

              Mind you, that experience also severely soured me on the quality of medical software systems, due to poor quality of the software that ran in that distribution. Linux itself was a golden god in comparison to the crap that was layered on top of it.

              • ok_dad a year ago

                I'd like to hear more about that, but I assume it's much like the other poster here that described a Linux system that is a peripheral device attached to the actual medical device that does the medical shit.

                • gmueckl a year ago

                  It is not a peripheral device if it runs the UI with all the main controls, is it?

                  • ok_dad a year ago

                    No, do you have a concrete example of this strawman, though?

                    Edit: I should also add (probably earlier too) that all my examples are specific to the USA FDA process. I'm sure some other place might not have the same rules.

                    • gmueckl a year ago

                      I can't see how you can make out a strawman in what I said. There are medical devices where the UI is running on a processor separate from the controller in charge of the core device functions. The two are talking to each other and there is no secondary way of interacting with the controller. This lessens the requirements that are put on the part running the UI, but does not eliminate them.

                      I'm mostly familiar with EU rules, but as far as I know the FDA regulations follow the same idea of tiered requirements based on potential harm done.

                      • ok_dad a year ago

                        The UI is one of the most important parts of a machine, look at the Therac-25! The FDA regulations require a lot of effort goes into the human factors, too, and the UI definitely had to be as reliable as the rest of the device and be as well engineered as the rest.

                        https://www.fda.gov/medical-devices/human-factors-and-medica...

                        Honestly, the FDA regulations go too far vs the EU regs. The company I worked for was based in the EU and the products there were so advanced compared to our versions. Ours were all based on an original design from Europe that was approved and then basically didn’t charge for 30 years. The European device was fucking cool and had so many features, it was also capable of being carried around rather than rolled. The manufacturing was almost all automated, too, but in the USA it was not at all automated, it was humans assembling parts then recording it in a computer terminal.

                • voakbasda a year ago

                  These were not peripherals. We are talking devices that would be front line in an emergency room. Terrifying.

            • Suzuran a year ago

              I am an American citizen and a former dialysis patient, now kidney transplant recipient. I have watched in-center dialysis machines reboot during treatment, show the old "Energy Star" BIOS logo, and then boot Linux...

              Felt kinda bad until I thought about how well a "Linux literally killed me" headline would do on HN, but then I realized I wouldn't be able to post the article if I actually died. Such is life. Or death? One or the other.

            • sarlalian a year ago

              Ok, that's good for a U.S. centric view. Do you know that every medical device manufactured in China, for use in China meets the same requirements? Same for India, Russia, etc. The U.S. isn't the world and I'd be surprised if Linux weren't in use in some critical systems around the world that would be shocking for U.S. experts on those types of systems.

            • cplusplusfellow a year ago

              Surely we can “harden” Linux for this application?

            • smoldesu a year ago

              Makes me wonder what they run their NAS software with. Or their internal web-hosting, or their networking devices, or any of the other devices they have littered about. I'd swear on the Bible that I've seen a dentist or two running KDE 3 before...

              • ok_dad a year ago

                Those aren't medical devices.

          • jmillikin a year ago

              > including e.g. the monitor attached to the PC used for displaying
              > X-ray images
            
            Somewhat off-topic, but I used to work in a dental office. The monitors used for displaying X-rays were just normal monitors, bought from Amazon or Newegg or whatever big-box store had a clearance sale. Even the X-ray sensors themselves were (IIRC) not regulated devices, you could buy one right now on AliExpress if you wanted to.
            • gmueckl a year ago

              That's not the case in the EU. I've worked for an equipment manufacturer for dental clinics. While the monitors were allowed to be off the shelf, the operator (dental clinic) is required to make sure that they work properly and display the image correctly - obey certain brightness and color resolution/calibration standards. Our display software had to refuse to work on an uncalibrated monitor.

              • alias_neo a year ago

                Interesting, how does your software detect an uncalibrated monitor? Did it come with a calibration device which had to be used to scan the display output to check?

                I don't suppose monitors report calibration data back to display adapters do they?

                • GauntletWizard a year ago

                  My guess is they had some heuristic based on EDIDs, which are incredibly easy to spoof.

                  https://smile.amazon.com/EVanlak-Passthrough-Generrtion-Elim...

                  • gmueckl a year ago

                    Yes, but why would you go to these lengths? The purpose of the whole mechanism is to prevent accidental misdiagnosis based on an incorrectly interpreted X-ray image. This isn't DRM, just a safeguard against incorrect use of equipment.

                    • GauntletWizard a year ago

                      People are cheap and corrupt. The speed bump this presents is real, but minor, in the face of a couple medical shops looking to save $100/pop on a dozen monitors.

                      I hope it's rare, but I think a persistent nag window ("Your display isn't calibrated and may not be accurate") is probably a better answer than refusing to work altogether, because it will be clear about the source of the problem and less likely to get nailed down.

                      • gmueckl a year ago

                        You have to draw a line somewhere, I guess. As far as I remember, protections against accidental misuse and foreseeable abuse of a device are required in medical equipment. But malicious circumvention of protections or any kind of active tampering are a whole other category in my opinion.

                      • kaba0 a year ago

                        Medical devices are insanely expensive (a CT scanner may reach a million dollars), you won’t risk $100 on such a small thing as a screen.

                • gmueckl a year ago

                  I didn't work on that specific software team and it has been a long time since I worked there. But the software came with its custom calibration routine and I believe that the calibration result was stored with model and serial number information from the monitor EDID.

                  • alias_neo a year ago

                    Thanks, sounds like I need to do some reading about EDIDs; I knew _of_ them but no real understanding is what they are and what they do.

        • goodpoint a year ago

          > I know medical devices DO NOT use Linux

          Absolutely false.

      • acje a year ago

        I remember working on some telecom equipment in the 90ies. It had a x86/Unix feature rich distributed management system. In other words complicated and expected to fail. The solution was a “watch dog” circuit that the main CPU had to poll every 100ms or so. Three misses and the CPU would get hard rebooted by the watch dog.

        This reminds me of two things. Good system design needs a hardware-software codesign. Oxide computers has identified this, but it was probably much more common before the 90ies than after. The second thing is that all things can fail so a strategy that only hardens the one component is fundamentally limited, even flawed. If the component must not fail you need redundancy and supervision. Joe Armstrong would be my source of quote if I needed to find one.

        Both rust and Linux has some potential for improvement here, but the best answers may lie in their relation to the greater system, rather than within it self. I’m thinking of WASM and hardware codesign respectively.

      • kaba0 a year ago

        That’s why monitoring, fail-safe power offs and redundant systems are important. E.g. even at the complete failure of a CAT scanner’s higher level control (which let’s say would run on an embedded linux kernel), the system would safely stop the radiation and power off, without any instructions from an OS. Here, an inconsistent state from the OS is actually more dangerous than stopping in the middle (e.g. the OS stucks and the same, high energy radiation is continuously being released)

    • amluto a year ago

      As a kernel developer, I mostly disagree. Panicking hard is nice unless you are the user whose system rebooted without explanation or the developer trying to handle the bug report saying “my system rebooted and I have nothing more to say”.

      Getting logs out is critical.

      • EdSchouten a year ago

        One does not rule out the other. You could simply write crash info to some NVRAM or something, and then do a reboot. Then you can recover it during the next boot.

        But there is no need to let userspace processes continue to run, which is exactly what Linux does.

        • wtallis a year ago

          > You could simply write crash info to some NVRAM or something, and then do a reboot. Then you can recover it during the next boot.

          That works for some systems: those for which "some NVRAM or something" evaluates to a real device usable for that purpose. Not all Linux systems provide such a device.

          > But there is no need to let userspace processes continue to run, which is exactly what Linux does.

          Userspace processes usually contain state that the user would also like to be persisted before rebooting. If my WiFi driver crashes, there's nothing helpful or safer about immediately bringing down the whole system when it's possible to keep running with everything but networking still functioning.

          • EdSchouten a year ago

            > If my WiFi driver crashes, there's nothing helpful or safer about immediately bringing down the whole system when it's possible to keep running with everything but networking still functioning.

            There have been various examples of WiFi driver bugs leading to security issues. Didn’t some Broadcom WiFi driver once have a bug in how it processed non-ASCII SSID names, allowing you to trigger remote code execution?

            • wtallis a year ago

              We're not talking about bugs in general, we're talking about bugs whose manifestation is caught by error checking already in the code. For device drivers, those situations can often be handled safely by simply disabling the device in question while leaving the rest of the OS running. I doubt the Broadcom bug you're thinking of triggered a WARN_ON() in the code path allowing for a remote code execution. (Also, the highest-profile Broadcom WiFi remote code execution bug I'm aware of was a bug in the WiFi chip's closed-source firmware, which doesn't even run on the same processor as the host Linux OS.)

          • kaba0 a year ago

            Isn’t that exactly the reason behind microkernel’s supposed superiority?

            • wtallis a year ago

              It's the primary claimed benefit to microkernels. But since Linux as it exists today already handles this case, it isn't a very strong argument in favor of microkernels.

    • notacoward a year ago

      Yeah, this part has never really been true.

      > In the kernel, "panic and stop" is not an option

      That's simply not true. It's an option I've seen exercised many times, even in default configurations. Furthermore, for some domains - e.g. storage - it's the only sane option. Continuing when the world is clearly crazy risks losing or corrupting data, and that's far worse than a crash. No, it's not weird to think all types of computation are ephemeral or less important than preserving the integrity of data. Especially in a distributed context, where this machine might be one of thousands which can cover for a transient loss of one component but letting it continue to run puts everything at risk, rebooting is clearly the better option. A system that can't survive such a reboot is broken. See also: Erlang OTP, Recovery Oriented Computing @ Berkeley.

      Linus is right overall, but that particular argument is a very bad one. There are systems where "panic and stop" is not an option and there are systems where it's the only option.

      • wtallis a year ago

        > Furthermore, for some domains - e.g. storage - it's the only sane option.

        Can you elaborate on this? Because failing storage is a common occurrence that usually does not warrant immediately crashing the whole OS, unless it's the root filesystem that becomes inaccessible.

        • notacoward a year ago

          Depends on what you mean by "failing storage" but IMX it does warrant an immediate stop (with or without reboot depending on circumstances). Yes, for some kinds of media errors it's reasonable to continue, or at least not panic. Another option in some cases is to go read-only. OTOH, if either media or memory corruption is detected, it would almost certainly be unsafe to continue because that might lead to writing the wrong data or writing it to the wrong place. The general rule in storage is that inaccessible data is preferable to lost, corrupted, or improperly overwritten data.

          Especially in a distributed storage system using erasure codes etc., losing one machine means absolutely nothing even if it's permanent. On the last storage project I worked on, we routinely ran with 1-5% of machines down, whether it was due to failures or various kinds of maintenance actions, and all it meant was a loss of some capacity/performance. It's what the system was designed for. Leaving a faulty machine running, OTOH, could have led to a Byzantine failure mode corrupting all shards for a block and thus losing its contents forever.

          BTW, in that sort of context - where most bytes in the world are held BTW - the root filesystem is more expendable than any other. It's just part of the access system, much like firmware, and re-imaging or even hardware replacement doesn't affect the real persistence layer. It's user data that must be king, and those media whose contents must be treated with the utmost care.

          • wtallis a year ago

            I understand why a failing drive or apparently corrupt filesystem would be reason to freeze a filesystem. But that's nowhere close to kernel panic territory.

            Even in a distributed, fault-tolerant multi-node system, it seems like it would be useful for the kernel to keep running long enough for userspace to notify other systems of the failure (eg. return errors to clients with pending requests so they don't have to wait for a timeout to try retrieving data from a different node) or at least send logs to where ever you're aggregating them.

            • notacoward a year ago

              In a system already designed to handle the sudden and possibly permanent loss of a single machine to hardware failure, those are nice to have at best. "Panic" doesn't have to mean not executing a single other instruction. Logging e.g. over the network is one of the things a system might do as part of its death throes, and definitely was for the last few such systems I worked on. What's important is that it not touch storage any more, or issue instructions to other machines to do so, or return any more possibly-corrupted data to other systems. For example, what if the faulty machine itself is performing block reconstruction when it realizes the world has turned upside down? Or if it returns a corrupted shard to another machine that's doing such reconstruction? In both of those scenarios the whole block could be corrupted even though that machine's local storage is no longer involved. I've seen both happen.

              Since the mechanisms for ensuring the orderly stoppage of all such activity system-wide are themselves complicated and possibly error-prone, and more importantly not present in a commodity OS such as Linux, the safe option is "opt in" rather than "opt out". In other words, don't try to say you must stop X and Y and Z ad infinitum. Instead say you may only do A and B and nothing else. That can easily be accomplished with a panic, where certain parts such as dmesg are specifically enabled between the panic() call and the final halt instruction. Making that window bigger, e.g. to return errors to clients who don't really need them, only creates further potential for destructive activity to occur, and IMO is best avoided.

              Note that this is a fundamental difference between a user (compute-centric) view of software and a systems/infra view. It's actually the point Linus was trying to get across, even if he picked a horrible example. What's arguably better in one domain might be professional malfeasance in the other. Given the many ways Linux is used, saying that "stopping is not an option" is silly, and "continuing is not an option" would be equally so. My point is not that what's true for my domain must be true for others, but that both really are and must remain options.

              P.S. No, stopping userspace is not stopping everything, and not what I was talking about. Or what you were talking about until the narrowing became convenient. Your reply is a non sequitur. Also, I can see from other comments that you already agree with points I have made from the start - e.g. that both must remain options, that the choice depends on the system as a whole. Why badger so much, then? Why equivocate on the importance (or even meaningful difference) between kernel vs. userspace? Heightening conflict for its own sake isn't what this site is supposed to be about.

              • wtallis a year ago

                > "Panic" doesn't have to mean not executing a single other instruction.

                We're talking specifically about the current meaning of a Linux kernel panic. That means an immediate halt to all of userspace.

    • xyzzy_plugh a year ago

      Right, the fact that you can toggle those sysctls is sort of the point. There are plenty of environments where running at all is better than not, hardware watchdogs can solve for unresponsiveness.

      TFA is about making it possible for the kernel to decide what to do, rather than exploding on the spot, which is terrible.

    • pca006132 a year ago

      Probably not very linux, but consider the case where a device is controlling a motor speed by setting its current, computed by getting its current speed and do a simple PID control. If your device crashed (due to failure of something else, for example some LED display) while the current output is pretty large, you may cause some serious damage to whatever that motor is connected to. The thing the system should be able to do is notify such error and gracefully shutdown, for example sending commands to shutdown everything else.

      Although I think this can be better done by some special panic handler that performs a sjlj and notify other systems about the failure, without continuing running with the wrong output...

      • Gare a year ago

        Such a device should have a simple protective circuit that doesn't allow this. This is common in any expensive or critical industrial system.

        • yellowapple a year ago

          "Should" unfortunately ain't the same as "does". The Torvaldsian (for lack of a better word) attitude seems to be to assume that someone is indeed dumb enough to design a system wherein all safety measures are software-defined, and in such a situation the software in question probably shouldn't catastrophically fail on every last failed assertion.

    • UncleMeat a year ago

      It is also a fabulous way of introducing vulnerabilities.

  • titzer a year ago

    The way to handle this is split up kernel work into fail-able tasks [1]. When a safety check (like array OOB) occurs, it unwinds the stack up to the start of the task, and the task fails.

    Linus sounds so ignorant in this comment. As if no one else thought of writing safety-critical systems in a language that had dynamic errors, and that dynamic errors are going to bring the whole system down or turn it into a brick. No way!

    Errors don't have to be full-blown exceptions with all that rigamarole, but silently continuing with corruption is utter madness and in 2022 Linus should feel embarrassed for advocating such a backwards view.

    [1] This works for Erlang. Not everything needs to be a shared-nothing actor, to be sure, but failing a whole task is about the right granularity to allow reasoning about the system. E.g. a few dozen to a few hundred types of tasks or processes seems about right.

    • jmillikin a year ago

      I think Linus's response would be that those failable tasks are called "processes", and the low-level supervisor that starts + monitors them is the kernel. If you have code that might fail and restart, it belongs in userspace.

      If you want to run an Erlang-style distributed system in the kernel then that's an interesting research project, but it isn't where Linux is today. You'd be better off starting with SeL4 or Fuchsia.

      • titzer a year ago

        40 years of microkernels, of which I know Linus is aware of, beg to differ. Maybe Linus's extreme opposition to microkernels, ostensibly because they have historically a little lower performance--I dunno--but my comment should not be read as "yes, you must have a microkernel". There are cheaper fault isolation mechanisms than full-blown separate processes. Just having basic stack unwinding and failing a task would be a start.

        • jstimpfle a year ago

          How do you unwind if most of your kernel is written in C? (answering my own question - they are doing stack unwinding - only manually).

          Where do you unwind to if memory is corrupted?

          I don't think we're talking about what would be exception handling in other languages. I believe it's asserts. How do userland processes handle a failed assertion? Usually the process is terminated, but giving a debugger the possibility to examine the state first, or dumping core.

          And that's similar to what they are doing in the kernel. Only in that in the kernel, it's more dangerous because there is limited process / task isolation. I think that is an argument that taking down "full-blown separate processes" might not even be enough in the kernel.

        • WastingMyTime89 a year ago

          Sorry it’s hard to take you seriously after that.

          Linux isn’t a microkernel. If you want to work on a microkernel, go work on Fuchsia. It’s interesting research but utterly irrelevant to the point at hand.

          Anyway, the microkernel discussion has been happening for three decades now. They haven’t historically had a little lower performance. They had garbage performance, to the point of being unsuitable in the 90s.

          Plenty of kernel code can’t be written as to be unwindable. That’s the issue at hand. In a fantasy world, it might have been written as such but it’s not the world will live in which is what matters to Linus.

          • rleigh a year ago

            Others have mentioned QNX. There is also ThreadX, which is a "picokernel". Both are certified for use in safety-critical domains. There are other options as well. Segger do one, for example, and there's also SafeRTOS, and others.

            "Performance" is a red herring. In a safety-critical system, what matters is the behaviour and the consistency. ThreadX provides timing guarantees which Linux can not, and all of the system threads are executed in strict priority order. It works extremely well, and the result is a system for which one can can understand the behaviour exactly, which is important for validating that it is functioning correctly. Simplicity equates to reliability. It doesn't matter if it's "slow" so long as it's consistently slow. If it meets the product requirements, then it's fine. And when you do the board design, you'll pick a part appropriate to the task at hand to meet the timing requirements.

            Anyway, systems like ThreadX provide safety guarantees that Linux will never be able to. But the interface is not POSIX. And for dedicated applications that's OK. It's not a general-purpose OS, and that's OK too. There are good reasons not to use complex general-purpose kernels in safety-critical systems.

            IEC 62304 and ISO 13485 are serious standards for serious applications, where faults can be life-critical. You wouldn't use Linux in this context. No matter how much we might like Linux, you wouldn't entrust your life to it, would you? Anyone who answered "yes" to that rhetorical question should not be trusted with writing safety-critical applications. Linux is too big and complex to fully understand and reason about, and as a result impossible to validate properly in good faith. You might use it in an ancillary system in a non-safety-critical context, but you wouldn't use it anywhere where safety really mattered. IEC 62304 is all about hazards and risks, and risk mitigation. You can't mitigate risks you can't fully reason about, and any given release of Linux has hundreds of silly bugs in it on top of very complex behaviours we can't fully understand either even if they are correct.

            • WastingMyTime89 a year ago

              Sorry, I’m a bit lost regarding your comment. The discussion was about code safety in Linux in the context of potentially introducing Rust. I don’t really see the link with microkernels in the context of safety oriented RTOS. I think you are reacting to my comment about microkernels performance in the 90s which I maintain.

              Neither QNX nor ThreadX are intended to be general purpose kernel. I haven’t looked into it for a long time but QNX performances used to not be very good. It’s small. It can boot fast. It gives you guaranty regarding time of return. Everything you want from a RTOS in a safety critical environment. It’s not very fast however which is why it never tried to move towards the general market.

          • jeffreygoesto a year ago

            Well QNX is running on a gazillion of devices, even resource restricted ones, without problems. It can be slower but it does not have to always. That is gar from being a fantasy world.

          • pjmlp a year ago

            QNX and INTEGRITY customers will be to differ.

        • lr1970 a year ago

          > 40 years of microkernels, of which I know Linus is aware of, beg to differ.

          For better or worse Linux is NOT a microkernel. Therefore, the sound microkernel wisdom is not applicable to Linux in its present form. The "impedance match" of any new language added to the linux kernel is driven by what current kernel code in C is doing. This is essentially linux kernel limitation. If Rust cannot adapt to these requirements it is a mismatch for linux kernel development. For the other kernels like Fuchsia Rust is a good fit. BTW, core Fuchsia kernel itself is still in C++.

        • pca006132 a year ago

          I think it might be a bit more complicated than that, considering you can have static data and unwinding the stack will not reset those states. I guess you still need some sort of task level abstraction and reset all the data for that task when unwinding from it. Btw, do we need stack unwinding or can we just do a sjlj?

          • titzer a year ago

            Note I am not going to advocate for try ... catch .. finally, because I think that language mechanism is abused out the wazoo for handling all kinds of things, like IO errors, but this is exactly what try ... finally would be for.

            Regardless, I think just killing the task instantly, even with partial updates to memory, would be totally fine. It'd be cheap, as automatically undoing the updates (effectively a transaction rollback) is still too expensive. Software transactional memory just comes with too much overhead.

            I vote "kill and unwind" and then dealing with partial updates has to be left to a higher level.

        • geertj a year ago

          > Just having basic stack unwinding and failing a task would be a start.

          As the sibling comment pointed out, if you extend this idea to clean up all state, you end up with processes.

          I do have some doubt on the no panic rule. But instead of emulating processes in the kernel, I’d see a firmware like subsystem whose only job it is to export core dumps from the local system, after which the kernel is free to panic.

          As a general point and in my view, and I agree this is an appeal to authority, Linus has this uncanny ability to find compromises between practicality and theory that result in successful real world software. He’s not always right but he’s almost never completely wrong.

    • mike_hock a year ago

      It doesn't continue silently, it warns. More accurately, it does what you tell it to, which can also be a hard stop if you want to.

      It's up to you to choose the right failure strategy and monitor your system if you don't want to panic, and take appropriate measures and not just ignore the warning.

      It's not Linus who sounds ignorant here, it's the people applying user-space "best practices" to the kernel. If the kernel panics, the system is dead and you've lost the opportunity to diagnose the problem, which may be non-deterministic and hard to trigger on purpose.

      • jeffreygoesto a year ago

        I agree with your statements, but I wonder: who is warned typically? An end user via a log he neither reads nor understands? The chance that this will lead to the right measure is low, isn't it.

        • Gibbon1 a year ago

          The couple of times I had to go digging into the kernel what the thing looks like to me is a very large bare metal piece of firmware. As someone who writes firmware that very last thing you ever want is it to hang or reset without reporting any diagnostics. Because you have no idea where the offending code is. I'll belabor the point for people that think a large program is a few thousand lands. With the kernel it's millions of lings of code mostly written by other people.

          Small rant. ARM cortex processors overwrites the stack pointer on reset. That's very very very dumb because after the watchdog trips you have no idea what the code was doing. Which means you can't report what the code was doing when that happened.

  • hedgehog a year ago

    As I read it the issue is a little bit deeper, start with the context here and read down the thread:

    https://lkml.org/lkml/2022/9/19/640

    Get at least down to here:

    https://lkml.org/lkml/2022/9/20/1342

    What Linus seems to be getting at is that there are many varying contextual restrictions on what code can do in different parts of the kernel, that Filho etc appear to be attempting to hide that complexity using language features, and that in his opinion it is not workable to fit kernel APIs into Rust's common definition of a single kind of "safe" code. All of this makes sense, in user land you don't normally have to care about things like whether different functional units are turned on or off, how interrupts are set up, etc, but in kernel you do. I'm not sure if Rust's macros and type system will allow solving the problem as Linus frames it but it seems like a worthy target & will be interesting to watch.

  • layer8 a year ago

    Please correct me if I’m wrong, but Rust also has no built-in mechanism to statically determine “this code won’t ever panic”, and thus with regards to Linux kernel requirements isn’t safer in that aspect than C. To the contrary, Rust is arguably less safe in that aspect than C, due to the general Rust practice of panicking upon unexpected conditions.

    • jmillikin a year ago

      Rust doesn't have an official `#[never_panic]` annotation, but there's a variety of approaches folks use. Static analysis (Clippy) can get you pretty far. My favorite trick is to link a no_std binary with no panic handler, and see if it has a linker error. No linker error = no calls to panic handler = no panics.

      Note that Rust is easier to work with than C here, because although the C-like API isn't shy about panicking where C would segfault, it also inherits enough of the OCaml/Haskell/ML idiom to have non-panic APIs for pretty much any major operation. Calling `saturating_add()` instead of `+` is verbose, but it's feasible in a way that C just isn't unless you go full MISRA.

      • ajross a year ago

        > Static analysis (Clippy) can get you pretty far.

        What's funny about this is that (while it's true!) it's exactly the argument that Rustaceans tend to reject out of hand when the subject is hardening C code with analysis tools (or instrumentation gadgets like ASAN/MSAN/fuzzing, which get a lot of the same bile).

        In fact when used well, my feeling is that extra-language tooling has largely eliminated the practical safety/correctness advantages of a statically-checked language like rust, or frankly even managed runtimes like .NET or Go. C code today lives in a very different world than it did even a decade ago.

        • jmillikin a year ago

          Analysis for memory safety is really hard. For >40 years there's been entire sub-industries focused just on analysis of C/C++ memory safety and it's nowhere near a solved problem. That's why Rust has `unsafe`, it's so the programmer has a way to encode logic that they believe is safe but aren't yet able to prove.

          Analysis of panic-safety in Rust is comparatively easy. The set of standard library calls that can panic is finite, so if your tool just walks every call graph you can figure out whether panic is disproven or not.

          • ajross a year ago

            > That's why Rust has `unsafe`, it's so the programmer has a way to encode logic that they believe is safe but aren't yet able to prove.

            That's not the right way to characterize this. Rust has unsafe for code that is correct but that the compiler is unable to detect. Foreign memory access (or hardware MMIO) and cyclic data structures are the big ones, and those are well-specified, provable, verifiable regimes. They just don't fit within the borrow checker's world view.

            Which is something I think a lot of Rust folks tend to gloss over: even at it's best, most maximalist interpretation, Rust can only verify "correctness" along axes it understands, and those aren't really that big a part of the problem area in practice.

            • kaba0 a year ago

              I don’t really get your comment - are you agreeing or disagreeing with parent? Because you seemingly say the same thing.

              And continuing on parent’s comment, rust can only make its memory guarantees by restricting the set of programmable programs, while C and the like’s static analysis has to work on the whole set which is simply an undecidable problem. As soon as unsafe is in the picture, it becomes undecidable as well in Rust, in general.

              • ajross a year ago

                The parent comment seemed to imply that using unsafe was a failing of the developer to prove to the compiler that the code is correct. And that's not right, unless you view thing like "doubly linked list" as incorrect code. Unsafe is for correct code that the compiler is unable to verify.

        • tialaramex a year ago

          Rust makes choices which drastically simplify the analysis problem.

          The most obvious is mutable references. In Rust there can be either one mutable reference to an object or there may be any number of immutable references. So if we're thinking about this value here, V, and we're got an immutable reference &V so that we can examine V well... it's not changing, there are no mutable references to it by definition. The Rust language won't let us have &mut V the mutable reference at the same time &V exists and so it needn't ever account for that possibility†.

          In C and C++ they break this rule all the time. It's really convenient, and it's not forbidden in C or C++ so why not. Well, now the analysis you wanted to do is incredibly difficult, so good luck with that.

          † This also has drastic implications for an optimiser. Rust's optimiser can often trivially conclude that a= f(b); can't change b where a C or C++ optimiser is obliged to admit that actually it's not sure, we need to emit slower code in case b is just an alias for a.

          • ajross a year ago

            That point would be stronger if Rust were showing up as faster than C compilers on this axis, and it isn't. In point of fact aliasing rules (which is what C calls the problem you're talking about) are very well travelled ground and the benefits and risks of -fno-strict-aliasing are well understood. Static analysis tools are in fact extremely good at this particular area of the problem. And of course at the level of emerging technologies, LTO makes this largely a non-issue because the compiler is able to see the aliasing opportunities at the level of global analysis. It doesn't need the programmer or the compiler to promise it an access is unaliased, it can just check and see.

            • kaba0 a year ago

              It’s not a problem, because compilers simply declare it as UB and don’t care about it. If you by accident did depend on that, your program will just get a nice, hard to debug bug.

        • P5fRxh5kUvp2th a year ago

          I've long been of the opinion that the rust community is missing the boat in a lot of respects.

          Rust could do the external tooling better than any other language out there, but they're so focused on the _language_ preventing abuse that they've largely missed the boat.

        • lifthrasiir a year ago

          While C/C++ has been made much safer by sanitizers, fuzzing and lints, they are:

          - Only useful when actually being used, which is never the case. (Seriously, can we make at least ASAN the default?)

          - Often costly to always turn them on (e.g. MSAN).

          - Often requires restructuring or redesign to get the most out of them (especially fuzzing).

          Rust's memory safety guarantee does not suffer from first two points, and the third point is largely amortized into the language learning cost.

        • j-krieger a year ago

          The difference being that C is barely changed for the sake of backwards compatibility while the panicking in kernel space is a recognized problem with rust that is being actively worked on and will have a solution in the future.

        • pjmlp a year ago

          Static analysis for C exists since 1979, date of lint's availability, the problem isn't lack of tooling, rather people actually using it.

          • jstimpfle a year ago

            I would assume that the things that lint could do you nowadays get by simply using -Wall -Wextra with gcc for example. While I haven't checked what a lint is required to do, but there have been plenty of situations in the past where I had to change my code in order to avoid triggering false positives from tests that run during normal compilation. For instance, there are tests that find accesses to potentially uninitialized variables.

            • pjmlp a year ago

              If we are talking about the 1979 version, most likely.

              If we are talking about products like PC-lint, Sonar qube, Coverity, the experience is much more than that.

      • pdimitar a year ago

        > My favorite trick is to link a no_std binary with no panic handler, and see if it has a linker error. No linker error = no calls to panic handler = no panics.

        Oh? How do you do that? Do you have a written guide handy? Very curious about this.

    • pornel a year ago

      Lack of a non-hacky no-panic guarantee is a pain. That would be like a no-segfault guarantee in C.

      But Rust's situation is still safer, because Rust can typically prevent more errors from ever becoming a run-time issue, e.g. you may not even need to use array indexing at all if you use iterators. You have a guarantee that references are never NULL, so you don't risk nullptr crash, etc.

      Rust panics are safer, because they reliably happen instead of an actually unsafe operation. Mitigations in C are usually best-effort and you may be lucky/unlucky to silently corrupt memory.

      Panics are a problem for uptime, but not for safety (in the sense they're not exploitable for more than DoS).

      In the long term crashing loud and clear may be better for reliability. You shake out all the bugs instead of having latent issues that corrupt your data.

      • hegelstoleit a year ago

        I think you're missing the point Linus made. Panicking is safer from a memory safety perspective, but it's not from a kernel perspective. You'll lose all the file changes that are not saved, you'll risk having disk written in a bad state which can be catastrophic, etc.

        • zozbot234 a year ago

          Panic in Rust need not equate to a literal kernel panic. It can call an oops handler, which might manage to keep the system operational depending on where the failure occurred.

          • hegelstoleit a year ago

            As far as I'm aware there's no way to reliably catch all panics. catch_unwind does not catch all panics. Handlers don't stop the program from terminating abruptly.

        • lifthrasiir a year ago

          Panic corresponds to a potential logic bug. If you have a logic bug, you already risk its consequences even if the panic didn't happen. As long as panic can be caught (and in Rust, it's indeed the case) it is safer than the alternative.

          • hegelstoleit a year ago

            1. As far as i'm aware there's no way to reliably catch all panics. catch_unwind does not catch all panics. 2. The whole point is that consequences of a panic are worse than the consequences of memory corruption. That's how the kernel was designed. There was an explicit design decision not to kernel panic in every situation where a logic error occurs.

            • lifthrasiir a year ago

              There are tons of edge cases with panics, e.g. panic can trigger a destructor that can panic itself, or unwinding may cross a language boundary which may not be well defined, but to my knowledge `catch_unwind` does catch all panics as long as unwinding reliably works. That disclaimer in the `catch_unwind` documentation only describes the `panic = abort` case.

              And I thought it was clear that kernel panic is different from Rust panic, which you don't seem to distinguish. Rust panic doesn't need to cause a kernel panic because it can be caught earlier.

              • hegelstoleit a year ago

                Obviously rust panic is not the same as a kernel panic. What you're taking for granted is that just because rust can catch a panic that it will. A simple overflow can cause a panic. When this happens, the panic might be caught before the kernel panics, but by then the program is probably already in an undefined state. It also might not be caught at all, and cause an actual kernel panic.

                • lifthrasiir a year ago

                  The program is in a defined but undesirable state, both when panic occurred in Rust and when a "simple" uncontrolled overflow happened in C (provided that the compiler is configured to not treat it as an UB, otherwise it'd be worse). And anyone doing Rust-C interop already has to be conscious about the language boundary, which happens to be a perfect place to catch Rust panics.

        • pornel a year ago

          I understand his point. I just disagree, and prefer a different tradeoff.

          Yes, a kernel panic will cause disruption when it happens. But that will also give a precise error location, which makes reporting and fixing of the root cause easier. It could be harder to pinpoint of the code rolled forward in broken state.

          It will cause loss of unsaved data when it happens, but OTOH it will prevent corrupted data from being persisted.

          • wtallis a year ago

            > But that will also give a precise error location, which makes reporting and fixing of the root cause easier. It could be harder to pinpoint of the code rolled forward in broken state.

            I think you must have missed out on how Linux currently handles these situations. It does not silently move on past the error; it prints a stack trace and CPU state to the kernel log before moving on. So you have all of the information you'd get from a full kernel panic, plus the benefit of a system that may be able to keep running long enough to save that kernel log to disk.

          • stjohnswarts a year ago

            On one embedded projected we had a separate debug chip could safely shutdown what might be dangerous circuits in the case of controller failure. The source code for that was much much much smaller than the controller and heavily vetted by mulitiple people. The small dedicated would initiate circuit shutdown on panic from the linux kernel on the controller. My point being is it's hard to know what happens after a panic, and logging and such is nice, but that may or may not be available, but being able to do some action as simple as sending a "panic" signal to a second dedicated processor to shut down critical systems in a controlled manner is nice. "Stop the world" can be very dangerous in some situations. There were even more independent backup failsafes on the potentially dangerous circuits as well, but redundancy is even more insurance something bad won't happen.

      • layer8 a year ago

        One has to be careful about words. When Rust (or Linux) is used in (say) a vehicle or in a nuclear power plant, panicking certainly has immediate safety implications.

        • avgcorrection a year ago

          And a perfect, bug-free ballistic rocket program is unsafe in the sense that it is efficient at causing damage.

          Rust’s “safety” has always meant what the Rust team meant by that term. There’s no gotcha to be found here except if you can find some way that Rust violates its own definition of the S-word.

          This submission is not really about safety. It’s a perfectly legitimate concern that Rust likes to panic and that panicking is inappropriate for Linux. That isn’t about safety per se.

          “Safety“ is a very technical term in the PL context and you just end up endlessly bickering if you try to torture the term into certain applications. Is it safer to crash immediately or to continue the program in a corrupted state? That entirely depends on the application and the domain, so it isn’t a useful distinction to make in this context.

          EDIT: The best argument one could make from this continue-can-be-safer perspective is that given two PLs, the one that lets you abstract over this decision (to panic or to continue in a corrupted state, preferably with some out of band error reporting) is safer. And maybe C is safer than Rust in that regard (I wouldn’t know).

          • layer8 a year ago

            That’s exactly my point. Rust’s definition of safety is a very specific one, and one has to be careful about what it actually implies in the context where Rust is employed. “Safety” isn’t a well-defined term for PL in general. “Soundness” is.

            • UncleMeat a year ago

              > “Safety” isn’t a well-defined term for PL in general. “Soundness” is.

              This is false. "Safety" and "Liveness" are terms used by the PL field to describe precise properties of programs and they have been used this way for like 50 years (https://en.wikipedia.org/wiki/Safety_and_liveness_properties). A "safety" property describes a guarantee that a program will never reach some form of unwanted state. A "liveness" property describes a guarantee that a program will eventually reach some form of wanted state. These terms would be described very early in a PL course.

              • layer8 a year ago

                What I mean is that there is no universal definition of which properties are safety properties. In principle, you can define any property you can formally reason about as a safety property. Therefore, whenever you talk about safety, you first have to define which properties you mean by that.

                In the context of Rust, there are a number of safety properties that Rust guarantees (modulo unsafe, FFI UB, etc.), but that set of safety properties is specific to Rust and not universal. For example, Java has a different set of safety properties, e.g. its memory model gives stronger guarantees than Rust’s.

                Therefore, the meaning of “language X is safe” is entirely dependent on the specific language, and can only be understood by explicitly specifying its safety properties.

                • UncleMeat a year ago

                  That's true for "soundness" too. Things aren't just "sound". They are sound with respect to something. So when you use "soundness" as a comparison against "safety", you'll have to understand how somebody could interpret your post in the way that I did.

                  Almost all discussion about Rust is in comparison to C and C++, by far the dominant languages for developing native applications. C and C++ are famously neither type-safe nor memory-safe and it becomes a pretty easy shorthand in discussions of Rust for "safety" to refer to these properties.

                • avgcorrection a year ago

                  > Therefore, whenever you talk about safety, you first have to define which properties you mean by that.

                  Like “memory safety”?

                  • layer8 a year ago

                    For example. Rust has other safety properties beyond memory safety.

            • avgcorrection a year ago

              Memory safety is a well-defined term.

              • layer8 a year ago

                I agree, but that isn’t the term that was used here, and Rust proponents usually mean more than memory safety by “safe” (like e.g. absence of UB).

                • avgcorrection a year ago

                  Going through that thread (a few posts back) it seems that “Rust is safe” (as seen in this submission title) was stated first by Torvalds. It wasn’t mentioned first by a “Rust aficianado”. So you would really have to ask Torvalds what he meant. But his mentioning of it (and this submission) obviously alludes to “safe” claims by the Rust project. Which has always been memory safety.

                  • layer8 a year ago

                    I disagree that “safe” as used by the Rust community is always restricted to memory safety, see my parent comment.

                • veber-alex a year ago

                  absence of UB is literally memory safety.

                  Rust proponents mean exactly "memory safety" when they say rust is safe because that is the only safety rust guarantees.

          • goto11 a year ago

            Linus' point is that safety means something different in kernel programming than in PL theory, and that Rust have to be safe according to kernel rules before it can be used for kernel programming.

        • pornel a year ago

          But it can still be safer - e.g. a panic can trigger an emergency stop instead of silently overwriting the "go full throttle" variable.

          • mike_hock a year ago

            Yes, or jumping to the "emergency stop" routine can instead trigger "go full throttle" because the jump address has been corrupted.

            Or in an actual vehicle, the "emergency stop" (if that means just stomping on the brakes) can flip the car and kill its passengers.

            • stjohnswarts a year ago

              It's about the odds here. Nothing is 100% safe. Independent systems almost always provide backup safety incase the OS/embedded system fails. Thing like overcurrent detector, brown out detectors, speed governors, etc in case code does something as a result of running corrupted (or something similarly awful)

    • dcsommer a year ago

      > ... the general Rust practice of panicking upon unexpected conditions

      What makes you say this? From the sample I've seen, Rust programs are far more diligent about handling errors (not panicking: either returning error or handling it explicitly) than C or Go programs due to the nature of wrapped types like Option<T> and Result<T, E>. You can't escape handling the error, and panicking potential is very easy to see and lint against with clippy in the code.

      • layer8 a year ago

        I’m referring to the fact that ubiquitous functions like unwrap() panic if the programmer has made an error. Guarding against such panics is outside of the scope of Rust-the-language, and has to be handled through external means. There are linters for C as well.

        • Veliladon a year ago

          I think I prefer Rust's way of doing things. Just last night I used the Vec! macro incorrectly putting in a comma instead of a semi-colon and despite the program compiling correctly, it immediately panicked with an OOB error. With C it would have been a lot harder to even notice a bug little alone track it down.

          • layer8 a year ago

            Right. My personal opinion is that exceptions provide a better trade-off between catching bugs and still allowing the chance of graceful shutdown or recovery.

            • Veliladon a year ago

              In my case it wouldn't have raised an exception though, it would have just been UB.

              It's not like there's not exceptions in Rust though. The error handling is thorough to a fault when it's used. Unwrap is just a shortcut to say "I know there might be bad input, I don't want to handle it right now, just let me do it and I'll accept the panic."

              • layer8 a year ago

                By exceptions, I’m referring to languages with exceptions as a dedicated language construct with automatic stack unwinding, and preferably without UB (e.g. Java or C#). Rust doesn’t have exceptions in that sense.

                • dureuill a year ago

                  But panics in rust are pretty much exceptions though?

                  The differences are they are actually meant to be used for exceptional situations ("assert violated => there's a bug in this program" or "out of memory, catastrophic runtime situation") and they are not typed (rather, the panic holds a type erased payload).

                  Other than that, it performs unwinding without UB, and is catchable[0]. I'm not seeing the technical difference?

                  [0]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html

                  • layer8 a year ago

                    You’re probably right now that I’ve read up on it, I wasn’t previously aware of catch_unwind.

                    • dureuill a year ago

                      Glad to be of service. Note that the idiomatic error handling in rust is still Result based rather than panic/catch_unwind based.

                      Nevertheless a long living application like, e.g., a webserver will catch panics coming from its subtasks (e.g., its request handlers) via catch_unwind

            • stjohnswarts a year ago

              exceptions would be awful in the kernel. I would be highly surprised if kernels like fuscia allow c++ exceptions.

        • tomjakubowski a year ago

          > that ubiquitous functions like unwrap() panic if the programmer has made an error.

          You're not wrong but you chose a hilarious example. Unwrap's entire purpose is to turn unhandled errors into panics!

          Array indexing, arithmetic (with overflow-checks enabled), and slicing are examples where it's not so obvious there be panic dragons. Library code does sometimes panic in cases of truly unrecoverable errors also.

        • stjohnswarts a year ago

          That's where linters and code reviews come in, you will never 100% prevent stupid coding, that's where review either automated and/or other coders and coding standards come in.

        • hegelstoleit a year ago

          Linters can catch panics, linters for C won't catch memory issues which is what rust prevents.

          • layer8 a year ago

            Linters like Splint [0] (predating Rust) can do that for C. I’m not saying that Rust’s built-in approach isn’t better, but please be careful about what exactly you claim.

            [0] http://splint.org/

            • dcsommer a year ago

              Interesting that despite tools like Splint, 70% of high severity security vulns, including in well staffed projects like Chrome and Windows, are due to memory unsafety. The false negatives of security analysis tools are significant and are the very reason Rust got developed.

              • layer8 a year ago

                No, the reason Rust was developed (with regard to that aspect) was that the necessary static analysis is enforced by the compiler if it is built into the language, whereas otherwise (if not built in) it empirically doesn’t get a lot of adoption. There’s nothing Rust’s static analysis is doing that couldn’t be done with the same semantics using an external static analyzer and linter annotations.

                The ideas of Rust weren’t new when Rust was developed. The actual integration into a new programming language beyond experimental status was, and the combination with ML-style functional programming.

            • hegelstoleit a year ago

              Splint doesn't make C memory safe. What I meant is that it doesn't prevent the same problems that Rust does. Hence, you can add a linter to rust to prevent panics. You cannot add a linter to C to make it memory safe.

    • stjohnswarts a year ago

      I think you're mixing up what a -kernel- rust programmer would do (but should know not to do) vs what rust "is", it's not the same. You have to enter a different mindset with the kernel, it will be a new hybrid of c context in the kernel and user-land programming.

    • jrochkind1 a year ago

      > but Rust also has no built-in mechanism to statically determine “this code won’t ever panic”,

      My intuition says that's the Halting Problem, so not actually possible to implement perfectly? https://en.wikipedia.org/wiki/Halting_problem

      • jmillikin a year ago

        If you were to define a subset of Rust's standard library (core + alloc + std) that does not contain the `panic!` macro, and excluded all functionality that needed to panic, then safe Rust could be proven to never panic (because it can't).

        That's different than solving the halting problem. You're not trying to prove it halts, you're just trying to prove it doesn't halt in a specific way, which is trivial to prove if you first make it impossible.

        • gpm a year ago

          > If you were to define a subset of Rust's standard library (core + alloc + std) that does not contain the `panic!` macro, and excluded all functionality that needed to panic, then safe Rust could be proven to never panic (because it can't).

          Not quite, because stack overflows can cause panics independent of any actual invocation of the panic macro.

          You need to either change how stack overflows are handled as well, or you need to do some static analysis of the stack size as well.

          Both are possible (while keeping rust turing complete), so it's still not like the halting problem.

          • jmillikin a year ago

            In a Rust defined without `panic!`, a stack overflow would not be able to panic. What would probably happen is the process would just die, like C.

      • pca006132 a year ago

        No, panic is not halting, you just need some static check to check that you never call some functions that can panic in your code. Essentially it is just checking if some code (panic) might be reachable, if it is not, it will never panic (but it can still do other crazy things).

        Note that we can only check for maybe, because in general we don't know if some code in the middle will somehow execute forever and never reach the panic call after it.

        • roywiggins a year ago

          Even if it is halting, you can sometimes statically detect if a Turing machine never halts. Just look through the state machine and see if any states will transition to a halt; if none of them do, the machine will loop forever. This is not a very large fraction of machines that loop forever, but if you're writing a machine and want to be absolutely sure it won't halt, just don't put in any states that halt.

      • skybrian a year ago

        That's different. You can't perfectly detect all infinite loops in a language that allows arbitrary loops. This also means you can't perfectly detect unreachable code.

        But determining that a function (such as panic) is never called because there are no calls to it is pretty easy.

      • im3w1l a year ago

        If you are fine with saying that stuff like this code may panic (and many people are fine with just that), then it's perfectly doable

            if false {
                panic!()
            }
        
        Basically you'd prohibit any call to panic whether they may actually end up running or not.
        • jrochkind1 a year ago

          Fair. From the responses, clearly i didn't know what I was talking about, fair enough!

          But ok, uninformed me would have guessed checking for that would be pretty straightforward in statically typed Rust. Is that something people want? Why isn't there a built-in mechanism to do it?

    • geraneum a year ago

      We cannot ensure that an arbitrary program halts by statically analyzing it. And it doesn’t have anything to do with the language of choice.

      https://en.m.wikipedia.org/wiki/Halting_problem

      • layer8 a year ago

        Proof assistants, which I expect to eventually merge with programming languages, can be used to restrict the set of programs you write to those where you can statically prove all properties you expect the program to hold. It’s not much different from what diligent programmers have always done in their head (with, of course, much more room for error).

        The fact that arbitrary programs are undecidable is a red herring here.

        • yonixw a year ago

          "Undecidable" Is way too close to your day2day program than you think: https://en.wikipedia.org/wiki/Rice%27s_theorem

          I would like to learn otherwise, but even a React JS+HTML page is undecidable... its scope is limited by chrome V8 js engine (like a vm), but within that scope I don't think you can prove anything more. otherwise we could just make static analysis to check if it will leak passwords...

          • layer8 a year ago

            I’m not sure you understand Rice’s theorem correctly. It means that you can’t write an algorithm that takes an arbitrary program as input and tells you whether it fulfills a given nontrivial semantic property. But you can write an algorithm that can tell you for some subset of programs. So as a developer, if you restrict yourself to releasing programs for which the algorithm has halted and given you the desired answer, you are fine.

            Depending on the semantic property to check for, writing such an algorithm isn’t trivial. But the Rust compiler for example does it for memory safety, for the subset of valid Rust programs that don’t use Unsafe.

            • yonixw a year ago

              But isn't every program we write today (Rust, C++, Python, JS, etc.) raise up to the level of an "arbitrary program"? How do you find those "some subset of programs" that will halt by said algorithm?

              The only sure way I can think of, is when you force your program to go through a more narrow non-turing algorithm. Like sending data through a network after Serialization. Where we could limit the De-Serialization process to be non Turing (json, yaml?).

              Same for code, that uses non-turing API, like memory allocation in a dedicated per process space. Or rust "borrow" mechanics that the compiler enforces.

              But my point is, everyday program are "arbitrary program" and not a red haring. Surly from the kernel perspective, which is Linus point imo.

              • layer8 a year ago

                For the first question, see the second paragraph I added in my previous comment.

                Regarding the second question, in the general case you have to guess or think hard, and proceed by trial and error. You notice that the analyzer takes more time than you’re willing to wait, so you stop it and try to change your program in order to fix that problem.

                We already have that situation today, because the Rust type system is turing-complete. Meaning, the Rust compiler may in principle need an infinite amount of time to type-check a program. Normally the types used in actual programs don’t trigger that situation (and the compiler also may first run out of memory).

                By the way, even if Rust’s type system wasn’t turing-complete, the kind of type inference it uses takes exponential time, which in practice is almost the same as the possibility of non-halting cases, because you can’t afford to wait a hundred or more years for your program to finish compiling.

                > But my point is, everyday program are "arbitrary program"

                No, most programs we write are from a very limited subset of all possible programs. This is because we already reason in our heads about the validity and suitability of our programs.

                • yonixw a year ago

                  > Regarding the second question, in the general case you have to guess or do trial and error.

                  > You notice that the analyzer takes more time than you’re willing to wait,

                  I see, thanks, didn't know about this feedback loop as I'm not a rust programmer. Still on my todo list to learn.

                  • layer8 a year ago

                    I don’t think it actually happens in Rust in practice, or only very rarely. I was more talking about the hypothetical case for any static analysis of nontrivial program properties as in Rice’s theorem.

        • geraneum a year ago

          > Rust is arguably less safe in that aspect than C, due to the general Rust practice of panicking upon unexpected conditions

          For clarification, I responded to this in particular because "safety" is being conflated with "panicking" (bad for kernel). I reckoned "Unexpected conditions" means "arbitrary programs", hence my response, otherwise you could just remove the call to panic.

      • roywiggins a year ago

        You can prove that a machine can't ever write "1" to the tape if you just look at the state machine and see that none of the rules write a 1 to the tape. Since no rules ever write 1, no possible execution could.

        Working out whether it will write 1 to the tape in general is undecidable, but in certain cases (you've just banned states that write 1) it's trivial.

        If all of the state transitions are valid (a transition to a non-existing state is a halt) then the machine can't get into a state that will transition into a halt, so it can't halt. That's a small fraction of all the machines that won't halt, but it's easy to tell when you have one of this kind by looking at the state machine.

        • geraneum a year ago

          > Rust is arguably less safe in that aspect than C, due to the general Rust practice of panicking upon unexpected conditions

          For context, this is OP's sentence that I responded to in particular. Ensuring safety [1] is way less trivial than looking for a call to "panic" in the state machine. You can remove the calls to "panic" and this alone does not make your program safer than the equivalent C code. It just makes it more kernel friendly.

          [1] not only memory safety

        • yonixw a year ago

          "Print 1" is trivial according to this: https://en.wikipedia.org/wiki/Rice%27s_theorem.

          But day to day programs are not trivial... as for your example, just switch it with this code: `print(gcd(user_input--,137))`... now it's quite more hard to "just ban some final states"

          • roywiggins a year ago

            Turing machines have a set of states and a transition function that governs how it moves between states. The transition function is a bunch of mappings like this:

                (input state, input symbol) --> (output state, output symbol, move left/right)
            
            This is all static, so you can look at the transition table to see all the possible output symbols. If no transition has output symbol 1, then it never outputs 1. It doesn't matter how big the Turing machine is or what input it gets, it won't do it. This is basically trivial, but it's still a type of very simple static analysis that you can do. Similarly, if you don't have any states that halt, the machine will never halt.

            This is like just not linking panic() into the program: it isn't going to be able to call it, no matter what else is in there.

          • UncleMeat a year ago

            That's not what "trivial property" means w.r.t. Rice's Thm.

            The point is that you can produce a perfectly working analysis method that is either sound or complete but not both. "Nowhere in the entire program does the call 'panic()' appear is a perfectly workable analysis - it just has false positives.

          • joshuamorton a year ago

            Indeed, but panic is easier because in some ways checking if a program can panic is akin to checking if the program links the panic function.

            And that's pretty easy to statically analyze.

  • ChrisSD a year ago

    That's all very well but the Rust-in-Linux advocates are advocating for #2 and fully agree with Linus on #1. So attacking #1 is attacking straw.

  • scoutt a year ago

    I think it goes beyond "panics". Linus is saying that Rust cannot guarantee safety under certain circumstances and that safety still depends on the order of fuctions called by the kernel module developer. Because some checks on some compilations will be disabled.

    "If you want to allocate memory, and you don't want to care about what context you are in, or whether you are holding spinlocks etc, then you damn well shouldn't be doing kernel programming. Not in C, and not in Rust.

    It really is that simple. Contexts like this ("I am in a critical region, I must not do memory allocation or use sleeping locks") is fundamental to kernel programming. It has nothing to do with the language, and everything to do with the problem space."

    https://lkml.org/lkml/2022/9/19/840

  • oconnor663 a year ago

    I don't have any experience with this project, but I know a lot of panics in my Rust code look like this (you probably know this already, just setting up a question):

        fn foo<T>() -> Option<T> {
            // Oops, something went wrong and we don't have a T.
            None
        }
    
        fn bar<T>() -> T {
            if let Some(t) = foo() {
                t
            } else {
                // This could've been an `unwrap`; just being explicit here
                panic!("oh no!");
            }
        }
    
    A panic in this case is exactly like an exception in that the function that's failing doesn't need to come up with a return value. Unwinding happens instead of returning anything. But if I was writing `bar` and I was trying to follow a policy like "never unwind, always return something", I'd be in a real pickle, because the way the underlying `foo` function is designed, there aren't any T's sitting around for me to return. Should I conjure one out of thin air / uninitialized memory? What does the kernel do in situations like this? I guess the ideal solution is making `bar` return `Option<T>` instead of `T`, but I don't imagine that's always possible?
    • strictfp a year ago

      The harsh truth is that you need to think about every single case of failure, and decide what to do when things go south.

      If you look at how POSIX does it, pretty much every single function has error codes, signaling everything from lost connections, to running out of memory, entropy or whatnot. Failures are hard to abstract away. Unless you have some real viable fallback to use, you're going to have to tell the user that something went wrong and leave it up to them to decide what the application can best do in this case.

      So in your case, I would return Result<T>, and encode the errors in that. Simply expose the problem to the caller.

    • jmillikin a year ago

      Couple options:

      1. Have a constraint on T that lets you return some sort of placeholder. For example, if you've got an array of u8, maybe every read past the end of the array returns 0.

        fn bar<T: Default>() -> T {
          if let Some(t) = foo() {
            t
          } else {
            eprintln!("programmer error, foo() returned None!");
            Default::default()
          }
        }
      
      2. Return a `Option<T>` from bar, as you describe.

      3. Return a `Result<T, BarError>`, where `BarError` is a struct or enum describing possible error conditions.

        #[non_exhaustive]
        enum BarError {
          FooIsNone,
        }
        
        fn bar<T>() -> Result<T, BarError> {
          if let Some(t) = foo() {
            Ok(t)
          } else {
            eprintln!("programmer error, foo() returned None!");
            Err(BarError::FooIsNone)
          }
        }
  • 3a2d29 a year ago

    I might be missing something here. So I understand panic! will essentially crash the kernel, that makes sense to me as a problem.

    But wouldn't reading outside an array bounds also possibly do that? It coudl seg fault which is essentially the same thing.

    Is it that reading out of bounds on an array isn't guaranteed to crash everything while a panic always will?

    • fdr a year ago

      What segment? You don't have mapped memory to overrun in the kernel, so that array would have to be in a very special spot to cause a bus error in such a situation. Also, even in user space, overrunning an array is far from guaranteed in trying to address an unmapped page...in fact, it often doesn't, since mapping memory with gaps for each array is prohibitively expensive (though, debugging aides like Electric Fence rely on exactly this mechanism)

  • whatshisface a year ago

    #3 could cause a security problem. I don't think you'd find it in the Linux kernel - it would let attackers read arbitrary kernel memory. That's one of safe Rust's strongest features: it will not compile a direct memory access vulnerability.

  • zozbot234 a year ago

    The problem with #3 is not really about C vs. Rust, it's about modern optimizing compilers (including GCC). A compiler is allowed to assume that UB simply won't happen, so it makes no guarantees whatsoever about what happens if UB is ever hit. It's not "return a random value", it really is "all bets are off". There is no guarantee that you manage to "limp along" in any reasonable sense, let alone report the failure. That's what "panic and recover" mechanisms are for, and yes even the ? operator in Rust can be seen as such a mechanism.

  • lucasyvas a year ago

    Then Linus is wrong because the unsafe keyword has nothing to do with no-panic guarantees? Unsafe correlates with memory safety / UB, so using it in a different way in the kernel would be flat out wrong.

    The language determines the definition of its constructs, not the software being written with it.

    Edit: It's worth mentioning that while I think he is wrong, I think it's symptomatic of there not being a keyword/designation in Rust to express what Linus is trying to say. I would completely oppose misusing the unsafe keyword since it has negative downstream effects on all future dependency crates, where it's not clear what characteristics "unsafe" refers to which causes a split. So maybe they need to just discuss a different way to label these for now and agree to improve it later.

  • echelon a year ago

    Use #2 everywhere you're not doing C FFI.

    Both #1 and #3 are gross and wrong.

blinkingled a year ago

> Even "safe" rust code in user space will do things like panic when things go wrong (overflows, allocation failures, etc). If you don't realize that that is NOT some kind of true safely[sic], I don't know what to say.

> Not completing the operation at all, is not really any better than getting the wrong answer, it's only more debuggable.

What Linus is saying is 100% right of course - he is trying to set the expectations straight in saying that just because you replaced C code with multi thousands (or whatever huge number) of man months of efforts, corrections and refinements with Rust code it doesn't mean absolute safety is guaranteed. For him as a kernel guy just as when you double free the kernel C code detects it and warns about it Rust will panic abort on overflows/alloc fails etc. To the kernel that is not safety at all - as he points out it is only more debuggable.

He is allowing Rust in the kernel so he understands the fact that Rust allows you to shoot yourself in the foot a lot less than standard C - he is merely pointing out the reality that in kernel space or even user space that does not equate to absolute total safety. And as a chief kernel maintainer he is well within his rights to set the expectation straight that tomorrow's kernel-rust programmers write code with this point in mind.

(IOW as an example he doesn't want to see patches in Rust code that ignore kernel realities for Rust's magical safety guarantee - directly or indirectly allocating large chunks of memory may always fail in the kernel and would need to be accounted for even in Rust code.)

  • lake_vincent a year ago

    Great explanation. I am not an expert on this, so your comment helped me understand. It sounds like Linus is just being a good kernel maintainer here, and clarifying a misunderstood technical term - safety.

    It's not a condemnation of rust, but rather a guidepost that, if followed, will actually benefit rust developers.

  • swinglock a year ago

    At least in user space, aborting an operation is much better than incorrect results. But the kernel being incorrect makes user space incorrect as well.

    First of all, making a problem both obvious and easier to solve is better. Nothing "only" about it - it's better. Better both for the programmers and for the users. For the programmer the benefit is obvious, for the user problems will simply be more rare, because the benefit the programmer received will make software better faster.

    Second, about the behavior. When you attempt to save changes to your document, would you rather have the corruption of your document due to a bug fail with fanfare or succeed silently? How about the web page you visited with embedded malicious JavaScript from a compromised third party, would you rather the web page closed or have your bank details for sale on a foreign forum? When correctness is out the window, you must abort.

    • alerighi a year ago

      > Aborting an operation is much better than incorrect results.

      Depends. Is a kernel panic better than something acting wrongly? I prefer my kernel not to panic, at the expense of some error somewhere that may or may not crash my system.

      If you look at the output of `dmesg` on any Linux system you often will see errors even in a perfectly working system. This is because programs of that size are by definition not perfect, there are bugs, the hardware itself has bugs, thus you want the system to keep running even if something is not working 100% right. Most of the time you will not even notice it.

      > First of all, making a problem both obvious and easier to solve is better.

      It's the same with assertions: useful for debugging, but we all disable them in production, when the program is not in the hands of a developer but of the customer, since for a customer a system that crashes completely is worse than a system that has some bugs somewhere.

      • jjnoakes a year ago

        > for a customer a system that crashes completely is worse than a system that has some bugs somewhere

        This entirely depends on the industry and the customer. My team leaves asserts on in production code because our customers want aborts over silent misbehavior.

        It is an order of magnitude cheaper for them if things fail loudly and they get a fix when compared to them tracking down quiet issues hours, days, or even months after the fact.

      • swinglock a year ago

        > Depends. Is a kernel panic better than something acting wrongly? I prefer my kernel not to panic, at the expense of some error somewhere that may or may not crash my system.

        That's a false dichotomy, you don't get to choose between definitely crashing or maybe crashing. That would be nice but it's not on the menu. Crashing is just the best case scenario, so if you can make your system stop instead of being incorrect, that's great.

        > but we all disable them in production (assertions)

        We don't all do that.

        I concede that it depends on the use case. You might not care if you got a single user non-networked gaming console for example. A bug could even become a welcomed part of the experience there. I hope these cases are more rare than not though.

        • alerighi a year ago

          > That's a false dichotomy, you don't get to choose between definitely crashing or maybe crashing. That would be nice but it's not on the menu. Crashing is just the best case scenario, so if you can make your system stop instead of being incorrect, that's great.

          So you prefer a system completely unusable than a system that may be used, but with some errors? If you prefer the first, you will not be able to use practically nothing. If you look at the `dmesg` output of a running Linux system you can find a lot of errors, that even if a single one of them was turned into a panic, your computer would not even be able to boot.

          Nothing is perfect, and errors will appear. Ideally errors should be handled at the lowest possible level, but if unhandled to me errors should not result in a complete system crash.

          > We don't all do that.

          I do that. Reason is that not doing that in my use case would not only render completely unusable the product, but not even upgradable with an over the air firmware update. So better that the system will continue running than it crashing (and then rebooting).

    • yencabulator a year ago

      > When you attempt to save changes to your document, would you rather have the corruption of your document due to a bug fail with fanfare or succeed silently?

      When your wifi driver crashes yet again, would you choose to discard all unsaved files open in your editor, just on the very unlikely possibility that they're corrupted now?

    • evouga a year ago

      Saving a document is a great example: I would much rather that the kernel corrupt 20% of my unsaved work on a document (with a warning about the corruption), than crash and delete 100% of it.

    • snovv_crash a year ago

      It depends if you care more about correctness of this one single component, relative to uptime of the entire system.

      A panic caused by the formatting in a rarely used log output taking down all of a large company's NTP servers simultaneously, for example, would not be seen as a reasonable tradeoff.

    • skybrian a year ago

      Yes, aborting an operation is usually better assuming you have some mechanism to do it safely. In the Linux kernel, apparently you often don't?

      Although, often in embedded programming, a watchdog that resets the board can be the right thing to do. (As long as you don't get a boot loop.)

  • titzer a year ago

    If that's what Linus is saying, then he needs to work on his communication skills, because that is not what he said. What he actually said is that dynamic errors should not be detected, they should be ignored. That's so antiquated and ignorant that I hope that he meant what you said, but it's definitely not what he wrote.

    As I posted up in this thread, the right way to handle this is to make dynamic errors either throw exceptions or kill the whole task, and split the critical work into tasks that can be as-a-whole failed or completed, almost like transactions. The idea that the kernel should just go on limping in a f'd up state is bonkers.

    • aspaceman a year ago

      > it's definitely not what he wrote.

      I feel like we must have read two different articles. You sound crazy. Didn't read it your way at all.

      > Think of that "debugging tools give a huge warning" as being the equivalent of std::panic in standard rust. Yes, the kernel will continue (unless you have panic-on-warn set), because the kernel MUST continue in order for that "report to upstream" to have a chance of happening.

      "If the kernel shuts down the world, we don't get the bug report", seems like a pretty good argument. There are two options when you hit a panic in rust code:

      * Panic and shut it all down. This prevents any reporting mechanism like a core dump. You cannot attach a normal debugger to the kernel.

      * Ignore the panic and proceed with the information it failed, reporting this failure later.

      The kernel is a single program, so it's not like you could just fork it before every Rust call and fail if they fail.

      • titzer a year ago

        He wrote:

        > In the kernel, "panic and stop" is not an option (it's actively worse than even the wrong answer, since it's really not debugable), so the kernel version of "panic" is "WARN_ON_ONCE()" and continue with the wrong answer.

        (edit, and):

        > Yes, the kernel will continue (unless you have panic-on-warn set), because the kernel MUST continue in order for that "report to upstream" to have a chance of happening.

        Did I read that right? The kernel must continue? Yes, sure, absolutely...but maybe it doesn't need to continue with the next instruction, but maybe in an error handler? Is his thinking so narrow? I hope not.

        • gmueckl a year ago

          The error handler is the kernel. Whatever code runs to dump the panic somewhere must rely on some sort of device driver, which in turn must depend on other kernel subsystems and possibly other drivers to work.

          There is an enormous variation in output targets for a panic on Linux: graphics hardware attached to PCIe (requires graphics driver and possibly support from PCIe bus master, I don't know), serial interface (USART driver), serial via USB (serial over USB driver, USB protocol stack, USB root hub driver, whatever bus that is attached to)... There is a very real chance that the error reporting ends up encountering the same issue (e.g. some inconsistent data on the kernel heap) while reporting it, Which would leave the developers with no information to work from if the kernel traps itself in an endless error handling loop.

        • jstimpfle a year ago

          In the case of WARN() macros, it will be continued with whatever the code says. There is no automatic stack unwinding in the kernel, and how errors should be handled (apart from being logged) must be decided case-by-case. It could just be handled with an early-exit returning an error code, like other "more expected" errors.

          The issue being discussed here is that Rust comes from a perspective of being able to classify errors and being able to automate error handling. In the kernel, it doesn't work like that, as we're working with more constraints than in userland. That includes hardware that doesn't behave like it was expected to.

      • titzer a year ago

        Well, you've edited your reply a couple times, so it's a moving target, but:

        > * Panic and shut it all down. This prevents any reporting mechanism like a core dump. You cannot attach a normal debugger to the kernel.

        No one is really advocating that. Clearly you need to be able to write code that fails at a smaller granularity than the whole kernel. See my comment upthread about what I mean by that: dynamic errors fail smaller granularity tasks and handlers deal with tasks failing due to safety checks going bad.

        • aspaceman a year ago

          Ease the snark space ranger.

          > dynamic errors fail smaller granularity tasks and handlers deal with tasks failing due to safety checks going bad.

          Yes and that's why Rust is bad here (but it doesn't have to be). Rust _forces_ you to stop the whole world when an error occurs. You cannot fail at a smaller granularity. You have to panic. Period. This is why it is being criticized here. It doesn't allow you any other granularity. The top comment has some alternatives that still work in Rust.

          • titzer a year ago

            > You cannot fail at a smaller granularity.

            Rust needs to fix that then. So we agree on that.

            • Jweb_Guru a year ago

              What was said is not actually true of Rust.

          • __jem a year ago

            > Rust _forces_ you to stop the whole world when an error occurs.

            But... this isn't true??

PragmaticPulp a year ago

I’ve been using Rust for a while, and I’m so, so tired of hearing this argument.

Yes, we know. We get it. Rust is not an absolute guarantee of safety and doesn’t protect us from all the bugs. This is obvious and well-known to anyone actually using Rust.

At this point, the argument feels like some sort of ideological debate happening outside the realm of actually getting work done. It feels like any time someone says that Rust defends against certain types of safety errors, someone feels obligated to pop out of the background and remind everyone that it doesn’t protect against every code safety issue.

  • chrsig a year ago

    I mean, it's felt like anytime anyone mentions any code base not written in rust, someone pops in and points out that it's not safe, and should be rewritten in rust.

    I think it's all part of the language maturing process. Give it time, zealots will either move on to something new (and then harass the rust community for not meeting their new standard of excellence) or simmer down and get to work.

    • IshKebab a year ago

      Well they're right. Most code written in C is horribly unsafe. Most code written in Rust is very safe. No code is guaranteed to be 100% safe - not even formally verified code.

      There's a clear safety spectrum, with C near the bottom and Rust near the top. It's tedious for people to keep saying "well it's not right at the top so we should just keep using C".

      I'm sure pro-seatbelt people were called "zealots" back in the day too.

      • greyhair a year ago

        > Most code written in C is horribly unsafe.

        I think that is untrue. I worked at the Network Systems arm of Bell Labs for sixteen years, and we could demonstrate five-nines of uptime on complex systems written entirely in C.

        C is a rough tool, I will grant you that, and Rust is a welcome addition to the toolkit, but saying that most code written in C is horribly unsafe, does not make it true.

        • paoda a year ago

          Perhaps I misunderstand, but I don't imagine most individuals or companies writing production C code to be on the same "level of competency" as a research company as prestigious as Bell Labs.

      • phendrenad2 a year ago

        Rust isn't seatbelts, Rust is a tank. It's hard to steer, you need 4 people to operate it, but it's safe from bullets. It can also sink in quicksand or mud very easily. Rust advocates ignore the usability problems and say "Drive a tank everywhere, it's bullet-safe". Meanwhile the average programmer will get lost in the complexity of Rust and invent shortcuts like using unsafe {} in exactly the wrong place.

        • dureuill a year ago

          This runs contrary to my experience.

          1. Coming from C++, my productivity is x2-x3 in Rust, making Rust a middle point between C++ and Python (about x8 productivity). What's more, if we factor maintenance time in, the lower costs of maintenance of Rust code makes the multiplier tend to x10, which is equal or better than Python (whose maintenance costs are important).

          2. I have a colleague coming from Python (so a very different background than my C++ background), and he doesn't "get lost in the complexity of Rust" but after some use of Rust makes pretty much the same conclusions as I do: initial coding slower than Python, but roughly equal when you factor in maintenance time. He now writes the quick tools that could be Python scripts in the past in Rust when we suspect that they won't be one-off scripts (which happens very often). We get ease of distribution (static binaries), portability (to Linux and Windows), and better performance out of it too.

          Although this is a comparison with C++ and Python, not C, the reasons why are simple and apply equally so to C:

          1. Easy access to a good ecosystem. Adding dependencies in C or C++ is a pain. Very easy to do in Rust, preventing the need of reinventing the wheel (squarely). C suffers even more from this, given its lack of standard library and data structures (everything is a linked list :-D)

          2. Memory safety and lack of UB in safe Rust brings a definitive simplicity in coding, code review and debug.

          3. Result-oriented APIs and generally expressive type system are what end-up bridging the gap with Python with time.

          What Rust definitely has is a learning curve. It is not optimized for taking the language without deep diving into it, or learning it in a short time. IMO it is a reasonable trade-off, given that the experience past the learning curve is so good, and that many of the things that make the learning curve so steep are integral to that experience (exclusive borrows, trait system, ...).

          • phendrenad2 a year ago

            You misunderstood my argument. My argument is that if forced to use Rust (given, say, a tectonic shift in the coding world where Rust becomes dominant - something many people are clamoring for), most average developers would have a difficult time writing secure code, because they'll have to keep a higher level of complexity in mind. Productivity is irrelevant. This is the classic trap of not being able to measure thing ABC, so you measure thing XYZ and assume that it's the same.

            TL;DR: Let's see what happens when average C programers are forced to use Rust. Will their code be more secure? I see no convincing arguments one way or the other. Only measuring XYZs.

      • chrsig a year ago

        The issue isn't correctness. The issue is annoyingness.

        > I'm sure pro-seatbelt people were called "zealots" back in the day too.

        Given that vehicles were grandfathered in, pro-seatbelt people were irrelevant to owners & drivers of said vehicle.

        Just like some rust zealot asserting that some existing project with millions of lines of code should be rewritten in rust is irrelevant to the project maintainers.

  • TillE a year ago

    It's really common to see people say meaningless stuff like "Rust is a safe language" which is either deeply confused or deeply misleading.

    Rust provides certain guarantees of memory safety, which is great, but it's important to understand exactly what that means and not to oversell it.

    • pornel a year ago

      It's an unproductive pedantry to expect every mention of the generalisation to be followed by a full disclaimer about exceptions and edge cases.

      People say "it's raining" without having to add "except under roofs".

      • lifthrasiir a year ago

        I think, if the wording was exactly "Rust is safe" it is indeed too vague as there are many notions of safety, and annoyingly enough people do say this. But "Rust provides memory safety" is clear enough and doesn't need further quantification.

        • pornel a year ago

          Official Rust materials are careful not to overpromise and to be clear on the extent of what is guaranteed and what isn't.

          The safety is always with an asterisk. Rust provides memory safety — provided that unsafe blocks, FFI, and other code running in the same process, and the OS itself, and the hardware doesn't misbehave.

          But if you accept that Python and Java can be called safe languages then Rust can be too. The other ones also have unsafe escape hatches and depend on their underlying implementations to be correct to uphold safety for their safe side.

          • HelloNurse a year ago

            All this safety, as Linus points out, is safety for plain programs, but a complex of serious problem for the kernel. "Safe languages" are only safe up to a point and in context; Rust has clearly been designed to write safe applications, not safe kernels.

            So if some enthusiasts are trying to use Rust at cross purposes for Linux they are likely to appear obnoxious and entitled, and it is perfectly right to challenge them to prove that they can make Rust suitable.

            There's more high quality and polite preaching earlier in the thread, for example:

              > Please just accept this, and really *internalize* it.  Because this isn't actually just about allocators. Allocators may be one very common special case of this kind of issue, and they come up quite often as a result, but this whole "your code needs to *understand* the special restrictions that the kernel is under" is something that is quite fundamental in general.
      • mslm a year ago

        Except everyone understands it's not raining under roofs. When someone says 'Rust is safe', they assume it infallible. It's been oversold.

      • phendrenad2 a year ago

        This is the patronizing attitude that keeps getting Rust advocates into trouble. "I don't need to be pedantic, I know better than you, so I'll just simplify my argument down to the point that it's actually a lie, but you'll thank me later"

      • II2II a year ago

        Think of it as elaborating, rather than disclaiming. There is a real problem in the realm of Rust advocacy where people make a blanket claim, either wrongfully assuming it is true or assuming that others are aware of the limitations of the claim. This is a problem when the reader is not aware of the limits of what is being said, while creating conflict when the reader calls out the limitations.

        Reading a book on Rust programming is an entirely different matter since authors tend to elaborate upon what they are claiming. The reader has to understand how things work and what the limits are. As such, there is less opportunity for misinformation to spread and less room for conflict.

  • oconnor663 a year ago

    Fwiw, the original article/email is less about "Rust has unsafe" and more about "panicking/crashing to avoid triggering UB isn't a viable strategy in the kernel."

    • pas a year ago

      it might be in a virtualized/development env. but otherwise that's why all those defensive coding practices are recommended in low-level code. to deal with this.

  • Yoric a year ago

    It's really weird.

    I keep seeing claims that Rust users are insufferable and claim that Rust protects against everything. But, as someone who has started using Rust around 0.4, I have never seen these insufferable users.

    I imagine that they lurk on some communities?

    • lifthrasiir a year ago

      I believe it boils down to hasty generalization, where you only recall (or are aware of) a few particularly noisy Rust users and take them as a representative of all Rust users. This kind of stereotype is unfortunately very hard to break.

    • 3a2d29 a year ago

      Okay, just to fact check this. I am a fan of Rust, but pretending there aren't these aggressive rust users is a bit putting your head in the sand.

      Like any language that has very cool features, there are people that take that tool as not a tool but a religion.

      You can even look in my comment history and see people arguing with me when I say I was a rust fan, but memory safety isn't a requirement in some areas of programming. One person made it there mission to convince me that can't possibly be the case and in (in my example of video games) that any memory bug crashes and game and will make users quit and leave.

      • maxbond a year ago

        I, too, have not encountered these toxic Rust fanboys. I don't believe my head is in the sand. I do regularly see people degrading Rust and it's community, and so am convinced these toxic Rust fanboys are largely a myth based on uncharitable interpretations of otherwise reasonable statements. I think people often read "I advocate for the deprecation of all C/C++ codebases" into the statement "Rust is a 'safe' language, for a certain meaning of that term," but I don't think it's actually common to advocate for such a deprecation outside of security-critical applications.

        I feel like it's a defensive reaction, that people feel like Rust is seeking to obviate arcane skills they've built over the course of their careers. Which I don't think is true, I think there will always be a need for such skills and that Rust has no mission to disrupt other language communities, but I can understand the reaction.

        > You can even look in my comment history

        Is this the thread you're referring to?

        https://news.ycombinator.com/item?id=32878868

        Because I genuinely don't see what you're talking about. No one seems to "make it their mission" and no one seems to be arguing for Rust in particular, as much as this category of languages.

        • moldavi a year ago

          I've seen both sides of this, as a Rust user and as a Go user.

          Rust users are generally friendly to one another, and to people who are interested in Rust. Hoever, some Rust users are toxic when talking to people outside the community or to people who disagree.

          That's why a lot of us (in the Rust community) don't notice it; we spend most of the time inside our own community talking to each other and being friendly to each other.

          This is a trait common to any bubble or insular community whether it be about politics, religion, economics, or whatever. It's fairly easy to recognize once you get in the habit of dis-identifying with your own side.

          There's also a phenomenon in human psychology where we tend to forgive "our side's" misbehavior, presumably because it's in service to a higher ideal and therefore forgivable. It's the difference between "passionately spreading the good word" and "aggressive evangelism", two views of the same action. After learning about this I've even seen it in myself, though hopefully I've learned to counteract it a bit.

          Note that this isn't unique to Rust, other languages have this too to an extent.

          It's something I really hope we can leave behind, because it's hurting an otherwise beneficial message that Rust can bring a new tradeoff that is favorable for a lot of situations.

          • lifthrasiir a year ago

            > Hoever, some Rust users are toxic when talking to people outside the community or to people who disagree.

            I'm not even sure if this is the case. I have seen enough toxic Rust users, but at least in my experience they rarely overlap with who are active in the community. This suggests that they are experiencing typical newcomer syndrome, comparable to Haskell newcomers' urge to write a monad tutorial, and also explains that why a disproportional number of non-Rust users observe toxic Rust users---if you are a Rust user but don't preach about Rust everywhere, how can others tell if you are indeed a Rust user? :-)

          • Yoric a year ago

            Fair enough.

        • 3a2d29 a year ago

          No I am not referring to that thread. I am referring to the thread further down where someone compares using a memory unsafe language to an illegal activity.

          If you need an example of the rust community being toxic, I give you https://github.com/actix/actix-web

          Look up the history and realize they bullied an open source project leader into leaving open source for good.

          • maxbond a year ago

            So this thread? https://news.ycombinator.com/item?id=32879558

            I still don't understand the relevance, this neither appears toxic nor to be a discussion of Rust; this looks like they put forward an out-there idea and you didn't care for it, which just seems like a discussion about consumer protection laws. I also don't see the connection from Actix drama to the idea that people are exaggerating the capabilities of Rust or causing problems for other language communities - I don't know much about it, I'm fully willing to believe toxicity was involved, but a breakdown in communication between a maintainer and their community doesn't seem like the behavior we're discussing and I don't see any evidence this was peculiar to Rust and not a phenomenon in open source at large.

            I don't want to relitigate some thread I wasn't even a part of, I just don't understand.

            • 3a2d29 a year ago

              I think the part where I have several negative votes on those comments, despite making points I think are valid, is what I don’t like.

              My understanding is that negative votes is for things that don’t contribute to discussion, yet all my comments are in the negatives except when I mentioned I actually am using rust. Then suddenly the commenter stops talking about our discussion all together and starts to mention learning rust.

              It’s frustrating because I like rust, but I can’t seem to criticize it in the slightest.

            • 3a2d29 a year ago

              Not to mention there was this whole issue: https://news.ycombinator.com/item?id=29501893

              After saying everyone was empowered to use their tool, they tried to kick someone off the team for working for Palantir.

              Regardless of politics, kinda unfair to make political statements using the rust accounts, then turn around and say other people can’t be part of rust because they work for a company who is political.

  • flohofwoe a year ago

    > Rust is not an absolute guarantee of safety and doesn’t protect us from all the bugs.

    That's not exactly the vibe I'm getting from the typical Rust fanboys popping up whenever there's another CVE caused by the usage of C or C++ though ;)

    Rust does seem to attract the same sort of insufferable personalities that have been so typical for C++ in the past. Why that is, I have no idea.

    • lucasyvas a year ago

      It protects against the leading 70 percent of CVEs, which are due to memory safety issues. This is all Rust has ever claimed to solve and it's all I've ever seen anyone cite when advocating for it.

      If these people are insufferable to you, that I can't change your mind on. That said you might want to get used to it since major areas of industry are already considering C/C++ as deprecated (a paraphrasing from the Azure CTO recently)

      • Test0129 a year ago

        I didn't know the Azure CTO was the CTO was the C++ community. I'm sure the billions of lines of code written in C++ for the finance industry would love to have a word.

        The insufferable nature of the people isn't the advocating of safety. It's that Rust seems to have evolved a community of "X wouldn't have happened if Y was written in Rust!" and then walking away like they just transferred the one bit of knowledge everyone needed. They occupy less than 1% of the programming community and act like they single-handedly are the only people who understand correctness. It's this smug sense of superiority that is completely undeserved that makes the community insufferable. Not the safety "guarantees" of the language.

        • lucasyvas a year ago

          This is not the first time it's happened. JS is effectively deprecated in favor of TS in the hearts of programmers in that ecosystem. There was a lot of disagreement about this a decade ago, but TS is now at its 10 year anniversary and any serious project in that world should be written with static type definitions. It had the early adopters that were insufferable at the time, but they were right about the path and those that have jumped in are having a way better experience.

          I think history will show that we can do a lot better than C/C++ and Rust is one of the best steps yet to show that. Rust will be replaced by something better some day and the cycle will repeat.

        • znpy a year ago

          > They occupy less than 1% of the programming community and act like they single-handedly are the only people who understand correctness.

          Maybe I’m too young (just past 30) but is it just me or is that some kind of attitude that emerged in the last 10-15 years?

          And I mean not only in programming, but in general.

          A small amount of people which is very vocal about something and start pushing everybody else to their thing while simultaneously shaming and/or making fun of those who either disagree or aren’t generally interested.

          I kinda see a pattern here.

          Either way, it’s very annoying.

          Going back to the rust topic… I recently started working with some software written in a mix of C++ and Java. I don’t own the codebase, I “just” have to get it working and keep it working. So i had to reach to another person for some performance issues and this guy starts the usual “should be rewritten in rust” … jesus christ dude, I don’t care for your fanboyism right know, either help me or tell me you won’t so I’ll looks somewhere else.

          And of course, if as an outsider this is the experience I have to go through every time I deal with rust people… I’ll try to minimise my exposure to such people (and to the language, if necessary).

          • ok123456 a year ago

            >A small amount of people which is very vocal about something and start pushing everybody else to their thing while simultaneously shaming and/or making fun of those who either disagree or aren’t generally interested.

            It's called manufacturing consent and it's all around us.

            • vlovich123 a year ago

              That’s extremely ungenerous. I see the legitimate challenges with Rust as do most people I talk with who are C++ veterans. But we also all agree that C/C++ isn’t tenable in the long term. It might not be Rust that wins eventually but only because a better alternative pops up. Without a better alternative it’s going to be Rust. And let me tell you. The Rust team to date has been very good at building a very attractive ecosystem and bringing people along. The people who are Rust advocates that I’ve come across tend to be extremely thoughtful individuals and not just fanboys latching onto something cool.

            • znpy a year ago

              If you’re citing the book from 1988, that looks interesting, I’ll add that to my to read list.

              If not, would you care to drop some links?

    • Test0129 a year ago

      I wouldn't say the Rust community parallels the C++ community in any way. The rust community is more like the insufferable Haskell/FP community who, despite producing very little measurable commercial value continue to look down on everyone else.

      Indeed, there's a lot of damage control going on in this thread walking back Rust's guarantees of safety despite that, up until this point, being Rust's only real selling point. It seems like every C/C++/Go/whatever repository has at least one issue suggesting a complete rewrite in Rust.

      • avgcorrection a year ago

        There’s nothing to walk back since the post does not contest Rust’s safety guarantees at all. The link is (by design or not) effectively click bait “Linus Torvalds says that Rust is not really safe”, when in reality it is just him saying that panicky (panic on programmer error) Rust code is inappropriate for the kernel and that Rust-in-Linux code should by default limp on when it has encountered an error. That is a perfectly reasonable point to make, but has got nothing to do with “safety” in the sense that the Rust project talks about that term.

      • WastingMyTime89 a year ago

        > Haskell/FP community

        As someone who worked on a lot of OCaml projects, I would like to assure you that the issue really is the Haskell community which I too find completely unbearable. The rest of the FP community is far nicer/less smug.

        For a long time, they just thought it was a shame some innovative constructs seemed to be stuck in their favourite languages (first class functions, variant types, inference) and not percolating to the mainstream. This fight has mostly be won which is great.

        • jstimpfle a year ago

          To be fair, the Haskell hype train has long passed, and I never perceived the Haskell community as insufferable. They're just preconcerned formulating everything in way too mathsy frameworks to the point of being extremely inproductive as from a "real world" programmer's perspective.

          • ParetoOptimal a year ago

            > To be fair, the Haskell hype train has long passed, and I never perceived the Haskell community as insufferable. They're just preconcerned formulating everything in way too mathsy frameworks to the point of being extremely inproductive as from a "real world" programmer's perspective.

            See my comment upthread, you seem to be misinformed on the use and prevalence of Haskell in the real world.

      • ParetoOptimal a year ago

        > insufferable Haskell/FP community who, despite producing very little measurable commercial value continue to look down on everyone else.

        I just took a break from creating measurable commercial value in Haskell.

        Grab a Starbucks, shop at Target, or use Facebook recently?

        Congrats, you used production Haskell code delivering measurable commercial value to you and millions of others.

      • timeon a year ago

        This seems to me more like wishful thinking. The post is barely talking about memory safety. You have confused combination of title and some post reacting to title of the post.

      • mwcampbell a year ago

        I wonder if the Rust community now is similar to what the C++ community was like when C++ was as young as Rust is now. Any old-timers want to comment on this?

        Edit to add: My guess is that the Rust community might still be worse because now we have widespread Internet access and social media.

        • pjmlp a year ago

          And back then we had flamewars on comp.lang.c and comp.lang.c++, hence the .moderated versions of them.

          I always been on the C++ side, when arguing on C vs C++ since 1993, already considered C a primitive option, coming from Turbo Pascal 6.0, and finding such a simplistic pseudo-macro assembler.

          So yeah, in a sense the Rust community is similarly hyped as we were adopting Turbo Vision, CSet++, OWL, MFC, PowerPlant, Tools.h++, POET, and thinking C would slowly fade away, and we could just keep on using a language that while compatible with C, offered the necessary type system improvements for safer code.

          But then the FOSS movement doubled down on C as means to write the GNU ecosystem, on the first editions of the GNU manifesto, and here we are.

    • ok123456 a year ago

      When the CVEs appear in Rust, they use that as proof that their technology is better because they found those errors. Or, those were just unsafe and therefore not "real" Rust.

  • jmull a year ago

    I think it keeps getting said because there appear to be a lot of people who don't understand this.

  • pjmlp a year ago

    It is the same argument that C folks used against Modula derived languages, Object Pascal and Ada.

    If there isn't 100% safety then why bother, it is the usual argument for the last 40 years.

a_humean a year ago

I know next to nothing about kernel programming, but I'm not sure here what Linus' objection to the comment he is responding to here is.

The comment seemed to be making reference to rust's safety guarantees about undefined behaviour like use after free.

Linus' seems to have a completely different definition of "safey" that conflates allocation failures, indexing out of bounds, and division by zero with memory safety. Rust makes no claims about those problems, and the comment clearly refers to undefined behaviour. Obviously, those other problems are real problems, but just not ones that Rust claims to solve.

Edit: Reading the chain further along, it increasingly feels like Linus is aruging against a strawman.

  • 4bpp a year ago

    From a quick skim, it seems to me that at least in Linus's interpretation, his interlocutor is requesting changes to the way the kernel does things in order to accommodate/maintain Rust's "there is no undefined behaviour; in cases where circumstances conspire to make behaviour undefined, terminate immediately" philosophy even in kernel Rust code. He then figures that if he said he is not willing to do that, the other side would respond with something to the effect of "but implementing the Rust philosophy in full means you get safety, and you surely can't have a goal more important than that", and therefore leaps to talking down the importance of the safety that Rust actually guarantees, to argue that it is not actually so great that all other objectives would be secondary to it.

    If his initial interpretation and expectation of the Rustacean response is in fact correct, the line of argumentation does not seem per se wrong, but I do think that it is bad practice in adversarial conversations to do the thing where you silently skip forward several steps in the argument and respond to what you expect (their response to your response)^n to be instead of the immediate argument at hand.

  • arinlen a year ago

    > I know next to nothing about kernel programming, but I'm not sure here what Linus' objection to the comment he is responding to here is.

    You should read the email thread, as Linhas explains in clear terms.

    Take for instance Linus's insightful followup post:

    https://lkml.org/lkml/2022/9/19/1250

    • ChrisSD a year ago

      What is better: continuing to "limp along" in some unknown corrupted state (aka undefined behaviour) or in a well defined (albeit invalid) state?

      • throw827474737 a year ago

        Had the same topic often on MCUs: limp along to hopefully get the error out somehow, otherwise it won't be noticed if not with JTAG debugger attached (default in field).

        So I can understand where Linus comes from.

        • gmueckl a year ago

          Yes. You could still hard reset after the error is reported if you wanted to. And if system availability matters, a hardware watchdog would handle the case where the error handling doesn't finish.

        • mlindner a year ago

          Limping along is what the salesman and the business people want as failures look bad.

          Engineers should want the immediate stop, because that's safer, especially in safety critical situations.

          • wtallis a year ago

            The kernel is not the whole system. The kernel needs to offer the "limping along" option so that the other parts of the system can implement whatever graceful failure method is appropriate for that system. There's no one size fits all solution for the kernel to pick.

          • niscocity35 a year ago

            What are you talking about? Should planes stop flying when they encounter an error?

            Safety critical systems will try to recover to a working state as much as possible. It is designed with redundancy that if one path fails, it can use path 2 or path 3 towards a safe usable state.

          • warinukraine a year ago

            You sound like you code websites or something.

            Real engineers, like say the people who code the machines that fly in mars, don't want "oops that's unexpected, ruin the entire mission because that's safer". Same for the Linux kernel.

      • Someone1234 a year ago

        This question is answered in Linus' emails fully and better than I'm going to do.

        But to restate briefly, the answer varies wildly between kernel and user programs, because a user program failing hard on corrupt state is still able to report that failure/bug, whereas a kernel panic is a difficult to report problem (and breaks a bunch of automated reporting tooling).

        So in answer: Read the discussion.

        • ChrisSD a year ago

          You seem to have misunderstood me. The distinction I'm making is not between kernel panic or undefined behaviour. The distinction is between undefined behaviour and defined behaviour. That defined behaviour can be anything, even including "limping on" somehow.

      • yencabulator a year ago

        What is better for a desktop user:

        1) needing to reload a wifi driver to reinitialize hardware (with a tiny probability of memory corruption) OR choosing to reboot as soon as convenient (with a tiny probability of corrupting the latest saved files)

        2) to lose unsaved files for sure and not even know what caused the crash

        • notacoward a year ago

          Why focus exclusively on the desktop, or over-generalize from it to other uses? What is appropriate for them is not necessarily so for the many millions of machines in server rooms and data centers. Also, you present a false dichotomy. "Lose unsaved files for sure" is not the case for many systems, and "not even know" is not necessarily the case. Logging during shutdown is a real thing, as is saving a crash dump for retrieval after reboot. Both have been standard at my last several projects and companies.

          As I've said over and over, both approaches - "limp along" and "reboot before causing harm" - need to remain options, for different scenarios. Anyone who treats the one use case they're familiar with as the only one which should drive policy for everyone is doing the community a disservice.

          • yencabulator a year ago

            Yes, both need to remain options. Rust-in-kernel needs to be able to support both. That's like half of Linus's ranting there.

            The other half is that kernel has a lot of rules of what is safe to be done where, and Rust has to be able to follow those rules, or not be used in those contexts. This is the GFP_ATOMIC part.

        • Jweb_Guru a year ago

          The latter, because the "tiny probability of memory corruption" can easily become a CVE.

          • P5fRxh5kUvp2th a year ago

            We have a term for this.

            FUD

            • Jweb_Guru a year ago

              Linux has numerous CVEs, and a large percentage stem from memory corruption. That's not FUD, I'm afraid.

              • scoutt a year ago

                It's FUD. And not only that. The fear of constantly being attacked by an external entity is also paranoic.

                • Jweb_Guru a year ago

                  Unfortunately, whether you personally care about this sort of thing isn't good enough anymore. Owned Linux boxes on IoT devices are now being marshaled into massive botnets used to perform denial of service attacks, while other vulnerabilities are exploited to enable ransomware. You having negligent security on your own unpatched box because you don't personally feel like it's a good tradeoff has many negative external consequences. Fortunately, the decision isn't actually up to you (and having fewer vulnerabilities won't influence you negatively anyway, so I'm not sure why you're so angry about it).

                  • scoutt a year ago

                    > why you're so angry about it

                    Am I?

                    You suppose a lot of things about me from literaly a bunch of words.

                    "A 'tiny probability of memory corruption' can easily become a CVE" is still FUD, because is simply not true in most cases. The words "tiny" and "easily" show the bias here.

                    The rest of the conversation seems a symptom of Hypervigilance: Fixation on potential threats (dangerous people, animals, or situations).

                    Fortunately, the decision isn't up to you either.

  • pfortuny a year ago

    I am probably wrong but I understood that “safety meaning panic” is noeither “safe” not allowed in the Linux kernel because the kernel must not panic when an error arises.

    • a_humean a year ago

      Which is why Rust has been accommodating the kernel by adding non-panic versions of the functions that Linus has been complaining about (namely that memory allocation is infallible, because that isn't an unreasonable thing to assume in applicationc code.). Still doesn't change the fact that "safe" in this context has a technical meaning, and what Linus is describing isn't that.

      • layer8 a year ago

        The issue that Linus is probably coming from is that many Rust aficionados evangelize for Rust as if the very specific technical meaning of “safe” in Rust was the generic meaning of “safe”. For those who understand the limitations and the trade-offs, that can be quite tiresome.

        • a_humean a year ago

          Except, the person he is responding to doesn't make those claims - though I haven't read further up the chain - only downwards.

      • Vt71fcAqt7 a year ago

        maybe his point is that the technical meaning should use a more acurate word in his opinion?

        • a_humean a year ago

          His point seems to be the opposite, that "safety" should have a vaguer meaning in his opinion, and not the well established technical definition that the author clearly meant when he used the word.

          • LtWorf a year ago

            Or, in other words, rust-safety should mean what safety means in every other context, or rust people need to come up with a different word.

            • a_humean a year ago

              You don't get to change the definition of a term used by another when it had a clear meaning in its use, and then make an arugment on the basis that the author meant y when they clearly meant x. That is just conflation.

              • LtWorf a year ago

                I think the word "safety" existed before rust…

                • a_humean a year ago

                  This has nothing to do with the common definition of "safety".Terms change their meaning based upon their use and context. The author has a clear use in mind - memory safety.

                  The rules of arugment existed long before the linux kernel. You don't get to change terms introduced within a arugment with a clear meaning because it helps you create a strawman. If you want to change the definition of a term mid arugment, you telegraph it. Once again, this is called conflation.

              • Vt71fcAqt7 a year ago

                >when it had a clear meaning in its use

                Thats not the issue though. It's that "safe" means something is actualy safe. My house isn't safe if its on fire, even if the house is in a safe neighborhood. Linus' claim is that "rust people" sometimes themselves conflate memory saftey with general code saftey, simply because "safe" is in the name. So much so that they will at times sacrifice code quality to achieve this goal despite (a) memory saftey not being real saftey and (b) there is no way to guarantee memory saftey in the kernel anyway. What he is saying is that "rust people" (whatever that means) are at times trading off real saftey or real code maintenance/performance for "rust saftey."

                >a compiler - or language infrastructure - that says "my rules are so ingrained that I cannot do that" is not one that is valid for kernel work.

                And

                >I think you are missing just how many things are "unsafe" in certain contexts and cannot be validated.

                >This is not some kind of "a few special things".

                >This is things like absolutely _anything_ that allocates memory, or takes a lock, or does a number of other things.

                >Those things are simply not "safe" if you hold a spinlock, or if you are in a RCU read-locked region.

                >And there is literally no way to check for it in certain configurations. None.

                You can judge wheter he is correct but he never said rust's saftey implies absolute saftey, only that some rust users are treating it that way by sacrificing the code for it. If that's the case then it makes a lot of sense to start using a more sensible word like "guaranteed" instead of safe. I think part of what contibutes to this idea is that "unsafe" code is written with the keyword "unsafe" as if code written not that way is safe, and code written with "unsafe" is bad. That's not to say that "unsafe" actually implies any of that - all it means is that it's not guaranteed to be memory safe - but according to Linus it creates a certain mentality which is incongruent with the nature of kernel development. And the reason for that is that safe and unsafe are general english words with strong connotations such as:

                >protected from or not exposed to danger or risk; not likely to be harmed or lost.

                >uninjured; with no harm done.

                And for unsafe:

                >able or likely to cause harm, damage, or loss

    • rowanG077 a year ago

      Safety doesn't mean panic. I don't feel that was the point the person Linus responded to was making.

Smaug123 a year ago

I think a much better email from the thread to link to would be the earlier https://lkml.org/lkml/2022/9/19/840, where Linus actually talks about some of the challenges of kernel programming and how they differ from user-space programming.

throwawaybutwhy a year ago

To put things in context, Linus is being reasonable and wise and well-mannered once again. Wouldn't mind reading a few juicy expletives, to be honest.

  • mustache_kimono a year ago

    As someone else in the thread notes, they seem to be talking past one and other. [0]

    Linus may view his job as "Saying No" but the way he does it still leaves a little to be desired, because his reasoning is sound here, but it's less "Follow my reasoning" than "You don't want to get yelled at again do you?"

    [0]: https://lore.kernel.org/lkml/CAFRnB2VPpLSMqQwFPEjZhde8+-c6LL...

  • wokwokwok a year ago

    >>>>> For GFP_ATOMIC, we could use preempt_count except that it isn't always enabled. Conveniently, it is already separated out into its own config. How do people feel about removing CONFIG_PREEMPT_COUNT and having the count always enabled?

    >>>> No (Linus)

    >>> As you know, we're trying to guarantee the absence of undefined behaviour for code written in Rust. And the context is _really_ important, so important that leaving it up to comments isn't enough.

    >>> Do you have an opinion on the above?

    >> This message. Ie. No. you can’t make everyone play by your rules. (Linus, grumpily)

    > While I disagree with some of what you write, the point is taken.

    > But I won't give up on Rust guarantees just yet, I'll try to find ergonomic ways to enforce them at compile time.

    I mean, it doesn’t sound like he’s being petty or misunderstanding.

    They want special rules (which won’t work) to do runtime checking for rust code. That seems weird, right?

    Rust safety should be compile time. That’s the point…

    I dunno, maybe I don’t understand what’s being said, but I don’t think Linus is particularly wrong here, even if it’s kind of shouty.

    • oconnor663 a year ago

      > Rust safety should be compile time. That’s the point…

      Array bounds checks are one of the most important safety measures Rust takes, and those have to happen at runtime (if the optimizer can't prove they'll never fire). Similarly, locking types like `Mutex` of course do all their locking and unlocking at runtime, though they also use the type system to express the fact that they will do that.

  • boardwaalk a year ago

    Context isn't your opinion.

  • staticassertion a year ago

    How is this context? Also how is it "wise" to get the definition of "safe" wrong while acting like a pedant?

  • darthrupert a year ago

    I wonder if he'll end up regretting opening this particular Pandora's box or will things stabilize eventually.

    • detaro a year ago

      What makes you expect it might not stabilize?

      • darthrupert a year ago

        Because Rust people tend to be a bit extreme. But perhaps Rust people who are also kernel programmers are less so.

    • thrown_22 a year ago

      If everyone who wrote a blogpost about how rewriting thing x in Rust would be amazing actually rewrote x in Rust it would already be the most popular language ever.

  • miohtama a year ago

    I would rather believe in Rust than Santa Claus.

rat9988 a year ago

I am the only one who would have a hard time to collaborate on a project where the "collaborators" start their message with

> You need to realize that

> (a) reality trumps fantasy

?

  • pfortuny a year ago

    Linus is not a “collaborator”.

    Sometimes you need to stress the difference betwen “my opinion” (as in “Kernel development requires greater safety standards”) and “facts” (“safety does not exist in software because it does not control the hardware aspect”).

  • detaro a year ago

    > I am the only one [...] ?

    Have you missed the years and years of people criticizing Linus for his communication style?

  • mynameisvlad a year ago

    This is also the “toned down” version of Linus.

    If he were any other person, he’d have been axed a long time ago for this behavior.

    I don’t understand how people put up with this kind of toxicity, even from him.

    • blueflow a year ago

      I bear this because in my opinion, software development has insufficient regulation and standards. I've seen enough bad software that im glad that someone is enforcing some standards regarding worst-case behavior, because this is something that is neglected too often.

      Its a bit sad that Linus needs to replicate individually what other engineering disciplines are mandated to by regulations. Look at car, train or aviation safety, they are decades ahead.

      • V_Terranova_Jr a year ago

        It's really more about rigor and the actual practice of engineering (e.g., tracking requirements and verification of implementation against these requirements, verifying systems & subsystems meet correctness invariants, etc.) moreso than regulation. Most software development is craftsmanship, including the Linux kernel. It's certainly possible to practice software development as engineering, but it's not common.

        Source: aerospace engineer with a flight sciences background, and also software reviewer for flight systems.

  • mslm a year ago

    Don't see what's wrong with it. He's leading one of the most high-profile and complex software system in the world with billions of users. If he was a softball all the time it'd be a mess. You have to be straightforward sometimes, or else people take too long to get the point.

  • curt15 a year ago

    Seems pretty mild to me. Where is the personal attack?

  • zo1 a year ago

    Then it's probably not the place for you, or people with thin skin that don't want to deal with that kind of tone/communication. If they lose out on talent because of it, that's their loss. Not every project has to be perfect and all-inclusive to the entire world of developers, and I'm okay with that.

    • V_Terranova_Jr a year ago

      It's a false dichotomy to say you either have to be "all-inclusive to the entire world of developers" or you should applaud project leaders throwing out useless condescending rants with no real technical content like:

        And the *reality* is that there are no absolute guarantees.  Ever. The "Rust is safe" is not some kind of absolute guarantee of code safety. Never has been. Anybody who believes that should probably re-take their kindergarten year, and stop believing in the Easter bunny and Santa Claus.
      
      What's going on here is not "woke people" trying to protect every little snowflake's feelings, rather it's noting the ranter is making himself feel good at others' expense with no other value added. His rants are completely superfluous to the substantive technical dialogue.
    • mynameisvlad a year ago

      It’s crazy to think that advocating for reasonable, non-toxic people to work with receives this kind response.

      Inclusivity and non-hostile work environments should not be considered “perfect” and “all-inclusive”. They should be basic. The default. The lowest bar possible.

      • mid-kid a year ago

        Just as many people have problems with this form of communication, many people find it hard to clearly express themselves in an environment where they're expected to put people's feelings above all else. What some might perceive as "hostile", others would simply call "honest". It's simply a matter of preference.

        • mynameisvlad a year ago

          No, it’s not “simply a matter of preference” when it causes someone else to be in a hostile work environment. At that point, you’re affecting the lives of those around you. Sure, some people have medical conditions that might prevent them from seeing this, but even they put in an effort to be better. It stops being a “preference” when it actively hurts those around you.

          If you are an asshole, are known to be an asshole, have no intention of changing that, and are working with others… maybe don’t. You’re free to work alone, but why make people around you miserable by having to deal with you? Go be an asshole to yourself and let everyone else work together.

          It’s shocking that advocating for safe and inclusive work environments is such a controversial topic. If he were any other person, his behavior would be quashed in a second.

          • throw827474737 a year ago

            > It’s shocking that advocating for safe and inclusive work environments is such a controversial topic. If he were any other person, his behavior would be quashed in a second.

            It is not, because that is exactly the problem.. whats your view of safe and inclusive is to some hostile and exclusive.. and until people realize that this extreme creates similar well-behaved assholes: nevermind.

          • Vt71fcAqt7 a year ago

            >You’re free to work alone, but why make people around you miserable by having to deal with you? Go be an asshole to yourself and let everyone else work together.

            You, and the billions of people using linux, are free to fork the code and exclude Linus completley.

            >It’s shocking that advocating for safe and inclusive work environments

            This isn't a "work enviroment" in the way you seem to be implying. The vast majority of people contributing to the kernel do not work for linux or Linus.

          • LunaSea a year ago

            Aren't other people joining his project in this case?

            • mynameisvlad a year ago

              Would it make it more ok if people were?

              How many people are joining? How many people are joining because or in lieu of Linus? How many people are joining just because it’s Linux/Git/whatever (although granted that is in part due to Linus making them such big things)? How many people would have joined/wouldn’t have left if he wasn’t there?

              • LunaSea a year ago

                That doesn't matter.

                It is his project and people are free ton join or start their own.

                • mynameisvlad a year ago

                  Of course it matters. There are many reasons why people might join one of his projects, many of which don’t involve him but instead the project itself. His presence might have stifled or grown involvement in those projects.

                  • LunaSea a year ago

                    And that is completely fine.

                    People don't have a universal right to collaborate to this project, especially on their own terms.

                    In the same way, these projects will like you said, evolve in positive or negative ways with no God given right to exist and thrive.

        • nickm12 a year ago

          The choice between condescension to your collaborators and putting people's feeling "above all else" is a false dichotomy. You can disagree on technical merits without making it personal and calling your collaborators kindergarteners who believe in Santa Claus.

      • throw827474737 a year ago

        Nope, imo this extreme also excludes many average folks are not so overly poliically correct. I have the feeling we have overshot peak inclusion and are excluding the not so well behaved folks (or simple people who have other problems, or who want to play these games).

        Personally I find that Linus here not toxic at all, at most a borderline strong opinion, but come on, as well as we all need to be more empathic, we should also be able to take some harsher critique and make not such a toxicity thing out of an more open and direct opinionated response...

      • ectopod a year ago

        Which is worse?

        a) Having poor communication skills.

        b) Describing people as toxic.

        • mynameisvlad a year ago

          If the shoe fits.

          Do you have any actual counter points or were you planning on beating that ad hominem to death?

        • ajkjk a year ago

          Well the latter isn't a bad thing so the former, I guess?

      • zozbot234 a year ago

        So an "inclusive" and "non-toxic work environment" should put fantasy above reality? There's nothing inherently toxic in Linus' message; he's making a technical point about how kernel code should be designed, to deal effectively with the sometimes complex and challenging reality of low-level systems.

        • nickm12 a year ago

          He is implying that his collaborators are kindergarteners who believe in Santa Claus. It's insulting.

          • zozbot234 a year ago

            He's doing no such thing. He's saying "surely you're well past kindergarten and don't believe in Santa Claus, so why would you ever believe this?" It's a valid argument.

      • zo1 a year ago

        I am a nice person and consider myself fair to everyone I work with, everyone gets a fair shot and a clean slate with me. But having to jump through verbal hoops to make my interactions "inclusive" and what these people would call non-hostile is downright hellish for me and way more effort than I think is reasonable. It's not inclusive towards me and is downright hostile towards me.

      • lelanthran a year ago

        You're correct, it should be the default.

        But why are you complaining that a group of people who don't want to work in your default environment went off and created their own?

        I don't understand what you have to complain about: they have their way of working and you want to change that because it offends you?

        Sounds like you're the problem, not them.

        • V_Terranova_Jr a year ago

          Sounds like he's calling them out on an internet forum that they are also free to ignore. Doesn't sound like he's the problem.

          • lelanthran a year ago

            > Sounds like he's calling them out on an internet forum that they are also free to ignore. Doesn't sound like he's the problem.

            Well, I don't go around pointing out how random groups, formed by like-minded people voluntarily, are doing collaboration "wrong".

            If I did, on some random internet forum, complain that the local Street Rod Enthusiasts Club[1] doesn't do proper agendas for their meetings, or that a book-reading club[1] that I know off isn't properly structured, or that the volunteer SPCA group is using the wrong IM/Chat software to communicate .. well, then I'm the problem.

            [1] That I have no intention of joining

      • niscocity35 a year ago

        Are you calling Linus unreasonable and toxic?

      • aaaaaaaaaaab a year ago

        >work environments

        It’s not a “work environment”. You can’t report Linus to HR. If you have a problem with him, you can fork the kernel and convince others to follow you. Then you’ll have a mailing list where you can ban Linus for his style. Good luck!

        • mynameisvlad a year ago

          Yes because if he were at any company, he’d have been fired. Decades ago.

          Just because it’s not an official “work environment” per your definition does not mean it isn’t hostile or intolerable were it actually one.

          But actually countering that point is a lot harder, isn’t it?

          • krater23 a year ago

            Ok, it's intolerable for you how he communicates on the Linux Kernel Mailing List. Are you anyway subscribed to the LKML? Are you kernel developer? Are you developer anyway? Who you are that you want to tell a community how they have to communicate? Why your opinion should matter for this community?

          • throw827474737 a year ago

            > Yes because if he were at any company, he’d have been fired. Decades ago.

            That's a pretty intolerable outright hostile and exclusive judgement :(

          • _dain_ a year ago

            > Yes because if he were at any company, he’d have been fired. Decades ago.

            And then there would be no Linux kernel. So much for companies.

          • aaaaaaaaaaab a year ago

            >does not mean it isn’t hostile or intolerable were it actually one

            It feels hostile and intolerable to you.

            There are many people who find the risk-averse non-confrontational corpspeak intolerable.

  • _dain_ a year ago

    I'd prefer him to say this bluntly in five words rather than five hundred words of prevarication and passive aggression and doubletalk that amounts to the same thing, but that nominally adheres to some CoC speech code.

    I will go further: if you think what Linus said here is unreasonable or rude, you really need to get out more.

  • SanjayMehta a year ago

    You need to realise that

    (a) that's the creator of Linux (b) see (a) above

    • mynameisvlad a year ago

      That shouldn’t excuse him from being a reasonably decent person to work with.

      He gets a lot more leeway than being the creator of Linux should afford someone.

      • SanjayMehta a year ago

        Unreasonable people build things. Reasonable people run meetings.

        • mynameisvlad a year ago

          Reasonable people also build things.

          Unreasonable people also build things alone where everyone else doesn’t have to deal with them.

          The world is hardly as black and white as you make it seem.

          • SanjayMehta a year ago

            There are 10 kinds of people. Those who understand binary and those who don't.

          • P5fRxh5kUvp2th a year ago

            Can you imagine seriously making the claim that Linus built Linux alone.

            • mynameisvlad a year ago

              Who made that claim? I certainly didn’t.

              Just as the parent comment generalized about the two kinds of people out there, I added other examples of generalizations about people. But that’s all they are, generalizations. Not specific examples.

        • KerrAvon a year ago

          I have no opinion on Linus’s behavior here, but that is both false and toxic. Steve Jobs only became truly effective when he learned to be reasonable in appropriate contexts. People who remain unreasonable all of the time crater their companies in the long run. Every time.

        • SanjayMehta a year ago

          Looks like I've annoyed the "reasonable" people with this comment. Interesting.

  • pyb a year ago

    It used to be way worse than that, but yes you're right.

  • Ecco a year ago

    You’re of course not the only one. That being said, it’s probably not that big of a deal for most people: there are countless other open source kernels, but Linux is by far the most popular, including in number of contributors.

  • aaaaaaaaaaab a year ago

    You are free not to collaborate on the Linux kernel. That’s the beauty of free software!

Tomte a year ago

I actually wondered with all the recent "Rust in the kernel" about culture clashes. I mean, most kernel developers aren't Rust programmers (and vice versa).

Now we got a first glimpse at what happens.

Still, I find it strange that it never seemed to come up in preparation to the first Rust merges. Were there any conflict resolution strategies in place (that I don't know about) or just "we flame it out on LKML"?

  • jmillikin a year ago

    I think this is more "[modern] userspace vs kernel" than "Rust vs kernel".

    If you dig slightly below the surface in any major userspace codebase, it has abort paths everywhere. Every memory allocation might abort, every array index or dict lookup might throw an exception, which if uncaught will abort. Lock (or unlock) a mutex twice, abort.

    The Rust standard library inherited this philosophy in large and small ways. An easy example (already being addressed) is memory allocation, but less obvious is stuff like "integer math is allowed to panic on overflow". It's not easy to write Rust code that is guaranteed not to panic in any branch.

    Now the userspace-trained Rust folks are working in the kernel, and they want to be able to panic() when something goes horribly wrong, but that's not how the kernel code works. They'd have the same issue if you tried to get a bunch of GNOME contributors to write kernel drivers with GLib, even though GLib is pure C.

    • mustache_kimono a year ago

      > they want to be able to panic() when something goes horribly wrong

      I'm not sure where you are getting this from the thread? Linus is using a userspace panic as an example. That's not something that is actually happening in the kernel?

    • veber-alex a year ago

      > "integer math is allowed to panic on overflow"

      This is configurable, by default with optimizations on math overflow doesn't panic in rust, it wraps around.

      Obviously the kernel won't enable panics here unless in debug mode.

  • ksec a year ago

    I still believe this is only the tip of iceberg in terms of culture clashes.

kweingar a year ago

Classic Linus.

From the closing paragraph, I feel like he’s under the impression that Rust-advocating contributors are putting Rust’s interests (e.g. “legitimizing it” by getting it in the kernel) above the kernel itself.

  • magicalhippo a year ago

    > I feel like he’s under the impression that Rust-advocating contributors are putting Rust’s interests (e.g. “legitimizing it” by getting it in the kernel) above the kernel itself.

    I mean the post Linus initially responded to did contain[1] a patch removing a kernel define, asking if anyone had any objections over removing that define, just to make the resulting Rust code a little nicer looking.

    [1]: https://lkml.org/lkml/2022/9/19/640

  • sidlls a year ago

    They probably are, in many cases. Rust’s community, in aggregate, have developed a reputation (earned, in my opinion). It’s too bad that the community don’t follow the leaders’ example in this regard. There are some quality, level-headed Rust advocates. They appear to be the minority.

    • mlindner a year ago

      Oh please. Stop smearing Wedson all over the map.

    • mustache_kimono a year ago

      At least they don't go around slandering programming language communities.

      If we're going to be serious about who is being toxic, it's definitely Linus in this thread. Guy makes first mistake (by a very broad interpretation of "mistake". Perhaps "misunderstanding"?). Linus goes nuclear. And while his reasoning is sound, his argumentation cycles between threats, bad-faith arguments, and just plain old yelling.

      What some people don't understand is that the Linux kernel isn't 'led' in any meaningful sense. But I suppose some projects don't need actual leadership? I once was recommended a Metallica documentary, because "It's amusing to see what emotionally stunted 40-50 year olds who have never had anyone tell them 'No' since 18 will do." That's the Linus vibe -- somehow we've limped along to here. Seriously, read the rust/rust-lang issues/RFCs. Those people sound like grownups contrasted to this.

      • shepardrtc a year ago

        > Linus goes nuclear. And while his reasoning is sound, his argumentation cycles between threats, bad-faith arguments, and just plain old yelling.

        In my opinion, in the software world, there is a large number of people who are very convinced of their own correctness. When they do something wrong or are simply mistaken, a gentle correction doesn't work. Linus is probably used to dealing with these people. I'm not saying the person he was replying to was necessarily doing that, but after a while you have an automatic response.

        The beauty and horror of OSS is that anyone can contribute. Having someone scream "WTF are you doing???" every once in a while isn't a bad thing. It's not nice to hear that being directed at you, but sometimes in life it is necessary.

        • mwcampbell a year ago

          In light of this comment, one thing that makes me nervous about leading my own open-source project is that there might not be anyone who is willing to scream "WTF are you doing???" at me when I make a bad design decision.

          • acjohnson55 a year ago

            Do you not have faith in yourself to receive feedback on your design if someone provided it in a less aggressive way?

        • Ar-Curunir a year ago

          > In my opinion, in the software world, there is a large number of people who are very convinced of their own correctness. When they do something wrong or are simply mistaken, a gentle correction doesn't work.

          Too bad there was no around to do that to Linus; maybe he'd finally realize that being an asshole is generally not a correct response.

          • mustache_kimono a year ago

            I think this is generally correct. The argument is "Linux is extraordinarily successful" but I think the counter is just as powerful "How many great features have not been implemented, because people have avoided working on the kernel or simply burnt out?"

      • chrsig a year ago

        ...this is not at all Linus going nuclear. I don't see any threats or 'yelling'. He could have been more diplomatic, and I think Linus was actually trying to be. Diplomacy isn't his strongest suit. I don't think the first comment in his reply was necessarily appropriate, because it was directed at the person rather than the problem. I can also understand not wanting to mince words and establish a very firm boundary so it doesn't become a perennial conversation.

      • kweingar a year ago

        > At least they don't go around slandering programming language communities.

        Not so sure about this. I see a good amount of acrimony toward C, C++, Go, Zig, etc. from the Rust side.

        • mustache_kimono a year ago

          I think "acrimony" among languages (not language communities) is fine, like "you don't have an Option type?" or "you don't guarantee I won't have a use after free?". I think saying the "The Go/Rust/Zig community is uniquely toxic" crosses a line. And, to be very clear, if Rust people do it, I think it's awful as well.

          • sidlls a year ago

            It’s not unique to Rust’s community at all. I think they have relatively more visibility in places like HN, currently.

        • timeon a year ago

          > Zig

          In this case it seems to be mutual. Even open hostility from some leading members of Zig community. Which is shame because these two languages could nicely coexist.

      • znpy a year ago

        > Linus goes nuclear.

        By his own standards he’s been very polite and calm. Remarkably so I’d say.

        He used to be way ruder in the past, then decided to work on that and be kinder.

        You can clearly see that in those emails.

        The fact that he doesn’t agree with somebody and articulates why doesn’t mean he’s rude.

      • throw827474737 a year ago

        Guy makes mistake and even after that continues discussion though he already heard the agreed argument?

        I'm not understanding that if Linuxrusters want to do more their own thing and get rid of those rules and discussions they just fork off a real Linuxrustkernel and go off?

        The "political correct" toxicity comes from that group which continously wants to undermine long beforehand agreeds frontiers... (e.g. again this panic-is-more-safe discussion).

        • mustache_kimono a year ago

          > The "political correct" toxicity comes from that group which continously wants to undermine long beforehand agreeds frontiers

          I have no idea what this actually refers to? "Panic is more safe"? Rust doesn't choose to panic on a failed memory allocation in the kernel, and never intended to. It was always TODO until it was implemented? Linus is using a userspace panic as an example here?

          As to this thread's particular issue, the API for an allocation wasn't settled, and this is the discussion. I think the contributor was completely within his remit to say, "Heck, we could do this is a more memory safe way..." And Linus was completely right to say "Yeah, that's not how we do allocations here." The only problem is thinking being a dick is a good way to lead a community.

          I think some fantasize about being able to be a dick in a FOSS project just like Linus (which feels like "if only I was a strong man dictator"), and I think that's an absurd desire. The Linux kernel is sui generis. In no other area of the world can anyone act this way, and be productive.

      • turtleyacht a year ago

        It might be difficult to back out kernel changes versus userspace changes. App-level concerns with leaky abstractions could follow functional programming, immutable state, fail-fast, and all sorts of gospel--but there's still a kernel doing stuff behind the scenes.

        If the kernel acquiesces to certain philosophies that are opposite to its intent as-a-kernel for many other environments and contexts it must support, a cascade of later patches could derail things completely. It may become too much effort to undo, and the project must limp along--until that mountain of tech debt costs too much to fix.

        Maybe the kernel cannot fail fast for good reasons. And the Linux project cannot fail fast for equally good reasons.

        And possibly, if a technically compelling reason presents itself, Linus may fully back it--even contributing to that work himself.

      • h2odragon a year ago

        Threats? Slander? Do you feel that you're "speaking for a community" here?

        • mustache_kimono a year ago

          Heck no! And I can't imagine anyone thinking I was?

          The threat is pretty clear? "If Rust people don't get this, we will have to part ways." This is an ultimatum? It's crazy girlfriend/boyfriend material? It's ridiculous after one contributor tries something that Linus thinks won't work in the kernel. Ridiculous. Just say no.

          The slander as well? "Rust’s community, in aggregate, have developed a reputation." And you know what? The C/C++/Zig/Nim/Haskell/Clojure communities have developed a reputation too, but, gosh, I don't talk about it because I know labeling groups isn't helpful/is completely non-technical.

          • topspin a year ago

            > "If Rust people don't get this, we will have to part ways."

            What are you quoting? I don't see this anywhere in the thread.

            The nearest I see is:

                If you cannot get over the fact that the kernel may have other
                requirements that trump any language standards, we really can't work
                together.
            
            A reasonable, politely delivered, statement directed to an individual as opposed to Rust. It was in response to this rather cringy bit of lecturing:

                No one is talking about absolute safety guarantees. I am talking about
                specific ones that Rust makes: these are well-documented and formally
                defined.
            
            Rust has no formal language specification yet. It's still "an area of research," to paraphrase what is said when the question is asked. No defined memory model either; from the current Rust reference:

                Rust does not yet have a defined memory model. Various academics
                and industry professionals are working on various proposals, but
                for now, this is an under-defined place in the language.
            
            One could argue (not me; I'm far too pragmatic for such things) that Linus is being exceptionally generous in entertaining Rust in its current state.
            • mustache_kimono a year ago

              FWIW, I actually mostly agree with Linus.

              I was paraphrasing. I didn't want to write a page length comment, and won't here, but there were a few more instances of similar ultimatums (like "Or, you know, if you can't deal with the rules that the kernel requires, then just don't do kernel programming.") And all are similarly ridiculous/dickish. Really no need for such dramatic convulsions, Linus, where Wedson was simply trying to explain the API expectations of the Rust language.

              Re: the rest, I think you are conflating Rust's UB guarantees with a specified memory model.

              • topspin a year ago

                > I was paraphrasing.

                You put it in quotes and didn't mention any paraphrasing. Linus didn't write it.

                > Rust's UB guarantees

                Can you point out the normative document that provides these guarantees? Rust doesn't have one as far as I know.

                • mustache_kimono a year ago

                  > You put it in quotes and didn't mention any paraphrasing. Linus didn't write it.

                  I think it's a fair characterization of what was said. Feel free, as everyone is, to read the entire thread again. I'm not a journalist. You have the primary source at your finger tips!

                  > Can you point out the normative document that provides these guarantees?

                  You're looking at the Rust reference right? https://doc.rust-lang.org/reference/behavior-considered-unde...

                  • topspin a year ago

                    > You're looking at the Rust reference right?

                    Not normative, as stated here[1], linked from the page you cite.

                    [1] https://doc.rust-lang.org/nomicon/index.html

                    • mustache_kimono a year ago

                      Okay? Do you think you have you quibbled enough? To be clear, I still think it's fine for Wedson to inform him even if the document is not a normative reference/specification? Even if these are just the expectations of API/Rust users?

                      • topspin a year ago

                        > the expectations of API/Rust users?

                        Pointing out whatever those are is fine. Linus pointing out the expectations of the Linux kernel is fine too, and no amount of invoking fictional formalisms trumps them.

                        • mustache_kimono a year ago

                          I 100% agree. And if you read my comments you'd realize, I agree with Linus on the substance. I think the way he said it was dick-ish. That's it!

                  • topspin a year ago

                    > I think it's a fair characterization of what was said.

                    I think inventing Linus quotes is unfair.

                    • mustache_kimono a year ago

                      Again, not a journalist? You/everyone are supposed to have read the primary source, as it's the linked subject of our discussion. I think whatever expectations of fairness we have for internet comments -- I have far exceeded them. And now we have your comment pointing out... whatever it is you wanted to point out. Reader beware!

          • sidlls a year ago

            The difference is these communities don’t have advocates spamming every technical thing they can find crapping on everyone else and proclaiming their favored language/tech to be indisputably superior in every case.

            • mustache_kimono a year ago

              If you don't think the Rust "community" receives the same sort of spam arguments from anti-Rust folks, then you're kidding yourself. And, yes, I completely understand that such arguments are super annoying. But the wrong response in my view is to answer with another stupid argument ("The Rust community is the problem.")

              Was Wedson acting in an untoward way here that in some way exemplifies something significant about the Rust community? No, not really. So, yeah, I think your comment above is a pointless low blow, cheap shot, an excuse to act nasty about some super annoying Rust comment you probably read months ago. And it just sounds like whining to me.

            • V_Terranova_Jr a year ago

              Extraordinary claims ought to be supported by extraordinary evidence. Those are some serious accusations you make.

  • mlindner a year ago

    You're completely wrong here. There is no push to "legitimize" Rust by getting it into the kernel. A lot of people want to actively write drivers for Linux without having to use C to do it.

    Trying to tweak the kernel to make integration easier in a supposed non-harmful way doesn't harm anything.

    • yencabulator a year ago

      Linus is specifically saying the proposed "tweak" is not desirable.

  • aaaaaaaaaaab a year ago

    Is he wrong?

    • kweingar a year ago

      Regardless of whether he’s right or wrong, I think that this is somewhat natural and is to be expected.

      Like with any emerging technology, early adopters become advocates because they’re convinced of the technology’s superiority. Once they organize into a community and get to know each other personally, then at least some of the motivation shifts: you want to see your friends succeed, you want to be part of a community that is making change, you want your early adoption to be “validated” by mainstream success, etc.

      This can cloud technical judgment (not saying this is happening here, but if it were, it wouldn’t be surprising)

    • bitexploder a year ago

      Does it even matter? Rust does what it does and it was enough of a benefit to include in the kernel. It is a big accomplishment.

  • lbhdc a year ago

    I was under the impression that was the reason for the push.

tcfhgj a year ago

> Not completing the operation at all, is not really any better than getting the wrong answer, it's only more debuggable.

Wouldn't be that sure about that. Getting the wrong answer can be a serious security problem. Not completing the operation... well, it is not good, but that's it.

  • atty a year ago

    The kernel can’t fail to complete its operations, because then the entire system crashes and no logs are created. Instead, you can finish the operation and check the result.

    • chlorion a year ago

      The kernel can't panic and display an error message, but corrupting itself and deleting valuable data or allowing people to execute arbitrary code (possibly remotely) is okay?

      I really have a hard time understanding how anyone could possibly think that's okay.

      It sounds like the kernel's quality is so poor that UB is commonplace and even expected at this point. Pretty scary how many systems are relying on this huge pile of broken C code to hopefully only slightly corrupt itself and your system.

      I'm not even sure how useful Rust in the kernel is going to be considering they want it to just ignore errors. You can't even have bounds checking on arrays because invalid accesses might be detected at runtime and cause an error, which is totally insane.

    • charcircuit a year ago

      panic doesn't instantly crash the program. It prints out debug information first. You could have kernel panics work the same way.

      • Someone1234 a year ago

        And when Linux is running on your fridge, in your car, or on a headless VM then who is there to read out this "printed output." The great thing about "log and continue" is you can automate collection and fix the underlying bug (or know that the hardware is failing).

        Keep in mind that in a kernel panic no hardware is assumed to work, so assumptions like "just write to storage!" isn't an assumption you can make, you're in a panic the IO could have been literally pulled out.

        • charcircuit a year ago

          >Keep in mind that in a kernel panic no hardware is assumed to work

          So just change that assumption since for these edge cases that is an incorrect assumption.

      • wtallis a year ago

        Printing debug information to the kernel log then immediately triggering a kernel panic is not as useful as it sounds, because that approach will quite often result in that debugging information never reaching a display or any kind of persistent storage.

      • secondcoming a year ago

        Prints it out how? If the kernel has crashed how do you guarantee anything gets printed, either to the screen, tty, log file?

  • alerighi a year ago

    > Not completing the operation... well, it is not good, but that's it.

    Depends on what the operation is. If the operation is flying an airplane or controlling a nuclear reaction, you are sure that not completing the operation and just aborting the program is the worst outcome possible. Beside the error can crash the plane or melt down the nuclear reactor, but may also not have any effect at all, e.g. a buffer overflow overwrites a memory area that is not used for anything important.

    Of course these are extreme example (for which Linux is of course out of discussion since it doesn't offer the level of safety guaranteed required), but we can make other examples.

    One example could be your own PC. If you use Linux, take a look at the dmesg output and count the number of errors: there are probably a lot of them, for multiple reason. You surely want your system to continue running, and not panic on each of them!

  • atoav a year ago

    I mean if it is a cosmetic thing sure. If it has substantial meaning I would rather have that 5 ton robotic welding arm not move than have it move through my skill.

    It is sometimes acceptable to get wrong output. But is nearly always better to know it is wrong.

    • fritolaid a year ago

      Unless it was holding a welding gun and stopped on one spot with the welding flame turned on instead of gracefully turning off the flame and backing away.

      Never used Rust before but is there a way to supply some default code to run in such a situation instead of just not carrying out the bad operation?

    • analognoise a year ago

      This sounds like the difference between "fault tolerant" and "fail safe".

      Fault tolerant - you get a fault, you keep moving.

      Fail safe - you fail, and thus all operations are stopped.

      • gmueckl a year ago

        Failing may require triggering some actions actively. Going inert is not the right way in many cases. Some system absolutely require best efforts in the face of failure. A fire alarm in an otherwise secure and locked down facility may have to trigger the opening of door locks, for example.

      • atoav a year ago

        I mean the Rust appeal is actually that it foeces you to handle Errors. Whether you then fail or not is your decision. What Rust usually does not do is just fail.

        This is good for when the things you are using could error, e.g. when you use an arbitrary unicode string as a filename you might get an error because depending on the OS there might be characters that you cannot use as filenames that are valid unicode (or the other way around, possible filenames that are not valid unicode).

        In most programming languages this is something you need to know to catch it. In Rust this is an Error that you can or cannot handle. But you can't forget to deal with it.

  • remram a year ago

    Not completing the operation is also a security issue, commonly called denial of service (DoS).

  • 2OEH8eoCRo0 a year ago

    True- which is why he says to throw a warning first.

tialaramex a year ago

I don't think I buy Linus' high level claim. It is not necessarily better to press on with the wrong answer, in some cases failure actually is an option and might be much better than oops we did it wrong.

This morning I was reading about the analysis of an incident in which a London tube train drove away with open doors. Nobody was harmed, or even in immediate danger, the train had relatively few passengers and in fact they only finally alerted the driver at the next station, classic British politeness (they made videos, took photographs, but they didn't use the emergency call button until the train got to a station)

Anyway, the underlying cause involves systems which were flooded with critical "I'm failing" messages and would just periodically reboot and then press on. The train had been critically faulty for minutes, maybe even days before the incident, but rather than fail, and go out of service, systems kept trying to press on. The safety systems wouldn't have allowed this failed train to drive with its doors open - but the safety critical mistake to disable safety systems and drive the train anyway wouldn't have happened if the initial failure had caused the train to immediately go out of passenger service instead of limping on for who knows how long.

  • theptip a year ago

    I feel like the OP is really hurting from quoting Linus out of context. This is many messages deep in a thread about automatically detecting atomic contexts in the allocator.

    And I don’t think he’s making a system level claim, that the whole train system should be designed to limp on through failures. He’s claiming that the kernel needs to be able to limp on so that the systems that use it can have the best chance of e.g. sending automated bug reports. (Or you can turn off the limping behavior if you want; maybe trains should do that. But maybe a train’s control system randomly rebooting might be more catastrophic than leaving its doors open? I don’t know.)

    From a couple messages up-thread in the OP:

    > … having behavior changes depending on context is a total disaster. And that's invariably why people want this disgusting thing.

    They want to do broken things like "I want to allocate memory, and I don't want to care where I am, so I want the memory allocator to just do the whole GFP_ATOMIC for me".

    And that is FUNDAMENTALLY BROKEN.

    If you want to allocate memory, and you don't want to care about what context you are in, or whether you are holding spinlocks etc, then you damn well shouldn't be doing kernel programming. Not in C, and not in Rust.

    It really is that simple. Contexts like this ("I am in a critical region, I must not do memory allocation or use sleeping locks") is fundamental to kernel programming. It has nothing to do with the language, and everything to do with the problem space.

    So don't go down this "let's have the allocator just know if you're in an atomic context automatically" path. It's wrong. It's complete garbage. It may generate kernel code that superficially "works", but one that is fundamentally broken, and will fail and becaome unreliable under memory pressure

  • flumpcakes a year ago

    I think it's obvious that Linus is correct here.

    For example, say there's a bug in the Linux kernel that would produce a "panic" at midnight Dec 31st 2022... do we accept a billion devices shutting down? In the best case rebooting and resuming a whatever user space program was running?

    Despite the bad taste, I think the obvious answer is as Linus says: the Kernel should keep going despite errors.

    • maxbond a year ago

      A better analogy would be: Let's say if we have kernel A that contains a bug; we don't know when it will trigger or what it will do. We have another kernel, B, which has the same bug, but while we don't know when it will trigger, we know it will cause the device to halt. Which is the better kernel?

      I'd say B is nearly always the better choice, because halting is a known state it's almost always possible to recover from, and going into unknown state may cause you to get hacked or to damage your peripherals. But if we were operating, say, a Mars rover, and shutting down meant we would never be able to boot again, then it'd be better take kernel A and attempt to recover from whatever state we find ourselves in. That's pretty exotic, however.

      In the case of an unanticipated error in a software component, we always need input from an external source to correct ourselves. When you're the kernel, that generally means either a human being or a hypervisor has to correct you; better to do so from a halted state than an entirely unknown one. Trying to muddle through despite is super dangerous, and makes your software component into lava in the case of a fault.

      • wtallis a year ago

        > But if we were operating, say, a Mars rover, and shutting down meant we would never be able to boot again, then it'd be better take kernel A and attempt to recover from whatever state we find ourselves in. That's pretty exotic, however.

        That you view it as exotic is partly a lack of imagination on your part; with a little more effort it's possible to identify similar use cases that are much closer to home than Mars.

        But that doesn't really matter. What matters is that the Linux kernel needs to support both options, because it's just one component in a larger system and that context outside the kernel is what determines which option is correct for that system.

        • maxbond a year ago

          > [W]ith a little more effort it's possible to identify similar use cases that are much closer to home than Mars.

          If you feel there are some that would add to this conversation, feel free to share them.

          • krater23 a year ago

            Your phone dies when you need to call 911. Your selfdriving car dies when you driving 120km/h on the highway. Only 2 that needed no effort to find.

  • pwinnski a year ago

    Linus' statement are applicable to the kernel only, and if we're using tube analogies, he was talking more about situations where the train is underway and something fails. The Rust way would be to panic, train stops in between stations and must be rebooted to continue. Linus was saying no, you carry on despite the error until you get to the next station. Much as the passengers in your story did.

    • tialaramex a year ago

      > The Rust way would be to panic, train stops in between stations and must be rebooted to continue.

      Which is safe. It's inconvenient, but it's safe. Failures of this sort do happen, electrical fires are probably the most extreme example. They're annoying, but nobody is at risk if you stop. Since the tube is in civilisation (even at the extreme ends of the London Underground which are outside London, like Chesham, this is hardly wilderness, you can probably see a house from where your train stopped if there aren't trees in the way) we can just walk away.

      https://commons.wikimedia.org/wiki/File:Chesham_Tube_Station...

      > Linus was saying no, you carry on despite the error until you get to the next station

      Depending on the error the consequences of attempting to "carry on" may be fatal and it's appropriate that the decision to attempt this rests with a human, and isn't just the normal function of a machine determined to get there regardless.

      • gmueckl a year ago

        Stopping a train in the tube between stations is not safe. You can't get off the train safely between stations. Most help can't reach a train stuck in a tube.

        • tialaramex a year ago

          Trains can be, and sometimes are, evacuated in a tunnel. The front (and rear, these trains are symmetrical) can be opened, converting into steps for able-bodied passengers to walk down to the tunnel floor.

          There's a video of passengers doing this for real in this 2016 news article:

          https://www.bbc.co.uk/news/uk-england-london-36716256

          • gmueckl a year ago

            Note the electrified third rail in the photos. It's not safe to walk there before that rail is disconnected.

eric4smith a year ago

“Rust is safe” is generally the same thing as saying “I like strongly typed languages”.

None of that is going to save us from bad code.

Some of the biggest systems that run the world are not written with either safe code nor strongly typed languages.

Yes I would say strongly typed languages and memory safe languages help make coding easier and indeed save time and some bugs.

But when you get past making the kinds of errors that cause memory problems or bad types…

You are still left with 95% of the bugs and logic errors anyway.

Still, 5% savings in productivity is not nothing.

  • Jweb_Guru a year ago

    Unfortunately for this theory, about 70% of C and C++ CVEs are memory safety issues, not 5%.

    • jstimpfle a year ago

      > 95% of bugs are logic errors

      > 70% of CVEs are memory errors

      No contradiction here.

coldtea a year ago

>And the reality is that there are no absolute guarantees. Ever. The "Rust is safe" is not some kind of absolute guarantee of code safety. Never has been. Anybody who believes that should probably re-take their kindergarten year, and stop believing in the Easter bunny and Santa Claus.

I thought that he had apologised and regretted being hostile in comments. Apparently not. Not that I have much of an issue with ranty colorful language, but you need to also be right and have a legitimate cause to pull it off...

The point he makes is BS. "the reality is that there are no absolute guarantees. Ever" Yeah, DUH! The compiler could have bugs and soundness issues for example.

The point is you don't need "absolute guarantees" just "way safer and which dozens more classes of issues discovered automatically" is already enough. The other guy didn't write about "absolute guarantees". He said "WE'RE TRYING to guarantee the absence of undefined behaviour". That's an aim, not a claim they've either achieved it, or they can achieve it 100%

>Even "safe" rust code in user space will do things like panic when things go wrong (overflows, allocation failures, etc). If you don't realize that that is NOT some kind of true safely, I don't know what to say.

Well, if Linus doesn't realize this is irrelevant to the argument the parent made and the intention he talked about, I don't know what to say...

  • ukweld a year ago

    Why are so many people criticizing Linus? This post strikes me as relatively moderate.

    Other software dictators do exactly the same, but in a more underhanded and bureaucratic manner, which is worse. Yet their disciples call them "benevolent".

    I can deal with Linus, but not with the latter. Linus strikes me as not being really serious or vindictive. It's just a colorful way of expressing himself.

    • peoplefromibiza a year ago

      because people of modern age have to destroy everything's good, to feel better about themselves, without having to actually be good.

      • coldtea a year ago

        Sort of how Linus pisses on Rust with a not-actually-good argument?

        • peoplefromibiza a year ago

          Linus is not pissing on Rust though, his argument is about panic in Kernel code.

          Why people feel attacked by Linus words is a mystery to me.

          • V_Terranova_Jr a year ago

            Not sure how mysterious it can be when he opens with a rant like:

              And the *reality* is that there are no absolute guarantees.  Ever. The "Rust is safe" is not some kind of absolute guarantee of code safety. Never has been. Anybody who believes that should probably re-take their kindergarten year, and stop believing in the Easter bunny and Santa Claus.
            
            This is needlessly talking down to competent developers as if they are deluded children. It's also not the only instance of it in the linked message. He would be far better off just going straight into the technical differences between what he is willing to permit in his kernel vs. what the Rust-oriented developers seek.
          • coldtea a year ago

            Yeah, a response saying "go back to kindergarden" etc, and people feel attacked? Such a mystery...

  • jmull a year ago

    > The point he makes is BS. "the reality is that there are no absolute guarantees. Ever" Yeah, DUH!

    You calling his point BS, but also strongly agreeing with it.

    I guess you find it too obvious. But while it's obvious to many, there seem to be many who do not understand it. Issues involving rust often get derailed to pointlessness when rust's safety guarantees are treated as an absolute.

    • coldtea a year ago

      >You calling his point BS, but also strongly agreeing with it.

      I'm only agreeing with the fact that there are no absolute guarantees. Not that his use of the fact in the point he makes has any relevance...

      If somebody had said "The earth is round, therefore we should not care about getting lost and GPS, because you always can always keep going and end up where you started on a sphere anyway", then I would have also "stongly agreed" to the first factoid, but think the overall point BS.

      • jmull a year ago

        The discussion is about what (if anything) the linux kernel should do to help satisfy the guarantees that rust wants to make.

        Accepting the idea that rust guarantees aren't necessarily always good is needed to accept the idea that those guarantees might need to be relaxed, or at least don't necessarily justify linux kernel changes.

        • coldtea a year ago

          >Accepting the idea that rust guarantees aren't necessarily always good

          The idea that rust guarantees aren't necessarily always good is completely orthogonal to a condescending diatribe about how "there are no absolute guarantees", and totally needlessly connected to "go back to kindergarten" and "stop believing in Santa" BS.

  • jstimpfle a year ago

    > The point he makes is BS. > Yeah, DUH! > Well, if Linus doesn't realize

    Bordering the hypocritical... And I got the impression you missed his point as well.

    • coldtea a year ago

      >Bordering the hypocritical...

      Only if one can't separate a trivial factoid being used in an argument with the quality of the argument itself and the point being made...

      You, know, you can agree that "there are no absolute guarantees" while still considering it a BS argument to use this fact to support that having the (non-absolute) guarantees Rust does give is in any way less useful...

      You can also disagree that "there are no absolute guarantees", while true, has any place to be used in an argument against the use of safer compilers...

      That's of: "There are no absolute guarantees against dying from a crash, and safety belts don't give you any, so let's not use safety belts either" quality

  • peoplefromibiza a year ago

    > Anybody who believes that should probably re-take their kindergarten year, and stop believing in the Easter bunny and Santa Claus.

    In today's news "random angry guy on the Internet tells Linus Torvalds to go back to kindergarten, because reasons"

    • coldtea a year ago

      Actually Linus wrote the above part. It's a quote from his post.

      • peoplefromibiza a year ago

        Then there's a good chance that he was right.

        Harsh truths are still truths.

        • coldtea a year ago

          Yes, heads you win, tails we lose.

  • flumpcakes a year ago

    > "WE'RE TRYING to guarantee the absence of undefined behaviour". That's an aim, not a claim they've either achieved it, or they can achieve it 100%

    How is a "guarantee" not claiming something is 100% ?

    • coldtea a year ago

      Almost all real life guarantees involve things that are not 100% guaranteed.

      "I guarantee I will be there" - but I could always be hit by a bus, or have a very serious family issue to tend to, or an earthquake might happen, or the airports might be closed due to Covid and so on.

      "Our bank guarantees your money" - yeah, except if the global economy collapses, or the country is hit by an asteroid, or if there's martial law, and so on.

      The trivial such cases are irrelevant to the guarantees they want to offer (and same for Rust), and it's a bad move to point to them and consider them as part of his argument.

      Not to mention they're saying "trying to", not guaranteeing in the first place. Which acknowledges things like possible bugs, or some edge case not handled, etc.

  • galangalalgol a year ago

    I think he does make a good point about the wrong answer sometimes being better than a panic. I assumed rust in the kernel would be compiled with no-panic.

stephc_int13 a year ago

This a naming/marketing issue.

Because "safe" in the context of a programming language is provably wrong and thus will trigger adversary reactions.

Rust is a hardened language, compared to C/C++. In the same way that Ada is hardened language, with different techniques, but the spirit is similar.

  • mlindner a year ago

    Rust isn't really hardened in the same way as Ada at all. They're almost perpendicular to each other.

staticassertion a year ago

> Even "safe" rust code in user space will do things like panic when things go wrong (overflows, allocation failures, etc). If you don't realize that that is NOT some kind of true safely, I don't know what to say.

When people say "safe" there's a pretty precise meaning and it's not this.

Yes, anyone who believes rust is 100% "safe" (by any definition) is wrong. That's not something you learn in Kindergarten though, it's actually about understanding that Rice's Theorem is a generalization of the Halting Problem.

> o this is something that I really need the Rust people to understand. That whole reality of "safe" not being some absolute thing

The irony of Linus lecturing anyone on safety lol anyway "the Rust people" know this already, when they say "safe" they mean "memory safe" - https://en.wikipedia.org/wiki/Memory_safety

Anyway, dumb shit like this is why I've always been quietly dreading Rust in the kernel.

a) The kernel will never be safe software because the mainline developers don't want it to be or even know what safe means

b) It just invites more posts like this and puts Rust closer to one of the most annoying software communities

> Or, you know, if you can't deal with the rules that the kernel requires, then just don't do kernel programming.

Agreed on this point. I was very interested in kernel dev earlier in my career until I actually started to engage with it.

  • tmtvl a year ago

    It does make sense that the mainline developers don't know what "safe" means if you arbitrarily decide that "safe" means "memory safe" specifically and no other kind of "safe". A Haskell or Clojure developer could arbitrarily decide that "safe" means "safe from side effects," but unless that is clearly stated every time they engage in discourse with someone I wouldn't blame their discussion partners for not knowing what the developer means when they talk about some code being "safe".

    I will agree with you that I dread Rust in the kernel, hopefully it can continue to exist there peacefully without people getting too hot under the collar about their personal hang-ups. For all its flaws Rust has an amazing value prop in the borrow checker and I would love for memory bugs to be eliminated for good.

    • staticassertion a year ago

      >if you arbitrarily decide that "safe" means "memory safe" specifically and no other kind of "safe".

      This is how Rust has always defined it. Linus is specifically saying that "Rust people" don't understand what "safe" is but... they do, he doesn't. He could say "Rust defines it as X, the kernel needs Y" but he doesn't say that, he implies that Rust people just don't understand the word "safe" or that they think Rust is safer than it is, which is simply not true. As I said, quite ironic given history.

      > I wouldn't blame their discussion partners for not knowing what the developer means when they talk about some code being "safe".

      I mean, I would definitely blame them if they're also going to go on an insulting rant about their definition being wrong.

      > without people getting too hot under the collar about their personal hang-ups

      Impossible, in my opinion, until a ton of people retire.

  • 2OEH8eoCRo0 a year ago

    One of my Marine NCOs would say, "there is no such thing as safe."

    You aren't safe on the FOB, in your car, in your barracks, or in your house. There are only degrees of safety. Very wise almost globally applicable words.

    • Ygg2 a year ago

      > There are only degrees of safety.

      Sure but people use this logic to justify no safety. Find me a marine a that goes into war totally naked.

      • 2OEH8eoCRo0 a year ago

        That's great for them. I dont use it to justify no safety.

bobajeff a year ago

As a layman who hasn't done any kernel programming. Linus sounds pretty reasonable here. We can't have the kernel crashing because of a panic.

  • Someone1234 a year ago

    A kernel crash IS a panic. They're one and the same.

    The discussion is a little more nuanced than just that. It is "we've entered an invalid/undefined/corrupt state, now what?" And in essence saying "We ONLY panic as a matter of last resort, we'll just spit out a bunch of loggable errors and soft fail from the kernel call until then."

robalni a year ago

I feel like there is an underlying problem here that Rust tries to be a "safe" language while "safety" isn't well defined. Rust said that crashing a process is always safe so that when something unexpected happens we can always resort to crashing so that we don't risk doing anything unsafe.

The problem is that this definition of safety is very arbitrary. Sometimes crashing a process can be safe (as in not causing serious problems) but sometimes not. Accessing an array out of bounds can be safe sometimes and sometimes not, and so on.

Rust says that here is a list of things that are always safe and here is a list of things that are always unsafe and then people want safety everywhere so they take that definition of safety to other contexts where it doesn't make sense, like the kernel.

mslm a year ago

This kind of exchange was inevitable. The Rust crowd has this mentality that their code can be perfect (beyond even 'safe'), when in reality as long as your foundational system inputs and capacity aren't perfect, no downstream thing can be either. It's harder to see in user space but in the kernel you can't avoid reality. Hope the Rust crowd in general gets more moderate after this (or maybe not, but then that's only to the loss of Rust's long-term success).

flumpcakes a year ago

I find the Rust community to be hostile towards any inquisitive questions about their claims of "guaranteed memory safety". I've argued before that C is probably a safer language in practise for the Linux kernel than Rust because you would have to contort and write non-idiomatic Rust, using FFI, or deal with C data structures that will hamper/remove a lot of Rust's memory safe benefits. Rust is also harder to read than C - especially if you are trying to keep a mental model of the bitmap layout in your head and just dealing with low level code.

Of course I've had many negative comments from "Rustaceans", with their defence of their negativity being "we don't like it when someone comes into our community".

It is a shame because Rust is a pretty cool language, but at this current rate I don't really see it being "the" systems programming language de jure.

I think Zig is probably a much better fit for writing a Kernel in a safer language. Again, rust programmers pile on and tell me that "zig isn't memory safe". We can't make use of other languages that bring safety benefits without the dog pile of "you should use Rust it's safe". Apparently nothing is safe other than Rust.

dureuill a year ago

Hmmm, the linked email is not providing a lot of context, so surely I'm missing something, but there's something I definitely don't understand: is there not a third option between stopping the whole kernel on an error or allowing an incorrect result?

Maybe my misunderstanding comes from my ignorance of the kernel's architecture, but surely there's a way to segregate operations in logical fallible tasks, so that a failure inside of a task aborts the task but doesn't put down the entire thing, and in particular not a sensitive part like kernel error reporting? Or are we talking about panics in this sensitive part?

Bubbling up errors in fallible tasks can be implemented using panic by unwrapping up to the fallible task's boundary.

To my understanding this is exactly what any modern OS does with user space processes?

I always have the hardest of time in discussions with people advocating for or against that "you should stop computations on an incorrect result". Which computations should you stop? Surely, we're not advocating for bursting the entire computer into flames. There has to be a boundary. So, my take is to start defining the boundaries, and yes, to stop computations up to these boundaries.

  • garaetjjte a year ago

    >and in particular not a sensitive part like kernel error reporting

    Things like "kernel error reporting" doesn't exist as discrete element. Sure, you might decide to stop everything and only dump log onto earlycon, but running with serial cable to every system that crashed would be rather annoying. For all kernel knows, the only way to get something to the outside world might be through USB Ethernet adapter and connection that is tunneled by userspace TUN device, at which point essentialy whole kernel must continue to run.

    • dureuill a year ago

      > Things like "kernel error reporting" doesn't exist as discrete element.

      I'm not familiar with kernel development in general or Linux in particular. I would have expected there to be an error reporting subsystem, so that if a given subsystem fails the failure is reported to the error reporting subsystem (which hopefully exposes a more modern interface than serial cable), but this might be naive on my part.

      > For all kernel knows, the only way to get something to the outside world might be through USB Ethernet adapter and connection that is tunneled by userspace TUN device, at which point essentialy whole kernel must continue to run

      Again I'm missing context on this discussion. For all I know this could be an error originating with a driver, since rust support for Linux is for driver development now. It would make sense to me that an error in the GPU driver doesn't prevent the ethernet driver to report the bug

      • wtallis a year ago

        There are plenty of platforms for which the available logging options are to either keep the whole network stack running, or get out the soldering iron to attach a serial port to unpopulated headers. So a "more modern interface" often isn't available, or has enough dependencies on the rest of the kernel that it's impossible to encapsulate into an error reporting subsystem that is at all self-contained.

acjohnson55 a year ago

So much for Linus's time away to work on himself. It's disheartening to see how hard it is to change even when someone has the intention and resources to.

  • oconnor663 a year ago

    It's important to distinguish "better" from "perfect". That's how we get the motivation to make incremental progress every day.

    • acjohnson55 a year ago

      I would say this demonstrates inadequate progress.

      As a manager, if I had a report who exhibited this level of verbal aggression, we would have a talk, and if it happened again, we'd be going through HR. It's not acceptable, regardless of technical merit.

Pulcinella a year ago

Not all that familiar with the specifics of Rust, but I assume it’s “safety” is somewhat similar to Swift’s “safety,” so type safety and memory safety, which does not mean no crashes, just that you will e.g. crash on an array OOB error rather than start writing or reading to random bits of memory.

  • oconnor663 a year ago

    You've got the right idea. The Rustonomicon gives a list of approximately everything that Rust considers unsound/UB (https://doc.rust-lang.org/nomicon/what-unsafe-does.html). The most common examples are:

    - use after free

    - breaking the aliasing rules

    - causing a "data race" (e.g. writing to the same value from multiple threads without a lock)

    - producing an invalid value (like a bool that's not 0 or 1)

    There's some other technical stuff like "calling a foreign function with the wrong ABI", but those four above capture most of what safe Rust wants to guarantee that you never do. I contrast, the same page provides an interesting list of things that Rust doesn't consider UB and that you can do in safe code, for example:

    - deadlocks and other race conditions that aren't data races

    - leak memory

    - overflow an integer

    - abort the whole process

  • staticassertion a year ago

    Rust's "safety" is memory safety. It's relatively well defined for a technical term: https://en.wikipedia.org/wiki/Memory_safety

    edit:

    > Yeah I was just trying to provide a clear definition, I didn't think you were implying it was BS.

    (would have replied but I'm rate limited on HN - thanks dang!)

    • Pulcinella a year ago

      Sorry didn’t mean to imply that it was BS or anything with the scare quotes. More that there is a more specific meaning behind it than some laymen’s interpretation of the word safe.

      I know I was a little surprised when I was learning Swift after hearing it was called safe only to experience crashes with array OOB. Took some explanation and thinking to understand what was meant by safe.

phendrenad2 a year ago

Whenever people say things like "Use Rust, it's memory safe" I know that they're clueless. Nobody has shown any evidence that for the average project, written by average developers, writing in Rust won't result in just as many exploitable bugs as writing in C.

Also I had to laugh at this:

> No one is talking about absolute safety guarantees. I am talking about specific ones that Rust makes: these are well-documented and formally defined

As the saying goes "name three".

0xbeefeed a year ago

Well, this had to happen at some point. Rust-for-Kernel isn’t just a second language. It’s another culture. A culture that is in clash with kernel community. This project should be halted right now, instead of wasting thousands of man-hours only to be stopped later. There is no way these two cultures would go along for long.

armchairhacker a year ago

what about a linter for Rust which highlights functions that may panic so you can avoid them? It seems like a fun project and useful feature

Unless I’m mistaken, in “safe” Rust, programs can still crash but only by calling “panic”, or other trivial cases (explicitly calling “exit” with a nonzero return value, calling into ffi code, etc)

Detecting functions which may “panic” and “exit” is very easy, significantly easier than detecting possible UB. Avoiding these functions (or providing a comment “no-panic guarantee” like “safety guarantee” for unsafe Rust) doesn’t seem very hard, since lots of panicking functions have a non-panicking variant.

say_it_as_it_is a year ago

Is this related to the use of unsafe blocks and the inventor of Linux arguing with someone out of their depth about alleged undefined behavior?

MarkSweep a year ago

Why is panicing in the kernel on an error not an option? Like kernels can write a core dump and reboot, right?

  • cillian64 a year ago

    From most users’ points of view, a lot of things the kernel does (e.g. a sound card driver) are non-critical so they’d prefer an error in that driver only killed that driver and not the whole kernel. Similarly, I’d be upset if a server rebooted because of a blip in its CD-ROM driver. And if you can just reload the module which errored, all the better.

    It would be cool if kernel Rust could implement a panic handler which just killed the offending module, but I’m assuming from the discussion around panics that this isn’t possible.

    • vips7L a year ago

      Wasn’t that the whole point of microkernels/minix vs monoliths? With drivers being in the kernel can you even restart the modules?

      • cillian64 a year ago

        With Linux you can unload and reload modules (rmmod, insmod) so it’s a little un-monolithic in that sense.

  • zetaposter a year ago

    Yeah ... Just reboot the machine and make me loose all my work, bro.

    • charcircuit a year ago

      This is why programs automatically saving their state is important.

      • jmull a year ago

        That's not a solution to OS instability.

        Reliably saving state in the face of sudden total failure is both very tricky and app-specific. Just saving state changes automatically won't do it -- partial writes of complex state are likely to be inconsistent without luck or careful design and QA controls (tests, testing, on-going controls to ensure nothing new operates or relies on anything outside the safe state-saving mechanism).

        It makes a lot more sense to put the effort into making the OS continue as well as it can, vs requiring every app to harden itself against sudden total failures.

      • elteto a year ago

        No, this is why kernels prioritizing not crashing is important. Applications saving their work is a nice extra.

  • pca006132 a year ago

    I guess when the kernel panics, there is nothing to write the core dump for you...

    • detaro a year ago

      The kernel crash dump mechanism works by reserving some memory, which it boots a fresh copy of the kernel into on kernel panics, which then takes care of reading the old dead kernel from memory and saving the dump.

      Of course this working requires the fresh kernel to be able to get up and do that without itself crashing, so it can't capture every scenario. And it is bringing down the system completely, and there's lots of pros and cons to be argued about that vs attempting to continue or limp along.

      • yencabulator a year ago

        The mechanism you describe is used/usable only in very specific scenarios.

        For practically all non-virtualized Linux hosts out there, the kernel crash dump mechanism works by adding ASCII text to kmesg, which is then read by journald, processed a little, and appended to a file -- which just means submitted back to the kernel for writing, which means FS needs to work, disk I/O needs to work, and so on.

  • pyb a year ago

    No need to reboot the machine without warning, and lose data, when the rest of the kernel is probably still functional.

  • dijit a year ago

    if you panic and you're a kernel you very likely corrupt your filesystem, at the very least.

    • layer8 a year ago

      While I don’t advocate for kernel panics, journaling filesystems are a thing.

      • dijit a year ago

        Yes, but even then not all filesystems are journaled.

        EFI is FAT, FAT is not journaled. You almost certainly have EFI these days.

        • layer8 a year ago

          That’s a good point, but EFI isn’t frequently written I believe, so that I would expect that to be a rare circumstance, and even rarer for user data to be affected as a consequence.

        • KMnO4 a year ago

          EFI is read, but not frequently written.

          • dijit a year ago

            I'm not sure why that's relevant.

            It will be written to on every kernel update and every initramfs update at least, which is what.. once a week on average?

            A reply like yours is not so subtly indicating that "it's fine to panic all the time because ultimately you might be fine if you get a panic", which I fundamentally disagree with, other concerns aside.

            Also you're suggesting that journaling filesystems are perfect and never lose data, which is also very untrue, in the default case they only protect metadata but there are still circumstances where they can lose data anyway; they're more resilient, not immune.

            • wtallis a year ago

              > It will be written to on every kernel update and every initramfs update at least, which is what.. once a week on average?

              Which distros actually use the EFI System Partition that way? I've usually only seen the ESP used to hold the bootloader itself, with kernels and initramfs and the bootloader config pointing to them stored either in a separate /boot partition or in a /boot directory of the / filesystem.

      • hulitu a year ago

        Journaling FS can also become corrupted. That's why i don't use XFS (just a quick log replay after a kernel crash. Have some crashes and the FS is corrupted beyond repair.)

  • pfortuny a year ago

    That is a decision taken by Linus. It might have been different but life is choice. This one has been made while Linus is the boss.

mlindner a year ago

I'm not sure what he's trying to say here. Is he saying code that's run out of bounds of memory should continue on into la-la-land?

awinter-py a year ago

yeah like if only they had written AI in rust instead of python AI safety wouldn't be an issue

hegelstoleit a year ago
  • gkbrk a year ago

    Who "has to" put up with Linus? Linux is open-source software.

    People who feel they are putting up with Linus can just `cp linux/ betterlinux/`. If it's actually better and Linus was holding things back by being an insufferabe cunt, they can expect a huge amount of people to switch to their fork as well.

    • hegelstoleit a year ago

      Nobody "has to" have a job either. It's interesting how words can have different standards of necessity depending on how you contextualize them. Is a basketball player tall? If you asked his friend in the NBA after a game, he might say no because he's relatively short for an NBA player. Amongst everyone else, you'd say he is definitely tall.

      So no, in the strict sense, you don't "have to" put up with linus, that's obviously not what I meant. Nobody "has to" put up with anyone, ever. If you want to work on the linux kernel though, you do. Otherwise, you have to do what you just said - you have to provide a better alternative which is a huge ask.

  • peoplefromibiza a year ago

    You mean for the hundreds of thousands of people that in the past 31 years willingly decided to contribute to the to the most successful and well maintained open source project ever, because the creator is such a brilliant person?

    • hegelstoleit a year ago

      Anyone who has to interact with him pretty much.

      • peoplefromibiza a year ago

        They can very much avoid it.

        They chose to.

        Who's to blame here?

        • hegelstoleit a year ago

          Easy. Linus for being insufferable.

Test0129 a year ago
  • avgcorrection a year ago

    > If it “sounds like” that then you should quote one of the Rust programmers in that thread, not Torvalds.

  • j-krieger a year ago

    Because a gun that will disable the trigger to shoot yourself in the foot 1/10ths of the time is still worth it.

    Rust is also WIP. The panic problem can still be solved.

    • krater23 a year ago

      Thats the part where the gun decides that you can't kill yourself. I don't like this idea ;P

hedora a year ago

I normally enjoy Linus rants, but he needs to RTFM.

In rust, safe code is code that does not have the unsafe keyword.

If all the unsafe code is sound, then you (provably) get high level guarantees about memory safety, etc.

The rust people are complaining that some of the unsafe RCU is unsound. They have a valid point. According to the rust manual, when you make unsound libraries sound, common courtesy dictates you create a CVE for the old implementation.

This is all in the rust book; it's pretty close to "hello world".

Anyway, the rust crowd is definitely right here. It would be better if the rust RCU bindings were sound.

  • flumpcakes a year ago

    I can't take anything you say in good faith. Are you an AI trained on the Rust subreddit?

    > According to the rust manual, when you make unsound libraries sound, common courtesy dictates you create a CVE for the old implementation.

    I cannot find the words to describe how supercilious this is.

warinukraine a year ago

Hahaha I did notice there's a lot of magical thinking amongst the rust people.

Prediction: in time the same will happen to "rust in the kernel" as happened to "c++ in the kernel": Linus will forbid it not because of some intrinsic problem with the language, but because the culture of the community prevented them from understanding the kernel rules.

  • mlindner a year ago

    The kernel is not a magical beast. Other kernels have already been written in Rust and work fine. The ideas of Rust are not "magical thinking", they're based on fundamental mathematical principals that are irrelevant to language or tool.

    • warinukraine a year ago

      I didn't say that the ideas of rust are magical thinking. I said that the community falls into magical thinking a lot.