Undefined Behavior deserves a better reputation (2021)

blog.sigplan.org

44 points by rramadass 3 years ago

The main distinction I'd draw is whether you'd doing formal or informal reasoning. In the formal reasoning world, undefined behavior is mostly a good thing. On one side of the contract, it's a set of proof obligations for the code, and not especially onerous in the grand scheme of things - correct programs won't do UB. On the other side, it's a clear statement of what the compiler is and is not allowed to optimize. The more UB, the more opportunities to optimize.

When you're doing informal reasoning, the calculus changes. There's all kinds of stuff that can go wrong that is not motivated by what the machine is actually doing. In fact, it's something of a nightmare. Doing a memcpy of a struct that has padding in it? What are the exact semantics of restrict? And threading. Benign data races used to be a thing, but in an undefined behavior world, it's game over. C makes things worse than they need to be with its wonky integer rules (left shift of a negative integer and wrapping multiplication of two unsigned shorts are both UB), but a lot of that is potentially fixable and those mistakes won't be repeated in new languages.

In the context of Rust, more undefined behavior makes sense, and Ralf's work takes us much closer to a solid spec. But when you're doing mostly informal reasoning, I can see why people are so emotionally against it, and decisions such as turning off strict aliasing might be justified.

jcranmer 3 years ago

While I do sympathize with some of the user complaints with UB, and the issues with things like signed integer overflow and strict aliasing seem entirely gratuitous, I think most users complaining about UB fail to comprehend that the issue with UB is that it's often really hard to constrain just what can possibly go wrong--and that's even without compiler optimizations kicking into play.
It should be pretty clear that memory unsafety produces all sorts of crazy havoc--a write to errant memory could overwrite stack return locations and then basically do whatever it wants given the power of ROP gadgets. At first glance, it looks like uninitialized memory is "merely" an issue of reading more or less random data, but there are cases (e.g., MADV_FREE) where it turns out that the value of uninitialized memory can change underneath you. Traps cause lots of program state to become rather indeterminate, simply because of what may or may not live in a register or in memory, but on some architectures (e.g., Alpha), code may keep running for a while after an instruction traps, to the point that you're no longer even in the same function. Sanely describing what happens in data races are beyond the ken of formal semanticists (see the still-unsettled discussions over the semantics of relaxed atomics); what hope do programmers have of reasoning about these memory semantics?
It also doesn't help that the distinction between undefined, unspecified, and implementation-defined behavior is poorly grasped by a large segment of the community.
- petergeoghegan 3 years ago
  
  > While I do sympathize with some of the user complaints with UB, and the issues with things like signed integer overflow and strict aliasing seem entirely gratuitous, I think most users complaining about UB fail to comprehend that the issue with UB is that it's often really hard to constrain just what can possibly go wrong--and that's even without compiler optimizations kicking into play.
  That's probably true, but compiler people do themselves no favors by pretending that these things come from some higher echelon, that they couldn't possibly presume to question. It just doesn't pass the smell test.
  The fact that -wfrapv and -Wno-strict-aliasing are not the defaults in GCC is a choice made by GCC. A bad choice, in my opinion. MSVC made different choices, and lots of people still use it, so there is an existence proof that you can just not do these things on a mainstream compiler. (In fact, MSVC doesn't even offer type-based aliasing as an option that can be enabled, last I checked.)
  
  tialaramex 3 years ago
  
  How much performance gets left on the table as you disable ever more optimisations though?
  The justification for monstrously unsafe languages like C was that they're faster. If after removing optimisations which are too tricky to write for they're no longer faster then the languages don't pay their way any more and there's no reason to use them.
  I was expecting it would be easy to find benchmarks trying the same C or C++ code with GCC, Clang and MSVC and giving performance numbers, but I didn't find that. Maybe it exists and I can be directed to it ?
  
  petergeoghegan 3 years ago
  
  > The justification for monstrously unsafe languages like C was that they're faster.
  I don't think that that's true. I find the explanation given by "Some Were Meant for C" [1] far more plausible.
  But leaving that aside: what does that have to do with anything that I said? And might I be permitted to make a point about GCC that is wholly unrelated to Rust, without getting a generic lecture about memory safety?
  > I was expecting it would be easy to find benchmarks trying the same C or C++ code with GCC, Clang and MSVC and giving performance numbers, but I didn't find that. Maybe it exists and I can be directed to it ?
  I don't doubt that there are silly compiler microbenchmarks somewhere. And I know for sure that strict aliasing could in principle make a huge difference. For example, an autovectorization optimization could take place once the compiler had leeway to applying an assumption about two pointers not aliasing, but not otherwise.
  However, in practice it doesn't seem to make all that much difference for most kinds of C programs, for all kinds of reasons that are very difficult to pin down. The big exceptions generally involve numerical code, which is why Fortran has always tended to be faster than C for numerical applications. At least it definitely was for most of the history of both languages. (Yes, C was openly understood to be slower than Fortran in cases that were important for Fortran 40+ years ago. I refer you to [1] once more.)
  [1] https://www.cs.kent.ac.uk/people/staff/srk21//research/paper...
  
  tialaramex 3 years ago
  
  > I don't think that that's true. I find the explanation given by "Some Were Meant for C" [1] far more plausible.
  Kell's argument rests on the idea that C is doing something you can't do in safe languages. Mostly it says the safe languages are managed, and so they simply can't match C for what Kell describes as "communicativity" which we will see shortly isn't meaningfully true.
  Now, one of the first examples Kell gives (from Duff's device) is something Dennis Ritchie thought was a huge mistake, the volatile qualifier. In C we can do MMIO the same way as in machine code, we refer to a volatile "object" that's actually not really in memory, the CPU fetches and stores will be issued and the MMIO happens. In a language like Rust this doesn't work, they have intrinsics which actually emit the same machine code, but there is no pretend object.
  Kell, like many C programmers, thinks C is revealing an important truth here, but it's actually keeping up a damaging masquerade. That object is a lie, to the extent we write code which pretends it's real that code is misleading and sooner or later a maintenance programmer will believe the lie and cause problems. MMIO is similar to memory access not because of some "truth" but just because it was technically convenient.
  [[ The other notable thing the volatile qualifier is (ab)used for is once again in MSVC. Microsoft semantics for volatile are atomic Acquire-release so you can use it for IPC and even within a concurrent program. That's not what it was for on Unix, it's not what the standard says it does, but it's how it happened to work on a single x86 CPU so that's what MSVC provides even today, and even on ARM if you accept the resulting performance penalty. ]]
  But even outside of volatile, which again is definitely a bad idea, Kell says C is better than the safe languages because of communicativity, his next examples are just data in memory, but they are "foreign" to the software. C will be able to access this data as raw bytes.
  If your experience of managed languages is, as Kells' seems to have been, a Lisp, then maybe the ability to access bytes of memory is remarkable. But you don't need Rust to do this from a safe language. C# isn't just safe it's a managed garbage collected language, yet it can do access to a byte slice just as well as C. Given a slice of bytes, step through it, one instruction at a time (with the size of an instruction to be established by a separate function) and ask a callback to look at those instructions. No problem in C#.
  The false belief that everything is just C anyway causes significant grief. C programmers tend to see text and think this means C's weird NUL-termination rule applies. So given the string "microsoft.com\0bo-chicken.example.com" and asked whether it's exactly "microsoft.com" the C code says yes, opening a security vulnerability. This really happened, and yet of course C programmers insisted it's somebody else's fault. And that's key here, Kell's examples are "foreign" in the sense that they aren't from this C program. But these examples aren't structures from Java, or Scheme, or even Rust, they're just from a different C program. This "communicativity" is unimaginitive.
  It reminds me of colonial Europeans passing judgement on the "savage" inhabitants of a place they've now decided belongs to them. Why don't these useless Indians have properly 0-byte terminated strings? And why are their symbol names so complicated? No no, this is all wrong, the correct way is the way I grew up with, and any alternative is not better, nor even just different, but necessarily wrong.
  The "safety" Kell envisions for a hypothetical safer C implementation relies on saying that behaviours which have been passionately defended here by C programmers including yourself as needing to be defined are Undefined and so can safely be outlawed. Aliasing between types? Kell says you shouldn't expect that to work and so it shan't in his safe C.
  Finally Kell comes back to is an assumption which is small in C but grew enormous in C++ about what we should do if the programs might be nonsense. Rice's Theorem says we can't necessarily decide if semantic properties hold for arbitrary programs. To defuse this problem we sometimes give up instead. Thus, a program to decide if another program is correct (such as a compiler) will have three possible results, Correct, Wrong and Not Sure. It's obvious what to do with the first two categories but that leaves us with Not Sure. In both C and C++ the answer is those programs go in the Correct bucket anyway and we cross our fingers.
  Rust says no, all the "Not sure" programs go in the "Wrong" bucket and the programmer can just modify the program until it's Correct so we're fine. This is a pragmatic engineering response. Our dynamic analysis says this new bridge might literally explode, I think the analysis method may be wrong about that, but I can't prove it. Should we build the bridge anyway and see if it explodes? No! Design a bridge that passes analysis!
  The Rust and C++ approaches have very different incentives for language designers. In Rust the incentive is to shrink that "No sure" pile, because it annoys programmers. Non-lexical lifetimes are an example of that shrinking process. A Rust program with NLL was always actually fine, but before the NLL change landed in the compiler it would be rejected because it was in the "Not sure" category. In C++ the incentive is to grow the "Not sure" category because every program we can convince ourselves might be correct is another C++ program in the Correct bucket. Some, perhaps many, perhaps even most are nonsense, but at least some of them are correct and we don't care to distinguish. The same tendency exits, to a lesser extent, in C itself.
  
  petergeoghegan 3 years ago
  
  > It reminds me of colonial Europeans passing judgement on the "savage" inhabitants of a place they've now decided belongs to them
  Okay!
  
  tialaramex 3 years ago
  
  Now you made me doubt myself, because I'm pretty sure that analogy was the weakest part of what I wrote, it was just how it seemed to me when I was writing.
  On further reflection I think maybe reputation for performance matters rather than performance itself. But I have strayed far off topic.
  
  petergeoghegan 3 years ago
  
  > On further reflection I think maybe reputation for performance matters rather than performance itself
  I think that you're vastly overestimating the importance C as an abstract specification and as a community of programmers with a coherent set of shared goals. You're also too focussed on performance. There is a practical sense in which C will tend to perform better for certain tasks, but it doesn't necessarily have all that much to do with the language itself. It's the whole ecosystem. And yes, path dependence matters. It isn't intrinsically true that it has to be this way, a little like how it isn't intrinsically true that we have to use QWERTY keyboards instead of Dvorak keyboards.
  It's not that there aren't lots of serious problems with C -- there certainly are. It's that those problems are systemic problems; they're more the result of a huge number of people making a huge number of pragmatic decisions, day after day, year after year -- and the sequence matters. Many of these people are not computer programmers. Many are from hardware vendors that have people that sit on standards bodies for everything from NVMe to RISC-V. These are all people that more or less all look at the world as it actually is today, and build on that incrementally. They build accretions on top of accretions.
  There are many glaring contradictions in C. Depending on who you ask, it's either a portable assembler, or a programming language that targets something called the C abstract machine. And neither party seems to want to even address the glaring inconsistency! This is a cabal that seems to have a real problem with staying on message, don't you think?
  I make only very modest claims here. I'm not saying that this is good or inevitable; only that it is the best explanation I am aware of. I'm definitely not saying that we can't do better. Only that I believe that the current state of affairs works as well as it does (i.e. barely adequately) because in the end it's very difficult to get an enormous number of people separated by time and space to agree on anything at all. C more or less remains the defacto standard when operating at the hardware/software interface not in spite of its glaring contradictions. It's because of them.
  It's all but impossible for me to prove any of this, because I'm describing diffuse, emergent behavior -- what I'm arguing is that things tend to take the path of least resistance, in an environment where companies come and go, and short term business considerations hold sway. I might be willing to put more effort into convincing you of this if I really was the C zealot that you imagine me to be, but I'm not.
  
  tialaramex 3 years ago
  
  I'm not sure there are C zealots. There are definitely C++ zealots, but I don't see that level of burning passion for C.
  As to the "inconsistency" between portable assembler and abstract machine, surely the problem is that the people who claim its a "portable assembler" aren't talking about a compiler they wrote, or a standard they authored but their (wrong) expectations for somebody else's compiler and standard. I don't see this language from WG14 (the committee) or the compiler vendors.
AlotOfReading 3 years ago

I don't think that's a useful distinction here. For context, I often work with high reliability software, including formal methods.
What's needed for actual programs is the ability to say one of two things:
A) There is no UB in program X or
B) UB in program X cannot lead to a violation of constraint Y
The current situation in the C family, whether you're using formal methods or not, is that you cannot generally prove (A) and the time traveling, no-holds-barred results of UB in the spec means that (B) is impossible.
While rust doesn't entirely solve this, the fact that those statements are true everywhere except unsafe means that the scope of code you have to manually review is limited to something smaller than "everything".
- tialaramex 3 years ago
  
  > the fact that those statements are true everywhere except unsafe
  This is not only a technical feature of Rust's standard library, but perhaps more importantly a cultural feature of Rust's ecosystem. The compiler has no technical problem with your "safe" implementation of Index for your type actually just doing unsafe pointer dereferences internally and trusting users to always pick valid indices, just like C. It's a bad idea, but the compiler is not a cop. However Rust's culture says if you're providing unsafe stuff that must be marked unsafe so that other people don't cut themselves on the sharp edges of your code by mistake.
  You can imagine with a different culture, you'd end up with popular code that's labelled "safe" but has UB all over the place because the interfaces lie everywhere as an "optimisation" and the community just puts up with it.

overgard 3 years ago

Here's my issue with UB in C/C++ (and I think it agrees with the article): I would much rather have direct compiler directives over the compiler being clever in a quiet way. The first example in this article is very good -- get_unchecked avoids undefined behavior (well, outside of accessing out of bounds), but still provides the wanted optimization. If something goes bad in your program, one of the first things you're going to suspect is functions with _unchecked at the end of them, so the developer ergonomics are great.

My main problem with undefined behavior is that it's rare to know when it's happening. I would much rather give hints to the compiler than having it "prove" something very subtle underneath the hood using UB, because then I understand where those things are happening. And it's not even like it's a rare thing! My C++ is littered with "const" and "[[nodiscard]]" and #pragma pack and all sorts of other things that don't change the behavior of the program but do indicate something important to the compiler. I want more of that, and less subtlety.

somat 3 years ago

"all it does is perform optimizations that are correct under the extra assumption that there is no Undefined Behavior."

How did the compiler writers reach the point where they were able to read "undefined behavior" as "does not happen" rather than "specification left this undefined so that the compiler can define it"

I don't think there is anything wrong with undefined behavior, I do however expect the compiler documentationto have an extensive section on stuff undefined by the specification.

for example: when a signed integer would exceeds the maximum value this compiler just lets it go and it does whatever the underlying hardware would do. on amd64 this will overflow to a negative value.

kps 3 years ago

I've said before¹ that I'm convinced that ‘undefined behavior’ was an actual mistake by the C89 committee that would not have been accepted if anyone at the time had realized the future implications. It is precisely “a license for the compiler to undertake aggressive optimizations that are completely legal by the committee's rules, but make hash of apparently safe programs” (as dmr said of `noalias`²).
¹ https://news.ycombinator.com/item?id=30024867
² https://www.lysator.liu.se/c/dmr-on-noalias.html
- rramadass 3 years ago
  
  > was an actual mistake by the C89 committee
  Agreed. Apparently the first compiler written by Dennis Ritchie had no UB (i have not been able to find more info. on this).
  See also the article published in IEEE : Dealing With C's Original Sin by Chris Hathhorn and Grigore Rosu.
jcranmer 3 years ago

There's a note in the C99 rationale explaining how compilers can use undefined behavior (specifically, with regards to signed overflow) to perform certain optimizations (specifically, reassociation of integer addition). I'd go back to C89, but the documents before the work effort to make C99 are not available on the WG14 website.
> I do however expect the compiler documentation to have an extensive section on stuff undefined by the specification.
There is an entire section of the C specification that lists every single undefined behavior. What more do you want?
- kps 3 years ago
  
  https://www.lysator.liu.se/c/rat/title.html has the C89 Rationale.
  “The terms unspecified behavior, undefined behavior, and implementation-defined behavior are used to categorize the result of writing programs whose properties the Standard does not, or cannot, completely describe. […] Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose.”
  
  jcranmer 3 years ago
  
  For the commentary on undefined behavior, I find N790 (https://www9.open-std.org/jtc1/sc22/wg14/www/docs/n790.htm) to be a better assessment of what the C committee thinks about undefined behavior:
  First consider the term "unspecified behavior". Most commentators on the Standard are of the opinion that this has the following properties:
  (1) There are a number of possible courses of actions, or the behavior is one that generates a result and then has a number of possible results.
  (2) The implementation can make any of the available choices, and can make different choices at different places or times.
  (3) The implementation need not document its choices.
  (4) No matter what choice the implementation makes, it cannot affect anything outside the range of that choice. If a value has to be chosen, it must be a valid value for that type.
  Property number 4 is the interesting one: it is usually taken to mean that the implementation cannot generate a spurious signal, branch to a random place in the code, or choose a trap representation. All of these, of course, are valid "undefined behavior".
- fsckboy 3 years ago
  
  >> "specification left this undefined so that the compiler can define it"
  >> I do however expect the compiler documentation to have an extensive section on stuff undefined by the specification and he should have said which is defined by the compiler
  so your response is not to what he meant
  > There is an entire section of the C specification that lists every single undefined behavior. What more do you want?
  a similar section wrt the compiler he is using, because he's talking about "behavior undefined by C but defined by implementation"
  
  jcranmer 3 years ago
  
  > a similar section wrt the compiler he is using, because he's talking about "behavior undefined by C but defined by implementation"
  That's called implementation-defined behavior. And there is a section in C that lists all the implementation-defined behavior, and all the compilers are supposed to document what they define the implementation-defined behavior to be (glares at Clang).
- somat 3 years ago
  
  My thought process, note that I am not a compiler writer, runs along these lines.
  undefined behavior is valid code(it is not a syntax error), it has to do some thing, that thing should be documented, if the spec does not want to document it, then the implementation should. don't make the assumption that because it is undefined it will never happen.
  
  jcranmer 3 years ago
  
  It is not practical to constrain what happens in the case of undefined behavior, even if you discard the potential of the compiler to optimize assuming undefined behavior can't happen.
  For example, what memory has and hasn't been written when a trap occurs isn't. Hell, on some architectures (hi Alpha!), an unknown amount of code will continue executing after the trapping instruction before the trap handler gets around to being invoked. Similar insanity is also in play when data races are involved (if data races are undefined, you can pretend that all code is sequentially consistent and there is a nice, simple, global total order of memory accesses. Data races cause memory accesses to not even be a consistent partial order.)
  Or take pointer provenance. Writing to an unknown memory location may cause printf in another thread to instead call system("rm -rf /").
  How the hell is one supposed to document the potential behavior of undefined behavior when undefined behavior is so inherently unconstrainable, and "undefined behavior can do anything" is considered unacceptable documentation?
  
  pklausler 3 years ago
  
  What you're talking about is "implementation defined" behavior, which is a distinct concept from "undefined behavior".
mort96 3 years ago

The specification has a separate concept for "specification left this undefined so that the compiler can define it": implementation-defined behaviour.
kevincox 3 years ago

Rather than look at it as "does not happen" it may be helpful to think of "if this does happen anything is correct". So you can then split the codepaths into parts where the compiler is strictly specified in what it has to do (defined behaviour) and parts where anything is "correct". Of course the most optimal solution is to just produce the best code you can for the first category, because what code also happens to be "correct" for the second category.
Of course this is equivalent to "does not happen" but may make more sense as to why the compiler acts this way.
jessermeyer 3 years ago

Compiler, Language, and Software engineer people do not often overlap, and so neither do their constraints. Over time you see opposing drift occurring, where language people end up designing languages nice for language development, software engineers design software nice for software development, and so on..
Consider what is nice for compiler development but is not nice for software engineering.
tialaramex 3 years ago

> How did the compiler writers reach the point where they were able to read "undefined behavior" as "does not happen" rather than "specification left this undefined so that the compiler can define it"
If they want the compiler to define it, the specification can say so, and on several things it says exactly that. This is what the phrase "implementation defined" is for.
The example of Undefined Behaviour which is most commonly given in this sort of argument - and you chose the same - is integer overflow. Why can't it just do what I meant? And you're right, the language could have chosen to do what you meant here and it did not. There are lots of options and you don't like the option C chose. I'll talk about that in a moment.
But more generally, Undefined Behaviour isn't at all like that. What happens if I cast my local telephone number to an integer pointer and then dereference the pointer ? Today that's Undefined Behaviour in C, but if we think that means "the compiler documentation should say" then what should it say?
"It does whatever the underlying hardware would do" is a circular definition, this is a computer program, what the hardware does is whatever we told it to do. So maybe you say well, it emits this particular machine code. Congratulations now your "compiler documentation" reads like the source code for the compiler and your users still don't know the answer because guess what, the CPU vendor can't do any better with that either.
Now, back to those integer overflows. We can do a whole bunch of things here, and they all have different consequences.
WUFFS says this is forbidden, you get a compiler error. If your code can overflow, that's a bad WUFFS function it does not compile.
Several languages including Python just don't have overflow. Your integer types just get bigger, this may be annoyingly slow in some cases, but like actual integers from grade school you won't just accidentally make one that's too big by mistake and something weird happens.
We could wrap the integers as you seem to prefer. This is what Rust's Wrapping<> types always do and several languages provide alternate arithmetic operators to request wrapping
We could saturate the integers, which means they stop at the edge of the overflow. This is what Rust's Saturating<> types do, and again I believe some languages have saturating operators (at least addition and multiplication anyway)
We could make arithmetic operations all "checked" so that they can fail if there would be overflow. The check could be in the form of a soft error, or it could cause something more dramatic like Rust's panic.
Or like C we can just say we refuse to define this, don't do it.

DougBTX 3 years ago

FWIW, the original link in the article shows that the panic_bounds_check check is present in optimised code in Rust 1.55, but if you update to a more recent Rust, say 1.60, then it is optimised away as expected:

    example::mid:
        test    rsi, rsi
        je      .LBB0_1
        and     rsi, -2
        mov     edx, dword ptr [rdi + 2*rsi]
        mov     eax, 1
        ret
    .LBB0_1:
        xor     eax, eax
        ret

https://rust.godbolt.org/z/Wz4Prjed4

tialaramex 3 years ago

Interesting, do you know how this is achieved?
Is there a Rust optimization that notices logically this type of construction is always in-bounds and so the bounds check is never emitted, or is the LLVM bounds check smarter now and realises hey, my parameter is never out of bounds, so no need to emit code here ?

teddyh 3 years ago

> This post is about defending and promoting UB as a concept, not UB in C/C++.

zaphar 3 years ago

This is important. He is arguing specifically that when made explicit and opted into by the coder that UB can be useful. The issues that C and C++ have are that it's too easy for the developer to get opted in to UB by the compiler without knowing it.
- megous 3 years ago
  
  So you just remember the relevant UBs defined in the C standard:
  https://gist.github.com/Earnestly/7c903f481ff9d29a3dd1
  There are not that many UBs in the base language. And even fewer are relevant for day to day coding. Most of it is in the std library.
  
  tialaramex 3 years ago
  
  Well, it's countably many, but there are about 200 of them, so we're not even talking US states or US presidents, but more like UN member states. I remember Togo exists, but I can't point to it on a map, and if you left it off a map I wouldn't notice.
  So we're asking quite a lot to say people should just remember all of them and actually use this knowledge not just recite the list. And these aren't small things, they include very broad ideas, like 'object is referred to outside of its lifetime' and even just categories of deviations from the standard like 'A "shall" or "shall not" requirement that appears outside of a constraint is violated'
  It's impressive that they enumerated them, A+ gold stars. As I understand it WG21 (C++) is still attempting to produce a list of the Undefined Behaviours in their language - but I don't think "memorize all these vague ideas and use that knowledge when programming" is practical.
  
  megous 3 years ago
  
  Huge number of them are obscure or in the standard library. A C programmer with some experience can go through a list and ignore the C library ones, cross out the weird ones that he'd never hit anyway because he'd not even think of writing code that way, and end up with some useful compressed list of maybe 30 for day to day use that really need to be observed because they normally appear in regular code.
  As for the standard library, that is often irrelevant, depending on what kind of code you're writing. Bootloader or Linux kernel, some other microcontroller code? Probably not relevant. GNOME C app? Not very relevant either, because it's all wrapped by glib, etc. And even when it's relevant, docs are very quickly searchable (by function name) and pitfalls are documented there.

kazinator 3 years ago

> I have presented Undefined Behavior as a tool that enables the programmer to write code that the compiler cannot check for correctness, and argued that — used responsibly — it is a useful component in a language designer's toolbox.

That is simply wrong. Code that the compiler cannot or does not check for correctness isn't "undefined behavior". Traditionally that is called "unsafe code": the programmer ensures safety through analyzing all the cases that may occur and making sure they have reliable consequences.

(The article talks about Rust a lot; maybe undefined and unsafe are the same thing in Rust?)

Undefined behavior with ill consequence may co-occur with safety. A case of undefined behavior may be flagged by the compiler by a diagnostic "undefined behavior in line 42". Thus, safety was ensured; the situation was diagnosed. Yet, an executable program could be produced anyway, and if that is run, it may mishbehave due to that undefined behavior in line 42.

This is the case in C, when you assign incompatible types, and use a compiler like GCC which only warns about that, by default, and translates anyway.

Some undefined behavior leads to a documented extension, which makes it defined for the given implementation, and safe (if used as documented). If you try to perform arithmetic on a void * pointer, that is not an ISO C feature; it requires a diagnostic. GCC allows void * pointer arithmetic; it behaves like byte addressing. Programs which use this feature are invoking undefined behavior: they are violating an ISO C constraint rule, yet being executed anyway. Furthermore, if the diagnostic isn't issued, then GCC is being a non-conforming implementation; it's a non-conforming extension.

imtringued 3 years ago

The only way to avoid C undefined behaviour is to not use C. Stop using C.

blueflow 3 years ago

I thought Rust doesn't have a specification, just a reference? So all of it is undefined behavior?

yccs27 3 years ago

"Undefined behavior" is a term of art[0], with the specific meaning as mentioned in the article: The compiler is allowed to assume that UB never happens, and change the compiled code based on that assumption. Not "no one has written down a definition" or anything else.
As the sibling comment points out, a contrasting term is "implementation-defined". Confusing the two is a common misconception when learning C++: You might expect "overflow is undefined behavior" to mean that there is no prescribed result and every compiler might do it differently, but each will produce results in its own consistent way. But that would be implementation-defined behavior; instead by doing unchecked addition, you tell the compiler that you know the addition will never overflow, and don't care at all about the overflowing case.
[0] aka. "improper noun", see https://news.ycombinator.com/item?id=32673100
- yccs27 3 years ago
  
  Followup to be precise: Undefined behavior in C++, of course, can have a definition set by the implementation. With the right switches, GCC will guarantee certain overflow behaviors. But a priori, the compiler is not bound by any guarantees.
planede 3 years ago

IMO differentiating a specification and reference this way is just nitpicking.
However it looks like that at least unsafe rust is underspecified regarding aliasing rules, therefore a bunch of unsafe rust is undefined behavior by definition. That is no authoritative text (whether reference or specification) defines the behavior of those programs.
Key paragraph from the article:
> Stacked Borrows is not part of the Rust spec, and is not the final word for aliasing-related UB in Rust. So there is still the chance that future revisions of this model can be made to better align with programmer intuition. The above code might get accepted because x2 is not actually being used to access memory. Or maybe &mut expr should only make such promises when used outside an unsafe block — but then, should adding unsafe really change the semantics of the program? As usual, language design is a game of trade-offs.
bruce343434 3 years ago

I'd say all Rust is implementation defined