- There are clearly stated guarantees, like "a pointer cannot access outside the bounds of its allocation" and "if you free an object while there are still pointers to it then those pointers cannot be dereferenced". These guarantees should be something you can reason about formally, and they should be falsifiable. Not sure this presentation really has that. It's not clear what they prevent, and what they don't prevent.
- There is no way to break out of the clearly stated guarantees. Totally unclear that whatever guarantees they have are actually guarded against in all cases. For example, what if a tmp_alloc'd object pointer escapes into another tmp_alloc'd object with different lifetime - I get that they wouldn't write code that does that intentionally, but "memory safety" to me means that if you did write that code, you'd either get a compile error or a runtime error.
It's possible to ascribe clearly stated guarantees to C and to make it impossible to break out of them (Fil-C and CHERI both achieve that).
> There is no way to break out of the clearly stated guarantees.
I disagree on this, and having escape hatches is critically important. Are we really going to call Rust or Haskell memory unsafe because they offer ways to break their safety guarantees?
Maybe I just misinterpret what you mean. When you say "no way" and "all cases", I take your meaning literally. The existence of pointers to bypass the borrow checker, disabling runtime bounds checks and unsafe blocks are exactly that: escape hatches to break Rust's safety, in the same way type-casting is an escape hatch to break C's (anemic) type safety, and unsafePerformIO in Haskell is an escape hatch to break every bone in your body.
Trivial is not a word I would use here! Rust's `unsafe` gets fuzzy as you transverse an operation's dependencies! There are many applications where marking a function as `unsafe` is subjective.
I do think the ideal kind of memory safe language either has no "unsafe", or has an "unsafe" feature that only needs to be used in super rare an obscure cases (Java is like that, sort of).
Fil-C has no "unsafe", so in that sense Fil-C is safer than Rust. You don't need an escape hatch if the memory safety guarantees are dialed in just right.
In Virgil, native targets have a couple of unsafe operations available, one of which is to be able to forge a closure from a pair of a code pointer and an object reference. This is used, e.g. to implement Wizard's JIT and fast interpreter, which generate new machine code at runtime. I can't imagine the level of proof necessary to make that safe--and not just the safety of the generated machine code, but it often lacks bounds checks because it relies on running verified Wasm bytecode, which cannot go out of bounds. So the proof would have to include a complete proof of correctness for the code validation algorithm (i.e. part of Wizard itself).
I dream of a JIT API that lets you propose machine code that is checked using an abstract interpreter to ensure that you have adequate checks to stay within the host language's type system.
I am trying out a couple of new directions, e.g. generating more of the tiers from a more abstract description, constantly shrinking the amount of hand-written compiler/interpreter code. My hard requirement is the end result has to be pretty darn close to what I'd write by hand.
One thing I am thinking about now is how to make more use of the implementation language's (e.g. Virgil) compiler to be able to paste together machine code templates gotten from writing in the implementation language. Think copy-and-patch compilation, but as language primitive. E.g. "please emit an inlined copy of the machine code for this function (first-class ref to said function) into memory here, under this ABI".
You say that and then you describe exactly what I would have used as a solution: copy and patch.
Just have the checker check the templates that the baseline JIT is stitching together and then have a safe way to ask for the prechecked templates to be stitched together.
Bunch of details in getting that right obviously, but it doesn’t seem impossible.
If I told you that I have a snippet of machine code that:
- obeys the ABI of your safe language (ie it has exactly the calling convention that safe language uses)
- corresponds exactly to a function body whose signature is T->U (or whatever, different safe languages have different function type syntax)
- obeys the language’s type system.
Then you could run an abstract interpreter to check that the machine code follows that type system. Simple example: given the above claims, if we further assume that the host language impl puts argument one into register 5, and the first argument’s type is “pointer to an array of bytes”, and we know that arrays have a 64-bit length prefixed to the start, then the abstract interpreter would just need to check that any deref of register 5 is preceded by a bounds check on whatever was loaded at offset -8 from register 5. And so on, for every possible thing you can do in the language.
Then the JIT would just have to make sure it puts checks in all of the places that the absint expects them. If the absint fails, then the machine code is rejected.
I think there’s a sense in which Rust is safer than Fil-C, though: Rust allows abstractions with little to do with memory safety but that still can’t be broken without 'unsafe'. So a struct called EvenNumber can fairly strongly guarantee that it contains an even number.
But Fil-C objects (at least for now?) only seem to allow one single capability type, and that capability grants unrestricted read/write access to the object’s bytes.
I wonder if one could build a handle system in Fil-C that would allow this to be extended. Or if a different variant of a Fil-C-like system could distinguish between pointers with different access levels to an object and could allow only the correct piece of trusted code to increase the permission of a pointer.
I’ve thought about adding such things to Fil-C but have held off on going there because it feels like a bridge too far.
What I mean by that is: the memory safety issues of C are a total dumpster fire, while whether a number is even or not (and whether you can prove that) is maybe like icing on the dumpster fire. It just doesn’t matter by comparison.
So I want to decisively fix the memory safety issues and not lose focus.
The problem of C code migration to memory safe languages is that legacy C projects aim for extremely high performance. Garbage-collecting languages would also be safe in any situation, but I want to note that the recent tendency toward Rust derives from its type-system based approach that imposes very few runtimes checks such as bound checking.
I myself hope something like F* gets more attraction in the industry.
That is the usual myth, back when I initially learned C, hobby programmers in Assembly could easily outperform machine code generated by C compilers, that is why books like those from Mike Abrash exist.
C got its performance fame thanks to optimizing compilers that abuse UB semantics.
Microsoft team on .NET, especially the great Stephen Toub blog posts, has been showing off how much performance can be squizzed out of a managed language compiler toolchain when people actually care.
Also lets not forget Apple only moved away from Object Pascal due to an internal team doing MPW initially as kind of submarine project, due to their UNIX roots, and still their focus was C++, not C.
I hear the perf claim a lot and yet most of the time when I write C/C++ code it's not because it's the most performant. It's either because I'm editing a codebase that's already in C/C++, or I'm using libraries whose best (or only) bindings are C/C++, or because I want to make syscalls (and safe languages don't expose those as nicely as C).
For desktop, server, web, mobile, etc. This holds true. Not so much for embedded systems, anything with unnatractive memory capacity or processor performance. Rust is starting to make it's way in, but C and even assembly is still king AFAIK.
The actual benefits seem manifest: there aren't nearly as many public reports of memory corruption (much less exploitable corruption) in Rust components, even when those components make extensive use of unsafety directly or transitively.
(This seems like one of those "throw the baby out with the bathwater" cases that people relitigate around Rust -- there's ample empirical evidence that building safe abstractions around unsafe primitives works well.)
> The actual benefits seem manifest: there aren't nearly as many public reports of memory corruption (much less exploitable corruption) in Rust components, even when those components make extensive use of unsafety directly or transitively.
Not sure the data is clean enough to draw meaningful conclusions because of confounding factors.
The biggest confounding factor is that Rust is relatively new, code written in it is even newer, and folks who research vulns may not have applied the same level of anger to Rust as to C.
That said, your point about "throwing the baby out with the bathwater" is well taken. I would expect that Rust has much fewer vulns than C/C++. My point is only that it's an unproven expectation.
> Not sure the data is clean enough to draw meaningful conclusions because of confounding factors.
I'm thinking of things like the Windows user- and kernel-mode font parsers; these have a pretty long and steady public history of exploitation that seems to have mostly stopped with the Rust rewrite 1-2 years ago. I don't think that's because vuln researches have stopped looking at them!
But yeah, I would like it if Google and Microsoft (among others) would put more hard data out there. I don't think of the Windows kernel teams as typically suffering from hype-driven development, so my abductive conclusion is that they have strong supporting data internally.
Edit: here's a hard data source from Google, showing that Rust has contributed to a marked decline in memory unsafety in Android[1].
That's good data, but: is the reduction in vulns because Rust is safer, or is it because vuln researchers assume it's safer and so choose to look for vulns in C/C++ code because it's what they're familiar with?
Wait, what? Unsafe in Rust _is_ the actual benefit. That is to say, what `unsafe` does is that it allows you to implement a fundamentally tricky thing and express its safety invariants in the type system and lifetime system, resulting in a safe API with no tricky parts.
That's the whole point.
There's tons of trivial unsafe in the Rust ecosystem, and a little bit of nontrivial unsafe, because crates.io is full of libraries doing interesting things (high-performance data structures, synchronization primitives, FFI bindings, etc.) while providing a safe API, so you can do all of that without writing any unsafe yourself.
The point of Rust isn't that you can implement low-level data structures in safe code, but that you can use them without fear.
Saying that unsafe is a benefit is backwards, I think. Unsafe is the compromise Rust made to achieve its other goals, but of course Rust would be better if there was some way of doing things without unsafe.
Consider the split_at_mut function. It takes one mutable slice, and returns two mutable slices by chopping at a caller-specified point. It's in the standard library.
The operation is completely safe, as after the call the original mutable slice is no longer live, but the borrow checker won't let you write such a function yourself unless you tag it as unsafe, so that's what the implementation must do.
The same thing happens in the implementation of Vec: there is low-level code that is unsafe, used to provide safe abstractions.
That’s not a benefit of Rust. In other safe languages you could implement those things without writing a single bit of unsafe code. (This is true in Fil-C for example.)
Is that really fair? Neither Fil-C nor C have anything that particularly resembles & or &mut. If Rust had an &shared_mut style of reference, you could presumably split it without unsafe. For that matter, Rust does have various interior-mutable types, and you could have a shared reference to a slice of AtomicBool or whatever, and you can split that without any particular magic.
I don't see the interest in split_at_mut. I can get the same thing by reslicing in Go. And also the GC will do the job the borrow checker foists off onto the unlucky Rust programmer.
I'm not a Ruster, I've spent my career writing a ton of C++. But I'm interested in Rust as an alternative because it doesn't need GC.
But in this case, it's not just the memory safety I'm interested in, it's the data races. If we have multiple threads but can guarantee that any object either has only read-only references, or one mutable references and no readers, we don't have data race issues.
Thinking aloud, and this is probably a bad idea for reasons I haven’t thought of.
What if pointers were a combination of values, like a 32 bit “zone” plus a 32 bit “offset” (where 32/32 is probably really 28/36 or something that allows >4GB allocations, but let’s figure that out later). Then each malloc() could increment the zone number, or pick an unused one randomly, so that there’s enormous space between consecutive allocs and an address wouldn’t be reissued quickly. A dangling pointer would the point at an address that isn’t mapped at all until possibly 2^32 malloc()s later. It wouldn’t help with long-lived dangling pointers, but would catch accessing a pointer right after it was freed.
I guess, more generally, why are addresses reused before they absolutely must be?
It sounds like what you're describing is one-time allocation, and I think it's a good idea. There is some work on making practical allocators that work this way [1]. For long-running programs, the allocator will run out of virtual address space and then you need something to resolve that -- either you do some form of garbage collection or you compromise on safety and just start reusing memory. This also doesn't address spatial safety.
Oh, nifty! I guarantee you anyone else discussing this has put more than my 5 minutes' worth of thought into it.
Yeah, if you allow reuse then it wouldn't be a guarantee. I think it'd be closer to the effects of ASLR, where it's still possible to accidentally still break things, just vastly less likely.
For sure. I'm under no illusion that it wouldn't be costly. What I'm trying to suss out is whether libc could hypothetically change to give better safety to existing compiled binaries.
However, it was limited to 8192 simultaneous “allocations” (segments) per process (or per whatever unit the OS associates the local descriptor tables with).
you can do this easily with virtual memory, and IIRC Zig's general purpose allocator does under some circumstances (don't remember if its default or if it needs a flag).
There are and have been many techniques and projects for making C more memory-safe. The crucial question it always comes down to is what performance hit do you take using them?
That's why C has been on top for so long. Seat-of-the-pants hand-crafted C has always been the fastest high-level language.
25 years of experience with D has shown this to be a huge improvement.
D also has references as an alternative to pointers. References cannot have arithmetic done on them. Hence, by replacing pointers with references, and with array bounds checking, the incidence of memory corruption is hugely reduced.
> C's memory safety could be drastically improved with the addition of bounds-checked arrays (which is an extension, and does not change existing code):
If you solved that problem then you'd still have a dumpster fire of memory safety issues from bad casts, use after free, etc
I found that C programs rarely evolve beyond their initial design. The trouble is, it's hard to refactor C programs. For example,
struct S { int a; };
struct S s; s.a = 3;
struct S *p; p->a = 3;
I.e. a . is for direct access, -> for indirect access. Let's say you want to change passing S by value to passing S by pointer. Now you have to update every use, instead of just the declaration.
This is how it would work in D:
struct S { int a; }
S s; s.a = 3;
S* p; p.a = 3;
ref S q; q.a = 3;
And so refactoring becomes much easier, and so happens more often.
> C has always been the fastest high-level language.
C has another big speed problem. Strings are 0 terminated, rather than length terminated. This means constant scanning of strings to find their length. Even worse, the scanning of the string reloads the cache with the string contents, which is pretty bad for performance.
Of course, you could use `struct String { char *p; size_t length; };` but since every library you want to connect to uses 0 terminated strings, you're out on your island all alone, so pragmatically it does not work.
Another speed-destroying problem with C strings is you cannot take a substring that does not require allocating a new string and then copying the data. (Unless the substring is right-justified.) This is not fast in any universe.
D uses length-denoted strings as a basic data type, and with string processing code, it is much faster than C. Substrings are quick and easy. You can still interface with C because D string literals implicitly convert to C string literals, as the literals are 0 terminated. So this works in D:
printf("hello world!\n");
(People sometimes rag on me for still using printf, but printf is the most optimized and debugged library function in the world, so I take advantage!)
> There are and have been many techniques and projects for making C more memory-safe.
Sort of. None of them got all the way to safety, or they never got all the way to compatibility with C.
Fil-C is novel in that it achieves both safety and compatibility.
> The crucial question it always comes down to is what performance hit do you take using them?
Is that really the crucial question?
I don't think you would have even gotten to asking that question with most attempts to make C memory safe, because they involved experimental academic compilers that could only compile a subset of the language and only worked for a tiny corpus of benchmarks.
Lots of C/C++ code is not written with a perf mindset. Most of the UNIX utilities are like that, for example.
> That's why C has been on top for so long. Seat-of-the-pants hand-crafted C has always been the fastest high-level language.
I don't think that's the reason. C rose to where it is today even when it was much slower than assembly. C was slower than FORTRAN for a long time (maybe still is?) but people preferred C over FORTRAN anyway.
C's biggest superpower is how easy it makes it to talk to system ABI (syscalls, dynamic linking, etc).
>> There are and have been many techniques and projects for making C more memory-safe.
> Sort of.
Yes. That's why I used the qualifier "more." Our statements are not in conflict.
> Fil-C is novel in that it achieves both safety and compatibility.
How does it affect performance?
>> The crucial question it always comes down to is what performance hit do you take using them?
> Is that really the crucial question?
Yes, because it's the factor that industry leaders use to decide on which language to use. For example, Apple switching from Pascal to C way back in the Stone Age. The fact that it's the crucial question doesn't mean that lots of people don't consider other factors for their own reasons.
> I don't think you would have even gotten to asking that question with most attempts to make C memory safe.
Yes, most. But for example, Microsoft's Checked C comes with a performance penalty of almost 10% for a partial solution. Not academic. Very commercial.
> C rose to where it is today even when it was much slower than assembly
Yes, that's why I said "high-level language." I don't consider assembly high-level, do you?
> people preferred C over FORTRAN anyway
People preferred C in the 1970s/80s because at the time you could allocate memory dynamically in C but not in FORTRAN. FORTRAN fixed that in the 1990s, but by then there were too few FORTRAN programmers to compete. Since then C has serially defeated all newcomers. Maybe Go or Rust are poised to take it on. When a major operating system switches from C, we'll know.
Right now, 1.5x-5x, but considering how many optimizations I know I can do but haven't done yet, I think those numbers are an upper bound.
> Yes, because it's the factor that industry leaders use to decide on which language to use. For example, Apple switching from Pascal to C way back in the Stone Age. The fact that it's the crucial question doesn't mean that lots of people don't consider other factors for their own reasons.
I don't think this is true at all, sorry. Top reason for using C/C++ is inertia. If you've got a pile of C code, then you'll keep writing C.
> Yes, most. But for example, Microsoft's Checked C comes with a performance penalty of almost 10% for a partial solution.
Checked C didn't make C memory safe, so I don't think it's interesting.
> Yes, that's why I said "high-level language." I don't consider assembly high-level, do you?
No, I don't consider assembly to be high-level. The point is: serious engineers don't just blindly reach for the fastest programming language. They'll take slow downs if it makes them more productive. Happens all the time.
> People preferred C in the 1970s/80s because at the time you could allocate memory dynamically in C but not in FORTRAN. FORTRAN fixed that in the 1990s, but by then there were too few FORTRAN programmers to compete. Since then C has serially defeated all newcomers. Maybe Go or Rust are poised to take it on. When a major operating system switches from C, we'll know.
The last time I saw a benchmark of FORTRAN beating C was the early 2000's. FORTRAN is much easier to optimize and compile.
C is great for writing operating systems because C has the right abstractions, such as the abstractions necessary for doing dynamic linking and basically any kind of ABI compatibility. Rust and Go don't have that today. Even C++ is worse than C in this regard. Swift has ABI, but it took heroic efforts to get there.
C didn't eat the world because of its stellar performance. It's the other way around. C has stellar performance because it ate the world, and then the industry had no choice but to make it fast.
> There are clearly stated guarantees, like "a pointer cannot access outside the bounds of its allocation"
But that's not a pointer in anything like the sense of a C pointer.
You'd need to reword that (as I know you've been doing with FiL-C) to be something more like: no reference to a (variable|allocation|object) may ever be used to access memory that is not a part of the object.
Pointers are not that, and the work you've done in FiL-C to make them closer to that makes them also be "not pointers" in a classic sense.
You can call Fil-C’s pointers whatever you like. You can call them capabilities if that works better for you.
The point of my post is to enumerate the set of things you’d need to do to pointers to make them safe. If that then means we’ve created something that you wouldn’t call a pointer then like whatever
If your goal is just to redefine the word then by all means, continue.
But semantics are very important if your goal is to drive adoption of your ideas. You can't misuse a term and then get pissy when people don't understand you.
And my point is that you cannot make C pointers safe. You can make something else that is safe, and you're clearly hard at work on that, which is great.
Fil-C's pointers work more like C pointers than like any other language construct I can think of, and are compatible enough that lots of C/C++ code compiles and runs with no changes.
So I think that Fil-C pointers are just pointers.
You could even get pedantic over what the spec says. If you go there, you find that Fil-C's pointers work exactly like how the spec promises pointers to work (and all of the places that the spec doesn't define either have safe semantics in Fil-C or lead to Fil-C safety errors).
Not a big fan of this line of thinking since if you tried to deploy a C implementation that just implements what's in the C standard, then you'd quickly find that real world C code expects more of pointers than the C standard promises.
Fil-C supports a superset of the C standard but a subset of what contemporary mainstream C compilers support (you can't pass an integer around in memory that is really being used to represent a pointer in Fil-C, but you can in Yolo-C).
You just can’t cast a pointer to uintptr_t, then store that int into memory, then load it back, then cast it back to pointer, and then dereference that pointer.
I just don’t let integer types carry capabilities. If I did, things would get weird (like if you said `x + y` and both happened to carry capabilities then what would you get?)
I remember chasing down a memory leak in my first commercial C code. Took me a long while to discover that if you allocate zero bytes you still have to free it! After that I took nothing for granted.
Nah. I use C a lot, but none of this is enough to make C safe. You really need the language and the tools to enforce discipline. Oh, and things like the cleanup attribute are not standard C either, so this is not portable code.
It starts by discovering how little most folks know from ISO C legalese versus what their compiler does, and it goes from there when adding anything else not part of the standard library.
I am in no way at all better than that guy. Not even sort of. I appreciate his talk.
However, if I were to make a presentation based on my superior C practices, it would have to be implementation and example heavy.
All of his rules sound great, except for when you have to break them or you don’t know how to do the things he’s talking about in your code, because you need to get something done today.
It reads a little like “I’ve learned a lot of lessons over my career, you should learn my lessons. You’re welcome.”
The talk was obviously extremely time-limited, as demonstrated when they basically skipped the last handful of slides and then it abruptly ended. I think for the time allocated, it was just right, and they did include a couple of examples where it made sense.
To me, "memory safety" really means:
- There are clearly stated guarantees, like "a pointer cannot access outside the bounds of its allocation" and "if you free an object while there are still pointers to it then those pointers cannot be dereferenced". These guarantees should be something you can reason about formally, and they should be falsifiable. Not sure this presentation really has that. It's not clear what they prevent, and what they don't prevent.
- There is no way to break out of the clearly stated guarantees. Totally unclear that whatever guarantees they have are actually guarded against in all cases. For example, what if a tmp_alloc'd object pointer escapes into another tmp_alloc'd object with different lifetime - I get that they wouldn't write code that does that intentionally, but "memory safety" to me means that if you did write that code, you'd either get a compile error or a runtime error.
It's possible to ascribe clearly stated guarantees to C and to make it impossible to break out of them (Fil-C and CHERI both achieve that).
> There is no way to break out of the clearly stated guarantees.
I disagree on this, and having escape hatches is critically important. Are we really going to call Rust or Haskell memory unsafe because they offer ways to break their safety guarantees?
I think that Rust's clearly stated guarantee holds if you never say "unsafe".
That's still a clear statement, because it's trivial to tell if you used "unsafe" or not.
Maybe I just misinterpret what you mean. When you say "no way" and "all cases", I take your meaning literally. The existence of pointers to bypass the borrow checker, disabling runtime bounds checks and unsafe blocks are exactly that: escape hatches to break Rust's safety, in the same way type-casting is an escape hatch to break C's (anemic) type safety, and unsafePerformIO in Haskell is an escape hatch to break every bone in your body.
That’s fair.
FWIW, Fil-C’s guarantees are literally what you want. There’s no escape.
Trivial is not a word I would use here! Rust's `unsafe` gets fuzzy as you transverse an operation's dependencies! There are many applications where marking a function as `unsafe` is subjective.
True.
I do think the ideal kind of memory safe language either has no "unsafe", or has an "unsafe" feature that only needs to be used in super rare an obscure cases (Java is like that, sort of).
Fil-C has no "unsafe", so in that sense Fil-C is safer than Rust. You don't need an escape hatch if the memory safety guarantees are dialed in just right.
In Virgil, native targets have a couple of unsafe operations available, one of which is to be able to forge a closure from a pair of a code pointer and an object reference. This is used, e.g. to implement Wizard's JIT and fast interpreter, which generate new machine code at runtime. I can't imagine the level of proof necessary to make that safe--and not just the safety of the generated machine code, but it often lacks bounds checks because it relies on running verified Wasm bytecode, which cannot go out of bounds. So the proof would have to include a complete proof of correctness for the code validation algorithm (i.e. part of Wizard itself).
I dream of a JIT API that lets you propose machine code that is checked using an abstract interpreter to ensure that you have adequate checks to stay within the host language's type system.
Someday, man
This will kill baseline JIT performance.
I am trying out a couple of new directions, e.g. generating more of the tiers from a more abstract description, constantly shrinking the amount of hand-written compiler/interpreter code. My hard requirement is the end result has to be pretty darn close to what I'd write by hand.
One thing I am thinking about now is how to make more use of the implementation language's (e.g. Virgil) compiler to be able to paste together machine code templates gotten from writing in the implementation language. Think copy-and-patch compilation, but as language primitive. E.g. "please emit an inlined copy of the machine code for this function (first-class ref to said function) into memory here, under this ABI".
> This will kill baseline JIT performance.
You say that and then you describe exactly what I would have used as a solution: copy and patch.
Just have the checker check the templates that the baseline JIT is stitching together and then have a safe way to ask for the prechecked templates to be stitched together.
Bunch of details in getting that right obviously, but it doesn’t seem impossible.
I don't get this, can you explain?
Sure.
If I told you that I have a snippet of machine code that:
- obeys the ABI of your safe language (ie it has exactly the calling convention that safe language uses)
- corresponds exactly to a function body whose signature is T->U (or whatever, different safe languages have different function type syntax)
- obeys the language’s type system.
Then you could run an abstract interpreter to check that the machine code follows that type system. Simple example: given the above claims, if we further assume that the host language impl puts argument one into register 5, and the first argument’s type is “pointer to an array of bytes”, and we know that arrays have a 64-bit length prefixed to the start, then the abstract interpreter would just need to check that any deref of register 5 is preceded by a bounds check on whatever was loaded at offset -8 from register 5. And so on, for every possible thing you can do in the language.
Then the JIT would just have to make sure it puts checks in all of the places that the absint expects them. If the absint fails, then the machine code is rejected.
I think there’s a sense in which Rust is safer than Fil-C, though: Rust allows abstractions with little to do with memory safety but that still can’t be broken without 'unsafe'. So a struct called EvenNumber can fairly strongly guarantee that it contains an even number.
But Fil-C objects (at least for now?) only seem to allow one single capability type, and that capability grants unrestricted read/write access to the object’s bytes.
I wonder if one could build a handle system in Fil-C that would allow this to be extended. Or if a different variant of a Fil-C-like system could distinguish between pointers with different access levels to an object and could allow only the correct piece of trusted code to increase the permission of a pointer.
I’ve thought about adding such things to Fil-C but have held off on going there because it feels like a bridge too far.
What I mean by that is: the memory safety issues of C are a total dumpster fire, while whether a number is even or not (and whether you can prove that) is maybe like icing on the dumpster fire. It just doesn’t matter by comparison.
So I want to decisively fix the memory safety issues and not lose focus.
The problem of C code migration to memory safe languages is that legacy C projects aim for extremely high performance. Garbage-collecting languages would also be safe in any situation, but I want to note that the recent tendency toward Rust derives from its type-system based approach that imposes very few runtimes checks such as bound checking. I myself hope something like F* gets more attraction in the industry.
That is the usual myth, back when I initially learned C, hobby programmers in Assembly could easily outperform machine code generated by C compilers, that is why books like those from Mike Abrash exist.
C got its performance fame thanks to optimizing compilers that abuse UB semantics.
Microsoft team on .NET, especially the great Stephen Toub blog posts, has been showing off how much performance can be squizzed out of a managed language compiler toolchain when people actually care.
Also lets not forget Apple only moved away from Object Pascal due to an internal team doing MPW initially as kind of submarine project, due to their UNIX roots, and still their focus was C++, not C.
I hear the perf claim a lot and yet most of the time when I write C/C++ code it's not because it's the most performant. It's either because I'm editing a codebase that's already in C/C++, or I'm using libraries whose best (or only) bindings are C/C++, or because I want to make syscalls (and safe languages don't expose those as nicely as C).
For desktop, server, web, mobile, etc. This holds true. Not so much for embedded systems, anything with unnatractive memory capacity or processor performance. Rust is starting to make it's way in, but C and even assembly is still king AFAIK.
And yet there are embedded systems running JavaScript and Python.
Are there any real-world, production systems based on JavaScript and Python?
Toy systems, yes. Hey, I too, think CircuitPython is really neat. But I'm skeptical someone would base a PLC (or similar) on it.
Above was talking about embedded systems, and I really don’t know if the James Webb Space Telescope counts, but…
The James Webb Space Telescope runs JavaScript, apparently [1].
[1]: https://www.theverge.com/2022/8/18/23206110/james-webb-space...
But the Rust ecosystem is littered with unsafe, so good luck getting the actual benefits of Rust. :(
The actual benefits seem manifest: there aren't nearly as many public reports of memory corruption (much less exploitable corruption) in Rust components, even when those components make extensive use of unsafety directly or transitively.
(This seems like one of those "throw the baby out with the bathwater" cases that people relitigate around Rust -- there's ample empirical evidence that building safe abstractions around unsafe primitives works well.)
> The actual benefits seem manifest: there aren't nearly as many public reports of memory corruption (much less exploitable corruption) in Rust components, even when those components make extensive use of unsafety directly or transitively.
Not sure the data is clean enough to draw meaningful conclusions because of confounding factors.
The biggest confounding factor is that Rust is relatively new, code written in it is even newer, and folks who research vulns may not have applied the same level of anger to Rust as to C.
That said, your point about "throwing the baby out with the bathwater" is well taken. I would expect that Rust has much fewer vulns than C/C++. My point is only that it's an unproven expectation.
> Not sure the data is clean enough to draw meaningful conclusions because of confounding factors.
I'm thinking of things like the Windows user- and kernel-mode font parsers; these have a pretty long and steady public history of exploitation that seems to have mostly stopped with the Rust rewrite 1-2 years ago. I don't think that's because vuln researches have stopped looking at them!
But yeah, I would like it if Google and Microsoft (among others) would put more hard data out there. I don't think of the Windows kernel teams as typically suffering from hype-driven development, so my abductive conclusion is that they have strong supporting data internally.
Edit: here's a hard data source from Google, showing that Rust has contributed to a marked decline in memory unsafety in Android[1].
[1]: https://security.googleblog.com/2022/12/memory-safe-language...
That's good data, but: is the reduction in vulns because Rust is safer, or is it because vuln researchers assume it's safer and so choose to look for vulns in C/C++ code because it's what they're familiar with?
It's hard to say.
Wait, what? Unsafe in Rust _is_ the actual benefit. That is to say, what `unsafe` does is that it allows you to implement a fundamentally tricky thing and express its safety invariants in the type system and lifetime system, resulting in a safe API with no tricky parts.
That's the whole point.
There's tons of trivial unsafe in the Rust ecosystem, and a little bit of nontrivial unsafe, because crates.io is full of libraries doing interesting things (high-performance data structures, synchronization primitives, FFI bindings, etc.) while providing a safe API, so you can do all of that without writing any unsafe yourself.
The point of Rust isn't that you can implement low-level data structures in safe code, but that you can use them without fear.
Saying that unsafe is a benefit is backwards, I think. Unsafe is the compromise Rust made to achieve its other goals, but of course Rust would be better if there was some way of doing things without unsafe.
Consider the split_at_mut function. It takes one mutable slice, and returns two mutable slices by chopping at a caller-specified point. It's in the standard library.
The operation is completely safe, as after the call the original mutable slice is no longer live, but the borrow checker won't let you write such a function yourself unless you tag it as unsafe, so that's what the implementation must do.
The same thing happens in the implementation of Vec: there is low-level code that is unsafe, used to provide safe abstractions.
That’s not a benefit of Rust. In other safe languages you could implement those things without writing a single bit of unsafe code. (This is true in Fil-C for example.)
Is that really fair? Neither Fil-C nor C have anything that particularly resembles & or &mut. If Rust had an &shared_mut style of reference, you could presumably split it without unsafe. For that matter, Rust does have various interior-mutable types, and you could have a shared reference to a slice of AtomicBool or whatever, and you can split that without any particular magic.
Fil-C doesn’t have those things because it doesn’t need them to achieve safety.
I don't see the interest in split_at_mut. I can get the same thing by reslicing in Go. And also the GC will do the job the borrow checker foists off onto the unlucky Rust programmer.
Pfft, whatever. Rusters gonna rust, I guess.
I'm not a Ruster, I've spent my career writing a ton of C++. But I'm interested in Rust as an alternative because it doesn't need GC.
But in this case, it's not just the memory safety I'm interested in, it's the data races. If we have multiple threads but can guarantee that any object either has only read-only references, or one mutable references and no readers, we don't have data race issues.
Great reason to use Fil-C
Thinking aloud, and this is probably a bad idea for reasons I haven’t thought of.
What if pointers were a combination of values, like a 32 bit “zone” plus a 32 bit “offset” (where 32/32 is probably really 28/36 or something that allows >4GB allocations, but let’s figure that out later). Then each malloc() could increment the zone number, or pick an unused one randomly, so that there’s enormous space between consecutive allocs and an address wouldn’t be reissued quickly. A dangling pointer would the point at an address that isn’t mapped at all until possibly 2^32 malloc()s later. It wouldn’t help with long-lived dangling pointers, but would catch accessing a pointer right after it was freed.
I guess, more generally, why are addresses reused before they absolutely must be?
It sounds like what you're describing is one-time allocation, and I think it's a good idea. There is some work on making practical allocators that work this way [1]. For long-running programs, the allocator will run out of virtual address space and then you need something to resolve that -- either you do some form of garbage collection or you compromise on safety and just start reusing memory. This also doesn't address spatial safety.
[1]: https://www.usenix.org/system/files/sec21summer_wickman.pdf
Oh, nifty! I guarantee you anyone else discussing this has put more than my 5 minutes' worth of thought into it.
Yeah, if you allow reuse then it wouldn't be a guarantee. I think it'd be closer to the effects of ASLR, where it's still possible to accidentally still break things, just vastly less likely.
That’s a way of achieving safety that has so many costs:
- physical fragmentation (you won’t be able to put two live objects into the same page)
- virtual fragmentation (there’s kernel memory cost to having huge reservations)
- 32 bit size limit
Fil-C achieves safety without any of those compromises.
For sure. I'm under no illusion that it wouldn't be costly. What I'm trying to suss out is whether libc could hypothetically change to give better safety to existing compiled binaries.
Two things:
- The costs of your solution really are prohibitive. Lots of stuff just won't run.
- "Better" isn't good enough because attackers are good at finding the loopholes. You need a guarantee.
This sounds similar to the 386 segmented memory model: https://en.wikipedia.org/wiki/X86_memory_segmentation#80386_...
However, it was limited to 8192 simultaneous “allocations” (segments) per process (or per whatever unit the OS associates the local descriptor tables with).
you can do this easily with virtual memory, and IIRC Zig's general purpose allocator does under some circumstances (don't remember if its default or if it needs a flag).
There are and have been many techniques and projects for making C more memory-safe. The crucial question it always comes down to is what performance hit do you take using them?
That's why C has been on top for so long. Seat-of-the-pants hand-crafted C has always been the fastest high-level language.
C's memory safety could be drastically improved with the addition of bounds-checked arrays (which is an extension, and does not change existing code):
https://www.digitalmars.com/articles/C-biggest-mistake.html
25 years of experience with D has shown this to be a huge improvement.
D also has references as an alternative to pointers. References cannot have arithmetic done on them. Hence, by replacing pointers with references, and with array bounds checking, the incidence of memory corruption is hugely reduced.
> C's memory safety could be drastically improved with the addition of bounds-checked arrays (which is an extension, and does not change existing code):
If you solved that problem then you'd still have a dumpster fire of memory safety issues from bad casts, use after free, etc
Array overflows is consistently the number one memory safety bug in shipped code, by a wide margin.
Citation needed.
(I would have guessed similar to what you said, minus the "by a wide margin" bit.)
A few minutes of googling:
https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html
https://runsafesecurity.com/blog/memory-safety-vulnerabiliti...
Neither of those support your claim that buffer overflows are the top issue by a wide margin.
If you are saying that “most memory safety issues are bounds related” then I agree. I’m just disagreeing on the wide margin bit.
> Neither of those support your claim that buffer overflows are the top issue by a wide margin
That's true, but I have seen that statistic more than once, and decided that I wasn't going to spend more time searching for it.
If that's not good enough for you, so be it.
I found that C programs rarely evolve beyond their initial design. The trouble is, it's hard to refactor C programs. For example,
I.e. a . is for direct access, -> for indirect access. Let's say you want to change passing S by value to passing S by pointer. Now you have to update every use, instead of just the declaration.This is how it would work in D:
And so refactoring becomes much easier, and so happens more often.> C has always been the fastest high-level language.
C has another big speed problem. Strings are 0 terminated, rather than length terminated. This means constant scanning of strings to find their length. Even worse, the scanning of the string reloads the cache with the string contents, which is pretty bad for performance.
Of course, you could use `struct String { char *p; size_t length; };` but since every library you want to connect to uses 0 terminated strings, you're out on your island all alone, so pragmatically it does not work.
Another speed-destroying problem with C strings is you cannot take a substring that does not require allocating a new string and then copying the data. (Unless the substring is right-justified.) This is not fast in any universe.
D uses length-denoted strings as a basic data type, and with string processing code, it is much faster than C. Substrings are quick and easy. You can still interface with C because D string literals implicitly convert to C string literals, as the literals are 0 terminated. So this works in D:
(People sometimes rag on me for still using printf, but printf is the most optimized and debugged library function in the world, so I take advantage!)Yeah totally.
It's a perfect example of C being optimized for simple mapping onto linear memory, rather than some kind of performance optimum
> There are and have been many techniques and projects for making C more memory-safe.
Sort of. None of them got all the way to safety, or they never got all the way to compatibility with C.
Fil-C is novel in that it achieves both safety and compatibility.
> The crucial question it always comes down to is what performance hit do you take using them?
Is that really the crucial question?
I don't think you would have even gotten to asking that question with most attempts to make C memory safe, because they involved experimental academic compilers that could only compile a subset of the language and only worked for a tiny corpus of benchmarks.
Lots of C/C++ code is not written with a perf mindset. Most of the UNIX utilities are like that, for example.
> That's why C has been on top for so long. Seat-of-the-pants hand-crafted C has always been the fastest high-level language.
I don't think that's the reason. C rose to where it is today even when it was much slower than assembly. C was slower than FORTRAN for a long time (maybe still is?) but people preferred C over FORTRAN anyway.
C's biggest superpower is how easy it makes it to talk to system ABI (syscalls, dynamic linking, etc).
>> There are and have been many techniques and projects for making C more memory-safe.
> Sort of.
Yes. That's why I used the qualifier "more." Our statements are not in conflict.
> Fil-C is novel in that it achieves both safety and compatibility.
How does it affect performance?
>> The crucial question it always comes down to is what performance hit do you take using them?
> Is that really the crucial question?
Yes, because it's the factor that industry leaders use to decide on which language to use. For example, Apple switching from Pascal to C way back in the Stone Age. The fact that it's the crucial question doesn't mean that lots of people don't consider other factors for their own reasons.
> I don't think you would have even gotten to asking that question with most attempts to make C memory safe.
Yes, most. But for example, Microsoft's Checked C comes with a performance penalty of almost 10% for a partial solution. Not academic. Very commercial.
> C rose to where it is today even when it was much slower than assembly
Yes, that's why I said "high-level language." I don't consider assembly high-level, do you?
> people preferred C over FORTRAN anyway
People preferred C in the 1970s/80s because at the time you could allocate memory dynamically in C but not in FORTRAN. FORTRAN fixed that in the 1990s, but by then there were too few FORTRAN programmers to compete. Since then C has serially defeated all newcomers. Maybe Go or Rust are poised to take it on. When a major operating system switches from C, we'll know.
> How does it affect performance?
Right now, 1.5x-5x, but considering how many optimizations I know I can do but haven't done yet, I think those numbers are an upper bound.
> Yes, because it's the factor that industry leaders use to decide on which language to use. For example, Apple switching from Pascal to C way back in the Stone Age. The fact that it's the crucial question doesn't mean that lots of people don't consider other factors for their own reasons.
I don't think this is true at all, sorry. Top reason for using C/C++ is inertia. If you've got a pile of C code, then you'll keep writing C.
> Yes, most. But for example, Microsoft's Checked C comes with a performance penalty of almost 10% for a partial solution.
Checked C didn't make C memory safe, so I don't think it's interesting.
> Yes, that's why I said "high-level language." I don't consider assembly high-level, do you?
No, I don't consider assembly to be high-level. The point is: serious engineers don't just blindly reach for the fastest programming language. They'll take slow downs if it makes them more productive. Happens all the time.
> People preferred C in the 1970s/80s because at the time you could allocate memory dynamically in C but not in FORTRAN. FORTRAN fixed that in the 1990s, but by then there were too few FORTRAN programmers to compete. Since then C has serially defeated all newcomers. Maybe Go or Rust are poised to take it on. When a major operating system switches from C, we'll know.
The last time I saw a benchmark of FORTRAN beating C was the early 2000's. FORTRAN is much easier to optimize and compile.
C is great for writing operating systems because C has the right abstractions, such as the abstractions necessary for doing dynamic linking and basically any kind of ABI compatibility. Rust and Go don't have that today. Even C++ is worse than C in this regard. Swift has ABI, but it took heroic efforts to get there.
C didn't eat the world because of its stellar performance. It's the other way around. C has stellar performance because it ate the world, and then the industry had no choice but to make it fast.
> There are clearly stated guarantees, like "a pointer cannot access outside the bounds of its allocation"
But that's not a pointer in anything like the sense of a C pointer.
You'd need to reword that (as I know you've been doing with FiL-C) to be something more like: no reference to a (variable|allocation|object) may ever be used to access memory that is not a part of the object.
Pointers are not that, and the work you've done in FiL-C to make them closer to that makes them also be "not pointers" in a classic sense.
I'm OK with that, it just needs to be more clear.
Semantics.
You can call Fil-C’s pointers whatever you like. You can call them capabilities if that works better for you.
The point of my post is to enumerate the set of things you’d need to do to pointers to make them safe. If that then means we’ve created something that you wouldn’t call a pointer then like whatever
> Semantics.
If your goal is just to redefine the word then by all means, continue.
But semantics are very important if your goal is to drive adoption of your ideas. You can't misuse a term and then get pissy when people don't understand you.
Seems like bro understood me just fine.
In Fil-C, pointers are called “pointers”.
And my point is that you cannot make C pointers safe. You can make something else that is safe, and you're clearly hard at work on that, which is great.
Fil-C's pointers work more like C pointers than like any other language construct I can think of, and are compatible enough that lots of C/C++ code compiles and runs with no changes.
So I think that Fil-C pointers are just pointers.
You could even get pedantic over what the spec says. If you go there, you find that Fil-C's pointers work exactly like how the spec promises pointers to work (and all of the places that the spec doesn't define either have safe semantics in Fil-C or lead to Fil-C safety errors).
What behavior in the C standard requires unsafety from pointers??
Not a big fan of this line of thinking since if you tried to deploy a C implementation that just implements what's in the C standard, then you'd quickly find that real world C code expects more of pointers than the C standard promises.
Fil-C supports a superset of the C standard but a subset of what contemporary mainstream C compilers support (you can't pass an integer around in memory that is really being used to represent a pointer in Fil-C, but you can in Yolo-C).
Does Fil-C support uintptr_t? Because I’ve written programs that I believe to be quite strictly conforming C99 programs, that make use of uintptr_t.
Yes you can use uintptr_t.
You just can’t cast a pointer to uintptr_t, then store that int into memory, then load it back, then cast it back to pointer, and then dereference that pointer.
Okay, but the C standard does allow exactly that for void*.
You can do exactly that with `void*` in Fil-C.
I just don’t let integer types carry capabilities. If I did, things would get weird (like if you said `x + y` and both happened to carry capabilities then what would you get?)
I remember chasing down a memory leak in my first commercial C code. Took me a long while to discover that if you allocate zero bytes you still have to free it! After that I took nothing for granted.
It's not even guaranteed that it doesn't allocate, so a malloc(0) could cause an out of memory.
> malloc(0) could cause an out of memory
tbh, 640K RAM ought to be enough for anybody.
For the last drop to make the cup run over it doesn't matter how big the cup is.
Nah. I use C a lot, but none of this is enough to make C safe. You really need the language and the tools to enforce discipline. Oh, and things like the cleanup attribute are not standard C either, so this is not portable code.
usually portability in C includes the provision that you can drop in whatever #includes you want?
No, it's really not that simple at all.
Probably depends on the macro, but ok.
It starts by discovering how little most folks know from ISO C legalese versus what their compiler does, and it goes from there when adding anything else not part of the standard library.
I don't think anyone ever doubted that a C program could be memory safe. The problem is knowing without exhaustive work whether yours is one of them.
These aren't bad practices, but I don't think they satisfy that desire either.
I am in no way at all better than that guy. Not even sort of. I appreciate his talk.
However, if I were to make a presentation based on my superior C practices, it would have to be implementation and example heavy.
All of his rules sound great, except for when you have to break them or you don’t know how to do the things he’s talking about in your code, because you need to get something done today.
It reads a little like “I’ve learned a lot of lessons over my career, you should learn my lessons. You’re welcome.”
The talk was obviously extremely time-limited, as demonstrated when they basically skipped the last handful of slides and then it abruptly ended. I think for the time allocated, it was just right, and they did include a couple of examples where it made sense.