Memory Safe Languages: Reducing Vulnerabilities in Modern Software Development [pdf]

media.defense.gov

98 points by todsacerdoti 3 days ago

Two big problems in this document:

- it conflates data race protection with memory safety, and it does so inconsistently. Java and C# are mentioned as MSLs and yet they totally let you race. More fundamentally, data races aren’t the thing that attackers exploit except when those data races do lead to actual memory corruption (like use after free, double free, out of bounds, access to allocator metadata etc). So it’s more precise to not mention data races freedom as a requirement for memory safety, both because otherwise languages like Java and C# don’t meet the definition despite being included in the list and because data races in the presence of memory safety are not a big deal from a security standpoint.

- The document fails to mention to mention Fil-C. It would be understandable if it was mentioned with caveats (“new project”, “performance blah blah”) but not mentioning it at all is silly.

AnthonyMouse 2 days ago
> More fundamentally, data races aren’t the thing that attackers exploit except when those data races do lead to actual memory corruption (like use after free, double free, out of bounds, access to allocator metadata etc).
This is absolutely not true. One of the classic data races is when you do a set of operations like this non-atomically:
```
  new_total = account.balance;
  new_total -= purchase_price;
  account.balance = new_total;
```
Which is a huge security vulnerability because it lets people double spend. Alice buys something for $1000 and something for $1 and instead of debiting her account by $1001 it debits it by $1 because the write for the second transaction clobbers the balance reduction from the first one.
Another common one is symbolic links. You check the target of a symbolic link and then access it, but between the check and the access the link changed and now you're leaking secrets or overwriting privileged data.
Data races are serious vulnerabilities completely independent of memory safety.
- tialaramex 2 days ago
  
  That filesystem example is a TOCTOU race, not a data race, and so it can happen in Rust or similar languages just the same. Although both TOCTOU races and data races are types of race condition, they are not the same thing. Many race conditions are an ordinary part of our lived experience - if you've ever thought "Oh, I need to buy more milk" and then went to a store but meanwhile your house mate, partner, colleague at work, or whatever were also out buying more milk, well, when you get back with milk there's too much milk, oops, that's a race condition, specifically a TOCTOU race - we checked the milk, then we purchased more milk, meanwhile someone else changed how much milk is there.
  Data races aren't like any real world experience. The way the machine actually works is too alien for us to get our heads around so we're provided with a grossly simplified "sequentially consistent" illusion when writing high level languages like C - in which things happen in some order. Data races are reality "bleeding through" if we don't follow the rules to preserve that illusion.
  
  stirfish 2 days ago
  
  >TOCTUO
  Time of check to time of use
  https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
  I didn't know this, thank you
- pizlonator 2 days ago
  
  I’ve fixed so many burning hot potato security bugs in my life and your data race example has never come up, not even once.
  Logic errors not preventable by any language or type system (like making sure you enforce policy in a setuid process) are far more likely than that.
  
  AnthonyMouse a day ago
  
  It depends what domain you're operating in. If you're only ever patching CVEs it shows up less, because if you've implemented that data race where the first and last lines are separate SQL queries, you're not going to get a CVE against the database software for it. Then it only even gets discovered if someone actually starts exploiting it or the race is so severe that it happens under normal use and then at the end of the quarter the accountants start asking why the numbers don't add up.
  
  pizlonator a day ago
  
  No memory safe language will protect you from getting your SQL queries wrong.
  Races in a database are not “data races” in the programming language sense, unless we’re debating. The hat query language to use
  
  AnthonyMouse a day ago
  
  > No memory safe language will protect you from getting your SQL queries wrong.
  That's kind of the point.
  > Races in a database are not “data races” in the programming language sense
  Only in the sense that in a sufficiently large bureaucracy you get to power up the Somebody Else's Problem Field and blame the DBA for it. But that doesn't get you the missing money back.
  
  pizlonator a day ago
  
  This whole conversation is about memory safety of programming languages, not security vulns that no language can prevent
  
  AnthonyMouse a day ago
  
  Which is, again, the point. People have a new hammer but not everything is a nail.
  And lots of programming languages provide tools to address data races. SQL has transactions, several languages have compare-and-swap primitives, etc.
tialaramex 2 days ago

> Java and C# are mentioned as MSLs and yet they totally let you race.
In Java a data race means loss of sequential consistency. Humans generally don't understand programs which lack sequential consistency so a typical Java team probably can't debug the program, but the program still always has well defined behaviour - and chances are you don't want to debug the weird non-sequentially consistent behaviour anyway, you just want them to fix the data race.
In C# data races are not too dangerous for trivial objects which are valid for all bit patterns. If you race an integer k, well, now k is smashed, don't think too hard about the value of k, it does have some value but it won't go well for you to try to reason about the value. For a complex object like a hash table, it's Undefined Behaviour.
Meanwhile in C or C++ all data races are immediate UB, you lose, game over.
- tialaramex 2 days ago
  
  Ugh, I guess my brain mis-fired when I wrote this originally. Sorry about that. The above C# comments (about trivial versus complex objects) actually refer to Go, a quite different garbage collected language. For C# the exact situation is not well documented, and so it's probably best not to rely on anything that you can't get a solid guarantee for, but in principle it's similar to Java.
- pizlonator 2 days ago
  
  In Fil-C, data races have Java-like behavior.
  The reason why YOLO-C has the bad data race behavior is because (1) data races might lead to memory safety violations and (2) the compiler is permitted to play more fast and loose than strictly necessary. Fil-C fixes (1) by making the language memory safe. Fil-C fixes (2) by just having different policies in the compiler.
burakemir 2 days ago

A definition of memory safety without data race freedom may be more precise but arguably less complete.
It is correct that data races in a garbage collected language are difficult to turn into exploits.
The problem is that data races in C and C++ do in fact get combined with other memory safety bugs into exploits.
A definition from first principles is still missing, but imagine it takes the form of "all memory access is free from UB". Then whether the pointer is in-bounds, or whether no thread is concurrently mutating the location seem to be quite similar constraints.
Rust does give ways to control concurrency, eg via expressing exclusive access through &mut reference. So there is also precedent that the same mechanisms can be used to ensure validity of reference (not dangling) as well as absence of concurrent access.
- pizlonator 2 days ago
  
  > The problem is that data races in C and C++ do in fact get combined with other memory safety bugs into exploits.
  Because C and C++ are not memory safe.
  > A definition from first principles is still missing, but imagine it takes the form of "all memory access is free from UB". Then whether the pointer is in-bounds, or whether no thread is concurrently mutating the location seem to be quite similar constraints.
  I think it's useful to work backwards from the languages that security folks say are "memory safe", since what they're really saying is, "I cannot use the attacks I'm familiar with against programs written in these languages".
  Based on that, saying "no UB" isn't enough, and only looking at memory accesses isn't enough.
  WebAssembly has no UB, but pointers defined to just be integers (i.e. the UB-free structured assembly semantics of a C programmer's dreams). So, attackers can do OOB and UAF data attacks within the wasm memory. The only thing attackers cannot do is control the instruction pointer or escape the wasm memory (unless the wasm embedder has a weak sandbox policy, in which case they can do both). Overall, I think that memory-safety-in-the-sense-of-wasm isn't really memory safety at all. It's too exploitable.
  To be memory safe like the "MSLs" that security folks speak of, you also need to consider stuff like function calls. Depending on the language, you might have to look at other stuff, too.
  I think that what security folks consider "memory safe" is the combination of these things:
  1) Absence of UB. Every outcome is well defined.
  2) Pointers (or whatever pointer-like construct your language has) can only be used to access whatever allocation the originated from (i.e. pointers carry capabilities).
  And it's important that these get "strongly" combined; i.e. there is no operation in the language that could be used to break a pointer's capability enforcement.
  Java and Fil-C both have a strong combination of (1) and (2).
  But, long story short, it's true that a definition of memory safety from first principles is missing in the sense that the field hasn't settled on a consensus for what the definition should be. It's controversial because you could argue that under my definition, Rust isn't memory safe (you can get to UB in Rust). And, you could argue that wasm meets (2) because "the allocation" is just "all of memory". I'm not even sure I like my own definition. At some point you have to say more words about what an allocation is.
pornel 2 days ago

They're not going to mention a single-person experimental project that has 900 stars on GitHub.
This is meant to be a practical strategy that can be implemented nation-wide, without turning into another https://xkcd.com/2347
- pizlonator 2 days ago
  
  > They're not going to mention a single-person experimental project that has 900 stars on GitHub.
  Seems like a bad way to pick technology.
  They do mention things like TRACTOR. Fil-C is far ahead of any project under the TRACTOR umbrella.
  > This is meant to be a practical strategy that can be implemented nation-wide, without turning into another https://xkcd.com/2347
  The solution to that is funding the thing that is essential, rather than complaining that an essential thing is unfunded. DOD could do that
  
  pornel 2 days ago
  
  > Seems like a bad way to pick technology.
  This is a very sensible way to pick a technology for a government.
  Having a cool proof of concept, with a bus factor of 1, and having a solution that countless government agencies can depend on for multi-million-dollar decades-long software projects are very different things.
  They can't just depend out of the blue on you personally maintaining "Fil's Unbelievable Garbage Collector" for the lifetime of the government's projects. Maybe you believe they could, but it takes way more legwork to give such assurance to a government.
  They list TRACTOR under projects they've already funded (and crucially, not among solutions they recommend yet). Apply for funding for Fil-C, and if it gets accepted, it'll probably get listed there too.
  The TRACTOR approach also has higher tolerance to being an experimental project, because it's one-time conversion of C to Rust. It only needs to work once, not continuously for decades. The Rust-lang org is set up to offer serious long-term support, and is way past having a critical dependency on a single developer.
  
  pizlonator 2 days ago
  
  > This is a very sensible way to pick a technology for a government.
  No, it's not, for the simple reason that the government has more than adequate resources to recreate a Fil-C-like with a team, or even just add people power to Fil-C.
  The fact that it only took one dude working in his spare time 1.5 years to make C memory safe suggests that the whole narrative of the OP is wrong. The problem isn't that people aren't using memory safe languages. The problem is that nobody is funding just making C memory safe.
  
  pornel a day ago
  
  You've made a language that is at least 20-50% slower than C and has a garbage collector.
  That's not what people use C for. You're presenting it as a memory-safe C, but you've got a more fine-grained ASAN. That's useful, but it's not blowing away the whole narrative.
  For running unfixable legacy C code there are already lower-overhead solutions. They're not as precise, but that's either not necessary for safety (e.g. where there's a right sandbox boundary), or the performance is so critical that people accept incomplete hardening despite the risks.
  For new development, where a slower GC language is suitable, there are plenty of languages to choose from that are more convenient and less crash-prone.
  There's already CHERI that takes a similar approach to pointer tagging, but they're doing it in hardware, because they know that software emulation makes the solution unappealing.
  
  pizlonator a day ago
  
  > You've made a language that is at least 20-50% slower than C and has a garbage collector. That's not what people use C for.
  Says who?
  Most software written in C is not perf sensitive. My shell could be 4x slower and I wouldn’t care.
  That’s also true for most of the GUI stuff I use, including the browser.
  > you've got a more fine-grained ASAN.
  The difference between Fil-C and asan is that Fil-C is memory safe while asan isn’t.
  This has nothing to do with “fine grained”.
  > it's not blowing away the whole narrative.
  The narrative is that C is not a memory safe language. That narrative is false.
  If the narrative was, “C is only memory safe if you’re willing to pay perf cost” then like whatever. But that’s not what folks are saying
  > For running unfixable legacy C code there are already lower-overhead solutions. They're not as precise, but that's either not necessary for safety (e.g. where there's a right sandbox boundary), or the performance is so critical that people accept incomplete hardening despite the risks.
  No there aren’t. Fil-C is the only memory safe solution for C code.
  Hwasan, mte, etc aren’t memory safe. Asan isn’t memory safe (and probably also isn’t cheaper). Don’t know what else you’re thinking of.
  > There's already CHERI that takes a similar approach to pointer tagging
  Neither Cheri nor Fil-C use pointer tagging. Both use pointer capabilities. Fil-C’s capabilities are safer (they actually protect use after free).
  Fil-C is faster than Cheri because I can run Fil-C on fast commodity hardware. Fil-C in my x86 box is orders of magnitude faster than the fastest Cheri machine ever
  
  safercplusplus 2 days ago
  
  Preach it brother! :)
  Hmm, I take it that the situation is that there are a number of vendors/providers/distros/repos who could be distributing your memory-safe builds, but are currently still distributing unsafe builds?
  I wonder if an organization like the Tor project [1] would be more motivated to "officially" distribute a Fil-C build, being that security is the whole point of their product. (I'm talking just their "onion router" [2], not (necessarily) the whole browser.)
  I could imagine that once some organizations start officially shipping Fil-C builds, adoption might accelerate.
  Also, have you talked to the Ladybird browser people? They seemed to be taking an interested in Fil-C.
  [1] https://www.torproject.org/
  [2] https://gitlab.torproject.org/tpo/core/tor
  
  pornel a day ago
  
  Tor wants to move to Rust, and they aren't happy with their C codebase. They want to expand use of multi-threading, and C has been too fragile for that.
  https://blog.torproject.org/announcing-arti/
  
  safercplusplus a day ago
  
  Makes sense. But maybe the fact that that post is 4 years old serves to bolster the argument for Fil-C's value proposition. However much people may want to move away from their C code bases, the resources it takes to do so in a timely manner are often not so readily available.
SkiFire13 2 days ago

> because data races in the presence of memory safety are not a big deal from a security standpoint.
Note though that data races can make otherwise memory-safe programs not actually memory safe. See for example Go
- pizlonator 2 days ago
  
  What’s the problem with Go?
  
  aw1621107 2 days ago
  
  IIRC slices aren't updated atomically in Go so you can get data races on their components, resulting in potential UB.
  For what it's worth, from what I've read on here (e.g., [0]) this has yet to be exploited in a non-demonstration setting, but who knows if/when the first such exploit will appear.
  [0]: https://news.ycombinator.com/item?id=42043939
  
  pizlonator 2 days ago
  
  I think that just means that Go isn't completely memory safe.
  Or, it means that "Go is only memory safe provided you have no data races".
  My point is that it's weird to say that there is a notion of memory safety that is separate from the memory safety you get if you also have a story for data races. It leads to exactly the confusion in the OP: it's not clear if they're saying that memory safety subsumes data race freedom in the sense that you're memory-safe even in the presence of races (like Java or Fil-C), or that it means that memory safety subsumes data race freedom in the sense that Rust's type system handles both memory safety and data races using the same basic technique.
  
  aw1621107 a day ago
  
  Right, I get what you're saying and think you have a point. Just wanted to expand on the (perceived?) issue with Go and data races.
jart 2 days ago

[flagged]
- Ygg2 2 days ago
  
  > Memory safety is like the global warming of the software industry.
  So it's an insidious long term issue that challenges our systems which reward short term thinking, and will slowly crush us, if we don't do anything about it?
  I fully agree.
- jdright 2 days ago
  
  jart putting Carmack and Musk at the same level is a bit sad and revealing, no wonder the downvotes.
  
  jart 2 days ago
  
  https://x.com/ID_AA_Carmack/status/1935353905149341968

charcircuit 2 days ago

A big thing missing is swapping out dependencies in unsafe languages for ones written in safe languages.

Usually there are only a couple places that actually deal with user controlled data, so switching to safe dependencies for things like making thumbnails for pdf files can be effective.

Edit: One more thing is compiling unsafe code to web assembly or other forms of sandboxing it was not mentioned.

ethan_smith 2 days ago

Incremental replacement of critical dependencies also offers a practical migration path for large legacy codebases where complete rewrites are economically infeasible.

flohofwoe 2 days ago

Ah, so it looks like the Rust mole in the government survived the DOGE purges, I was wondering what happened to him ;)

tialaramex 2 days ago

It's not that there's some special interest group pushing this particular technology, it's just the best available solution so if you want to solve the problem this is how you do it.
hitekker a day ago

IIRC, Alex Gaynor was championing Memory Safety & Rust in the FTC. But then he left (or was ejected?) before the Trump Administration came in.
I have no clue who is carrying the torch now. Someone probably, given that OP's document references Rust 16 times and Java just 4. But we'll have to see how CISA shakes out after its funding gets cut due to alleged mission creep

Animats 2 days ago

> Out of the 58 in-the-wild zero-days discovered in 2021, 67% were memory safety vulnerabilities.

About where that number was twenty years previous.

The big difference is that twenty years ago, the enemy was script kiddies. Now it's competent teams funded by multiple nation-states.

notepad0x90 2 days ago

It is also worth mentioning that not all memory safety vulns are exploitable or have a theoretical exploitation vector. Many these days are similar to theoretical crypto vulns in that "some day" the capability might be developed. It isn't just exploit mitigations but secure development practices that make it hard enough to where even theoretical exploitation isn't viable.

andreidd 2 days ago

Why is Delphi/Object Pascal an MSL?

cryptonector 16 hours ago

Probably because it has counted byte strings and does bounds checks?
Ygg2 2 days ago

Why shouldn't it be?
- andreidd 2 days ago
  
  What's stopping you from UAF or OOB array access in Delphi?
  
  sirwhinesalot 2 days ago
  
  Delphi arrays are bounds checked. UAF is mitigated by having ref counted strings and such. Not fully safe but much safer than C and C++.
  
  Ygg2 2 days ago
  
  Usually a runtime or compile time check. At least for OOB in Ada.
  https://www.jdoodle.com/ia/1IgW

larodi 2 days ago

So, Perl with its tainted-data tracking mechanism is not considered safe. Weird.

johnisgood 2 days ago

Does it explicitly say so? I could not find "Perl" in the PDF. There are only examples of MSLs. Perl not making the example list does not mean it is not one. It is a non-exhaustive list.

awaymazdacx5 2 days ago

reducing security incidents for modern software developments

timewizard 2 days ago

> MSLs such as Ada, C#, Delphi/Object Pascal, Go, Java, Python, Ruby, Rust, and Swift offer built-in protections against memory safety issues

They offer default protections that can be easily overridden in most of those languages. Some of them require you to use those overrides to implement common data structures.

> MSLs can prevent entire classes of vulnerabilities, such as buffer overflows, dangling pointers, and numerous other Common Weakness Enumeration (CWE) vulnerabilities.

If used a certain way.

> Android team made a strategic decision to prioritize MSLs, specifically Rust and Java, for all new development

Was that /all/ they did?

> Invest initially in training, tools, and refactoring. This investment can usually be offset by long-term savings through reduced downtime, fewer vulnerabilities, and enhanced developer efficiency.

That is an exceedingly dubious claim to make in general.