Ask HN: Why is there not a memory-safe C?

43 points by reacharavindh 8 years ago

I'm going through a lot of "Why Rust?" reading material, and starting to wonder why is there no memory safe subset of C?

While going through Rust tutorials, I'm realizing how much more sophisticated Rust gets than C in terms of the language. So many syntactical things to remember.

Could there be a less powerful subset of C that we can use in situations where we're not building operating systems or distributed systems?

rayiner 8 years ago

A lot of Rust complexity exists to allow memory manipulation while ensuring safety without requiring garbage collection. Take something simple like taking the address of an object on the stack. In C, you get a pointer that becomes invalid when the function returns; subsequent access to that pointer will access garbage, or the local variables of another function activation. In Go, the local value is promoted to the heap; you can continue using the pointer once the function has returned, but you need a GC to clean it up. In Rust, you have memory safety and don't need a GC. But you get a pointer qualified to a region such that it can't escape the scope of the enclosing function. The whole machinery of regions (and corresponding syntactic complexity) arises out of the need to achieve memory safety without GC.

gravypod 8 years ago

Why is C not memory safe? It's because you can create things and forget to destroy them, destroy things that are already destroyed, or trust an inaccurate wrray length.

If you never do anything of those your program is anecdotally memory safe. Turn all things into libraries (strings, basic data structures, etc) and verify those closely and your C is now what many people call "memory safe". Add valgrind on that and you're pretty safe.

Good patterns and good tooling can combat any issues you have with C if you need to use C. Otherwise just use one of the "!C" languages out there

Other good things:

    - Every warning is an error until you can explain to me why that code can't be rewritten any other way

    - All code is compiled with -pedantic 

    - preprocessor hacks should be replaced with inline static functions 

    - const everywhere
  • gnode 8 years ago

    I don't think this really answers the question. While you can be careful with C, it's still inherently dangerous, as you're relying on nothing more than your own human scrutiny to not fuck up. A memory safe language should make this impossible.

  • dvfjsdhgfv 8 years ago

    > If you never do anything of those your program is anecdotally memory safe

    At the same time your C becomes a very special variant of C. When you use the constructs from the SEI CERT C Coding Standard, a lot of things becomes much more complicated than usual. You are more careful, but at the same time spend more time writing code. And even if you do all that, of course you can never be sure about third-party libraries.

    At some point I just gave up and switched to D, with no regrets.

    • gravypod 8 years ago

      Exactly. C isn't the right tools for every job. It is the right tools for some but that doesn't mean you should be affraid to look for alternatives.

  • reacharavindh 8 years ago

    True that there is a safe way of writing C, if I followed x guidelines, and did Y checks. But, I cannot in good conscience trust all the libraries and tools I import have gone through the same diligence.

    Another thing is, even the basic constructs of the language (for simpler needs like the ones I described in original question)are deemed unsafe - like string processing. Someone told me in passing that I should use Redis's SDS[1] if I want safe strings in C. Do not use char * and manipulate them. I admit that I haven't spent my time understanding why that is the case, or have validated whether it is mitigated magically by using SDS. The point is the need for doing such things for basic elements of a language - strings.

    What I wonder is why there isn't a language that is C, with safe defaults, easy-to use syntax, as fast as C, and consequentially less powerful. I was hoping Rust was it, until I started to feel scared just looking at the code. At the risk of prejudice, I feel it is more of a glorious C++ than a language I could use for simple general purpose programs(that are still as fast as C).

    [1] - https://github.com/antirez/sds

    EDIT : Grammar and reference

    • bigato 8 years ago

      > What I wonder is why there isn't a language that is C, with safe defaults, easy-to use syntax, as fast as C

      Because part of the speed of C is due to the fact that it doesn't do much for you. The moment you want the language to guarantee your memory safety, you'll have to pay either via speed penalty or via added complexity. At least this has been the case with the languages I know. Maybe someday something else will emerge which proves me wrong.

    • gnode 8 years ago

      > I was hoping Rust was it, until I started to feel scared

      I recommend you stick with it. Your understanding of it will improve with practise.

      Memory safety isn't free -- there's either a language complexity (borrow checking), a runtime complexity (garbage), or a programming complexity (avoiding making mistakes). While there's no one correct option, there is in my opinion a wrong one. Human minds are fallible.

    • gravypod 8 years ago

      Funnily enough I was considering including sds as an example in my intial post.

      The misconception that people have is that C has data structures provided. C is a language of only primatives. There are pointers and there are different width numeric types and structures comprised of those numeric/pointer types. Nothing else is provided by default. No strings, no lists, nothing.

      For many situations operating at this level of abstraction is ok. Many programs do not ever need a string type. Unfortunately in situations where strings were needed initially the people at Bell Labs decided to use a pointer to a character and come up with convention for that layout of memory.

      This means that everything must understand those conventions! This is a dangerous and leaky abstraction. As the more times you implement something the more times you can mess it up.

      SDS provides a single implementatuon for string manipulation by hiding data within the pointer of the string and type aliasing the original char*. It is abstracted from your initial stucture.

      If you do this with everything you write, using the type system to your advantage, than most of your issues with C will not be an issue.

      I know the guy behind redis has many videos on his software. From what I've heard he is a modern day software engineering genius and I'm assuming he will have good examples of how to implement this non-leaky abstraction in his software walkthroughs.

    • ht85 8 years ago

      Rust forces you to formalize memory ownership, and for complicated usages it can become pretty scary, yes.

      Have you tried to figure how the safe subset of C you're suggesting would look in Rust? If you're doing something simple, it shouldn't be that bad.

      Do you have a specific use-case in mind? I'm curious what your requirements are in terms of performance and simplicity, that languages like Go, Java or even Haskell do not meet. Is it a platform thing?

  • pjc50 8 years ago

    There's also the "embedded" style: no memory allocation. Throw away malloc() from your standard library and make all variables static globals. It looks odd and all sorts of things become unreasonably hard (such as networking and text processing), but it does guarantee you don't get heap-related bugs if you don't have a heap.

    • gravypod 8 years ago

      Back in the olden times game development studios used to do this. They would pre-estimate the maximum number of everything happening (explosion, ai, ammo, etc) and pre allocate everything. Then allocation and deallocations is just a fifo of spaced pointers built on a block of memory pre laid out in a structure in the binary.

      If you allocate 2x the memory you need then there's no risk of overflows. It's riskier but it makes all of the code dead simple, extremely fast, and is a good middleground between embedded and normal.

      You do get large binaries though...

      • pjc50 8 years ago

        Yes, and if it's zero-initialised it can take up zero space in the binary.

  • rco8786 8 years ago

    I’m not sure that “write Safe C code” really answers the question here

giancarlostoro 8 years ago

There's also Cyclone which tried to be safer than C in some respects. It was used by a DoD Linux distro called AUXillery OS iirc. Mysteriously disappeared from the internet a few years back. They were making sure every bit of the kernel was safe. I think its goals was for jet planes or something mission critical enough. Kept looking back every few years on Wikipedia till the page disappeared and the homepage.

Edit: Wow now I absolutely cannot find a darn thing about that distro. Used to be one last page detailing the project left. It was quite interesting, wish they hadn't shut down the homepage at least.

Edit 2: Looks like it was called AuroraUX here's a bit more info on it:

https://www.openhub.net/p/AuroraUX https://www.phoronix.com/scan.php?page=news_item&px=MTIyMTI

Even found a blog post from someone who was apparently developing AuroraUX:

http://ultravioletos.blogspot.com/2009/09/opensolaris-distro...

Gotta love mysterious little Operating Systems. Looks like they were being maintained by Blastwave which was some sort of OpenSolaris focused company.

Found a Spanish wikipedia entry still alive which mentions support for Cyclone:

https://es.wikipedia.org/wiki/AuroraUX

(Google Translate if you don't know Spanish :)

  • Yoric 8 years ago

    If my memory serves, Cyclone had a type system pretty close to the core of Rust.

    • giancarlostoro 8 years ago

      You may be right about this:

      https://cyclone.thelanguage.org/

      Looks like Rust took some inspiration from Cyclone.

      • steveklabnik 8 years ago

        It took more in the earlier days, it's not very similar now. Or rather, the goals are similar, but the way they're achieved are very different.

  • anatoly 8 years ago

    Interesting. I looked around - could the name have been AuroraUX? A page about it survives in Russian wikipedia (deleted in English back in 2012).

    It's said to have been based on DragonFly BSD, with new components written in Ada (!), but also supporting Cyclone.

    • giancarlostoro 8 years ago

      Oooh I think that might actually be it! I think they went from Linux to BSD eventually... (Totally forgot about that change) I just remember 'AUX' being part of the name, maybe they changed it to AuroraUX after some time. Nice catch thanks! :D

gnode 8 years ago

I'd say that this is Rust. Ensuring at compile time that references are handled in a memory safe manner (borrow checking) requires a certain amount of complexity. There are of course plenty of memory safe languages more syntactically and semantically similar to C which just use a garbage collector. D and Go for instance.

dave84 8 years ago

I think unfortunately that the way C handles memory is so ingrained in the language and the standard library that to have a safe version of C would be a different enough language to have to call it a different name.

There is Safe-C[0], but it's not directly compilable as C and vice versa, and there are libraries like Cello[1] which provide higher level mechanisms like garbage collection to C.

[0] http://www.safe-c.org/ [1] http://libcello.org/

  • reacharavindh 8 years ago

    Thanks for the tip. I tried to check safe-c. But, it only had a compiler for Windows. Its documentation has screenshots of setting it up on Windows but no mention of Linux? Is it just Windows only?

    • dave84 8 years ago

      Yes, unfortunately. It’s more of an academic exercise than a production compiler.

bigato 8 years ago

Yeah, there is. It is called Go.

Well, while my answer is not meant to be serious, Go started as "C with some improvements", and the three person team who first wrote it had among them Ken Thompson, one of the first users of C. Because it is garbage collected, it mostly abstracts away the memory allocation from the programmer. There's penalty in performance, though. It's a trade-off that allows the programmer to not think much about it. Go favors simplicity over speed while Rust tries to have speed as a main goal, and sometimes the tradeoff is more complexity.

  • isaachier 8 years ago

    Go will never be nearly as ubiquitous as C. It's plug-ins API is terrible. Rust users definitely don't want a GC language to replace C.

    • bigato 8 years ago

      I didn't say it will, nor did OP say that he wanted the alternative to be as ubiquitous as C.

      When you refer to Go plug-ins API, do you mean this[1]? Because I wrote a fair amount of Go and wasn't even aware of this package. And while I don't know whether it is terrible or not, I'm pretty sure this package in particular is not relevant for most Go applications. So maybe you meant something else?

      I didn't say Rust users want a GC language to replace C, I don't know where this is coming from.

      [1] https://golang.org/pkg/plugin/

      • isaachier 8 years ago

        Suggesting another language implies that language is essentially just a variant of C and can be used in any situation as C. The OP was asking why there is no (mainstream?) variant of C with built-in memory safety. Suggesting Go implies Go is a direct replacement for C in most cases, which it is not.

        Regarding plugins, yes that is the package I am referring to. The idea of dynamic loading is completely ignored in the Go programming language but it is used in many, many other languages (i.e. Java, Python, Lua). Go's support for this feature is lacking and should be fixed for it to interact with other languages.

  • shp0ngle 8 years ago

    Go is not "memory safe C".

    Go has garbage collector and that is not zero cost.

    Rust has zero cost abstractions, so Rust is sort-of "memory safe C".

    However, Rust is notoriously hard to learn, read and write in. So that sort of shows you that desugning low-level, zero-cost language is HARD

    • bigato 8 years ago

      The OP didn't ask for zero cost abstractions. Furthermore, maybe "zero-cost" is not the best way to describe Rust abstraction; they may be zero cost for the machine at runtime, in exchange for a bigger mental burden on the programmer, thus being better labeled as a trade-off in more speed in exchange for more complexity. Whether this trade-off is worth it, will depend on some factors like the use case and the programmers involved. And the OP was actually complaining about the added complexity, thus my answer being Go.

      • steveklabnik 8 years ago

        > Furthermore, maybe "zero-cost" is not the best way to describe Rust abstraction

        "Zero-cost" has always been about runtime cost. Other costs are not part of the slogan. Everything always has a cost!

        • majewsky 8 years ago

          And notably, Rust didn't start the "zero-cost abstraction" meme, C++ did.

      • RaleyField 8 years ago

        > bigger mental burden on the programmer

        Only sometimes. For complex enough problems I think mental burden of writing and maintaining software in less complex language will become more burdensome than the intrinsic burden that comes with more complex language.

    • vthriller 8 years ago

      > Rust is notoriously hard to … read

      I've seen this argument a number of times before, but always discarded it as something that only people not familiar with post-1.0 syntax could bring (cause all those ~@[Omg]s really were hard to read). Do people really still struggle with its syntax, and if so, why?

      Coincidentally I personally really seem to struggle with reading sources in Go, although I'm yet to figure out why exactly it is the case and why I'm not as slow when reading sources in e.g. Perl.

      • iopq 8 years ago

        Really? Because I find

            let x = ~"blah";
        

        to be more readable than

            let x = String::from("blah");
    • rurban 8 years ago

      Rust zealots: please read your documentation about unsafe memory. And while you are there also about deadlocks.

      The best safe C derivate is D, but there are also some minor offsprings, like safe-C or mscc or the popular cfi option, the Intel bounds-checking library or safeclib. Most C compilers let you track the buffer bounds nowadays, but rarely someone uses it.

  • dvfjsdhgfv 8 years ago

    > It is called Go.

    I'd argue it's called D. Go and C are two quite different languages. Whereas if you've been coding in C you can easily switch to D and benefit from safer constructs instantly, almost without feeling a difference.

    • bigato 8 years ago

      I don't know much about D to know whether this is true, but you definitely can switch easily from C to Go.

mannykannot 8 years ago

If you started on the project of making such a language, you would have to make choices about how to adapt dynamic memory allocation, pointer assignment and arithmetic, the various ways in which functions can return pointers to out-of-scope variables, the way C decays arrays to pointers when passing them to functions, and the way C accommodates separate compilation. Most of these issues touch the core paradigms of C, and if you were to build a decision tree of all the options, you would find that none of the leaves would be a subset of C. You would probably also find that each leaf is already well-served by an existing language.

vowelless 8 years ago

Rust should be compared to c++, not C.

Your question still stands. However, it's not too hard to be "memory safe" in modern c++.

  • mannykannot 8 years ago

    ...by avoiding the parts that are the legacy of C.

  • RaleyField 8 years ago

    > it's not too hard to be "memory safe" in modern c++.

    I'm not entirely convinced that is the case, but surely you will agree that it's much easier to audit Rust for memory-safety than C++.

  • sudeepj 8 years ago

    I agree that safety aspect in C++ has improved in its modern versions. But, these modern features are best practices as documented in C++ core guidelines. The issue with best practices is that it cannot be enforced. That is, lets say everybody on my team follows the latest guidelines. We even have latest static analysis tools. However, the same cannot be said about the 3rd party libraries that we will use. In this regard, Rust is much better since the every line of code is subjected to the same strict rules.

    Offcourse, one can argue that the 3rd party Rust libraries can use a lot of "unsafe" code which we as consumer may not know about. But the situation is still better than C++.

  • Jweb_Guru 8 years ago

    Until people routinely use a RefCell equivalent in C++, abandon APIs like string_view, stop mutating directly through shared references, and stop using multiple threads (or C++ somehow finds a way to encode thread safety into its type system), most C++ programs that are memory safe are only so by coincidence. The whole point of Rust is to make memory safety a locally provable property, and even modern C++ has far too many idioms that abandon that local provability to make global memory safety feasible in large programs.

tacostakohashi 8 years ago

Yes, there is a less powerful subset of C that is memory safe.

You can use local variables, but no pointer types - no variables with pointer types, no functions that return a pointer type, and don't use the * or & operations which dereference or create a pointer.

Initialize everything - there are compiler warnings to help with that.

Admittedly, this subset of C isn't particularly useful. No dynamic memory, no strings or file i/o since those functions from the standard library take pointers. You could definitely do some arithmetic and return the value from main() in a completely memory-safe manner using this subset, though.

isaachier 8 years ago

There have been a few attempts to make C safer. However, one of the most popular aspects of C is the fact that it works everywhere. That means vendors are expected to have a working compiler for their platform. The easiest way to do that is to keep C features minimal. C99 was controversial because it asked these vendors to comply with changes in the language. C11 actually reduced many of the requirements in C99, making them optional. Sometimes this can be frustrating for programmers. Luckily clang-tidy and various sanitizers or valgrind can catch many issues you will encounter writing C.

pjc50 8 years ago

It would have to be so radically different that it wouldn't resemble C. All sorts of standard C idiom is inherently unsafe, like zero terminated strings and pointer arithmetic.

ptero 8 years ago

You need to be more specific about what you are looking for: user-choice safety (i.e., a convenient way or a library for users to do most of the things that C does in a memory safe fashion, which is not enforced by the language / compiler), enforced safety or something else. Depending on this choice the arguments and benefits for it might be different.

That said, IME new C projects (not maintaining old monsters, for which a new language or library will not help) are mainly small components or libraries where you need to be close to hardware. In those cases, it might be better to reap benefits of being simple, clean and close to problem domain (hardware) and mitigate dangers by keeping the C code small, modular and clean.

Not criticizing, just asking an honest counter-question: what for (use cases)?

pornel 8 years ago

The main value of C is being the most compatible, well-known lowest common denominator. As soon as you add anything to it, it's no longer the C language, and you lose the benefit of universal compatibility and acceptance. This creates network effect in the C ecosystem that is so strong, that C99 is still "new", and C11 is an experimental curiosity.

Anyone trying to improve C has to face the dilemma that being "C but better" doesn't seem to offer enough for C users to switch (you get all the headache of a "new" language for one or two features), and languages adding enough improvements to justify cost of switching have to evolve to the point of no longer looking like C.

andrewflnr 8 years ago

There are quite a few as research projects. They've just never hit the mainstream. Or gotten out of the lab at all.

If I can nitpick your last sentence for a second, that's the wrong way to think about it. The parts of C that are dangerous aren't the "powerful" parts; in particular, whether your can write a distributed system is beside the point. The dangerous parts of C include basics like arrays that you need for every program. Fixing them won't affect your ability to do almost anything.

trevex 8 years ago

There is also a language called zig[1], which popped up on hckrnews a few times in the past. I didn't have the chance to try it out, but am following the blog.

From what I can gather, it supports compile time code execution and reflection, so no preprocessor necessary. Furthermore it has a lot of safety features and optional checks, a nullable type instead of null value and manual memory management but also provides a `defer` statement.

[1] https://ziglang.org

  • steveklabnik 8 years ago

    Memory safety is an explicit non-goal of Zig, or at least it was previously. It tries to help, but makes no guarantees.

RaleyField 8 years ago

> Could there be a less powerful subset of C

You probably meant minimal superset of C (as C doesn't have nontrivial or similarly performant memory-safe subset). And probably, although its niche would be smaller. Either you have complexity in language or in code. Systems software is more often larger and more complex so it less often makes sense to choose language that's easier to learn that is then harder to write in.

eb0la 8 years ago

That reminds me ADA selling points about 20 years ago: Type safe, concurrency, and easy to read syntax.

Of course ADA didn't made it outside military contractors.

Anyone dos know why?

  • pjmlp 8 years ago

    Compiler prices mostly.

    You could get a compiler for C, Pascal, Basic, Forth, ... for a few hundred, while Ada compilers were priced in thousands.

    Then when vendors started selling SDKs for their OSes, it was an hard sell to get management to pay for the SDK and then an additional compiler, instead of using the languages provided on the OS SDK.

YorkshireSeason 8 years ago

The question and all answers so far assume that memory safety is a clear and well-understood concept!

Au contraire!

Can I bring [1] to your attention? The paper argues that our intuition about memory safety means allowing for local reasoning about state.

[1] https://arxiv.org/abs/1705.07354

cpr 8 years ago

C was always intended as a higher-level assembly, so by definition, it can't be memory-safe.

bleke 8 years ago

It needs to be revised and is not cool(tm) project than creating new language and rethinking how to avoid any pointer arithmetic (starting from char *str, to arrays char str[999]) is not very easy task.

jmg-prog 8 years ago

https://en.wikipedia.org/wiki/MISRA_C is what you are looking for

  • airbreather 8 years ago

    This is the best answer in my opinion as the vehicle safety standards for programmable safety systems are derived from IEC61508 which, among other things, gives guidelines for methodologies intended to create and validate software with target known failure rates. See this for a brief overview https://www.google.com.au/url?sa=t&source=web&rct=j&url=http...

    Functional safety is a specialised branch of engineering all of its own.

mabynogy 8 years ago

Worse is better. Process isolation, valgrind and similar tools are good enough.

D is better than rust IMHO.

zzzcpan 8 years ago

There are at least a few memory safe C compilers (PoCs) [1] [2] [3] and probably more than a few memory safe C subsets [4] and dialects.

However, they are not part of the C ecosystem, they don't compile into C and don't provide anything like musl gcc wrapper to be able to incrementally add memory safety in certain situations.

[1] https://www.cs.rutgers.edu/~santosh.nagarakatte/softbound/

[2] http://sva.cs.illinois.edu/index.html

[3] http://chrisseaton.com/plas15/safec.pdf

[4] bounds checking feature in tcc https://bellard.org/tcc/ http://download.savannah.nongnu.org/releases/tinycc/

gaius 8 years ago

What is the use case for a “safe C”? You use C because you do want control over how your data is actually stored in memory, for reasons of efficiency, or performance, or dealing with hardware, or whatever reason. The ability to just grab a bunch of bytes and cast them to a struct is absolutely key in many real systems.

But if you really do want it, C offers the opportunity to replace malloc() with whatever you like... or to not malloc at all...

sureaboutthis 8 years ago

For the same reason there is no memory safe assembly language.

blackflame7000 8 years ago

You could use smart pointers in c++11

  • nurettin 8 years ago

    You could easily get a deadlock if a shared pointer gets destroyed within a thread it manages. Shared pointers alone are not the answer to memory safety.

    • blackflame7000 8 years ago

      I Said SMART not SHARED. That includes weak_ptrs and unique_ptrs

      • nurettin 8 years ago

        Not sure how that makes the situation any different.

        • blackflame7000 8 years ago

          Shared_pointers are thread-safe from a creation and deletion aspect. Futhermore, you can use weak_ptrs to reference a shared_ptr and use that where you suspect a pointer might be deleted to a resource that is still required. Furthermore, the trusty old keyword const can make the pointer immutable thereby mitigating the problem. (And don't say what about const_cast.. because that's undefined behavior masquerading as acceptable because it rarely causes side-effects.

          • nurettin 8 years ago

            My point is, "making sure x happens" or "follow this sure to work design philosophy which will break once the system is more complex than a game of billards" is not a guarantee to anything, therefore none of the smart pointers solve any problems about deadlocks caused by resource sharing.

            • blackflame7000 8 years ago

              I can tell you're not familiar with the latest C++ developments. RAII solves deadlocks so do things like std::lock_guard. C++ 11 is as safe as you're ever going to get while still maintaining fine granularity over the control of the system.

              C++ applications are some of the most vastly complicated programs in existence so I don't follow the argument that they break on projects larger than a breadbox.

              • nurettin 8 years ago

                I am familiar with all developments in C++ since 2001, that is why I am pretty sure that moving mutex primitives from platform level to language and memory model level has not changed anything in terms of deadlocks and you are just googling new stuff as we go.

                • blackflame7000 8 years ago

                  You should really check out the new stuff. It's a barely even the same language anymore. They've made many improvements.

    • isaachier 8 years ago

      That isn't exactly a memory issue to be fair. It is deadlock.

  • Jweb_Guru 8 years ago

    Smart pointers guaranteeing memory safety is a meme that needs to die. If you exclusively use smart pointers and never use references, maybe it's true, but this precludes you from even using std::vector.