NASA has a list of 10 rules for software development

cs.otago.ac.nz

374 points by vyrotek 5 months ago

jph 5 months ago

Read the original and it explains the purpose of each item:

The original clearly describes that the coding rules primarily target C and attempt to optimize the ability to more thoroughly check the reliability of critical applications written in C. The original author clearly understands what they're doing, and explains lots of other ways to verify C code.

For what it's worth, the rationales in the original all make perfect sense to me. Perhaps this is because I learned C on tiny systems? I learned C for hardware for implanted medical devices, and our lab did similar kinds of guidelines.

dgfitz 5 months ago

That pdf reads like the first draft of a guide on "how to write safety-critical code" and probably inspired things like DO-178, LOR1, DAL-A, 882, all sorts of standards.
Writing C code for a safety-critical system running on an RTOS really humbled me. Felt like I should make more than I did relative to peers slinging code using 748mb of ram in a browser tab. ;D
- astrobe_ 5 months ago
  
  For me too. TFA is, I think, the first coding standard I came across that made sense to me. Except for the "2+ assertions per function" rule. 2 is arbitrary for one thing, and secondly your top priority, in the context of mildly mission critical embedded code, is to do what it takes to keep your system alive and sane (as fault tolerance people put it). Then log the issue if you can. Downtime caused by a program abort followed by a watchdog reboot should be a last resort for things that you really couldn't possibly expect and therefore cannot handle properly - there's no such thing as an "unexpected error".
  Also before that, design things so that errors cannot happen; for instance use C's type system, as weak and clunky as it is, to make it is hardly possible to pass out-of-range parameters. There's a trend these days to bash-at-will C for its "unsafe" nature, but a significant part of the issues are "between the keyboard and the chair", that is not doing the right thing because it is tedious - that's probably the main defect of C.
  
  pdpi 5 months ago
  
  “2+ assertions” could be read as “a precondition, a postcondition, and maybe a loop invariant”. The guideline should perhaps be more explicit about asserting your pre- and postcondition, but that’s a perfectly natural minimum standard.
  As for C’s unsafety — it is a poor engineer who doesn’t account for human factors. A language that doesn’t mitigate PEBKACs, and which lacks affordances to make safe code less tedious, is intrinsically unsafe.
- revskill 5 months ago
  
  Or replace all with zig.
  
  rwmj 5 months ago
  
  Airbus writes all their flight control software in C, but it's not C as you would find in say your Linux distro, more like a tightly controlled subset of C with its own compilers, toolchains and formal verification. So they're already writing it in a different language, just one that superficially looks like C.
  
  MaxBarraclough 5 months ago
  
  Airbus do seem to use Ada for some things:
  • Airbus Chooses GNAT Pro Ada for Development of Unmanned Aerial System https://news.ycombinator.com/item?id=24488986
  • https://www.adacore.com/press/a350
  • https://www.adacore.com/press/airbus-selects-gnatpro-for-vsr...
  
  rwmj 5 months ago
  
  Totally makes sense to choose Ada (definitely not a new language!) for green field development. But as I understand it, the Airbus flight control systems are millions and millions of lines of "C", and they ain't getting rewritten any time soon or probably ever. However they do invest a lot in toolchains and formal software verification (in Newport, Wales).
  
  DrNosferatu 5 months ago
  
  Is Airbus using CompCert?
  https://en.m.wikipedia.org/wiki/CompCert
  
  rwmj 5 months ago
  
  I believe so, but not the upstream version, via their commercial partner: https://www.absint.com/
  
  pjmlp 5 months ago
  
  Zig's safety is at the level of Object Pascal, Modula-2 and similar.
  Way better than plain old C, yet there are some weaknesses on the armour, and we know better since AT&T's Cyclone project.
  
  SkiFire13 5 months ago
  
  I would advice against writing safety critical software in a software that has yet to get a stable release.
  
  MaxBarraclough 5 months ago
  
  Right. If you want a safer language than C for your safety-critical embedded code, the obvious choice is Ada. The article even mentions Ada specifically.
  
  lenkite 5 months ago
  
  How does one make an REST Service using Ada without paying for frameworks/tools?
  
  MaxBarraclough 5 months ago
  
  No doubt you could implement a modern web API in Ada, but you wouldn't really be playing to Ada's strengths. Ada is more commonly associated with safety-critical embedded work than with web development.
  Ada Web Server (AWS) [0][1] is Free and Open Source software, but that framework doesn't seem to get much use, and it doesn't inspire confidence that [0] mentions SOAP but doesn't mention JSON. I'm not aware of any proprietary/payware Ada web server solutions.
  [0] https://www.adacore.com/gnatpro/toolsuite/ada-web-server
  [1] https://github.com/AdaCore/aws
  
  nicce 5 months ago
  
  Using Ada efficiently just requires too much $$$ for small projects.
  
  MaxBarraclough 5 months ago
  
  We're talking about developing safety-critical software. C and Ada have their pros and cons, but Zig isn't even in the running.
  
  otabdeveloper4 5 months ago
  
  Yes, rewriting all the world's software in boutique untested languages is surely the way to write safe and reliable software.
  Because, you know, all the crusty idiots writing software before you didn't know about the wonders of syntactic sugar and automatically installed dependencies.
  
  GoblinSlayer 5 months ago
  
  Today boutique languages are those that target PDP-7.
simne 5 months ago

> The original clearly describes that the coding rules primarily target C
Yes, this is because space engineers don't like so much unpredictable things as Garbage Collectors, which used on nearly all functional languages by definition.
In pure C, one could write code with 100% static allocation, so any step could be checked and ensured to run in very exact time limit. - Typical GC could give instability larger to O(c^n) from size of garbage.
Second reason, in most air-space development companies, main are air-dynamic (hardware) engineers, and all others considered as lover priority (even when practically, now software costs more than half of typical plane), and most hardware engineers don't understand functional programming.
cratermoon 5 months ago

Failure to understand why things are they way they are (aka Chesterton's fence) is why we have some supposedly technically saavy people going on about supposedly fraudulent 150-yo social security beneficiaries born in 1875.
throw0101d 5 months ago

> https://spinroot.com/gerard/pdf/P10.pdf
Originally publicly published in the IEEE Computer journal (doi:10.1109/MC.2006.212):
* https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...
RossBencina 5 months ago

Came here to cite the Power of 10 paper.
There are a couple of talks by Dr Holzmann on youtube regarding JPL high reliability software development process: "Mars Code" is the one that I remember: https://www.youtube.com/watch?v=16dQLBgOwbE well worth the watch I think.
nine_k 5 months ago

In other words, C is not a great language for the task, but, being forced to use it, NASA had to devise rules that helped static checking, at the cost of making the code harder to write and read.
Were they able to use Ada or even Modula-2 more widely, much of that won't be needed.

throwaway2037 5 months ago

The final paragraph is brilliant:

    > If the rules seem Draconian at first, bear in mind that they are meant to make it possible to check code where very literally your life may depend on its correctness: code that is used to control the airplane that you fly on, the nuclear power plant a few miles from where you live, or the spacecraft that carries astronauts into orbit. The rules act like the seat-belt in your car: initially they are perhaps a little uncomfortable, but after a while their use becomes second-nature and not using them becomes unimaginable.

atoav 5 months ago

As an electronics guy I have to say that even if you code perfectly, your system can still fuck up, because your assumption is wrong. E.g. I had a project physically tear itself apart, because apparently when the USB-cable was disconnected in just the right moment the servo would interpret the cut off PWM pulse as a position clue, one that just by code alone should be unreachable.
What we learn from that is that even well written code cannot guarantee everything and you need physical fallbacks like end-switches that remove power from the servos if activated (and whoever thought pulse length is a reliable way to set servos is probably wrong).
- colechristensen 5 months ago
  
  Especially when writing software for hardware in space, managing "unreachable" code paths is tested and considered, because with cosmic rays and other sorts of radiation, no code path is actually unreachable.
  
  atoav 5 months ago
  
  Radiation hardened computing is a whole other (fascinating) can of worms, I can only imagine how hard that can get.
- arijo 5 months ago
  
  Wouldn’t a digital servo mitigate the problem?
  
  atoav 5 months ago
  
  No, they were digital servos. As far as I know the only difference between analog and digital servos is in how they process the received signal, the signals themselves are very much the same.
  But if by digital you meant some hypothetical servo that needs to receive its data as bytes with a checksum — yeah that would work, as long as the thing can do a graceful shutdown on powerloss. But I am not aware of such servos (although I wouldn't be surprised if they existed, on that project I was just a programmer).
  
  arijo 5 months ago
  
  I realize now my comment was dumb.
  Thanks for the correction.
pests 5 months ago

It’s interesting how the seat belt comment sets the time this was written. Seat belts have been required for basically my entire life and I, nor anyone else I know, think twice about them.
But here it’s being used as a familiar pain point the author assumes everyone deals with.
- danielscrubs 5 months ago
  
  Fun fact, Volvo created and patented the three point safety belt in use today after years of R&D and testing expenses, but immediately made it free for all other car makers.
  I wonder if companies would do that today without heavy incentives. I can’t imagine for example a VC backed company doing that.
  https://www.forbes.com/sites/brentdykes/2025/01/28/the-data-...?
  
  mingusrude 5 months ago
  
  In the same vein, at Volvo's factory outside of Gothenburg they have the obligatory museum. It's just that they don't showcase old, famous models. The museum is entirely built around car safety and how Volvo has worked with it. It is interesting with a company that has been so dedicated to their core values for such a long time.
  
  euroderf 5 months ago
  
  Aren't Volvos the cars that are (or were) so rigid that collision shocks were transmitted rather too directly to passengers ?
  
  Sharlin 5 months ago
  
  I think that was a problem with many if not most cars before modern energy-dissipating crumple zones were developed. The front was built rigid to prevent the engine block from entering the cockpit and crushing the driver/passenger (which was a big safety problem at some point) but turns out that too much rigidity wasn't a good idea either…
  
  switch007 5 months ago
  
  If SV developed seatbelts:
  SaaS - seatbelts as a service, available in 3 packages:
  - Starter: 4 tightenings* per day, $2.99/mo
  - Standard: 6 tightenings* per day, $5.99/mo
  - Premium: unlimited** tightenings, $9.99/mo
  *Once exceeded, belt will remain slack. Check your local laws. We will not be held liable
  **Subject to a fair usage policy. Policy may change at any time. Please check before driving
  All plans support a maximum of 2 occupants. Please subscribe to additional plans for more occupants.
  
  throwaway2037 5 months ago
  
  > I wonder if companies would do that today without heavy incentives.
  Didn't Tesla give away all of their patents at some point?
  
  mjevans 5 months ago
  
  Offhand, they made the charging system related patents free to use because a standard charging system is a great thing for any vehicle.
  
  WhyNotHugo 5 months ago
  
  Doesn’t this fall under “commoditise your complement”? It’s good business for Tesla to do so.
  
  throwaway2037 5 months ago
  
  Dumb question. How exactly do you make something "patent free"? Do you get the patent (so no one else can troll with it), the promise not to enforce "violations"?
  
  fmajid 5 months ago
  
  They used to advertise the fact their CEO's wife was a nurse, and she lobbied them to prioritize safety.
  
  fmajid 5 months ago
  
  Here it is:
  https://www.youtube.com/watch?v=iPgDgNtOouo
  Some more background:
  https://www.volvocars.com/en-ca/news/safety/let-your-head-re...
  
  edanm 5 months ago
  
  I mean, a modern example might be the various companies open-sourcing LLMs (e.g. Meta). I don't think "public-mindedness" is exactly the right explanation, it's probably driven by strategic thinking... but then, maybe the same was true of Volvo.
- ryandrake 5 months ago
  
  Definitely an artifact of its time. I have older family members who still resent seatbelt laws, who still have those "buckle only" defeat devices that stop the chiming. "The government can't tell me what to do" can be such a deeply ingrained attitude!
  
  technofiend 5 months ago
  
  I have a friend who spent a few years skydiving. He thinks he's a fantastic driver but well let's just say opinions vary. He will happily speed on the freeway and dive between cars like a maniac. What's funny is his pre and post attitude towards seatbelts after skydiving. He's now all about safety equipment and using it every time, probably because a backup chute saved his life. Even so, he's a skeptic about Automatic Activation Devices (AADs) since they came after his time. It's funny how being personally impacted changes your attitude, and then the next safety device comes along and people are back to not trusting it.
  
  yellowapple 5 months ago
  
  It ain't just older people, either. A lot of friends in my age group and younger default to not putting on their seatbelts, and only do so if I explicitly tell them to do so (or if they notice my truck beeping about it).
  
  pests 5 months ago
  
  My state doesn't require rear belts but I still wear mine when sitting in the back and encourage others to do the same. My sibling had a classmate in high school who got thrown out of the rear window while the car was in a roll, crushing and killing him.
  It's a little crazy to me that people are perfectly comfortable going 80+ down the freeway with no belts on in the back. Like the two seat backs are enough. It reminds me of the "no smoking sections" in restaurant that were sectioned off by a half wall.
  
  Symbiote 5 months ago
  
  I would refuse to start or continue a journey if any passenger behind our beside me won't wear their seat belt.
  Old UK safety video on this: https://www.youtube.com/watch?v=TWLmoeoHrP4
  
  PlunderBunny 5 months ago
  
  In New Zealand, the driver is responsible for ensuring all passengers are wearing seatbelts, and I believe (someone please correct me) that they can be held libel if any passenger is not wearing one.
  
  fingerlocks 5 months ago
  
  I don’t understand what happened in that video.
  Are we supposed to believe that the front seat will somehow move forward from the force of the rear passenger hitting it? And this force will be so great that it will crush the front driver’s skull against the steering wheel? Is that really the take-away here?
  If so, that’s a PSA about poor engineering and design of the driver seat and less about rear passenger seat belt safety.
  
  MaxBarraclough 5 months ago
  
  I have no particular knowledge of this topic but from this 1998 BBC article http://news.bbc.co.uk/1/hi/uk/128684.stm :
  > It is estimated that if all rear seat belts were worn, 120 deaths and > 1,000 serious injuries could be prevented each year. Back seat > passengers are three times more likely to die in an accident if they > are not strapped in, according to the AA. > > The organisation says each year more than 50 people in the front seats > of cars are killed after being hit by back seat passengers who were > not wearing seatbelts.
  edit This 2018 article from the RAC also says it's real: https://www.rac.co.uk/drive/news/motoring-news/drivers-warne...
  
  cesnja 5 months ago
  
  Kinetic energy increases with the square of the velocity, so in a head-on collision everything not buckled on the back seat becomes a missile heading towards the passenger seats. Even a bag with a laptop is dangerous and you should put it in the foot compartment. And that's like 3kg while an average person will weigh around 70kg.
  
  yellowapple 5 months ago
  
  Even the folks up front need to be wearing one. That's how my cousin died: got thrown through the windshield in a head-on collision.
  
  sneak 5 months ago
  
  I wear my seatbelt 100% of the time and demand all my friends and loved ones do, too. I also resent seatbelt laws because they are abhorrent. Nothing should invoke state violence without a victim.
  That said, and more on topic: this isn't so much about laws as it is attitudes; plenty of people even today don't bother to wear seatbelts.
- throwaway2037 5 months ago
  
  > It’s interesting how the seat belt comment sets the time this was written.
  I was surprised by this comment, as the PDF version that I read was undated.
  During my search, I found there is a whole Wiki page dedicated to this paper! Ref: https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...
  That Wiki page says 2006.
- connicpu 5 months ago
  
  I'm reminded of a clip that went viral a couple years ago with footage from the 80s containing people describing the recent expansion of DUI laws as "Communist"[1]
  [1]: https://www.youtube.com/watch?v=2xcQIoh3FQQ
- patrick451 5 months ago
  
  I still find seatbelt laws to be one of the most ridiculous examples of government overreach. People think the government shouldn't be in their bedroom, but for some inexplicable reason have no problem with the government being inside their car.
  
  throwaway2037 5 months ago
  
  I am struggling to take this comment seriously. How do you feel about requiring minors (< 18yrs old in most places) to wear a seat belt? Do you also think that crash test safety is an overreach of gov't... or speed limits?
moffkalast 5 months ago

1. Be polite
2. Be efficient
3. Have a plan for every edge case you meet
- MaxBarraclough 5 months ago
  
  That almost works as a manifesto for Ada SPARK, just replace polite with formal.
  
  moffkalast 5 months ago
  
  The Nvidia firmware language? I think that one only has two steps:
  1. The more you buy the more you save
  2. Black leather jacket
NetOpWibby 5 months ago

Yeah I think I’ll stick to building websites and web apps.
- callc 5 months ago
  
  You never know, your website may just be mission critical, like if there’s some code on the ISS running:
  while true; do wget $NetOpWibby_website || sudo shutdown; done
  
  FpUser 5 months ago
  
  Sure, they have to worry that some imbecile makes mission critical code depend on some dating website.
  
  yellowapple 5 months ago
  
  Someday scientists are going to want to study human reproduction in microgravity in order to test the feasibility of space colonization, and "some dating website" will indeed be something on which mission critical code depends.
  
  cess11 5 months ago
  
  Why would colonial powers let subjects control their own reproduction?
  
  defrost 5 months ago
  
  Typically when an independant judiciary gets its nose rubbed so deep in evidence they have to rule against it.
  
  cess11 5 months ago
  
  Can you give five examples?
  
  yellowapple 5 months ago
  
  I can give at least two:
  https://en.wikipedia.org/wiki/Perez_v._Sharp
  https://en.wikipedia.org/wiki/Loving_v._Virginia
  
  defrost 5 months ago
  
  Indeed .. and they're both US examples.
  In my initial comment I was thinking of four main western countries (there are others), each with multiple court cases that hammered home the core human rights violations inherent in anti-miscegenation laws and forced, often deceptive, abortion policies.
  Had the GP commenter here simply asked for an example or an expansion I'd have provided that .. but the "five examples" demand was just .. odd.
  They've wandered off with no reply so I suspect that might have been the limit of their rhetoric .. such as it was.
  
  defrost 5 months ago
  
  You can't find examples of countries ending miscegenation laws and forced abortion eugenic policies after they became publically embarrassing through lawsuits?
  Why five?
  
  yellowapple 5 months ago
  
  "Let" ain't exactly the right word for something people are gonna do no matter how hard a colonial government cracks down on it.
  People be fuckin'.
- pjmlp 5 months ago
  
  Where I stand, it is quite common to have Websites and apps do security assessments with penttesting and code checking, dependencies are validated and may be refused based on security clearance.
  Software is critical for many business, even if no-one dies, millions may be lost, and drive the company into insolvency.

AndyKelley 5 months ago

If I made my own criticism of these rules it would be very different from OP. It was difficult to take the article serious from the get-go when it defended setjmp/longjump. That pattern is so obviously broken from anyone who has ever had to go near it. The article makes an argument like this:

1. setjmp/longjmp is exception handling

2. exception handling is good

and I take serious issue with that second premise.

Also the loop thing obviously means to put a max iteration count on every loop like this:

for (0..N) |_| {

}

where N is a statically determined max iteration count. The 10^90 thing is silly and irrelevant. I didn't read the article past this point.

If I were to criticize those rules, I'd focus on these points:

* function body length does not correlate to simplicity of understanding, or if anything it correlates in the opposite way the rules imply

* 2 assertions is completely arbitrary, it should assert everything assertable, and sometimes there won't be 2 assertable things

kevin_thibedeau 5 months ago

Hardcoded loop bounds are a requirement for running on hardware that will not be 100% reliable due to operating in the space radiation environment. Paranoid levels of defensive programming are a requirement for this regime. I would trust NASA's experience here rather than an armchair expert.
- olalonde 5 months ago
  
  Space radiation and hardware reliability are not the given reason for the upper-bound.
  From the NASA document:
  > Rule: All loops must have a fixed upper-bound. It must be trivially possible for a checking tool to prove statically that a preset upper-bound on the number of iterations of a loop cannot be exceeded. If the loop-bound cannot be proven statically, the rule is considered violated.
  > Rationale: The absence of recursion and the presence of loop bounds prevents runaway code. This rule does not, of course, apply to iterations that are meant to be non-terminating (e.g., in a process scheduler). In those special cases, the reverse rule is applied: it should be statically provable that the iteration cannot terminate. One way to support the rule is to add an explicit upper-bound to all loops that have a variable number of iterations (e.g., code that traverses a linked list). When the upper-bound is exceeded an assertion failure is triggered, and the function containing the failing iteration returns an error. (See Rule 5 about the use of assertions.)
- AnimalMuppet 5 months ago
  
  I think radiation unreliability is orthogonal to fixed upper bounds. Radiation is as likely to flip a bit in the check of the upper bound as anywhere else. Having that upper bound hardcoded won't protect against that at all, as far as I can see.
  
  kevin_thibedeau 5 months ago
  
  It will ensure a more rapid recovery to a safe state. This is why in principle you don't check for loop termination with "==" when ">=" or <=" will better protect against a corrupted loop counter.
  
  AnimalMuppet 5 months ago
  
  On such a system, would the code execute out of ROM? Or would the ROM image be loaded into RAM? If the latter, then radiation could flip a bit in the code that represents the upper limit of the loop.
  Still, if you're going to use words like "better protect" instead of "protect", I'm not sure I can say you're wrong...
  
  kevin_thibedeau 5 months ago
  
  RAM and ROM can be nominally protected with ECC, which many rad hard processors provide. It's the internal state of the processor (registers and control logic) which is not fully protected. The inherent hardening of the process makes faults less likely, but they are still guaranteed to happen. There are design approaches to improve reliability with triplicate flip-flops but you pay a power and area penalty. Higher level redundancies aren't always practical also due to mass and volume constraints. At the end of the day software has to account for running on an unreliable system and minimizing bad stuff happening as much as possible.
- cjbgkagh 5 months ago
  
  Don’t they have voting CPUs? If one goes out of sync discard that result and restart it.
  And wouldn’t the value end up on the CPU the same if it was hardcoded or not?
  Perhaps there could be some verification done with hardcoded loop bounds for making sure things work in real time. I’ll read the article before commenting further. Edit: Yeah, the hardcoded bounds is to give some sort of guarantee as to how long a function will take.
  
  pauldino 5 months ago
  
  That's not the case for JPL missions (the paper originally is from JPL) which generally have 2 separate independent computers where 1 is active at a time.
  Since they're independent, the 2 computers don't actually have to run the same software, I believe during Mars entry descent and landing the standby compute element runs a different less sophisticated but easier to validate version of the EDL code to take over if any fault is detected while the primary software is running. (I was going to do a quick check on dataverse.jpl.nasa.gov to confirm that but it seems to be down)
  Also I think a few years ago on the Mars Curiosity rover (2012) there was some corruption in the flash storage on one of the computers that prevents the full flight software from being loaded on to it, so instead it runs a stripped-down version of the code with very limited functionality to function as a lifeboat in case the fully-working computer ever fails. https://ieeexplore.ieee.org/document/9843266
  
  mr_toad 5 months ago
  
  I’ve only heard of voting CPUs being used on launchers. I think the level of redundancy on other craft varies.
  
  fuzztester 5 months ago
  
  iirc, I read about voting CPUs (like 4 of them) being used on space shuttles or earlier spacecraft. I could be wrong though.
- dooglius 5 months ago
  
  Can you elaborate on the failure mode described here? If we take a model where registers (including PC) or memory can get bit-flipped, it seems like all bets are off.
  
  kevin_thibedeau 5 months ago
  
  High energy radiation causes single event upsets in digital logic when you get an unlucky hit in the right part of a circuit. When that part is a flip-flop holding state or a bus transmitting data, you get persistent corruption. That can cause any and everything to go awry or it may resolve itself depending on what was hit. Various design approaches are used to address these faults through redundancy mechanisms but it isn't always practical to employ them.
  For basic software protection you may want to depend on a watchdog timer and filling unused memory with NOP slides to trap the processor until the timer reboots. If you have hardware controlling something more risky like explosive bolts, you may want stronger assurances that the hardware won't fail by adding lower level redundancies.
  
  AlotOfReading 5 months ago
  
  This is actually a pretty common part of the threat model for high integrity computing (e.g. the ECU in your car and airplane avionics). Part of the standard solution is to run processors in lockstep and throw errors if any part of the cpu state diverges between the cores.
bluGill 5 months ago

Exception handling in non trivial code is more performant than all the if else sequences to handle errors - if you even handle errors which odds are you won't.
but it needs to be in the language so that the tricky code is write once.
- throw-qqqqq 5 months ago
  
  The part about “non trivial code” does a lot of lifting here. I would say your statement is untrue in general in at least a few languages.
  In .NET and Java it costs at least 100x as much time to throw an exceptions as to return an error code.
  Other languages may have cheaper exceptions, but for many mainstream languages, you pay a big performance price for exceptions.
  That price is often offset by other positive things, so the tradeoff is made willingly and with eyes open.
  
  mkleczek 5 months ago
  
  > In .NET and Java it costs at least 100x as much time to throw an exceptions as to return an error code.
  Don't know about .NET but in Java it is not throwing exceptions that is heavy but filling in stack trace during construction of a Throwable. It can be made much more performant using constructor that disables stack trace: https://docs.oracle.com/en/java/javase/21/docs/api/java.base...
  
  jayd16 5 months ago
  
  > In .NET and Java it costs at least 100x as much time to throw an exceptions as to return an error code.
  The conjecture is that if checks are always executed while exceptions are free when not thrown.
  That said, predicted branches are also fairly free so arguing about this in the abstract is pointless.
  
  rocqua 5 months ago
  
  In the margin, branches are more likely close to the hot path, so they are more likely to 'pollute' the instruction cache. Whilst exceptions have less of this.
  Though profile guided optimization will probably catch this. And it might be that even without, compilers still optimize around this.
  
  tempodox 5 months ago
  
  You wouldn't run .NET or Java on a spacecraft MCU and setjmp/longjmp in C are as cheap as it gets for exception handling.
  
  elygre 5 months ago
  
  You pay the cost only when the exception is raised. If that happens on more than 1% of your runtime checks, it probably is not (aka should not be) an exception.
- AlotOfReading 5 months ago
  
  Realtime programming often compromises throughput ("performance") in order to make static deadline guarantees possible. Software exceptions, especially the tabled variety that are actually performant, are notoriously difficult to balance here and extremely uncommon even these days.
- eptcyka 5 months ago
  
  How will your code decide to throw? Continue down the happy path until the OS sends a signal? How does your business logic choose to throw if not by checking input and output values?
cratermoon 5 months ago

> sometimes there won't be 2 assertable things
It seems like if there's 1 assertable thing then it's trivial to also assert an inverse condition. Before arguing that it would be redundant, remember that in the presence of random bitflips and other anomalies caused by radiation exposure, logic may not operate as deterministically as expected.
lowbloodsugar 5 months ago

Where did it say exception handling is good?
AnimalMuppet 5 months ago

Yeah, this read like he really wanted this to be a general set of guidelines instead of a C-specific set, and since they didn't explicitly say that they are C-specific, he's playing stupid "gotcha" games for whatever reason.
caspper69 5 months ago

Hi Andy, it's been a long time since I pestered you. Glad to see Zig coming along so well. Congratulations on the progress! I'll apologize in advance for the long post.
Wrt exceptions, you didn't elaborate. Exception handling can be a hot button issue with passionate opinions. My comments are general in nature; they are not specifically directed toward you or the Zig language.
IMHO, exception handling is inherently neither good nor bad- it is merely a mechanism; it has upsides and downsides, and the context in which they are being discussed / analyzed / used should be the guiding factor.
C provides no bounds or overflow checking. In the context of a system which must not fail, you are then forced to use some combination of rigorous assertions, overflow flag checking, and return value verification. It's cumbersome and ugly.
Rust provides both bounds checking and overflow checks (in debug mode at least). Asserts are still available, but are not required in as many instances as in C.
Neither of those languages provide what we would consider to be exceptions or exception handling. An argument could be made that if you squint the right way that Rust's resume after panic is a form of exception handling, but that's a semantic debate, and it would certainly be considered non-traditional if it were to be categorized as such. Wrt C, it had never occurred to me that setjmp()/longjmp() could be used as an exception handling mechanism. I have always seen them used as a context-switching mechanism for e.g. tasking (green-threads).
In languages that provide exception handling, both runtime and user defined exception conditions are handled by the runtime itself (oftentimes behind the scenes). This mechanism allows one to provide a catch-all lexical scope for handling exceptions, which can be cleaner and more ergonomic from a programmer perspective and can be simpler to reason about (although this is certainly debatable). It also provides an opportunity to handle conditions that might be unknown or unforeseen at the time the code is being written. Defer semantics in a language without exceptions might be considered a mid-ground approach.
The usual complaints with exceptions are: (1) that they essentially constitute hidden control flow and hidden code execution that happens outside of the plain reading of the source; (2) that you pay the performance penalty for checking / verifying the exception conditions at every line of code (also see #1); (3) that the conditions considered "exceptional" by most implementations are instead just run-of-the-mill error conditions that should be handled explicitly; and (4) that there is no well-defined structure or convention for where and when to handle exceptions and when to pass them up the stack (again, see #1) which results in a spaghettification of sorts.
C++ (optional), Java, JS, Python and C# (among many others) all provide exception handling and are all mainstream.
My general rule of thumb (which is always subject to situational variance) is that application code benefits from robust exception handling while systems level or performance critical code should not use exceptions, or at a minimum should be very judicious with usage.
Exceptions can be abused and misused like any other feature, but the reduction in repetitive manual error checking (see Go) can be a win for many teams.
YMMV of course- we have all been dragged into the 7th circle of hell at one time or another, and programming features are a lot like liquor; once you've gotten sick on one, it's near impossible to go back.

zoogeny 5 months ago

> People using Ada, Pascal (Delphi), JavaScript, or functional languages should also declare types and functions as locally as possible.

My own personal approach in JavaScript is to avoid defining functions in a nested manner unless I explicitly want to capture a value from the enclosing scope.

This is probably due to an outdated mental model I had where it was shown in performance profiling that a function would be redefined every time the enclosing function was called. I doubt this is how any reasonable modern JavaScript interpreter works although I haven't kept up. Since the introduction of arrow functions (a long time ago now in relative terms) their prolific use has probably lead to deep optimizations that render this old mental model completely useless.

But old habits dies hard I guess and now I keep any named function that does not capture local variables at a file/module scope.

A lot of the other notes are interesting and very nit-picky in the "technically correct is the best kind of correct" way that older engineers eat up. I feel the general tone of carefulness that the NASA rules is trying to communicate to be very good and I would support most of them in the context that they are enforced.

ww520 5 months ago

For most languages, the inner function body when first defined is parsed/compiled once, with the unbound free variables collected in a list. At the next calling of the function definition, the compiled function body is reused and only the free variables are bounded to the variables in the outer environment (closure). It's quite efficient. Depending on the implementation, it could just be a two-element struct pointing to the compiled function body and to the outer environment. Or it could be a small call thunk embedding the setup of the variable bindings and the jump to the address of the function body.
jbreckmckye 5 months ago

FYI inline functions are reused these days. So are object / array literals within a certain size.
In some cases inline functions (if they fall within V8 optimisations) can be re-used more cheaply than the overhead of non-inline alternatives like binding or currying
Gibbon1 5 months ago

I use gcc's nested functions a fair amount. Because it reduces cutting pasting and makes the code much easier to understand.
I use variable argument macro's to implement debug print functions for debugging. And I just leave them in the code as a form of documentation. Looking at the debug print statements tells you a lot of about what the code is doing.
- shakna 5 months ago
  
  Unfortunately, as useful as they really are, GCC's nested functions are usually done via trampolines, which means an executable stack. That's not really a safe thing, when dealing with critical systems, like NASA regularly do.
  
  LiamPowell 5 months ago
  
  This isn't the case for Ada on GCC, so the support is there, it just hasn't been extended to C:
  > The use of trampolines requires an executable stack, which is a security risk. To avoid this problem, GCC also supports another strategy: using descriptors for nested functions. Under this model, taking the address of a nested function results in a pointer to a non-executable function descriptor object. Initializing the static chain from the descriptor is handled at indirect call sites.
  > On some targets, including HPPA and IA-64, function descriptors may be mandated by the ABI or be otherwise handled in a target-specific way by the back end in its code generation strategy for indirect calls. GCC also provides its own generic descriptor implementation to support the -fno-trampolines option. In this case runtime detection of function descriptors at indirect call sites relies on descriptor pointers being tagged with a bit that is never set in bare function addresses. Since GCC’s generic function descriptors are not ABI-compliant, this option is typically used only on a per-language basis (notably by Ada) or when it can otherwise be applied to the whole program.
  > For languages other than Ada, the -ftrampolines and -fno-trampolines options currently have no effect, and trampolines are always generated on platforms that need them for nested functions.
  From: https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html
  
  shakna 5 months ago
  
  Aye, "usually" up there was the implication.
  You can actually do it in C, too! It just requires setting up a few macros, and getting it right requires a fairly decent understanding of the underlying hardware. (via TARGET_CUSTOM_FUNCTION_DESCRIPTORS).
  
  Gibbon1 5 months ago
  
  This only a problem if you pass the nested function as a function pointer and you have a machine with a no execute stack.
  The processors I program for don't have no execute stacks so the use of a trampoline is of no consequence.
  Also my memory was that no execute stacks were heralded as the solution to stack smashing attacks. No execute stacks problem solved. And turned out if you're vulnerable to stack smashing no execute stacks won't save you.
  So I feel we threw out something good, nested functions for some cargo cult security. And since no one uses them the reactionary luddites on WG14 are free to hope they go away.
ravenstine 5 months ago

Yeah, in JavaScript I'll pretty much only write nested functions to use them as scope control, and usually I don't intend for that function to be callable elsewhere. Otherwise, I prefer writing functions that are as "pure" as possible, even if that means having to pass in many parameters each call.
shambulatron 5 months ago

He is the odd

bumby 5 months ago

Just for context, these aren’t really “rules” as much as proposed practices. Note that official “rules” are in documents with names like “NPR” aka “NASA procedural requirements.”[1] So, while someone may use the document in the featured article to frame a discussion, a developer is not bound to comply (or alternatively waive) those “rules” and could conceivably just dismiss them.

[1] e.g. https://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPR&c=7150&s=2...

westurner 5 months ago

awesome-safety-critical lists a number of specs: https://awesome-safety-critical.readthedocs.io/en/latest/#so...
- bumby 5 months ago
  
  Just be aware that some of the NASA-specific ones fall into a similar category. NASA “guidebooks” and “handbooks” aren’t generally hard requirements.
  
  westurner 5 months ago
  
  From "The state of Rust trying to catch up with Ada [video]" https://news.ycombinator.com/item?id=43007013 :
  >> The MISRA guidelines for Rust are expected to be released soon but at the earliest at Embedded World 2025. This guideline will not be a list of Do’s and Don’ts for Rust code but rather a comparison with the C guidelines and if/how they are applicable to Rust
  /? Misra rust guidelines:
  - This is a different MISRA C for Rust project: https://github.com/PolySync/misra-rust
  - "Bringing Rust to Safety-Critical Systems in Space" (2024) https://arxiv.org/abs/2405.18135v1
  ...
  > minimum of two assertions per function.
  Which guidelines say "you must do runtime type and value checking" of every argument at the top of every function?
  The SEI CERT C Guidelines are far more comprehensive than the OT 10 rules TBH:
  "SEI CERT C Coding Standard" https://wiki.sei.cmu.edu/confluence/plugins/servlet/mobile?c...
  "CWE CATEGORY: SEI CERT C Coding Standard - Guidelines 08. Memory Management (MEM)" https://cwe.mitre.org/data/definitions/1162.html
  
  bumby 5 months ago
  
  Sorry, I’m not following your point. When I said “NASA-specific” I meant those in your link like “NASA Software Engineering and Assurance Handbook” and “NASA C Style Guide” (emphasis mine). Those are not hard requirements in spaceflight unless explicitly defined as such in specific projects. Similarly, NASA spaceflight software does not generally get certified to FAA requirements etc. The larger point being, a NASA developer does not have to follow those requirements simply by the nature of doing NASA work. In other words, they are recommendations but not specifications.
  
  westurner 5 months ago
  
  Are there SAST or linting tools to check that the code is compliant with the [agency] recommendations?
  Also important and not that difficult, formal design, implementation, and formal verification;
  "Formal methods only solve half my problems" https://news.ycombinator.com/item?id=31617335
  "Why Don't People Use Formal Methods?" https://news.ycombinator.com/item?id=18965964
  Formal Methods in Python; FizzBee, Nagini, deal-solver: https://news.ycombinator.com/item?id=39904256#39958582
  
  bumby 5 months ago
  
  I’m not aware of any tools for analysis geared to NASA requirements specifically, but static analysis is a requirement for some types of development.
  
  westurner 5 months ago
  
  Why isn't there tooling to support these recommendations; why is there no automated verification?
  SAST and DAST tools can be run on_push with git post-receive hooks or before commit with pre commit. (GitOps; CI; DevOpsSec with Sec shifted left in the development process is DevSecOps)
  
  bumby 5 months ago
  
  I don’t work there so I can’t speak definitely, but much of it probably stems from the sheer diversity of software. For example, ladder logic typically does not have the same tools as structured programming but is heavily used in infrastructure. It is also sometimes restricted to specify a framework, leaving contractors to develop in whatever they want.

layer8 5 months ago

The rule about recursion is likely also to ensure a statically known bound on needed stack space, in addition to a statically known runtime bound (in conjunction with the other rules).

While the criticism of rule 3 is right in that there is a dependence on the compiler, it is still a prerequisite for deriving upper bounds for the runtime by static analysis on the binary. This is actually something that is done for safety-critical systems that require a guaranteed response time, based on the known timing characteristics of the targeted microprocessor.

ajross 5 months ago

> The rule about recursion is likely also to ensure a statically known bound on needed stack space
It's definitely that. Static stack size analysis in the embedded world is a long-standing paradigm and recursion and function pointer indirection defeat that.
If the point of rule was that NASA was arguing that recursive constructions are always harder to reason about than iterative ones, then obviously NASA is wrong.
- shakna 5 months ago
  
  Just to quote the original rationale:
  > Rationale: Simpler control flow translates into stronger capabilities for verification and often results in improved code clarity. The banishment of recursion is perhaps the biggest surprise here. Without recursion, though, we are guaranteed to have an acyclic function call graph, which can be exploited by code analyzers, and can directly help to prove that all executions that should be bounded are in fact bounded. (Note that this rule does not require that all functions have a single point of return – although this often also simplifies control flow. There are enough cases, though, where an early error return is the simpler solution.) [0]
  [0] https://spinroot.com/gerard/pdf/P10.pdf
- AnimalMuppet 5 months ago
  
  If hard realtime and stack space are part of your constraints, then recursive constructions are in fact always harder to reason about.
  
  ajross 5 months ago
  
  To clarify: it's not stack "space" that's at issue. It's the requirement that the lack of stack overflow at runtime be formally verified. If functions exist in a static call tree[1] you can do this with a little disassembly wizardry and a few hundred lines of python. If you allow them to call themselves recursively, or even to indirect through a function pointer, it becomes undecidable.
  [1] There are some other related rules like "no alloca()".
  
  rcxdude 5 months ago
  
  Function pointers are possible to deal with, with varying levels of automation. You can detect when pointers to functions are taken, and tie them to call sides with type information or some manual work (I have a static analysis tool which allows function pointers for callbacks by linking functions passed into one function to the list of possible destinations for a given callsite). Recursion is harder but it's still possible in principle to prove a bound in a lot of cases, given a proof framework. Not something I'd encourage, though.

davemp 5 months ago

> Since programmers these days typically read their code on-screen, not on paper, it's not clear why the size of a sheet of paper is relevant any longer.

Folks spent a while (centuries?) iterating on the standard page and character sizes. It makes sense to me that what we landed on wasn’t solely due to the limitations of paper, but also the limitations of humans.

dooglius 5 months ago

Title should indicate that this is a _criticism_ of the rules.

Enginerrrd 5 months ago

Yeah and also one that I'm not very impressed with.
The criticism is (mostly) super contrived and totally misses the wisdom behind why some of these rules were made in the first place. A lot of the author's points are very reminiscent of the same classic rebuttal against criticisms of the C language for being insecure: "It's perfectly fine if you're good and don't make stupid mistakes." That's just not a very mature view of working with groups of human beings.
Most of these rules are designed to reduce errors that are difficult for humans to see by making the code more readable, deterministic, and avoiding situations that can lead to unintended behavior that is subtle in its true complexity. Creating a series of "Gotchas" where it perhaps negates that idea in an obscure situation doesn't really mean that the rules don't tend to produce code that is more reliable and auditable.
Some of these rules really do seem kind of anachronistic, but... then again there's still a lot of old FORTRAN code and the like running on NASA hardware.
- wduquette 5 months ago
  
  Most of these rules are designed to make it possible to debug errors from millions of miles away, with extremely limited visibility into the program’s state, so that failure modes can be predicted, understood, and resolved on the ground, and worked around as needed, so that we don’t lose a spacecraft in deep space.
killingtime74 5 months ago

It's more like a commentary

readthenotes1 5 months ago

Someone in NASA once told me that it was easier to teach a mechanical engineer to program then a software developer mechanical engineering.

From that perspective, the avoidance of recursion is more compelling. Plus, Fortran didn't support it...

grandempire 5 months ago

This is a common attitude in other engineering fields, but they also don't tend to have outstanding software and don't care to.
paulluuk 5 months ago

I'm curious, how does that perspective make the avoidance of recursion more compelling?
- airbreather 5 months ago
  
  Recursion can be good, predicting you will never run out of stack or not, can be hard.
  It's a little hard to duck out to Mars to reboot the lander after you bricked it with a recursive function that never exited.
- readthenotes1 5 months ago
  
  Recursion was a filter topic when I went through computer science training. Some people couldn't get it.

jcarrano 5 months ago

GCC, at least, allows you to get stack usage (after compilation), and caller-callee relationships (-fstack-usage, -fcallgraph-info, etc). From this one can infer maximum stack size. It will not work with recursion or function pointers, unless one manually incorporated additional data.

setjmp() and longjmp(), are a poor way of handling exceptions, as any cleanup code won't be executed. Of course, following the spirit of these rules, one would not have resources that need cleaning up but still.

The main issue, that is never mentioned, because it will be different in each application, is what to do when something goes wrong. Say an iteration limit is exceeded, or the fixed resources allocated at startup are not enough.

ozim 5 months ago

Nothing special as those are like MISRA C rules from automotive industry.

I see that is repackaging of low level rules and not some „magical 10 development rules” that would apply to your yet another CRUD app.

Not to bash article just informing people who would write „software engineering is not real engineering” - it is real engineering and there are norms. Just because someone did not hear about standards and rules and practices doesn’t mean there are none.

scraptor 5 months ago

Of course the more forty year old rules you follow the more engineering it is. Using a better programming language that doesn't depend on organisational process for safety is not engineering - there's no standard for that.
- ozim 5 months ago
  
  How did you come up with such conclusion?
  Argument is only that there is a lot of proper engineering in software development - whether it is implementing memory safety by default in new languages or having organizational processes. All is there and I only dislike opinions that say software development is immature new field. It is not.
  Article was also about NASA organization rules so pointing to MISRA was argument „it is. It only NASA and those are not so special rules”.

jsrcout 5 months ago

I work with a lot of embedded and embedded-adjacent software, and even I think several of these rules are too much. Having said that, Holzmann's rules are from 2006, and embedded / space qualified hardware has improved quite a bit since then. These days planetary probes run C++, and JWST even uses JavaScript to run command scripts. Things are changing.

manmal 5 months ago

> All loops must have a fixed upper-bound

Things like spinlocks or CAS (Compare-And-Swap) are elegant and safe solutions for concurrency, and AFAIK you can’t really limit their upper bound. Others in the thread have pointed out that those are more guidelines than rules - still, not sure about this one.

PaulDavisThe1st 5 months ago

Historically, the right limit for a loop around a spin lock/CAS was related to the effective time cost of a context switch (taking into account both the register save/restore and TLB flush). This is not a fixed number across all hardware but it is knowable upper bound.
- manmal 5 months ago
  
  That’s an interesting perspective. I‘m not sure I‘d ever be able to act on this outside of some embedded environments - even there, we now have actual OSs and the variance that comes with them.
  
  PaulDavisThe1st 5 months ago
  
  Context switches on most general purpose OS's are non-preemptible. It's relatively easy to find out what they cost. The TLB hit is application dependent - it depends on the working set (memory use footprint) after the switch. This is not some hard to know thing, certainly not for the purposes of bounding a spinlock.
TheBlight 5 months ago

I suspect multi-threaded programming for spacecraft is frowned upon at NASA.
- RossBencina 5 months ago
  
  Not sure about now, but it apparently was allowed when the Mars Pathfinder priority inversion problem happened.
  
  TheBlight 5 months ago
  
  AIUI that involved the scheduler for multiple processes not threads.

fullstackwife 5 months ago

They write the right stuff:

- https://www.fastcompany.com/28121/they-write-right-stuff (pdf: https://www.eng.auburn.edu/~kchang/comp6710/readings/They%20...)

HarHarVeryFunny 5 months ago

It seems they also need.

11. Use strict typing for all scalar types. Do not mix imperial and metric units.

esafak 5 months ago

> Do not mix imperial and metric units.
Value types to the rescue. I use them liberally.

wileydragonfly 5 months ago

The physical constraints of a sheet of paper are pleasing to the eye. Otherwise the dimensions of paper would be different. Oh well, it was good for a chuckle.

procaryote 5 months ago

This is written with a weird attitude of trying to find problems with the rules. Take this for example:

> This does a bounded number of iterations. The bound is N^10. In this case, that's 10^90. If each iteration of the loop body takes 1 nsec, that's 10^81 seconds, or about 7.9×10^72 years. What is the practical difference between “will stop in 7,900,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 years” and “will never stop”?

... yes, you're very smart and have found a way to technically satisfy the requirement while building a broken program

Clearly the upper bound rule is to make it easy to reason about worst-cases so setting a very high one isn't really the gotcha they seem to think it is. It just means you can look at the high bound and go "this is too high, fix it"

jeffreygoesto 5 months ago

I don't like the "I criticize you for not having my use case" attitude.

rollcat 5 months ago

I think some of these recommendations, while overall excellent, are somewhat self-contradictory. For example, functions that are no longer than one screenful, AND ban on function pointers, AND the general rule to keep the codebase readable.

How do you implement a state machine, like a byte code interpreter? You need some way to dispatch functions based on opcodes. You could argue it has no place in mission-critical code, but well-understood, time-bound state machines (like Plan9-style regexes or BPF) can actually greatly increase code readability.

Also: Greenspun's tenth rule.

jmull 5 months ago

    int const N = 1000000000;
    for (x0 = 0; x0 != N; x0++)
    for (x1 = 0; x1 != N; x1++)
    ...
    for (x9 = 0; x9 != N; x9++)
        -- do something --;

Rule 1 would disallow this. Probably other rules as well, depending on the details for your code analysis tools.

Also the article seems to imply the rule about recursion was about knowing your program would terminate, but the rationale from the rules says it's about verifying execution is bounded. That's a somewhat different concept. With recursive calls you have the stack to consider.

dehrmann 5 months ago

Friendly reminder that you are most likely not NASA and have different goals, so you should approach the problem differently. The cost of a stack overflow (even at FAANG-scale) for you is likely many orders of magnitude cheaper than for NASA, and the cost of delaying a project for a year is much worse.

Unless you work on commercial aircraft avionics. NASA flying a probe into Mars makes them look dumb. Flying an aircraft into a mountain haunts people for the rest of their lives.

matu3ba 5 months ago

Example 1 is a deficit of C with missing computed goto and switch continue. Example 2 review is ambiguous on practicality. Example 3 reads very odd, an over-approximated upper stack bound is very possible https://github.com/ziglang/zig/issues/157#issuecomment-76395... and tighter computation with advanced analysis as well. Example 4,5,6,7,8,9,10 yes. Overall good read.

kazinator 5 months ago

These rules are obviously for something critical.

I don't think they are saying that if you write some kind of service program at NASA, it cannot have an infinite loop for taking requests. Or if you write a line-by-line file processing utility, it has to cap out at a maximum number of lines so that the loop is statically bounded. Or that if you write some tool to process syntax, perhaps a compiler, that you cannot use recursion.

These are not embedded, safety-critical things.

bumby 5 months ago

NASA requires their software to be classified based on risk. The requirements that get levied are supposed to be commensurate with the risk. (Eg human-rated safety critical software has more stringent requirements that basic business software, although they write both)

emorning3 5 months ago

Looking over the list I determined that I already regularly follow all the rules but the first three.

But, as a C#/C++/Typescript developer, I don't have the first clue how any I might go about implementing those rules.

Anybody know of patterns for this kind of safe programming that I can use?

Does just using a language like Zig automatically make my code 'safe'?

Anybody know if generating code from a proof checking language like Lean would satisfy all these rules?

bluejekyll 5 months ago

Rust pretty much nails all of those.
- nicce 5 months ago
  
  Not really. By default allocators will panic if there isn't physical memory available. Recursive functions can cause panic at certain depth. Code generated by macros isn't very visible for the developers and recursive macros are very common. Return types are checked only if the developer adds #[must_use].
  You can overcome lot if you invest a lot for type system, but that depends on the developer.

jmclnx 5 months ago

nice, saved because these days who know ho long the page will be available :)

christophberger 5 months ago

The first rule of writing safety-critical systems in C: Don't write safety-critical systems in C.

vasco 5 months ago

Safer to put 50 interns each working on their own version of the code and the have it run 50 times in parallel and choose the answer that most of them agree to. No need for standards then. Probably also more robust to radiation bit flips.

procaryote 5 months ago

This assumes the 50 interns have no systematic error. In practice, inexperienced developers tend to make similar mistakes
- pasc1878 5 months ago
  
  Especially if they are now using AI to check or write the code.

einpoklum 5 months ago

Well, I was expecting to see:

Rule 1: Don't write any code that can make the ship go ka-boom.

Rule 2: We need to consistently use the f'ing metric system and SI units - and don't you forget it.

Rule 3: ... You haven't already forgotten about rule #2, have you?

shwoopdiwoop 5 months ago

What’s wrong with using the metric system?
- Swizec 5 months ago
  
  Is a reference to this famous incident of a mars lander slamming into mars due to mismatched units.
  https://everydayastronaut.com/mars-climate-orbiter/
  
  kanbankaren 5 months ago
  
  Not sure whether they used a value pattern.
  typedef struct { float value; } meter;
  typedef struct { float value; } feet;
  Using strong typing, you can avoid passing values in wrong units.
  
  dgoodell 5 months ago
  
  The mars probe unit problem was mismatched between different software programs. One was telemetry reported by the spacecraft to the ground, the other was software on the ground that used this telemetry to perform calculations that reported the results with the incorrect units.
  Maybe the answer is that strong typing should somehow continue outside of the individual programs and be embedded in file formats as well?
  
  kanbankaren 5 months ago
  
  Yes. Any message exchanges should have type encoded like in ASN.1, but not sure whether NASA used such a format.
  
  einpoklum 5 months ago
  
  In C++, there are a few very nice libraries for computation with units, e.g.:
  https://github.com/mpusz/mp-units
  which does all sorts of checking, allows arithmetic with proper accounting for units and so on. But yes, the basic notion is wrapping things in structs which can't be simply assigned to each other.
- marcosdumay 5 months ago
  
  NASA often forget they use it.
- naturlich0 5 months ago
  
  Nothing lol but people tend to be obnoxious about using it

IshKebab 5 months ago

Very C centric.

> The assertion density of the code should average to a minimum of two assertions per function.

I would actually say the opposite in modern languages like Rust. Assertions are a sign you have failed to encode requirements in the type system.

nicce 5 months ago

> I would actually say the opposite in modern languages like Rust. Assertions are a sign you have failed to encode requirements in the type system.
Maybe. You still shouldn't write unsafe code without asserts.

raylus 5 months ago

Note this is NASA/JPL, not NASA-wide, JPL is a NASA center (FFRDC).

moffkalast 5 months ago

"I'd just like to interject for a moment. What you're referring to as NASA, is in fact, NASA/JPL, or as I've recently taken to calling it, NASA plus JPL. JPL is not a government agency unto itself, but rather another free component of a fully functioning NASA agency made useful by the labs, production facilities.."

alkonaut 5 months ago

This looks more like rules for C development. In order to make reliable software in C (which is honestly pretty brave to begin with once you get past a certain complexity treshold).

redtriumph 5 months ago

My ex-boss completed his PhD thesis with author of this doc as his advisor. So some of the ideas seem relevant since they popped up in few of the work related conversations with my boss.

osigurdson 5 months ago

Rule 11: Use metric for everything

deadlydose 5 months ago

You can use anything you want as long as you are consistent and don't mix units.
- osigurdson 5 months ago
  
  https://ntrs.nasa.gov/citations/19930014020

rednafi 5 months ago

This makes me love Go even more.

Gotta appreciate the things that the compiler takes care of for me.

While I wish the stop-the-world GC pause were more predictable and took less time, I’ll gladly pay the cost for safety.

Zig seems interesting, too. It feels like a better C, and I’m closely following its development as it marches toward 1.0.

glitchc 5 months ago

Stop-the-world GC pause can never work for a safety critical system (motor controller, elevator system). That pause could mean a crushed human appendage or worse.
- rednafi 5 months ago
  
  I don’t work in a safety critical system. If it works for Cockroach DB, it works for my use case. It’s a tradeoff I make to be able to use a nicer and safer language.
  
  Narishma 5 months ago
  
  They I don't see what you point was since these rules are specifically for safety critical systems written in C.

ccosmin 5 months ago

No function pointers? That severely restricts the utility of the language…

fallingmeat 5 months ago

fun fact, the same guy (G Holzman) also made the Spin model checker

toolslive 5 months ago

from the original document: "critical code is written in C." The document is not dated, but it's probably quite old (I'm guessing 30-something years). Writing critical code in C is probably a mistake, but once you find yourself in that situation, you will find these rules are too tight (which is also what the criticism is about). You should probably read them as "try to avoid ...".

So I would just prepend the document with "Rule 0: Try to avoid writing critical code in C."

airbreather 5 months ago

Sometimes, eg aspects of automotive functional safety, MISRA C might be all you get.
These NASA principles are more about enabling better possible static analysis of the code and ease of someone else, maybe decades later, debugging or pushing changes to something likely on another planet.
Also you have to remember space based computing lags well behind terrestrial computing because of the radiation hardening. They are often still dealing with legacy systems that might be 8 bit with very limited memory, they were still in the hardware expensive engineers cheap mode until well into the nineties, if not later. Rad750s run at 400 mips and were, and maybe are, preferred choice of processor.
- toolslive 5 months ago
  
  These days, there are compilers for embedded systems that can prove for certain code (for example) that it runs in constant time and constant space. As an example, galois.com has been doing this for Haskell, not just for embedded systems, but also for even more low level things like FPGAs.

jonesn11 5 months ago

Good thing to give to LLMs as well.

stevoski 5 months ago

Did I just go back 40 years in time?

rollcat 5 months ago

All of these guidelines are perfectly applicable for modern mission-critical or embedded code, which is a big chunk of what NASA does (sending stuff into space, including people, and keeping it operational / bringing them back), and been doing for far longer than 40 years.
Even then, the essence (code clarity, robustness, tooling/static analysis, etc) is a good guideline for general code. You can play fast & loose with prototypes/MVPs, or one-off scripts, but once it's time to bring it to production and maintaining it long-term, you will be grateful to yourself for keeping stuff clean.

procaryote 5 months ago

> Note that setjmp() and longjmp() are how C does exception handling, so this rule bans any use of exception handling.

In theory yeah, but I've very rarely encountered it in the wild.

It's hard to use right in any code that has cleanup to do, without building a lot of infrastructure for resource handling.

nealabq 5 months ago

No recursion means no Erlang. Which means no RabbitMQ?

moffkalast 5 months ago

Thankfully, yes. Spacecraft use outdated hardware with heavy limitations where a stack overflow is a genuine possibility. You're not even supposed to use a heap in general, static vars all the way. It's like writing stuff for a shitty microcontroller where you run out of RAM if you take one wrong turn.
AlotOfReading 5 months ago

Erlang seems like a strange choice for deeply embedded hard realtime systems that aren't supposed to ever crash. It's a different set of tradeoffs than Erlang makes.
- PaulRobinson 5 months ago
  
  Erlang was designed for running telephone exchanges - about as deeply embedded hard realtime a system as you can get, that needs to be fault tolerant otherwise 911 goes down.
  
  AlotOfReading 5 months ago
  
  Erlang is typically used for soft realtime systems on fairly powerful non-embedded hardware. There have been attempts to use it in hard realtime systems, but no successful ones that I'm aware of.
ajross 5 months ago

You say this as if it's somehow an obvious thing that... we clearly need RabbitMQ on spacecraft?

gorfian_robot 5 months ago

now raise your hand if you are actually in the business of writing code being deployed on JPL missions .....

bueller? bueller???

fallingmeat 5 months ago

Could apply to any application involving functional safety requirements

lincpa 5 months ago

[dead]

EVa5I7bHFq9mnYK 5 months ago

I'm terrified - do they really use C at NASA? I thought it's all ADA. In fact, most of the rules would be automatically satisfied by an early FORTRAN, as it had no heap, no stack and no recursion.