WebAssembly and C++

115 points by mariuz 3 years ago

There's now an interesting alternative to Emscripten called WaJIC:

Enables most of the "Emscripten magic" (like embedding Javascript code into C/C++ files), but in a more bare bones package (apart from clang it essentially just uses the wasm-opt tool from Binaryen for post-processing).

(to be clear, wajic has fewer out-of-the-box features than Emscripten, but it might be an alternative for very small projects which don't need all the compatibility shims provided by Emscripten, while still providing features to simplify calling between C/C++ and WASM.

Also on an unrelated note: I found working with WASM a lot more enjoyable after I switched from C++ to plain C. The resulting WASM blobs are usually smaller, and interfacing between JS and WASM is a lot easier down on the C API level.

singularity2001 3 years ago

There is also that alternative called clang, which has builtin WASM support since version __.0
(but you need to write js glue code yourself)
- flohofwoe 3 years ago
  
  Both WAjic and Emscripten depend on Clang to compile to WASM, but Clang alone isn't very useful except for very simple WASM libraries that don't need to call out into web APIs. WAjic adds exactly the part that simplifies writing the JS glue.
- glandium 3 years ago
  
  IIRC, it's since version 8.0.

codeflo 3 years ago

I have a clarification: Dereferencing a null pointer in C++ doesn’t reliably crash anymore, unfortunately, and if you still believe it does, then I do not want to run your C++ code. I assume that the author understands this and is only trying to simplify. My problem is that this is already such a widespread and dangerous myth that I’m sad to see it perpetuated in an otherwise great article.

For anyone who’s wondering, I’m referencing “UB” here (which is short for Undefined Behavior, but don’t be confused by the English language meaning, it’s a precise technical term in the spec). Skipping the details, there’s a surprising (and growing) amount of situations where a null pointer access leads to silent incorrect code execution instead of a crash already, with standard compilers and CPUs. C++ programmers need to deal with that on any platform. As I’m sure the author is aware, what the WASM compilers do here is well within the spec.

10000truths 3 years ago

Compilers following the spec is necessary, but not sufficient. Compilers are expected to be useful, and in practice, that means being able to fail reliably when spec violations occur. Let’s highlight what I personally expect, from my experience:
1. Pretty much every non-embedded CPU architecture has a page-based MMU.
2. Pretty much every compiler treats NULL as a zero value.
3. Every operating system using an MMU will leave the zero page unmapped, and load/stores at those addresses will cause a segfault.
4. If the pointer address cannot be inferred at compile time (e.g. via address of stack variables, or constant propagation), the compiler cannot optimize away the dereference.
Now WASM is a little bit different because it’s a virtual bytecode, but 99% of WASM builds use emscripten, and emscripten’s primary use case is porting C programs to the web with minimal changes, which means that in order to be useful, either emscripten (via instrumentation) or WASM (via memory protection feature) needs to be able to accommodate common target specific behaviors like memory protection.
- Kranar 3 years ago
  
  >If the pointer address cannot be inferred at compile time (e.g. via address of stack variables, or constant propagation), the compiler cannot optimize away the dereference.
  It can't optimize away the dereference, but it can optimize away any NULL checks, so for example the compiler can transform this:
  if(y != NULL) { printf("Hello"); } int x = *y;
  Into this:
  printf("Hello"); int x = *y;
  It's possible that you never expect "Hello" to be printed if y is NULL, since there is a clear check for NULL before the printf, and yet... because of undefined behavior and compiler optimizations, the compiler is allowed to make assumptions about runtime behavior even if that behavior precedes an undefined operation.
  https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...
  
  comex 3 years ago
  
  Compilers would be allowed to do this… but they don't do it. See for yourself:
  https://gcc.godbolt.org/z/4b5azx5ac
  I replaced the printf with a simple write to a global variable, since as-is, the optimization might actually be illegal on the grounds that the printf might never return (e.g. if stdout is a blocked pipe). But none of GCC, Clang, or MSVC elide the check at max optimization level.
  If you put the dereference before the null check, then all of those compilers do elide it. This is pretty easy to justify, since any program without address 0 mapped will always crash before reaching the check. This optimization did cause a brouhaha a decade ago, when it was discovered that it was being applied to the Linux kernel. At the time, it was possible for a user program to map address 0 and have that be directly dereferenceable by the kernel, and an optimized-away null pointer check made an otherwise unexploitable kernel bug exploitable. [1] Linux now passes a flag to turn the optimization off (-fno-delete-null-pointer-checks), which solves that problem without denying user programs the benefits of the optimization.
  That said, it appears that Clang elides the check in the dereference-before-check version even on WebAssembly; this is probably a bug.
  [1] https://lwn.net/Articles/342330/
  
  codeflo 3 years ago
  
  It's dangerous to assume that you know better than someone like Raymond Chen, the author of the article the GP linked. This is not a Clang bug, it's just the nature of UB optimizations we're talking about. :)
  I've found a simple example. Code order and compiler options are very finicky with this sort of thing, but as of posting this, clang optimizes away the null dereference here entirely: https://gcc.godbolt.org/z/zjhY1W67Y
  (Edit: Simplified example even a bit more.)
  
  10000truths 3 years ago
  
  In your example, even though the behavior is undefined according to the spec, the program will still reliably segfault as long as the target it runs on has the appropriate memory protections for the zero page. The crash will occur immediately after the elided null check, so the failure is localized to the problematic area and a debugger will show the problem.
  
  Kranar 3 years ago
  
  First your statement is unfortunately just untrue, the idea that any behavior can be relied upon when performing an undefined operation is why to this day we have very buggy software written in C and C++ that lead to security vulnerabilities. There is nothing reliable about undefined behavior.
  Second, even if we accept for the sake of argument that your statement is true, the entire point of my example is that by the time you get to the dereference which produces a segfault, it's already too late. Any operation prior to that segfault that assumed a non-null pointer (such as the print statement) will have already been executed and produced potential side-effects. There is nothing local about this, it's simply that for the sake of an example I wrote a small code snippet, but in the article I linked to there are examples of very non-local bugs both spatially and temporally that would not be something you could reliably track down in a debugger.
  
  repsilat 3 years ago
  
  Maybe off-topic, but here's an example of a null dereference that usually won't segfault because it never actually loads the address:
  struct X { int foo() { return 1; } }; X *x = nullptr; cout << x->foo() << endl;
  
  int_19h 3 years ago
  
  It won't occur immediately after the elided check, that's exactly the problem. It will allow the I/O to occur first. Now imagine if that was a database update.
- IX-103 3 years ago
  
  WASM is a little more different than that, because 0 is a valid memory address -- the beginning of the heap. As you would expect this has all kinds of wonderful ramifications for code that assumes 0 is nullptr.
  
  gpderetta 3 years ago
  
  Note that the constant 0 being the null pointer is not an assumption but it is guaranteed by the standard. Whether that maps to address 0 it is an implementation detail, but any implementation that doesn't do that is looking for trouble.
  
  10000truths 3 years ago
  
  Yes - as mentioned in the article, 0 being a valid memory address is an unexpected gotcha. My point is that, even though the C spec allows for such an environment, it still causes lots of problems in practice, so it should be addressed.
  
  int_19h 3 years ago
  
  It should be noted that such an environment has been extremely common back when C was designed, and for a couple decades thereafter. This isn't something new that has just been sprung up unexpected on C coders.
- apaprocki 3 years ago
  
  Re: 3. AIX (that you could buy today from IBM using the latest POWER chips) maps the zero page as read-only, so dereferencing NULL will not segfault.
sanxiyn 3 years ago

Dereferencing a null pointer in C++ doesn't reliably crash, but dereferencing a null pointer in GCC and Clang C++ compiled with -fsanitize=null does reliably crash, with useful error messages, and it is documented to be so.
You are not required to write standard C++. Making uses of implementation defined features is a valid engineering choice.
AshamedCaptain 3 years ago

> Dereferencing a null pointer in C++ doesn’t reliably crash anymore
Anymore? I would say the opposite: these days it tends to crash more reliably if anything, as the size of the redzones increases. Even on Windows 9x 0 was a perfectly valid address.
- sumtechguy 3 years ago
  
  0 is usually a 'valid' address. What is at that addr though can have some interesting bits that may or may not contain valid instructions/data. Now on some processors you can lock it out so it seg faults if you access it though.
kllrnohj 3 years ago

Null pointer derefs in C/C++ reliable crash on anything that isn't an embedded device. The times when the compiler "avoids" it are just times when the deref isn't actually used, and thus dead code eliminated. Which doesn't tend to result in incorrect code execution, but rather crashes later on that are harder to understand (like having "this" end up being null inside a member function)
adrian_b 3 years ago

While you are right that in theory a C or C++ compiler is free to do anything when Undefined Behavior is specified in the standard, any careful programmer should always compile all C/C++ programs with options like "-fsanitize=undefined -fsanitize-undefined-trap-on-error".
This guarantees that null pointers or out-of-bounds array accesses will crash the program.
nix0n 3 years ago

> standard compilers and CPUs
I'm not sure what you mean by that, are you including modern Windows/Linux/Mac PCs?
> silent incorrect code execution
Do you know of any non-contrived pieces of code that silently do the wrong thing, on platforms that could otherwise run Electron?
- codeflo 3 years ago
  
  "Non-contrived" is a matter of definition; it's almost necessary to give toy examples, but I hope you can be convinced that the same structural pattern can appear in real code.
  I hope Clang on x86_64 is common enough for you to make my point. In this example, Clang reasons (backwards in time) that the if couldn't have been taken and dereferences whatever was in p, instead of dereferencing null: https://gcc.godbolt.org/z/zjhY1W67Y

kllrnohj 3 years ago

> In Wasm a null pointer just refers to memory[0] and it is a legal address to read and write from.

Uh, can anyone explain the reasoning behind this? Because it looks like WASM just made an absolutely brain dead decision here that flies against everything computers have done for the last ~20 years.

Yes yes there exist systems without MMUs where 0x0 is a valid address (along with all the low range addresses for that matter). But it definitely isn't a friendly or "modern" design, so why on earth would WASM go with it? Why wouldn't they just have a fixed offset? I mean really the brk based memory scheme was already bad enough, but to start at 0x0?

EDIT: Oh, also worth noting this doesn't just affect compiled languages like C/C++. But also languages like Java. How do you think NullPointerException works? The runtime isn't inserting if checks before every object access, that'd be murderously slow. No, it's just letting the deref trap and converting the segfault into an NPE exception. So WASM is also a PITA for high level languages that don't expose pointers at all. Well, that's once WASM code can catch traps in the first place, which currently it cannot.

titzer 3 years ago

No having an unmapped zero-page by default is a result of Web Platform integration. There isn't an easy interop story with JS ArrayBuffer where there are inaccessible holes. And it's not just JS (and therefore a JSVM issue); a lot of code in the web platform (e.g. Chromium) uses the memory of ArrayBuffers rawly and is not prepared to encounter holes (OS-level signals).
That said, we are working on this:
https://github.com/WebAssembly/memory-control
s-macke 3 years ago

WASM is not designed to work around the shortcomings of C. The fact that nowadays the program crashes with a segmentation fault error after a null pointer dereference is only because modern operating systems are being nice to you.
But a proper error handling can still be implemented into WASM. Maybe, the following proposal on memory control will add the option:
https://github.com/WebAssembly/memory-control/blob/master/pr...
- kllrnohj 3 years ago
  
  > WASM is not designed to work around the shortcomings of C.
  That's not the question. The question is why was WASM designed to be hostile to C (or C-likes)? There's seemingly no benefit since WASM has to be relocated anyway, the actual memory addresses aren't 0x0 after all. So the WASM runtime is already doing an offset. Therefore there's no cost (and seemingly no implementation burden) to not starting at 0x0, so why would you? Why would you be aggressively hostile to what's going to almost certainly be your 2 largest users?
  
  OskarS 3 years ago
  
  My guess is probably that it's performance issue. You would have to do a `if (addr == 0) trap()` branch on EVERY memory access in the WASM runtime, which is not free. I guess it would still has to do some kind of bounds check to see that you're not overflowing your allotted buffer, but I could imagine that adding that kind of null-check to every memory access is not something you just get for free.
  The other option would maybe be to make the entire first page of memory non-readable, and then have the MMU execute the trap for you (this is how real OSes does this null check, right?), but then you're wasting 4kb in every WASM runtime. Doesn't seem ideal.
  I agree with you, this is a real shame, and it's going to lead to a LOT of bugs that would have been discovered otherwise. But I can see the reasoning why you wouldn't do this in WASM.
  
  kllrnohj 3 years ago
  
  > The other option would maybe be to make the entire first page of memory non-readable, and then have the MMU execute the trap for you (this is how real OSes does this null check, right?), but then you're wasting 4kb in every WASM runtime. Doesn't seem ideal.
  You're only wasting virtual address space, not actual RAM. So the only cost is your max memory possible decreases by 4kb. Hardly a noteworthy expense.
  And this is an expense that WASM runtimes already have to spend anyway, since none of them are running on bare metal with address 0x0 being available to them. So there's already address translations in play. Multiple of them, in fact, as you have to first translate from the WASM heap to the host's address space, and then from that through the MMU to physical pages.
  And of course the WASM runtime already needs to trap for invalid addresses (eg, -1 right out the gate). How would trapping the first few pages be any more expensive than any other unmapped range? You're still working with a single contiguous region (at least until mmap finally lands). There doesn't seem to be any benefit to starting from 0x0 vs. starting from any other constant.
  
  OskarS 3 years ago
  
  > You're only wasting virtual address space, not actual RAM.
  Fair enough, that's true.
  > And of course the WASM runtime already needs to trap for invalid addresses (eg, -1 right out the gate). How would trapping the first few pages be any more expensive than any other unmapped range?
  Memory locations are presumably all unsigned, so -1 is not an issue. The thing I'm saying is that you're turning a test that was `if (addr < LIMIT)` into `if (addr != 0 && addr < LIMIT)`, which is not nothing. And it's on EVERY memory access.
  Thinking about it, there's no way you could rely on the MMU to be the ONLY memory check obviously. Like, the JIT can't translate a reference to `x = memory[ptr]` into just a `mov x, [BASE + ptr]`, because it could reference non-WASM memory, which is obviously a security no-no (unless it could statically guarantee `ptr < LIMIT`). You need some kind of runtime check in addition for security and stability, you can't get around it. Given that, I think I agree with you, might as well just do it. It's, like, two or three extra instructions or something? But I would not say that it's free.
  
  kllrnohj 3 years ago
  
  > The thing I'm saying is that you're turning a test that was `if (addr < LIMIT)` into `if (addr != 0 && addr < LIMIT)`, which is not nothing. And it's on EVERY memory access.
  But that's not what I'm saying. I'm saying every memory access would become:
  addr = addr - START_OFFSET (eg, 0x1000) if (addr < LIMIT) { do stuff }
  There's just an extra sub in there. And you can definitely optimize that, same as any other bounds check optimization. Which the runtime is already tasked with doing since all your loads are already (wasm_address + host_array_buffer_ptr). If that add is free, the early sub likely is, too. They fall into the same category here.
  But you can also make it completely free by just ensuring there's a dead zone before the host array buffer location, such that anything in the range 0x0-0x1000 (or whatever) just lands before the array buffer's start location which has been mmap'd to trap. Then you don't need any changes at all.
  
  cesarb 3 years ago
  
  > Thinking about it, there's no way you could rely on the MMU to be the ONLY memory check obviously. Like, the JIT can't translate a reference to `x = memory[ptr]` into just a `mov x, [BASE + ptr]`, because it could reference non-WASM memory, which is obviously a security no-no (unless it could statically guarantee `ptr < LIMIT`). You need some kind of runtime check in addition for security and stability, you can't get around it.
  On 64-bit systems, you actually can, since WASM is for now 32-bit only; just reserve a 4GB (plus 1 page) block of virtual memory starting at BASE, and there's no way for BASE+ptr (assuming ptr is 32-bit unsigned) to reach outside it (the extra 1 page after the end is to catch unaligned accesses at the very end of that 4GB).
  That is, you can statically guarantee "ptr < LIMIT" if "ptr" is 32 bits and LIMIT is 2^32 or more.
  
  sudosysgen 3 years ago
  
  You can use the MMU of the CPU to trap a null pointer dereference for free.
  
  kevingadd 3 years ago
  
  Not all execution targets for wasm can do this.
  
  sudosysgen 3 years ago
  
  There are very few such targets, thus they will simply have to pay the extra cost. It's a very, very good tradeoff to make.
  If you're trying to get every ounce of performance out of a CPU lacking very basic memory management facilities, it's probably a good idea not to use wasm.
  
  s-macke 3 years ago
  
  Yes, even in the proof of concept, you could have simply added an offset parameter when defining the memory area. They did not.
  
  tcfhgj 3 years ago
  
  C# has introduced nullables; safe Rust simply doesn't have nullptrs
  
  erichdongubler 3 years ago
  
  Nitpick: Safe Rust [_does_][0] let you construct null pointers -- it's _dereferencing_ them that's not allowed in safe Rust.
  [0]: https://play.rust-lang.org/?version=stable&mode=debug&editio...
- josefx 3 years ago
  
  > WASM is not designed to work around the shortcomings of C.
  Memory access protection on modern systems is not about C segfaults. But if WASM embraces MS DOS era memory access management then then that is certainly a choice, might even still be an improvement over plain JavaScript. Does it at least have near and far pointers?
  
  int_19h 3 years ago
  
  It's not segmented, so no... or rather, not yet.
  The wasm spec already accommodates to some extent the notion of multiple "memories" (i.e. distinct flat heaps), although it only allows for one in practice:
  https://webassembly.github.io/spec/core/syntax/modules.html#...
  And there's an active proposal to allow for multiple memories:
  https://github.com/WebAssembly/multi-memory/blob/main/propos...
  In an environment like that, you'd need full-fledged pointers to carry both the memory index and the offset; and then you might want a non-fat "pointer to same memory" alternative for perf. Might as well call them far and near.
  
  s-macke 3 years ago
  
  Well, the WASM binary format uses variable-length integer encoding. You can see this as an enhanced version of near and far pointers.
shadowofneptune 3 years ago

It seems reasonable to me based on how memory is used in WebAssembly, and the constraints it runs under. The address space of a WASM module is limited to just that module. A null pointer in one module will never be the same as one in another module, so it's already unlike native code.
Mandating that address 0 cannot be used places more constraints on what the browser engine can do when it compiles the module, and for some languages compiled to WASM this feature may not even be necessary.
EDIT: As for your own edit, since you cannot do arithmetic on a pointer in Java, this does make it easier. NULL could be set to -1, which unless that module is using 4 gigabytes of memory will cause an out-of-bounds exception.
- kllrnohj 3 years ago
  
  > Mandating that address 0 cannot be used places more constraints on what the browser engine can do when it compiles the module, and for some languages compiled to WASM this feature may not even be necessary.
  How do you figure? Just say the WASM heap starts at 0x1000 instead of 0x0 or whatever. Bam, done & done. The browser engine is already incorporating offsets to all the memory accesses anyway, the "real" addresses do not start at 0x0 of course.
  Edit: > EDIT: As for your own edit, since you cannot do arithmetic on a pointer in Java, this does make it easier. NULL could be set to -1,
  Well, you'd use something a bit farther away so that the internal engine also doesn't need to worry about small offsets from null (you're not always dereference 0x0 itself, but rather a field "nearby" after all)
  And yes Java can do that. So can C, for that matter, by defining NULL to be something else. But if everyone has to just reimplement guard pages in the upper 4GB address range, it gets back to why the hell didn't WASM just standardize that like every other MMU-based system? Why did WASM be an unfriendly snowflake?
  And of course once an mmap-like happens you lose your guard page safety. The runtime could still use the address range you hoped was invalid for something valid. You really need the runtime to guarantee the dead zone. Which a simple heap offset would have trivially achieved.
  
  shadowofneptune 3 years ago
  
  That actually is not a bad solution. Keeps everything in alignment. I assumed your solution would keep addresses 1, 2, 3, etc... as valid.
  
  flohofwoe 3 years ago
  
  > by defining NULL to be something else
  I bet that around 99% of C code in the real world assumes that NULL == 0. I'm sure that if it would be easy to make nullptr accesses trap in WASM browser runtimes without killing performance, the Emscripten team would have come up with a solution.
  
  josefx 3 years ago
  
  > I bet that around 99% of C code in the real world assumes that NULL == 0.
  That is the correct assumption as 0 is interpreted as a null pointer constant when interpreted as a pointer. As far as I understand you have to do something like memset(ptr,0,sizeof(int*)) to bypass this conversion and get a pointer to the memory location 0 instead of a NULL pointer.
  
  saurik 3 years ago
  
  Yeah, but that memset is super common due to how people clear structs.
flohofwoe 3 years ago

Emscripten allows to compile in a mode where null-pointer accesses trap, but this involves a check before each pointer access and costs a lot of performance. I think null being a valid address is a side effect of the WASM heap being a Javascript ArrayBuffer, where zero is a valid index (and I guess it's not possible to create an ArrayBuffer where the zero index is out-of-bounds).
Maybe in non-browser WASM runtimes (where the WASM heap doesn't need to be wrapped in an ArrayBuffer but can be directly mapped to virtual memory) it's possible to let null-pointer accesses segfault without performance cost.
jokoon 3 years ago

For for curiosity, how does C handles this? Is that at the OS level?
Because if it is, you can't really compare WASM to C, since WASM is not designed to be ran by an OS, so there are other protections.
WASM comes at some cost I guess, and I'm not sure if it makes it so much more difficult to debug.
- mananaysiempre 3 years ago
  
  In a typical protected memory situation, yes, this is done by the OS setting up a faulting mapping of OS-dependent size at low addresses, though the C runtime startup code could do it if the OS did not. (Not sure how Win9x handled it, by the way—wasn’t it supposed to map the DOS VM in the first 1M?..) In bare metal environments, there is usually something at address zero—if you’re lucky, it’s ROM, and you get a bus fault if you try to write there; if you’re not (as on the 8086), it is part of an architecturally significant part of RAM such as the interrupt vector table, and spectacular amounts of chaos can ensue if you touch it at the wrong time.
  Technically, while the C standard does require that
  void *p = 0;
  makes p be a special pointer value equal to any other one obtained that way and unequal to any normal pointer value, it does not require the memory backing the variable p to actually store all zero bits. Neither does it require that p be equal to q or r in
  int n = 0; void *q = (void *)n; void *r; memset(r, 0, sizeof r);
  nor, of course, that actually dereferencing p cause any particular behaviour like a crash. (POSIX may be stricter, I’m not sure.)
  Therefore the designers of the WASM C ABI could have made (void *)0 an all-ones pointer instead, for example, or better yet that minus some headroom, and guarantee a crash on dereference that way. Of course, this would be a DeathStation-level perversity, and I doubt Clang would be capable of dealing with such a platform out of the box.
  ...
  Now that I’m thinking about it, a more realistic solution would be to have (void *)n be stored as n but actually refer to WASM memory address n - 64K or something like that during codegen. Perfectly ordinary platform as far as C is concerned, possibly a bit of binary bloat(?). Don’t know why they didn’t do it that way.

smartmic 3 years ago

> But I think many people with similar C++ experience would agree with me that it's pretty rare for it to actually be the right tool to use for software these days.

Is this really the case? Personally, I see C++ (in general, not related to WASM) as a very powerful, modern language, plus it is standardized. Counting on the Lindy effect, I expect it to be around and of value for more than the next 40 years.

kllrnohj 3 years ago

C++20 is honestly a pretty good language. So much of the anti-C++ sentiment seems to be from people who don't realize C++98 isn't what's in common use anymore (and I would absolutely agree C++98 was kinda shit in many ways)
But realistically anything where performance is the primary goal, especially graphical or media things, C++ is still going to be on your very very short list of languages to use. Rust is a compelling alternative, but... that's kinda it. It's really a choice between those 2. And you certainly can't be faulted for going with the far more established and far more widely used C++ over the new upstart Rust.
- tialaramex 3 years ago
  
  > C++20 is honestly a pretty good language.
  > C++98 was kinda shit in many ways
  > And you certainly can't be faulted for going with the far more established and far more widely used C++ over the new upstart Rust.
  This cake-and-eat-it approach doesn't work. Either you're choosing C++ 20, and you get modern luxuries like a starts_with() method on strings but your language is so fresh out of the box that your tools don't work properly yet and your compiler vendor fixes swathes of fundamental bugs in core C++ 20 features every release or you're choosing to say well, C++ is really one language, and then ew... this library I'm using thinks raw pointers signify ownership because it was first written in 2008.
  Also, although C++ cleaned up lots of things since C++ 98 (for example Bjarne's terrible "look at me, I have operator overloading" I/O Streams is pushed aside by a modern formatting library in C++ 20) the fundamental ethos remains the same - C++ will always pick the wrong defaults. I'm not talking about a couple of unlucky choices, it's impossible to be this unlucky, they must surely be doing it on purpose. Without Vittorio's Epochs, which they very deliberately did not take for C++ 20, they won't be able to fix the defaults.
  
  kllrnohj 3 years ago
  
  You're kinda ignoring the middle ground there? Intentionally maybe? Libraries written in 2008 and still maintained have long since moved to C++11 or C++14 or newer. The baseline is easily a post-C++11, which was a huge turning point.
  > C++ will always pick the wrong defaults
  Oh definitely. Bad defaults are probably my only remaining complaint with modern C++.
qsort 3 years ago

Absolutely not, that quote is nuts.
It is true that the use cases where you wouldn't pick C++ as your first choice are more popular now than they used to be, and it's probably also true that if you only know C++, learning something like Ruby or Python to complement your skillset is likely to be a smart career move.
But come on, almost all the languages that are hip and get a lot of mindshare wish they had a fraction of C++'s usage and influence.
snovv_crash 3 years ago

Agreed. If you want to do anything with actual deployment to production of image processing for example, and you can't run it in a docker container, then C++ is the only option to get access to the myriad of libraries available.

AshamedCaptain 3 years ago

> For example, null handling in basically every non-C language doesn't rely on help from the CPU!

I am quite sure that Java does rely on page faults to detect null access (i.e. sigsegv causes a NullPointerException) As probably do most Javascript implementations and I would even dare say most languages' runtimes...

Cause if they don't, that's an optimization waiting to happen (e.g. for optional types).

evmar 3 years ago

[author here] Very interesting, do you have a reference on how Java implements this?
I just poked at v8 a bit and they seem to have an explicit 'null' object in the VM, though I don't know a lot about so it's possible it works as you suggest.
It seems it might be hard to distinguish language-level null pointers from null pointers within the VM implementation. E.g. v8 runs within a Chrome process that is processing HTML and other things, so if v8 had some special signal handling of null pointers it'd need to be able to distinguish those from HTML processing bugs etc., which I think you might only be able to do by examining the stack? I haven't thought this through but it seems difficult.
- wahern 3 years ago
  
  You can distinguish where a NULL pointer exception occurred by the program counter/instruction pointer. JIT'd code lives in particular memory regions (e.g. those explicitly mmap'd with PROT_EXEC) dynamically managed by the application. From there you may or may not care to examine the stack(s) for more specific details.
  Alternatively, you could simply only enable NULL pointer handling around specific blocks of code where you're prepared to handle it, and disable it everywhere else (causing the kernel to simply kill the program). I've used this technique to elide explicit array overflow checks in some very performance critical code, dramatically increasing performance (there would have been at least as much overflow checks as semantic operations). None of the critical code used malloc/free or called into any libraries, so on SIGSEGV I would simply longjmp back to a safe place and either grow the data structures and reset the state, or return failed, depending on how large things had grown.[1]
  The hard part of all of this isn't detecting where or even why a fault occurred--at least presuming you've architected things deliberately--but rather handling asynchronicity. If you have access to mechanisms like userfaultfd it's much easier, but it's doable on most systems. On most Unix systems threading primitives are NOT async-signal safe, but there are plenty of syscalls that are, and at least historically the semantics of Unix signal handlers were carefully defined to permit these sorts of tricks. It's a fine needle that needs to be threaded, but one that can done correctly nonetheless.
  [1] Note that this wasn't accidental. I wrote the code very carefully this way. I had also worked on at least two projects before where someone tried to get clever and add SIGSEGV handlers to recover from code that was never designed to be recoverable, with predictably horrendous results (e.g. endless, time-wasting bug reports and "fixes").
  
  gpderetta 3 years ago
  
  AFAIK, GNAT and GCJ (GCC front ends for Ada And Java respectively, the latter now dead) implement null pointer exceptions by simply raising a language level exception from the SIGSEGV handler (after I assume checking it happened in user code instead of the runtime).
  Completely non portable of course, but with the help of glibc, the Itanium ABI exception support and non-call-exceptions it can evidently be made to work reliably, at least on Linux.
  
  evmar 3 years ago
  
  Thank you for explaining! I think the piece I was missing is that the sigint handler is passed the instruction pointer of the faulting instruction.
- layer8 3 years ago
  
  See the links to the OpenJDK implementation in this blog post: https://shipilev.net/jvm/anatomy-quarks/25-implicit-null-che...
- ArchOversight 3 years ago
  
  v8 is JavaScript, which is not a JVM. They are two very different beasts.
  
  evmar 3 years ago
  
  The comment I was replying to made statements about both Java and JavaScript, so my questions were also about both (and whether they were different). I just added a line break to make this clearer.
  
  ArchOversight 3 years ago
  
  Ah, apologies, it seemed like you were accidentally conflating the two. Thanks for clarifying!

titzer 3 years ago

FWIW there is a proposal in the works to add page-based protection, which will allow unmapping the 0 page, restoring the trap-on-null-deref behavior that is important for many languages with safety checks.

https://github.com/WebAssembly/memory-control

HexDecOctBin 3 years ago

Has there been any meaningful progress in doing DOM manipulation (and accessing other web APIs) from WASM?

flohofwoe 3 years ago

Depends on what you mean. The DOM still strictly lives on the Javascript side (and IMHO that's a good thing), but the calling overhead from WASM to JS (and back) has been reduced in browsers (ca 2018/19) so that the overhead doesn't matter much anymore, and Emscripten has features to keep the JS shim close to the C/C++ side (by embedding Javascript directly into C/C++ source files). That way it's quite trivial to build a C or C++ library which manipulates the DOM by calling out into small JS source snippets.
To the user of such a library it doesn't make a difference whether WASM manipulates the DOM directly or goes through a JS shim.
WASM proposals like Interface Types (or whatever it is called currently) will reduce the amount of housekeeping needed by the JS shim, but that would only affect the library maintainer, not the library user.
(IMHO: using WASM for a framework which spends most of its time manipulating the DOM is wasted effort, JS is much better suited for this type of "dynamic workload", this type of application might be better done with a hybrid approach where Javascript and WASM work hand in hand)
- gspr 3 years ago
  
  > (IMHO: using WASM for a framework which spends most of its time manipulating the DOM is wasted effort, JS is much better suited for this type of "dynamic workload", this type of application might be better done with a hybrid approach where Javascript and WASM work hand in hand)
  Well, I suspect there might be a non-trivial number of us who have absolutely no interest in learning JS, but would like to do DOM manipulation in web apps written 100% in [Personal Favorite Language That Compiles To WASM].
  
  flohofwoe 3 years ago
  
  FWIW if you look around, C++ and Rust libraries for DOM manipulation exist (I haven't searched for other languages which compile to WASM):
  https://github.com/mbasso/asm-dom
  https://github.com/sycamore-rs/sycamore
  I think solving the problem of DOM access on the library level is exactly the right way to tackle this problem. The library user don't need to care about specific WASM features, and the library implementation can be simplified when those WASM features become available (and also implement per-browser fallback paths)
  
  tcfhgj 3 years ago
  
  Those libs/frame works would benefit massively from WASM DOM access.
  
  flohofwoe 3 years ago
  
  I doubt it. What would "DOM access in WASM" even look like?
  The DOM is exposed as a tree of high level Javascript objects with properties, but WASM has no notion of objects, properties and their relationship, it only has a few integer and floating point primitive types and heap addresses. All that "DOM access from WASM" would mean is that there's a new primitive "external reference" type which allows to tunnel an opaque Javascript object reference through WASM and back to JS while pinning the JS object for the JS garbage collector. But one couldn't really do anything useful with that reference on the WASM side without calling through JS (AFAIK there are some very high level proposals for this too, which could basically create the required JS shim code automatically, but that would just add more runtime complexity for little actual gain - the same shim-generation work could be done offline in the compiler toolchain).
  All that the "external reference" type would enable is a slightly simplified Javascript shim, because it would no longer need to map index handles to Javascript objects. But that's just a small part of what the JS shim does.
  It's a bit like asking if x86 machine code has support for the Win32 window system API, the answer is both 'yes' and 'no' ;)
  
  tcfhgj 3 years ago
  
  > The DOM is exposed as a tree of high level Javascript objects with properties,
  I don't understand the reasoning. Of course it should be exposed in a way that best fits into WASM tech
  
  flohofwoe 3 years ago
  
  The entire concept of the DOM (-API) is built around the Javascript object model, which is on a totally different abstraction level than WASM. It's like asking to create a CPU which can directly run Javascript, and on top has special instructions to "manipulate the DOM". It just doesn't make much sense on a practical level.
  Creating a library in a high level language which wraps DOM manipulation makes a lot more sense, and nothing in WASM prevents this (and the rest are just implementation details of the library).
  
  gspr 3 years ago
  
  Does the question make sense on the level of sandboxing, though? I thought part of the problem was that the WASM VM doesn't even have access to the DOM itself?
  
  flohofwoe 3 years ago
  
  The WASM VM can only communicate with the outside world through what is essentially a jump table of "Javascript FFI functions" (or native functions for WASM VMs that run outside the browser), and those functions can only have simple arguments and return values like integers and floats (no pointers, no strings, no structs, no object references, etc...).
  Creating a Javascript object on the "JS side" requires a mapping of the Javascript object to a simple WASM integer type (usually this is performed by the JS shim by storing the object in a dictionary with the integer handle as key). The integer handle is then passed back to the WASM side. Whenever WASM wants to call a Javascript API function involving a JS object, it passes the integer handle back to the Javascript shim, which looks up the associated JS object in the dictionary, and so on and so forth...
  One proposed improvement is to be able to directly tunnel Javascript object references as "opaque handles" through WASM, so that the object dictionary housekeeping code wouldn't be necessary.
  The whole thing seems terribly convoluted, but its not much different than FFI mechanisms where a very high level language needs to call into C APIs (just reversed).
  
  irrational 3 years ago
  
  But, why? You would have to recompile the WASM code every time a change was made to the DOM. Add a new DIV that you now need to address? Add a reference to it in the WASM code and recompile. What a pain that would be.
  
  HexDecOctBin 3 years ago
  
  A div's name is equivalent to a pointer to a widget in a GUI library. HTML isn't some special sauce, we've known how to do GUI in C/C++ for a long time.
  
  gspr 3 years ago
  
  I don't understand. Can you elaborate?
  
  NohatCoder 3 years ago
  
  So you want to avoid the horrors of the web by inserting some language-warping piece of glue code that is guaranteed to only ever get in your way. The irony.
  In any case, the DOM and CSS is like 90% of the reason to hate the basic web stack. The other 90% of the reason that people hate JavaScript for is the myriad of frameworks that some people opt to use. You don't have to, and if you come from C you will probably feel right at home not doing so.
  
  gspr 3 years ago
  
  > So you want to avoid the horrors of the web by inserting some language-warping piece of glue code that is guaranteed to only ever get in your way. The irony.
  No, I in particular want to avoid glue code!
  > In any case, the DOM and CSS is like 90% of the reason to hate the basic web stack. The other 90% of the reason that people hate JavaScript for is the myriad of frameworks that some people opt to use. You don't have to, and if you come from C you will probably feel right at home not doing so.
  Perhaps. In my case, it's simply that I abhor the idea that this particular platform requires one specific langauge. I think that's rather dumb. I'd prefer to treat web browsers as just another platform. I do understand that this platform is a bit different from the classical ones (Linux, Windows, Mac, the other unices, take the cartesian product with hardware architectures if you want) in that it doesn't have a notion of file access (and a bunch of other things), but still – just another (a bit esoteric) platform nonetheless. The platform shouldn't care what language you write in, as long as you have a compiler targeting it.
  I pick my language—any with a compiler that targets the platforms I require—and then I perhaps write different frontends depending on the platform. Maybe on the unices the frontend reads and writes files. Maybe on Windows it's GUI based. And maybe on the browser one interacts with it through DOM manipulation. That last detail shouldn't, IMHO, dictate that the language used. We don't accept that in other situations, so why do we when it comes to the web?
  
  NohatCoder 3 years ago
  
  > We don't accept that in other situations, so why do we when it comes to the web?
  Because it is the only thing we have sufficiently sandboxed so that we can actually run it untrusted.
  I totally get what you want, but we already have a bunch of X-to-JavaScript compilers that pretty much all deliver the experience of: "It works, but it is a bit gnarly, so why would you use this when you can just write JavaScript". And I don't think WebAssembly is going to change that experience much, mainly because it is a pretty poor VM, seemingly designed around the misconception that a lack of features makes it a good compilation target.
  I honestly think we could give compiled languages a much better time on the web with a new intermediate language, but someone has to design it for that purpose rather than cargo cult Assembly similarity.
  
  gspr 3 years ago
  
  > Because it is the only thing we have sufficiently sandboxed so that we can actually run it untrusted.
  I just don't get why that property couldn't be language-agnostic. Isn't the WASM VM also meant to be sandboxed?
  
  xwolfi 3 years ago
  
  I think trivial. Manipulating the DOM without Javascript is as useful to understand it as manipulating a car without hands. It can work other ways, you can prefer it but it wasn't made for that amount of flexibility. Better use wasm for what it's good at: obfuscation of proprietary code, high intensity mathematics, port of existing code. Not adding a span in a div please.
  
  gspr 3 years ago
  
  > I think trivial.
  Neither one of us can back up our claims, but: Surely there's a non-trivial amount of people out there who abhor JS and who would still want to be able to target the web as a platform?
  > Manipulating the DOM without Javascript is as useful to understand it as manipulating a car without hands.
  Why? If you mean that because "things are the way they are", then sure, but this isn't a fundamental limitation.
  > Better use wasm for what it's good at: obfuscation of proprietary code,
  That's not what WASM is for.
  > high intensity mathematics, port of existing code. Not adding a span in a div please.
  But why? Why should that last task be the domain of only JS?
  I think this is exactly the attitude that has kept me shying away from the web as a platform. On classical OS platforms, they may well be better suited and less suited languages for a given task – but declaring that you mustn't do something in a given language is just strange to me. Yet it seems like the norm on the web.
  
  esperent 3 years ago
  
  > Surely there's a non-trivial amount of people out there who abhor JS
  The amount of people for whom the hate is deserved and not just a knee jerk reaction is probably trivial though (at least once you include Typescript and ES6). Modern JS is not a bad language and it's time for hating on it to go out of fashion.
  
  gspr 3 years ago
  
  > The amount of people for whom the hate is deserved and not just a knee jerk reaction is probably trivial though (at least once you include Typescript and ES6). Modern JS is not a bad language and it's time for hating on it to go out of fashion.
  I'm not saying it's a bad language any more than I'm saying that strawberry ice cream is bad. I'm merely stating the fact that there's a non-trivial amount of people who abhor strawberry ice cream and JS. I find your reply, that strawberry ice cream is perfectly fine, a strange one.
  
  esperent 3 years ago
  
  I didn't say it's perfectly fine, I said it's not worthy of being "abhored".
- HexDecOctBin 3 years ago
  
  > the calling overhead from WASM to JS (and back)
  This is what I meant: is it still necessary to go through JS, or do we now get low level access to DOM directly through WASM? I guess we don't.

hoseja 3 years ago

The problem with NULL not being 0 is you can't do things like `if(ptr)`

ncmncm 3 years ago

Easily handled, in translation.
The compiler understands it is a pointer, and that you are checking if it is equal to nullptr, and generates the right code.
Another smart adjustment when making a new ABI is to make true represented as -1. This matches how AVX* and SIMD instructions do things. (The RISC-V people missed this opportunity, among others.) Of course when converting to int, it gets turned into 1, but compilers have absolutely no difficulty with that.
qsort 3 years ago
???
Pointers, as per the standard, have an implicit conversion to bool that evaluates to true if and only if the pointer is not exactly nullptr. Code like:
```
  if (ptr) {
    ...
  }
```
is guaranteed to work the way you expect it to, regardless of how null pointers are implemented.
teddyh 3 years ago

In C, NULL is always 0. Or, more correctly, the constant ‘0’ in a pointer context is always the NULL pointer. It’s then up to the compiler to convert the NULL pointer to the actual memory address constant used for the current architecture. From pure C code, the NULL pointer will always appear to == 0, and you should never be able to tell otherwise.
janekm 3 years ago

Indeed, I do make an effort to write if(ptr!=NULL) (partially because it more explicitly encapsulates the intent, much like writing ptr=NULL when "clearing" the ptr) but changing the developer expectations around this sounds like a terrible idea.