| Svelte Hacker News

pcwalton 6 years ago

We tried this in Rust and found it was slower than 1:1 threading.

pron 6 years ago

What you tried wasn't "this", though. It was one particular implementation of lightweight threading that has to cope with Rust's peculiarities, special requirements and compilation targets. There is absolutely nothing essential about lightweight threads that prevents them from emitting essentially the same code as the stackless-coroutine approach. It's just that in Rust it might be very hard or even not worth it, given the language's target audience.
- matklad 6 years ago
  
  Fibers under the magnifying glass [1] might be a relevant paper here. Its conclusion, after surveying many different implementations, is that lightweight threads are slower than stack less coroutines.
  [1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p136...
  
  pron 6 years ago
  
  No, its conclusion is that fibers with certain properties in C/C++ are slower -- and particularly hard to implement correctly -- than stackless coroutines in C/C++. That's because of the particular characteristics of those languages. In fact, you'll note that the only negative thing he says about Go is that it incurs an overhead when interacting with non-Go code.
  
  pcwalton 6 years ago
  
  And that overhead is a deal-breaker in Rust.
  
  pron 6 years ago
  
  Sure, but that overhead is also not essential, but a feature of Go's particular implementation. Fibers aren't one thing and there are many, many ways of implementing them. As I said before, implementing them for Rust well would have likely required changes to LLVM and Web Assembly, and even then it would be harder than async/await, perhaps to the point of being too hard to be worth it and probably against aspects of Rust's philosophy (I would say that that is the main difference between the two: achieving similar performance is much easier for the language implementors with async/await). But it's just not true that there is something essential about them that makes them slower. After all, you're running all of your code inside a particular implementation of threads.
  
  pcwalton 6 years ago
  
  > Sure, but that overhead is also not essential, but a feature of Go's particular implementation.
  The only way to get around the FFI performance problem would be for all fibers to have big stacks. At that point you've thrown away their biggest selling points: high scalability and fast spawning.
  
  pron 6 years ago
  
  > The only way to get around the FFI performance problem would be for all fibers to have big stacks.
  I don't know all of Rust's specific constraints, but it is not the case in general. There are two levels for FFI support in this context, based on whether you want to allow FFI to block the lightweight thread (perhaps through an upcall), or not. Only if you want to allow that do you need "big stacks", but even then they can be "virtually big" but "physically small". If you don't, then all you need to do is to temporarily run FFI code on a "big stack", but you know that all the FFI frames are gone by the time you want to block. Depending on your FFI, if you don't allow the FFI code to hold pointers into you language's stack, you're all good.
  
  Rusky 6 years ago
  
  > Depending on your FFI, if you don't allow the FFI code to hold pointers into you language's stack, you're all good.
  Rust does this with virtually all FFI.
- pcwalton 6 years ago
  
  I don't understand what your objection is. It's a given that what I wrote applies to Rust. This is a thread about Rust. I didn't say that M:N threading is always slower than 1:1.
  Besides, fibers don't emit essentially the same code as async code. One has a stack, and the other doesn't. That's a significant difference.
  
  silon42 6 years ago
  
  If the stack could be sufficiently small, it's not that different from heap allocated async state. But you probably needs segmented stacks, or at least separate stack async preemptible or non-async-preemptible code (has anyone tried making a system like this?)
  
  pron 6 years ago
  
  It's isn't a given that M:N threading is slower than 1:1 threading even in Rust. A particular implementation you tried exhibited that behavior.
  > One has a stack, and the other doesn't. That's a significant difference.
  They both have some memory area to which they write state. Calling it "a stack" refers to the abstraction in the programmer's mind, not to how the memory is actually written/read. It is true that in order to support recursion, a thread might need to dynamically allocate memory, but so would async/await, except it'll make it more explicit.
  
  pcwalton 6 years ago
  
  > It's isn't a given that M:N threading is slower than 1:1 threading even in Rust. A particular implementation you tried exhibited that behavior.
  I don't see any way around the problems of segmented stacks and FFI. There is no way to implement stack growth by reallocating stacks and rewriting pointers in Rust, even in theory. It would break too much code: there is a lot of unsafe (and even safe!) code out there that assumes that stack pointer addresses are stable. In fact, async/await in Rust had to introduce a new explicit pinning concept in order to solve this exact problem while remaining backwards compatible. And when calling the FFI, you have to switch to a big stack, which was an insurmountable performance problem. Rust code by its nature is FFI-heavy; it's part of the niche that Rust finds itself in.
  
  pron 6 years ago
  
  You can make what are virtually zero-cost copies from what you call a "big stack" to a resizable stack with virtual memory tricks. You don't even need to copy the entire stack, but cleverly rewrite the return address stored on the stack to do this kind of "code-switching". But it does mean doing backend manipulations in a platform-dependent way. There are several good ways to do this, none of them particularly easy. What is perhaps impossible is allowing FFI code to block the lightweight thread, but async/await doesn't solve this, either.
  
  pcwalton 6 years ago
  
  > You can make what are virtually zero-cost copies from what you call a "big stack" to a resizable stack with virtual memory tricks. You don't even need to copy the entire stack, but cleverly rewrite the return address stored on the stack to do this kind of "code-switching".
  We tried it. It was too slow.
  
  pron 6 years ago
  
  OK. It really is hard when you're what you call "FFI-heavy" and don't like a significant runtime. So Rust has several * self-imposed* constraints (whether they're all essential for its target domains is a separate discussion, but some of those constraints certainly are) that makes this task particularly hard, but my point is that there is nothing fundamental to n:m threading that makes it slower than async/await, and async/await does fundamentally come at the significant cost of a particularly viral form of accidental complexity.