I posit that while more programmer friendly async abstractions (such as in Go) are good in the short-term, ideally async should be far less scary even in the lower levels. Programmers should shy away from async not because it is hard, but when they have carefully considered if the architecture can afford to be sync instead. If they aren't doing async, it's likely that they rely on other code that should be handling async well.
Alan Kay said (paraphrased roughly) that programmers use multithreading when they don't understand how to write programs as the state machines that they are. Then you get all the surprise and suspicion about distributed locking (featured on HN today!) and whatnot. There is much essential complexity to be tackled here. If robust software is a priority (it really should be), we need to be deliberate in whether async is exposed at a given layer of abstraction. We shouldn't be awed into writing subpar programs.
Was Kay thinking of local code or distributed code? For local I can see his point. But once we start talking to networks or even really hard drives you need scatter-gather semantics. Otherwise you’re running lots of threads, which undercuts his point, or you’re writing your own multitasking state machine which is just a poor implementation of concurrency where you’re responsible for all the bugs and deadlocks, not just some of them.
(I should caveat what I'm about to say that I'm primarily concerned about writing robust and highly performant programs, and while I believe it should be a focus broadly, it's a practical niche.)
That's the thing, though. It's arguably even more important for distributed code. If we abstract away the state machine too much, it becomes difficult to reason about the code precisely because of abstraction. The complexity that was present explicitly in the state machine will just cause confusing behavior in the abstracted version. Using lightweight threads or another high level abstraction that approximates blocking code will allow getting a program out faster, but lower quality at that. Two examples to illustrate my point: first, you mention scatter-gather, but the base concept is orthogonal to sync/async. However, I/O is characteristically async, and therefore the underlying mechanisms are async anyways. Second, io_uring is showing that async can be good for performance while not being a difficult interface.
Sync code makes things easier to reason about but also kinda not. I think the big issues with async are that the OS hasn't done a good job allocating responsibility of async interfaces and the fundamental difficulty. The former makes async seem less efficient than it could be, which is true in that a sync-over-async interface is better than an async-over-sync-over-async interface, but we should have async interfaces accessible. The latter probably feeds into a bias to not even touch async where mixing async and sync would be the best blend of performance and programmability.