Hardware Stockholm Syndrome

programmingsimplicity.substack.com

27 points by rajiv_abraham 12 hours ago

Weird article.

Hasn't most code that's been compiled in the last few decades using the x86 frame pointer register (ebp) as a regular register? And C also worked just fine on CPUs that didn't have a dedicated frame pointer.

AFAIK the concepts of the stack and 'subroutine call instructions' existed before C because those concepts are also useful when writing assembly code (subroutines which return to the caller location are about the only way to isolate and share reusable pieces of code, and a stack is useful for deep call hierarchies - for instance some very old CPUs with a dedicated on-chip call-stack or 'return register' only allowed a limited call-depth or even only a single call-depth).

Also it's not like radical approaches in CPU designs are not tried all the time, they just usually fail because hardware and software are not developed in a vacuum, they heavily depend on each other. How much C played a role in that symbiosis can be argued about, but the internal design of modern CPUs doesn't have much in common with their ISA (which is more or less just a thin compatibility wrapper around the actual CPU and really doesn't take up much CPU space).

tdeck 2 hours ago

It's because the historical perspective in this article is really lacking, despite that being the premise of the article.
Not only does the author seem to believe that C was the first popular high-level language, but the claim that hardware provided stack based CALL and RETURN instructions was not universally true. Many systems had no call/return stack, or supported only a single level of subroutine calls (essentially useless for a high level language but maybe useful for some hand written machine code).
FORTRAN compilers often worked by dedicating certain statically allocated RAM locations near a function's machine code for the arguments, return value, and return address for that function. So a function call involved writing to those locations and then performing a branch instruction (not a CALL). This worked absolutely fine if you didn't need recursion. The real benefit of a stack is to support recursion.

summa_tech 9 hours ago

> Look at a modern CPU die. See those frame pointer registers? That stack management hardware? That’s real estate. Silicon. Transistor budget.

You don't. What you see is caches, lots of caches. Huge vector register files. Massive TLBs.

Get real - learn something about modern chips and stop fighting 1980s battles.

burnt-resistor 3 hours ago

This. They're shouting at the sky from whatever squirrels are running around their brains without understanding anything about real CPUs. It's somewhere between a shitpost and AI slop. They offered no better alternative except to bitch about how "imperfect" things are that work. I'd like to see their FOSHW RTL design, test bench, and formal verification for a green field multicore, pipelined, superscalar, SIMD RISC design competitive to RISC-V RV64GC or Arm 64.

wrs 8 hours ago

Given this is rewinding to the 1970s, I expected a mention of CSP [0], or Transputers [1], or systolic arrays [2], or Connection Machines [3], or... The history wasn't quite as one-dimensional as this makes it seem.

[0] https://en.wikipedia.org/wiki/Communicating_sequential_proce...

[1] https://en.wikipedia.org/wiki/Transputer

[2] https://en.wikipedia.org/wiki/Systolic_array

[3] https://en.wikipedia.org/wiki/Connection_Machine

lukeh 4 hours ago

XMOS still very much alive. But CSP comes with its own set of challenges which ironically have force one back to using shared memory.

mrheosuper 8 hours ago

>Pong had no software. Zero. It was built entirely from hardware logic chips - flip-flops, counters, comparators. Massively parallel.

And that is what FPGA for.

This strikes me the author lack of hardware knowledge but still try to write a post about hardware.

mcdeltat an hour ago

Would some message passing hardware actually be "better" in terms of performance, efficiency, and ease of construction? I thought moving data between hardware subsystems is generally pretty expensive (e.g. atomic memory instructions, there's a significant performance penalty).

Disclaimer that I'm not a hardware engineer though.

bibanez 10 hours ago

You see innovation in this space a lot in research. For example, Dalorex [1] or Azul [2] for operations on sparse matrices. Currently a more general version of Azul is being developed at MIT with support for arbitrary matrix/graph algorithms with reconfigurable logic.

[1] https://ieeexplore.ieee.org/document/10071089 [2] https://ieeexplore.ieee.org/document/10764548

Animats 9 hours ago

If you disallow recursion, or put an upper bound on recursion depth, you can statically allocate all "stack based" objects at compile time. Modula I did this, allowing multithreading on machines which did not have enough memory to allow stack growth. Statically analyzing worst case stack depth is still a thing in real time control.

Not clear that this would speed things up much today.

Panzerschrek 6 hours ago

It's too constrained to statically-analyze the whole program to determine required stack size. You can't have calls via function pointers (even virtual calls), can't call thirdparty code without sources available, can't call any code from a shared library (since analyzing it is impossible). This may work only in very constrained environments, like in embedded world.