Extropic is building thermodynamic computing hardware

115 points by vyrotek 12 hours ago

est 16 minutes ago

I listened to the Hinton podcast few days ago, he mentioned (IIRC) that "analog" AIs are bad because the models can not be transfered/duplicated in a lossless way, like in .gguf format, every analog system is built differently you have to re-learn/re-train again somehow

Does TSUs have to same issue?

d_silin 12 hours ago

It is a hardware RNG they are building. The claim is that their solution is going to be more computationally efficient for a narrow class of problems (de-noising step for diffusion AI models) vs current state of the art. Maybe.

This is what they are trying to create, more specifically:

https://pubs.aip.org/aip/apl/article/119/15/150503/40486/Pro...

modeless 3 hours ago

It's not just a "hardware RNG". An RNG outputs a uniform distribution. This hardware outputs randomness with controllable distributions, potentially extremely complex ones, many orders of magnitude more efficiently than doing it the traditional way with ALUs. The class of problems that can be solved by sampling from extremely complex probability distributions is much larger than you might naively expect.
I was skeptical of Extropic from the start, but what they've shown here exceeded my low expectations. They've made real hardware which is novel and potentially useful in the future after a lot more R&D. Analog computing implemented in existing CMOS processes that can run AI more efficiently by four orders of magnitude would certainly be revolutionary. That final outcome seems far enough away that this should probably still be the domain of university research labs rather than a venture-backed startup, but I still applaud the effort and wish them luck.
A_D_E_P_T 12 hours ago

An old concept indeed! I think about this Ed Fredkin story a lot... In his words:
"Just a funny story about random numbers: in the early days of computers people wanted to have random numbers for Monte Carlo simulations and stuff like that and so a great big wonderful computer was being designed at MIT’s Lincoln laboratory. It was the largest fastest computer in the world called TX2 and was to have every bell and whistle possible: a display screen that was very fancy and stuff like that. And they decided they were going to solve the random number problem, so they included a register that always yielded a random number; this was really done carefully with radioactive material and Geiger counters, and so on. And so whenever you read this register you got a truly random number, and they thought: “This is a great advance in random numbers for computers!” But the experience was contrary to their expectations! Which was that it turned into a great disaster and everyone ended up hating it: no one writing a program could debug it, because it never ran the same way twice, so ... This was a bit of an exaggeration, but as a result everybody decided that the random number generators of the traditional kind, i.e., shift register sequence generated type and so on, were much better. So that idea got abandoned, and I don’t think it has ever reappeared."
RIP Ed. https://en.wikipedia.org/wiki/Edward_Fredkin
- Imnimo 5 hours ago
  
  And still today we spend a great deal of effort trying to make our randomly-sampled LLM outputs reproducibly deterministic:
  https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
- rcxdude 6 hours ago
  
  It's funny because that did actually reappear at some point with rdrand. But still it's only really used for cryptography, if you just need a random distribution almost everyone just uses a PRNG (a non-cryptographic one is a lot faster still, apart from being deterministic).
vlovich123 12 hours ago

Generating randomness is not a bottleneck and modern SIMD CPUs should be more than fast enough. I thought they’re building approximate computation where a*b is computed within some error threshold p.
- UltraSane 3 hours ago
  
  Generating enough random numbers with the right distribution for Gibbs sampling, at incredibly low power is what their hardware does.
jazzyjackson 10 hours ago

I think that's underselling it a bit, since there's lots of existing ways to have A hardware RNG. They're trying to use lots and lots of hardware RNG to solve probabilistic problems a little more probabilisticly.
- pclmulqdq 9 hours ago
  
  I tried this, but not with the "AI magic" angle. It turns out nobody cares because CSPRNGs are random enough and really fast.
TYPE_FASTER 10 hours ago

https://en.wikipedia.org/wiki/Lavarand

trevor_extropic 11 hours ago

If you want to understand exactly what we are building, read our blogs and then our paper

https://extropic.ai/writing https://arxiv.org/abs/2510.23972

throwaway_7274 9 hours ago

I was hoping the preprint would explain the mysterious ancient runes on the device chassis :(
- helicone 3 hours ago
  
  i dig it.
  people are so scared of losing market share because of art choice they make all of their products smooth dark grey rectangles with no features.
  ugly.
  at least this one has some sense of beauty, the courage to make a decision about what looks good to them and act on it. they'll probably have to change the heptagon shape because no way that becomes a standard
  it costs so little to add artistic flair to a product, its really a shame fewer companies do
- ipsum2 9 hours ago
  
  The answer is that they're cosplaying sci-fi movies, in attempt to woo investors.
  
  dmos62 8 hours ago
  
  Why are you replying under every other comment here in this low effort, negative manner?
  
  helicone 3 hours ago
  
  i think they're a hater
  
  simonerlic 9 hours ago
  
  What, is a bit of whimsy illegal?
  
  rcxdude 6 hours ago
  
  A product of dubious niche value that has this much effort put into window dressing is suspicious.
  
  helicone 3 hours ago
  
  how much effort is it really to draw some doodles on the 3d model?
helicone 3 hours ago

can you play doom on it, yet?

nfw2 12 hours ago

I don't really understand the purpose of hyping up a launch announcement and then not making any effort whatsoever to make the progress comprehensible to anyone without advanced expertise in the field.

ipsum2 12 hours ago

That's the intention. Fill it up with enough jargon and gobbledegook that it looks impressive to investors, while hiding the fact that there's no real technology underneath.
- fastball 9 hours ago
  
  You not comprehending a technology does not automatically make it vaporware.
- frozenseven 9 hours ago
  
  >jargon and gobbledegook
  >no real technology underneath
  They're literally shipping real hardware. They also put out a paper + posted their code too.
  Flippant insults will not cut it.
  
  ipsum2 9 hours ago
  
  Nice try. It's smoke and mirrors. Tell me one thing it does better than a 20 year old CPU.
  
  maradan 9 hours ago
  
  This hardware is an analog simulator for Gibbs sampling, which is an idealized physical process that describes random systems with large scale structure. The energy efficient gains come from the fact that it's analog. It may seem like jargon, but Gibbs sampling is an extremely well known concept with decades of work with connections to many areas of statistics, probability theory, and machine learning. The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap, it's very similar to EBM learning/sampling but with the advantage of being able to sample larger systems for the same energy.
  
  theamk 4 hours ago
  
  > The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap,
  Is it?
  The paper is pretty dense, but Figure 1 is Fashion-MNIST which is "28x28 grayscale images" - which does not seem very real-life for me. Can they work on a bigger data? I assume not yet, otherwise they'd put something more impressive for figure 1.
  In the same way, it is totally unclear what kind of energy are they talking about, in the absolute terms - if you say "we've saved 0.1J on training jobs" this is simply not impressive enough. And how much overhead is it - Amdahl law is a thing, if you super-optimize the step that takes 1% of the time, the overall improvement would be negligible even if savings for that step are enormous.
  I've written a few CS papers myself back in the day, and the general idea was to always put the best results at the front. So they are either bad communicators, or they don't highlight answers to my questions because they don't have many impressive things (yet?). Their website is nifty, so I suspect the latter.
  
  frozenseven 9 hours ago
  
  More insults and a blanket refusal to engage with the material. Ok.
  
  ipsum2 9 hours ago
  
  If you think comparing hardware performance is an insult, then you have some emotional issues or are a troll.
  
  frozenseven 9 hours ago
  
  Ah, more insults. This will be my final reply to you.
  I'll say it again. The hardware exists. The paper and code are there. If someone wants to insist that it's fake or whatever, they need to come up with something better than permutations of "u r stoopid" (your response to their paper: https://news.ycombinator.com/item?id=45753471). Just engage with the actual material. If there's a solid criticism, I'd like to hear it too.
  
  rcxdude 6 hours ago
  
  The fact that there's real hardware and a paper doesn't mean the product is actually worth anything. It's very possible to make something (especially some extremely simplified 'proof of concept' which is not actually useful at all) and massively oversell it. Looking at the paper, it looks like it may have some very niche applications but it's really not obvious that it would be enough to justify the investment needed to make it better than existing general purpose hardware, and the amount of effort that's been put into 'sizzle' aimed at investors makes it look disingenuous.
  
  frozenseven 5 hours ago
  
  >The fact that there's real hardware and a paper doesn't mean the product is actually worth anything.
  I said you can't dismiss someone's hardware + paper + code solely based on insults. That's what I said. That was my argument. Speaking of which:
  >disingenuous
  >sizzle
  >oversell
  >dubious niche value
  >window dressing
  >suspicious
  For the life of me I can't understand how any of this is an appropriate response when the other guy is showing you math and circuits.
- maradan 9 hours ago
  
  "no really technology underneath" zzzzzzzzzzz
lacy_tinpot 11 hours ago

What's not comprehensible?
It's just miniaturized lava lamps.
- nfw2 10 hours ago
  
  A lava lamps that just produces randomness, ie for cryptology purposes, is different than the benefit here, which is to produce specific randomness at low energy-cost

alyxya 11 hours ago

This seems to be the page that describes the low level details of what the hardware aims to do. https://extropic.ai/writing/tsu-101-an-entirely-new-type-of-...

To me, the biggest limitation is that you’d need an entirely new stack to support a new paradigm. It doesn’t seem compatible with using existing pretrained models. There’s plenty of ways to have much more efficient paradigms of computation, but it’ll be a long while before any are mature enough to show substantial value.

vlovich123 12 hours ago

I’ve been wondering how long it would take for someone to try probabilistic computing for AI workloads - the imprecision inherent in the workload makes it ideally suited for AI matrix math with a significant power reduction. My professor in university was researching this space and it seemed very interesting. I never thought it could supplant CPUs necessarily but certainly massive computer applications that don’t require precise math like 3D rendering (and now AI) always seemed like a natural fit.

Imustaskforhelp 11 hours ago

I don't think that it does AI matrix math with significant power reduction but rather it just seems to provide rng? I may be wrong but I don't think what you are saying is true in my limited knowledge, maybe someone can tell what is the reality of it, whether it can do Ai matrix math with significant power reduction or not or if its even their goal right now as to me currently it feels like a lava lamp equivalent* thing as some other commenter said
- rcxdude 6 hours ago
  
  The paper talks about some quite old-school AI techniques (the kind of thing I learned about in university a decade ago when it was already on its way out). It's not anything to do with matrix multiplications (well, anything do with computing them faster directly) but instead being able to sample from a complex distribution more efficiently by have dedicated circuits to simulate elements of that distribution in hardware. So it won't make your neural nets any faster.
6510 10 hours ago

I'm still waiting for my memristors.

quantumHazer 12 hours ago

there is also Normal Computing[0] that are trying different approaches to chips like that. Anyway these are very difficult problems and Extropic already abandoned some of their initial claims about superconductors to pivot to more classical CMOS circuits[1]

[0]: https://www.normalcomputing.com

[1]: https://www.zach.be/p/making-unconventional-computing-practi...

sashank_1509 10 hours ago

The cool thing about Silicon Valley is serious people try stuff that may seem wild and unlikely and in the off chance it works, entire humanity benefits. This looks like Atomic Semi, Joby Aviation, maybe even OpenAI in its early days.

The bad thing about Silicon Valley is charlatans abuse this openness and friendly spirit, and swindle investors of millions with pipe dreams and worthless technology. I think the second is inevitable as Silicon Valley becomes more famous, more high status without a strong gatekeeping mechanism which is also anathema to its open ethos. Unfortunately this company is firmly in the second category. A performative startup, “changing the world” to satisfy the neurosis of its founders who desperately want to be seen as someone taking risks to change the world. In reality it will change nothing, and go die into the dustbins of history. I hope he enjoys his 15 minutes of fame.

nfw2 10 hours ago

What makes you so sure that extropic is the second and not the first?
- sashank_1509 10 hours ago
  
  Fundamentally, gut feels by following the founder on Twitter. But if I had to explain, I don’t understand the point of speeding up or getting true RnG, even for diffusion models this is not a big bottleneck, so it sounds more like a buzzword than actual practical technology.
  
  jazzyjackson 10 hours ago
  
  Having a TRNG is easy, you just reverse bias a zener diode or any number of other strategies that rely on physics for noise. Hype is a strategy they're clearly taking, but people in this thread are so dismissive (and I get why, extropic has been vague posting for years and makes it sound like vaporware) but what does everything think they're actually doing with the money? It's not a better dice roller...
  
  nfw2 9 hours ago
  
  What is it if not a better dice roller though? Isn't that what they are claiming it is? And also that this better dice rolling is very important (and I admittedly am not someone who can evaluate this)
  
  sashank_1509 3 hours ago
  
  Yes, I think they claim they are a far better dice roller in randomness and speed and that this is very important. The first might be true, but I don’t see why second is in any way true. These all need to be true for this company to make sense :
  1. They build a chip that does random sampling far better than any GPU (is this even proven yet?)
  2. They use a model architecture that utilizes this sampling advantage which means most of the computation must be concentrated at sampling. This might be true for energy based models or some future architecture we have no idea about. AFAIK, this is not even true for diffusion.
  3. This model architecture must outcompete autoregressive models in economically useful tasks, whether language modeling or robotics etc, right now auto regressive transformers is still king across all tasks.
  And then their chip will be bought by hyper scalers and their company will become successful. There is just so many if’s outside of them building their core technology that this whole project makes no sense. And you can say that this is true for all startups, I don’t think that’s the case, this is just ridiculous.

Void_ 10 hours ago

This gives me Devs vibe (2020 TV Series) - https://www.indiewire.com/awards/industry/devs-cinematograph...

tcdent 10 hours ago

Such an underrated TV show.

lordofgibbons 8 hours ago

How should we think about how much effective compute is being done with these devices compared to classical (GPU) computing? Obviously FLOPs doesn't make sense, so what does?

fidotron 11 hours ago

Is this the new term for analog VLSI?

Or if we call it analog is it too obvious what the problems are going to be?

jabedude 7 hours ago

Question for the experts in the field: why does this need to be a CPU and not a dongle you plug into a server and query?

docandrew 11 hours ago

Hype aside, if you can get an answer to a computing problem with error bars in significantly less time, where precision just isn’t that important (such as LLMs) this could be a game changer.

alyxya 10 hours ago

Precision actually matters a decent amount in LLMs. Quantization is used strategically in places that’ll minimize performance degradation, and models are smart enough so some loss in performance still gives a good model. I’m skeptical how well this would turn out, but it’s probably always possible to remedy precision loss with a sufficiently larger model though.
- fastball 9 hours ago
  
  LLMs are inherently probabilistic. Things like ReLU throw out a ton of data deliberately.
  
  alyxya 8 hours ago
  
  No that isn’t throwing out data. Activation functions perform a nonlinear transformation to increase the expressivity of a function. If you did two matrix multiplications without ReLU in between, your function contains less information than with a ReLU in between.
  
  fastball 4 hours ago
  
  How are you calculating "less information"?
  
  shwaj 39 minutes ago
  
  I think what they meant was:
  Two linear transformations compose into a single linear transformation. If you have y = W2(W1*x) = (W2*W1)*x = W*x where W = W2*W1, you've just done one matrix multiply instead of two. The composition of linear functions is linear.
  The ReLU breaks this because it's nonlinear: ReLU(W1*x) can't be rewritten as some W*x, so W2(ReLU(W1*x)) can't collapse either.
  Without nonlinearities like ReLU, many layers of a neural network could be collapsed into a single matrix multiplication. This inherently limits the function approximation that it can do, because linear functions are not very good at approximating nonlinear functions. And there are many nonlinearities involved in modeling speech, video, etc.

antics 6 hours ago

I like this but based on what I am seeing here and the THRML readme, I would describe this as "an ML stack that is fully prepared for the Bayesian revolution of 2003-2015." A kind of AI equivalent of, like, post-9/11 airport security. I mean this in a value-neutral way, as personally I think that era of models was very beautiful.

The core idea of THRML, as I understand it, is to present a nice programming interface to hardware where coin-flipping is vanishingly cheap. This is moderately useful to deep learning, but the artisanally hand-crafted models of the mid-2000s did essentially nothing at all except flip coins, and it would have been enormously helpful to have something like this in the wild at that time.

The core "trick" of the era was to make certain very useful but intractable distributions built on something called "infinitely exchangeable sequences" merely almost intractable. The trick, roughly, was that conditioning on some measure space makes those sequences plain-old iid, which (via a small amount of graduate-level math) implies that a collection of "outcomes" can be thought of as a random sample of the underlying distribution. And that, in turn, meant that the model training regimens of the time did a lot of sampling, or coin-flipping, as we have said here.

Peruse the THRML README[1] and you'll see the who's who of techniques and modeling prodedures of the time. "Gibbs sampling", "probabilistic graphical models", and "energy-based models", and so on. All of these are weaponized coin flipping.

I imagine the terminus of this school of thought is basically a natively-probabilistic programming environment. Garden variety deterministic computing is essentially probabilistic computing where every statement returns a value with probability 1. So in that sense, probabilistic computing is a ful generalization of deterministic computing, since an `if` might return a value with some probability other than 1. There was an entire genre of languages like this, e.g., Church. And now, 22 years later, we have our own hardware for it. (Incidentally this line of inquiry is also how we know that conditional joint distributions are Turing complete.)

Tragically, I think, this may have arrived too late. This is not nearly as helpful in the world of deep learning, with its large, ugly, and relatively sample-free models. Everyone hates to hear that you're cheering from the sidelines, but this time I really am. I think it's a great idea, just too late.

[1]: https://github.com/extropic-ai/thrml/blob/7f40e5cbc460a4e2e9...

wasabi991011 an hour ago

Really informative insight, thanks. I'm not too familiar with those models, is there any chance that this hardware could lead to a renaissance of sample-based methods? Given efficient hardware, would they scale to LLM size, and/or would they allow ML to answer some types of currently unanswerable questions?
- antics 16 minutes ago
  
  Any time something costs trillionths of a cent to do, there is an enormous economic incentive to turn literally everything you can into that thing. Since the 50s “that thing” has been arithmetic, and as a result, we’ve just spent 70 years trying to turn everything from HR records to images into arithmetic.
  Whether “that thing” is about to be sampling is not for me to say. The probability is certainly not 0 though.

arjie 9 hours ago

Has anyone received the dev board? What did you do with it? Curious what this can do.

hereme888 7 hours ago

Looks like an artifact from Assassin's Creed or Halo.

motohagiography 7 hours ago

i've followed them for a while and as just a general technologist and not a scientist, i have a probably wrong idea of what they do, but perhaps correcting it will let others write about it more accurately.

my handwavy analogy interpretation was they were in-effect building an analog computer for AI model training, using some ideas that originated in quantum computing. their insight is that since model training is itself probabilistic, you don't need discrete binary computation to do it, you just need something that implements the sigmoid function for training a NN.

they had some physics to show they could cause a bunch of atoms to polarize (conceptually) instantaneously using the thermodynamic properties of a material, and the result would be mostly deterministic over large samples. the result is what they are calling a "probabilistic bit" or pbit, which is an inferred state over a probability distribution, and where the inference is incorrect, they just "get it in post," because the speed of the training data through a network of these pbits is so much more efficient that it's faster to just augment and correct the result in the model afterwards than to use classical clock cycles to directly compute it.

frozenseven 8 hours ago

It's finally here! Extropic has been working on this since 2022. I'm really excited to see how this performs in the real world.

rvz 12 hours ago

This looks really amazing if not unbelieveable to the point where it is almost too good to be real.

I have not seen benchmarks on Extropic's new computing hardware yet but need to know from experts who are in the field of AI infrastructure at the semiconductor level if this is legit.

I'm 75% believing that this is real but have a 25% skepticisim and will reserve judgement until others have tried the hardware.

So my only question for the remaining 25%:

Is this a scam?

arisAlexis 12 hours ago

Too good to be true, incomprehensible jargon to go along...
KaiserPro 11 hours ago

I mean it sure looks like a scam.
I really like the design of it though.
unsupp0rted 11 hours ago

I doubt it’s a scam. Beff might be on to something or completely delusional, but not actively scamming.
- vzcx 11 hours ago
  
  The best conmen have an abundance confidence in themselves.
  
  delichon 11 hours ago
  
  This one just released a prototype platform, "XTR-0", so if it's a fraud the jig will shortly be up.
  https://extropic.ai/writing/inside-x0-and-xtr-0
  
  natosaichek 8 hours ago
  
  I think it's more a concern that the hardware isn't useful in the real world, rather than that the hardware doesn't meet the description they provide of it.

moralestapia 11 hours ago

Nice!

This is "quantum" computing, btw.

trevormccrt 8 hours ago

Actually it's not. Here's some stuff to read to get a clearer picture! https://extropic.ai/writing
- moralestapia 6 hours ago
  
  It strictly is not, as no quantum phenomena is being measured (hence why I used the quotes); but if all goes well w/ extropic you'll most likely end up doing quantum again.

behnamoh 12 hours ago

Usually there's a negative correlation between the fanciness of a startup webpage and the actual value/product they'll deliver.

This gives "hype" vibes.

wfurney 6 hours ago

Interesting you say that, I had an instinctual reaction in that vein as well. I chalked it up to bias since I couldn’t think of any concrete examples. Something about the webpage being so nice made me think they’ve spent a lot of time on it (relative to their product?) Admittedly I’m nowhere close to even trying to understand their paper, but I’m interested in seeing what others think about it
- rcxdude 6 hours ago
  
  I've seen it as well. One thing that's universally true about potential competitor startups in the field I work in is that the ones who don't actually have anything concrete to show have way nicer websites than ours (some have significantly more funding and still nothing to show).
  I have a passing familiarity with the areas they talk about in the paper, and it feels... dubious. Mainly because of the dedicated accelerator problem. Even dedicated neural net accelerators are having difficulty gaining traction against general purpose compute units in a market that is ludicrously hot for neural net processing, and this is talking about accelerating Monte-Carlo processes which are pretty damn niche in application nowadays (especially in situations where you're compute-limited). So even if they succeed in speeding up that application, it's hard to see how worthwhile it would be. And it's not obvious from the publicly available information whether they're close to even beating the FPGA emulation of the concept which was used in the paper.
noir_lord 10 hours ago

I'm more impressed that my laptop fans came on when I loaded the page.
It's the one attached to my TV that just runs movies/YT - I don't recall the last time I heard the fans.
- ecshafer 10 hours ago
  
  They did say thermodynamic computing.
- imploded_sub 4 hours ago
  
  Same on a serious dev machine. That page just pegs a core at max, it's sort of impressive.

OBELISK_ASI 10 hours ago

[dead]