I listened to the Hinton podcast few days ago, he mentioned (IIRC) that "analog" AIs are bad because the models can not be transfered/duplicated in a lossless way, like in .gguf format, every analog system is built differently you have to re-learn/re-train again somehow
It is a hardware RNG they are building. The claim is that their solution is going to be more computationally efficient for a narrow class of problems (de-noising step for diffusion AI models) vs current state of the art. Maybe.
This is what they are trying to create, more specifically:
It's not just a "hardware RNG". An RNG outputs a uniform distribution. This hardware outputs randomness with controllable distributions, potentially extremely complex ones, many orders of magnitude more efficiently than doing it the traditional way with ALUs. The class of problems that can be solved by sampling from extremely complex probability distributions is much larger than you might naively expect.
I was skeptical of Extropic from the start, but what they've shown here exceeded my low expectations. They've made real hardware which is novel and potentially useful in the future after a lot more R&D. Analog computing implemented in existing CMOS processes that can run AI more efficiently by four orders of magnitude would certainly be revolutionary. That final outcome seems far enough away that this should probably still be the domain of university research labs rather than a venture-backed startup, but I still applaud the effort and wish them luck.
An old concept indeed! I think about this Ed Fredkin story a lot... In his words:
"Just a funny story about random numbers: in the early days of computers people wanted to have random numbers for Monte Carlo simulations and stuff like that and so a great big wonderful computer was being designed at MIT’s Lincoln laboratory. It was the largest fastest computer in the world called TX2 and was to have every bell and whistle possible: a display screen that was very fancy and stuff like that. And they decided they were going to solve the random number problem, so they included a register that always yielded a random number; this was really done carefully with radioactive material and Geiger counters, and so on. And so whenever you read this register you got a truly random number, and they thought: “This is a great advance in random numbers for computers!” But the experience was contrary to their expectations! Which was that it turned into a great disaster and everyone ended up hating it: no one writing a program could debug it, because it never ran the same way twice, so ... This was a bit of an exaggeration, but as a result everybody decided that the random number generators of the traditional kind, i.e., shift register sequence generated type and so on, were much better. So that idea got abandoned, and I don’t think it has ever reappeared."
It's funny because that did actually reappear at some point with rdrand. But still it's only really used for cryptography, if you just need a random distribution almost everyone just uses a PRNG (a non-cryptographic one is a lot faster still, apart from being deterministic).
Generating randomness is not a bottleneck and modern SIMD CPUs should be more than fast enough. I thought they’re building approximate computation where a*b is computed within some error threshold p.
I think that's underselling it a bit, since there's lots of existing ways to have A hardware RNG. They're trying to use lots and lots of hardware RNG to solve probabilistic problems a little more probabilisticly.
people are so scared of losing market share because of art choice they make all of their products smooth dark grey rectangles with no features.
ugly.
at least this one has some sense of beauty, the courage to make a decision about what looks good to them and act on it. they'll probably have to change the heptagon shape because no way that becomes a standard
it costs so little to add artistic flair to a product, its really a shame fewer companies do
I don't really understand the purpose of hyping up a launch announcement and then not making any effort whatsoever to make the progress comprehensible to anyone without advanced expertise in the field.
That's the intention. Fill it up with enough jargon and gobbledegook that it looks impressive to investors, while hiding the fact that there's no real technology underneath.
This hardware is an analog simulator for Gibbs sampling, which is an idealized physical process that describes random systems with large scale structure. The energy efficient gains come from the fact that it's analog. It may seem like jargon, but Gibbs sampling is an extremely well known concept with decades of work with connections to many areas of statistics, probability theory, and machine learning. The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap, it's very similar to EBM learning/sampling but with the advantage of being able to sample larger systems for the same energy.
> The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap,
Is it?
The paper is pretty dense, but Figure 1 is Fashion-MNIST which is "28x28 grayscale images" - which does not seem very real-life for me. Can they work on a bigger data? I assume not yet, otherwise they'd put something more impressive for figure 1.
In the same way, it is totally unclear what kind of energy are they talking about, in the absolute terms - if you say "we've saved 0.1J on training jobs" this is simply not impressive enough. And how much overhead is it - Amdahl law is a thing, if you super-optimize the step that takes 1% of the time, the overall improvement would be negligible even if savings for that step are enormous.
I've written a few CS papers myself back in the day, and the general idea was to always put the best results at the front. So they are either bad communicators, or they don't highlight answers to my questions because they don't have many impressive things (yet?). Their website is nifty, so I suspect the latter.
Ah, more insults. This will be my final reply to you.
I'll say it again. The hardware exists. The paper and code are there. If someone wants to insist that it's fake or whatever, they need to come up with something better than permutations of "u r stoopid" (your response to their paper: https://news.ycombinator.com/item?id=45753471). Just engage with the actual material. If there's a solid criticism, I'd like to hear it too.
The fact that there's real hardware and a paper doesn't mean the product is actually worth anything. It's very possible to make something (especially some extremely simplified 'proof of concept' which is not actually useful at all) and massively oversell it. Looking at the paper, it looks like it may have some very niche applications but it's really not obvious that it would be enough to justify the investment needed to make it better than existing general purpose hardware, and the amount of effort that's been put into 'sizzle' aimed at investors makes it look disingenuous.
A lava lamps that just produces randomness, ie for cryptology purposes, is different than the benefit here, which is to produce specific randomness at low energy-cost
To me, the biggest limitation is that you’d need an entirely new stack to support a new paradigm. It doesn’t seem compatible with using existing pretrained models. There’s plenty of ways to have much more efficient paradigms of computation, but it’ll be a long while before any are mature enough to show substantial value.
I’ve been wondering how long it would take for someone to try probabilistic computing for AI workloads - the imprecision inherent in the workload makes it ideally suited for AI matrix math with a significant power reduction. My professor in university was researching this space and it seemed very interesting. I never thought it could supplant CPUs necessarily but certainly massive computer applications that don’t require precise math like 3D rendering (and now AI) always seemed like a natural fit.
I don't think that it does AI matrix math with significant power reduction but rather it just seems to provide rng? I may be wrong but I don't think what you are saying is true in my limited knowledge, maybe someone can tell what is the reality of it, whether it can do Ai matrix math with significant power reduction or not or if its even their goal right now as to me currently it feels like a lava lamp equivalent* thing as some other commenter said
The paper talks about some quite old-school AI techniques (the kind of thing I learned about in university a decade ago when it was already on its way out). It's not anything to do with matrix multiplications (well, anything do with computing them faster directly) but instead being able to sample from a complex distribution more efficiently by have dedicated circuits to simulate elements of that distribution in hardware. So it won't make your neural nets any faster.
there is also Normal Computing[0] that are trying different approaches to chips like that. Anyway these are very difficult problems and Extropic already abandoned some of their initial claims about superconductors to pivot to more classical CMOS circuits[1]
The cool thing about Silicon Valley is serious people try stuff that may seem wild and unlikely and in the off chance it works, entire humanity benefits. This looks like Atomic Semi, Joby Aviation, maybe even OpenAI in its early days.
The bad thing about Silicon Valley is charlatans abuse this openness and friendly spirit, and swindle investors of millions with pipe dreams and worthless technology. I think the second is inevitable as Silicon Valley becomes more famous, more high status without a strong gatekeeping mechanism which is also anathema to its open ethos. Unfortunately this company is firmly in the second category. A performative startup, “changing the world” to satisfy the neurosis of its founders who desperately want to be seen as someone taking risks to change the world. In reality it will change nothing, and go die into the dustbins of history. I hope he enjoys his 15 minutes of fame.
Fundamentally, gut feels by following the founder on Twitter. But if I had to explain, I don’t understand the point of speeding up or getting true RnG, even for diffusion models this is not a big bottleneck, so it sounds more like a buzzword than actual practical technology.
Having a TRNG is easy, you just reverse bias a zener diode or any number of other strategies that rely on physics for noise. Hype is a strategy they're clearly taking, but people in this thread are so dismissive (and I get why, extropic has been vague posting for years and makes it sound like vaporware) but what does everything think they're actually doing with the money? It's not a better dice roller...
What is it if not a better dice roller though? Isn't that what they are claiming it is? And also that this better dice rolling is very important (and I admittedly am not someone who can evaluate this)
Yes, I think they claim they are a far better dice roller in randomness and speed and that this is very important. The first might be true, but I don’t see why second is in any way true. These all need to be true for this company to make sense :
1. They build a chip that does random sampling far better than any GPU (is this even proven yet?)
2. They use a model architecture that utilizes this sampling advantage which means most of the computation must be concentrated at sampling. This might be true for energy based models or some future architecture we have no idea about. AFAIK, this is not even true for diffusion.
3. This model architecture must outcompete autoregressive models in economically useful tasks, whether language modeling or robotics etc, right now auto regressive transformers is still king across all tasks.
And then their chip will be bought by hyper scalers and their company will become successful. There is just so many if’s outside of them building their core technology that this whole project makes no sense. And you can say that this is true for all startups, I don’t think that’s the case, this is just ridiculous.
How should we think about how much effective compute is being done with these devices compared to classical (GPU) computing? Obviously FLOPs doesn't make sense, so what does?
Hype aside, if you can get an answer to a computing problem with error bars in significantly less time, where precision just isn’t that important (such as LLMs) this could be a game changer.
Precision actually matters a decent amount in LLMs. Quantization is used strategically in places that’ll minimize performance degradation, and models are smart enough so some loss in performance still gives a good model. I’m skeptical how well this would turn out, but it’s probably always possible to remedy precision loss with a sufficiently larger model though.
No that isn’t throwing out data. Activation functions perform a nonlinear transformation to increase the expressivity of a function. If you did two matrix multiplications without ReLU in between, your function contains less information than with a ReLU in between.
Two linear transformations compose into a single linear transformation. If you have y = W2(W1*x) = (W2*W1)*x = W*x where W = W2*W1, you've just done one matrix multiply instead of two. The composition of linear functions is linear.
The ReLU breaks this because it's nonlinear: ReLU(W1*x) can't be rewritten as some W*x, so W2(ReLU(W1*x)) can't collapse either.
Without nonlinearities like ReLU, many layers of a neural network could be collapsed into a single matrix multiplication. This inherently limits the function approximation that it can do, because linear functions are not very good at approximating nonlinear functions. And there are many nonlinearities involved in modeling speech, video, etc.
I like this but based on what I am seeing here and the THRML readme, I would describe this as "an ML stack that is fully prepared for the Bayesian revolution of 2003-2015." A kind of AI equivalent of, like, post-9/11 airport security. I mean this in a value-neutral way, as personally I think that era of models was very beautiful.
The core idea of THRML, as I understand it, is to present a nice programming interface to hardware where coin-flipping is vanishingly cheap. This is moderately useful to deep learning, but the artisanally hand-crafted models of the mid-2000s did essentially nothing at all except flip coins, and it would have been enormously helpful to have something like this in the wild at that time.
The core "trick" of the era was to make certain very useful but intractable distributions built on something called "infinitely exchangeable sequences" merely almost intractable. The trick, roughly, was that conditioning on some measure space makes those sequences plain-old iid, which (via a small amount of graduate-level math) implies that a collection of "outcomes" can be thought of as a random sample of the underlying distribution. And that, in turn, meant that the model training regimens of the time did a lot of sampling, or coin-flipping, as we have said here.
Peruse the THRML README[1] and you'll see the who's who of techniques and modeling prodedures of the time. "Gibbs sampling", "probabilistic graphical models", and "energy-based models", and so on. All of these are weaponized coin flipping.
I imagine the terminus of this school of thought is basically a natively-probabilistic programming environment. Garden variety deterministic computing is essentially probabilistic computing where every statement returns a value with probability 1. So in that sense, probabilistic computing is a ful generalization of deterministic computing, since an `if` might return a value with some probability other than 1. There was an entire genre of languages like this, e.g., Church. And now, 22 years later, we have our own hardware for it. (Incidentally this line of inquiry is also how we know that conditional joint distributions are Turing complete.)
Tragically, I think, this may have arrived too late. This is not nearly as helpful in the world of deep learning, with its large, ugly, and relatively sample-free models. Everyone hates to hear that you're cheering from the sidelines, but this time I really am. I think it's a great idea, just too late.
Really informative insight, thanks. I'm not too familiar with those models, is there any chance that this hardware could lead to a renaissance of sample-based methods? Given efficient hardware, would they scale to LLM size, and/or would they allow ML to answer some types of currently unanswerable questions?
Any time something costs trillionths of a cent to do, there is an enormous economic incentive to turn literally everything you can into that thing. Since the 50s “that thing” has been arithmetic, and as a result, we’ve just spent 70 years trying to turn everything from HR records to images into arithmetic.
Whether “that thing” is about to be sampling is not for me to say. The probability is certainly not 0 though.
i've followed them for a while and as just a general technologist and not a scientist, i have a probably wrong idea of what they do, but perhaps correcting it will let others write about it more accurately.
my handwavy analogy interpretation was they were in-effect building an analog computer for AI model training, using some ideas that originated in quantum computing. their insight is that since model training is itself probabilistic, you don't need discrete binary computation to do it, you just need something that implements the sigmoid function for training a NN.
they had some physics to show they could cause a bunch of atoms to polarize (conceptually) instantaneously using the thermodynamic properties of a material, and the result would be mostly deterministic over large samples. the result is what they are calling a "probabilistic bit" or pbit, which is an inferred state over a probability distribution, and where the inference is incorrect, they just "get it in post," because the speed of the training data through a network of these pbits is so much more efficient that it's faster to just augment and correct the result in the model afterwards than to use classical clock cycles to directly compute it.
This looks really amazing if not unbelieveable to the point where it is almost too good to be real.
I have not seen benchmarks on Extropic's new computing hardware yet but need to know from experts who are in the field of AI infrastructure at the semiconductor level if this is legit.
I'm 75% believing that this is real but have a 25% skepticisim and will reserve judgement until others have tried the hardware.
I think it's more a concern that the hardware isn't useful in the real world, rather than that the hardware doesn't meet the description they provide of it.
It strictly is not, as no quantum phenomena is being measured (hence why I used the quotes); but if all goes well w/ extropic you'll most likely end up doing quantum again.
Interesting you say that, I had an instinctual reaction in that vein as well. I chalked it up to bias since I couldn’t think of any concrete examples. Something about the webpage being so nice made me think they’ve spent a lot of time on it (relative to their product?) Admittedly I’m nowhere close to even trying to understand their paper, but I’m interested in seeing what others think about it
I've seen it as well. One thing that's universally true about potential competitor startups in the field I work in is that the ones who don't actually have anything concrete to show have way nicer websites than ours (some have significantly more funding and still nothing to show).
I have a passing familiarity with the areas they talk about in the paper, and it feels... dubious. Mainly because of the dedicated accelerator problem. Even dedicated neural net accelerators are having difficulty gaining traction against general purpose compute units in a market that is ludicrously hot for neural net processing, and this is talking about accelerating Monte-Carlo processes which are pretty damn niche in application nowadays (especially in situations where you're compute-limited). So even if they succeed in speeding up that application, it's hard to see how worthwhile it would be. And it's not obvious from the publicly available information whether they're close to even beating the FPGA emulation of the concept which was used in the paper.
I listened to the Hinton podcast few days ago, he mentioned (IIRC) that "analog" AIs are bad because the models can not be transfered/duplicated in a lossless way, like in .gguf format, every analog system is built differently you have to re-learn/re-train again somehow
Does TSUs have to same issue?
It is a hardware RNG they are building. The claim is that their solution is going to be more computationally efficient for a narrow class of problems (de-noising step for diffusion AI models) vs current state of the art. Maybe.
This is what they are trying to create, more specifically:
https://pubs.aip.org/aip/apl/article/119/15/150503/40486/Pro...
It's not just a "hardware RNG". An RNG outputs a uniform distribution. This hardware outputs randomness with controllable distributions, potentially extremely complex ones, many orders of magnitude more efficiently than doing it the traditional way with ALUs. The class of problems that can be solved by sampling from extremely complex probability distributions is much larger than you might naively expect.
I was skeptical of Extropic from the start, but what they've shown here exceeded my low expectations. They've made real hardware which is novel and potentially useful in the future after a lot more R&D. Analog computing implemented in existing CMOS processes that can run AI more efficiently by four orders of magnitude would certainly be revolutionary. That final outcome seems far enough away that this should probably still be the domain of university research labs rather than a venture-backed startup, but I still applaud the effort and wish them luck.
An old concept indeed! I think about this Ed Fredkin story a lot... In his words:
"Just a funny story about random numbers: in the early days of computers people wanted to have random numbers for Monte Carlo simulations and stuff like that and so a great big wonderful computer was being designed at MIT’s Lincoln laboratory. It was the largest fastest computer in the world called TX2 and was to have every bell and whistle possible: a display screen that was very fancy and stuff like that. And they decided they were going to solve the random number problem, so they included a register that always yielded a random number; this was really done carefully with radioactive material and Geiger counters, and so on. And so whenever you read this register you got a truly random number, and they thought: “This is a great advance in random numbers for computers!” But the experience was contrary to their expectations! Which was that it turned into a great disaster and everyone ended up hating it: no one writing a program could debug it, because it never ran the same way twice, so ... This was a bit of an exaggeration, but as a result everybody decided that the random number generators of the traditional kind, i.e., shift register sequence generated type and so on, were much better. So that idea got abandoned, and I don’t think it has ever reappeared."
RIP Ed. https://en.wikipedia.org/wiki/Edward_Fredkin
And still today we spend a great deal of effort trying to make our randomly-sampled LLM outputs reproducibly deterministic:
https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
It's funny because that did actually reappear at some point with rdrand. But still it's only really used for cryptography, if you just need a random distribution almost everyone just uses a PRNG (a non-cryptographic one is a lot faster still, apart from being deterministic).
Generating randomness is not a bottleneck and modern SIMD CPUs should be more than fast enough. I thought they’re building approximate computation where a*b is computed within some error threshold p.
Generating enough random numbers with the right distribution for Gibbs sampling, at incredibly low power is what their hardware does.
I think that's underselling it a bit, since there's lots of existing ways to have A hardware RNG. They're trying to use lots and lots of hardware RNG to solve probabilistic problems a little more probabilisticly.
I tried this, but not with the "AI magic" angle. It turns out nobody cares because CSPRNGs are random enough and really fast.
https://en.wikipedia.org/wiki/Lavarand
If you want to understand exactly what we are building, read our blogs and then our paper
https://extropic.ai/writing https://arxiv.org/abs/2510.23972
I was hoping the preprint would explain the mysterious ancient runes on the device chassis :(
i dig it.
people are so scared of losing market share because of art choice they make all of their products smooth dark grey rectangles with no features.
ugly.
at least this one has some sense of beauty, the courage to make a decision about what looks good to them and act on it. they'll probably have to change the heptagon shape because no way that becomes a standard
it costs so little to add artistic flair to a product, its really a shame fewer companies do
The answer is that they're cosplaying sci-fi movies, in attempt to woo investors.
Why are you replying under every other comment here in this low effort, negative manner?
i think they're a hater
What, is a bit of whimsy illegal?
A product of dubious niche value that has this much effort put into window dressing is suspicious.
how much effort is it really to draw some doodles on the 3d model?
can you play doom on it, yet?
I don't really understand the purpose of hyping up a launch announcement and then not making any effort whatsoever to make the progress comprehensible to anyone without advanced expertise in the field.
That's the intention. Fill it up with enough jargon and gobbledegook that it looks impressive to investors, while hiding the fact that there's no real technology underneath.
You not comprehending a technology does not automatically make it vaporware.
>jargon and gobbledegook
>no real technology underneath
They're literally shipping real hardware. They also put out a paper + posted their code too.
Flippant insults will not cut it.
Nice try. It's smoke and mirrors. Tell me one thing it does better than a 20 year old CPU.
This hardware is an analog simulator for Gibbs sampling, which is an idealized physical process that describes random systems with large scale structure. The energy efficient gains come from the fact that it's analog. It may seem like jargon, but Gibbs sampling is an extremely well known concept with decades of work with connections to many areas of statistics, probability theory, and machine learning. The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap, it's very similar to EBM learning/sampling but with the advantage of being able to sample larger systems for the same energy.
> The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap,
Is it?
The paper is pretty dense, but Figure 1 is Fashion-MNIST which is "28x28 grayscale images" - which does not seem very real-life for me. Can they work on a bigger data? I assume not yet, otherwise they'd put something more impressive for figure 1.
In the same way, it is totally unclear what kind of energy are they talking about, in the absolute terms - if you say "we've saved 0.1J on training jobs" this is simply not impressive enough. And how much overhead is it - Amdahl law is a thing, if you super-optimize the step that takes 1% of the time, the overall improvement would be negligible even if savings for that step are enormous.
I've written a few CS papers myself back in the day, and the general idea was to always put the best results at the front. So they are either bad communicators, or they don't highlight answers to my questions because they don't have many impressive things (yet?). Their website is nifty, so I suspect the latter.
More insults and a blanket refusal to engage with the material. Ok.
If you think comparing hardware performance is an insult, then you have some emotional issues or are a troll.
Ah, more insults. This will be my final reply to you.
I'll say it again. The hardware exists. The paper and code are there. If someone wants to insist that it's fake or whatever, they need to come up with something better than permutations of "u r stoopid" (your response to their paper: https://news.ycombinator.com/item?id=45753471). Just engage with the actual material. If there's a solid criticism, I'd like to hear it too.
The fact that there's real hardware and a paper doesn't mean the product is actually worth anything. It's very possible to make something (especially some extremely simplified 'proof of concept' which is not actually useful at all) and massively oversell it. Looking at the paper, it looks like it may have some very niche applications but it's really not obvious that it would be enough to justify the investment needed to make it better than existing general purpose hardware, and the amount of effort that's been put into 'sizzle' aimed at investors makes it look disingenuous.
>The fact that there's real hardware and a paper doesn't mean the product is actually worth anything.
I said you can't dismiss someone's hardware + paper + code solely based on insults. That's what I said. That was my argument. Speaking of which:
>disingenuous
>sizzle
>oversell
>dubious niche value
>window dressing
>suspicious
For the life of me I can't understand how any of this is an appropriate response when the other guy is showing you math and circuits.
"no really technology underneath" zzzzzzzzzzz
What's not comprehensible?
It's just miniaturized lava lamps.
A lava lamps that just produces randomness, ie for cryptology purposes, is different than the benefit here, which is to produce specific randomness at low energy-cost
This seems to be the page that describes the low level details of what the hardware aims to do. https://extropic.ai/writing/tsu-101-an-entirely-new-type-of-...
To me, the biggest limitation is that you’d need an entirely new stack to support a new paradigm. It doesn’t seem compatible with using existing pretrained models. There’s plenty of ways to have much more efficient paradigms of computation, but it’ll be a long while before any are mature enough to show substantial value.
I’ve been wondering how long it would take for someone to try probabilistic computing for AI workloads - the imprecision inherent in the workload makes it ideally suited for AI matrix math with a significant power reduction. My professor in university was researching this space and it seemed very interesting. I never thought it could supplant CPUs necessarily but certainly massive computer applications that don’t require precise math like 3D rendering (and now AI) always seemed like a natural fit.
I don't think that it does AI matrix math with significant power reduction but rather it just seems to provide rng? I may be wrong but I don't think what you are saying is true in my limited knowledge, maybe someone can tell what is the reality of it, whether it can do Ai matrix math with significant power reduction or not or if its even their goal right now as to me currently it feels like a lava lamp equivalent* thing as some other commenter said
The paper talks about some quite old-school AI techniques (the kind of thing I learned about in university a decade ago when it was already on its way out). It's not anything to do with matrix multiplications (well, anything do with computing them faster directly) but instead being able to sample from a complex distribution more efficiently by have dedicated circuits to simulate elements of that distribution in hardware. So it won't make your neural nets any faster.
I'm still waiting for my memristors.
there is also Normal Computing[0] that are trying different approaches to chips like that. Anyway these are very difficult problems and Extropic already abandoned some of their initial claims about superconductors to pivot to more classical CMOS circuits[1]
[0]: https://www.normalcomputing.com
[1]: https://www.zach.be/p/making-unconventional-computing-practi...
The cool thing about Silicon Valley is serious people try stuff that may seem wild and unlikely and in the off chance it works, entire humanity benefits. This looks like Atomic Semi, Joby Aviation, maybe even OpenAI in its early days.
The bad thing about Silicon Valley is charlatans abuse this openness and friendly spirit, and swindle investors of millions with pipe dreams and worthless technology. I think the second is inevitable as Silicon Valley becomes more famous, more high status without a strong gatekeeping mechanism which is also anathema to its open ethos. Unfortunately this company is firmly in the second category. A performative startup, “changing the world” to satisfy the neurosis of its founders who desperately want to be seen as someone taking risks to change the world. In reality it will change nothing, and go die into the dustbins of history. I hope he enjoys his 15 minutes of fame.
What makes you so sure that extropic is the second and not the first?
Fundamentally, gut feels by following the founder on Twitter. But if I had to explain, I don’t understand the point of speeding up or getting true RnG, even for diffusion models this is not a big bottleneck, so it sounds more like a buzzword than actual practical technology.
Having a TRNG is easy, you just reverse bias a zener diode or any number of other strategies that rely on physics for noise. Hype is a strategy they're clearly taking, but people in this thread are so dismissive (and I get why, extropic has been vague posting for years and makes it sound like vaporware) but what does everything think they're actually doing with the money? It's not a better dice roller...
What is it if not a better dice roller though? Isn't that what they are claiming it is? And also that this better dice rolling is very important (and I admittedly am not someone who can evaluate this)
Yes, I think they claim they are a far better dice roller in randomness and speed and that this is very important. The first might be true, but I don’t see why second is in any way true. These all need to be true for this company to make sense :
1. They build a chip that does random sampling far better than any GPU (is this even proven yet?)
2. They use a model architecture that utilizes this sampling advantage which means most of the computation must be concentrated at sampling. This might be true for energy based models or some future architecture we have no idea about. AFAIK, this is not even true for diffusion.
3. This model architecture must outcompete autoregressive models in economically useful tasks, whether language modeling or robotics etc, right now auto regressive transformers is still king across all tasks.
And then their chip will be bought by hyper scalers and their company will become successful. There is just so many if’s outside of them building their core technology that this whole project makes no sense. And you can say that this is true for all startups, I don’t think that’s the case, this is just ridiculous.
This gives me Devs vibe (2020 TV Series) - https://www.indiewire.com/awards/industry/devs-cinematograph...
Such an underrated TV show.
How should we think about how much effective compute is being done with these devices compared to classical (GPU) computing? Obviously FLOPs doesn't make sense, so what does?
Is this the new term for analog VLSI?
Or if we call it analog is it too obvious what the problems are going to be?
Question for the experts in the field: why does this need to be a CPU and not a dongle you plug into a server and query?
Hype aside, if you can get an answer to a computing problem with error bars in significantly less time, where precision just isn’t that important (such as LLMs) this could be a game changer.
Precision actually matters a decent amount in LLMs. Quantization is used strategically in places that’ll minimize performance degradation, and models are smart enough so some loss in performance still gives a good model. I’m skeptical how well this would turn out, but it’s probably always possible to remedy precision loss with a sufficiently larger model though.
LLMs are inherently probabilistic. Things like ReLU throw out a ton of data deliberately.
No that isn’t throwing out data. Activation functions perform a nonlinear transformation to increase the expressivity of a function. If you did two matrix multiplications without ReLU in between, your function contains less information than with a ReLU in between.
How are you calculating "less information"?
I think what they meant was:
Two linear transformations compose into a single linear transformation. If you have y = W2(W1*x) = (W2*W1)*x = W*x where W = W2*W1, you've just done one matrix multiply instead of two. The composition of linear functions is linear.
The ReLU breaks this because it's nonlinear: ReLU(W1*x) can't be rewritten as some W*x, so W2(ReLU(W1*x)) can't collapse either.
Without nonlinearities like ReLU, many layers of a neural network could be collapsed into a single matrix multiplication. This inherently limits the function approximation that it can do, because linear functions are not very good at approximating nonlinear functions. And there are many nonlinearities involved in modeling speech, video, etc.
I like this but based on what I am seeing here and the THRML readme, I would describe this as "an ML stack that is fully prepared for the Bayesian revolution of 2003-2015." A kind of AI equivalent of, like, post-9/11 airport security. I mean this in a value-neutral way, as personally I think that era of models was very beautiful.
The core idea of THRML, as I understand it, is to present a nice programming interface to hardware where coin-flipping is vanishingly cheap. This is moderately useful to deep learning, but the artisanally hand-crafted models of the mid-2000s did essentially nothing at all except flip coins, and it would have been enormously helpful to have something like this in the wild at that time.
The core "trick" of the era was to make certain very useful but intractable distributions built on something called "infinitely exchangeable sequences" merely almost intractable. The trick, roughly, was that conditioning on some measure space makes those sequences plain-old iid, which (via a small amount of graduate-level math) implies that a collection of "outcomes" can be thought of as a random sample of the underlying distribution. And that, in turn, meant that the model training regimens of the time did a lot of sampling, or coin-flipping, as we have said here.
Peruse the THRML README[1] and you'll see the who's who of techniques and modeling prodedures of the time. "Gibbs sampling", "probabilistic graphical models", and "energy-based models", and so on. All of these are weaponized coin flipping.
I imagine the terminus of this school of thought is basically a natively-probabilistic programming environment. Garden variety deterministic computing is essentially probabilistic computing where every statement returns a value with probability 1. So in that sense, probabilistic computing is a ful generalization of deterministic computing, since an `if` might return a value with some probability other than 1. There was an entire genre of languages like this, e.g., Church. And now, 22 years later, we have our own hardware for it. (Incidentally this line of inquiry is also how we know that conditional joint distributions are Turing complete.)
Tragically, I think, this may have arrived too late. This is not nearly as helpful in the world of deep learning, with its large, ugly, and relatively sample-free models. Everyone hates to hear that you're cheering from the sidelines, but this time I really am. I think it's a great idea, just too late.
[1]: https://github.com/extropic-ai/thrml/blob/7f40e5cbc460a4e2e9...
Really informative insight, thanks. I'm not too familiar with those models, is there any chance that this hardware could lead to a renaissance of sample-based methods? Given efficient hardware, would they scale to LLM size, and/or would they allow ML to answer some types of currently unanswerable questions?
Any time something costs trillionths of a cent to do, there is an enormous economic incentive to turn literally everything you can into that thing. Since the 50s “that thing” has been arithmetic, and as a result, we’ve just spent 70 years trying to turn everything from HR records to images into arithmetic.
Whether “that thing” is about to be sampling is not for me to say. The probability is certainly not 0 though.
Has anyone received the dev board? What did you do with it? Curious what this can do.
Looks like an artifact from Assassin's Creed or Halo.
i've followed them for a while and as just a general technologist and not a scientist, i have a probably wrong idea of what they do, but perhaps correcting it will let others write about it more accurately.
my handwavy analogy interpretation was they were in-effect building an analog computer for AI model training, using some ideas that originated in quantum computing. their insight is that since model training is itself probabilistic, you don't need discrete binary computation to do it, you just need something that implements the sigmoid function for training a NN.
they had some physics to show they could cause a bunch of atoms to polarize (conceptually) instantaneously using the thermodynamic properties of a material, and the result would be mostly deterministic over large samples. the result is what they are calling a "probabilistic bit" or pbit, which is an inferred state over a probability distribution, and where the inference is incorrect, they just "get it in post," because the speed of the training data through a network of these pbits is so much more efficient that it's faster to just augment and correct the result in the model afterwards than to use classical clock cycles to directly compute it.
It's finally here! Extropic has been working on this since 2022. I'm really excited to see how this performs in the real world.
This looks really amazing if not unbelieveable to the point where it is almost too good to be real.
I have not seen benchmarks on Extropic's new computing hardware yet but need to know from experts who are in the field of AI infrastructure at the semiconductor level if this is legit.
I'm 75% believing that this is real but have a 25% skepticisim and will reserve judgement until others have tried the hardware.
So my only question for the remaining 25%:
Is this a scam?
Too good to be true, incomprehensible jargon to go along...
I mean it sure looks like a scam.
I really like the design of it though.
I doubt it’s a scam. Beff might be on to something or completely delusional, but not actively scamming.
The best conmen have an abundance confidence in themselves.
This one just released a prototype platform, "XTR-0", so if it's a fraud the jig will shortly be up.
https://extropic.ai/writing/inside-x0-and-xtr-0
I think it's more a concern that the hardware isn't useful in the real world, rather than that the hardware doesn't meet the description they provide of it.
Nice!
This is "quantum" computing, btw.
Actually it's not. Here's some stuff to read to get a clearer picture! https://extropic.ai/writing
It strictly is not, as no quantum phenomena is being measured (hence why I used the quotes); but if all goes well w/ extropic you'll most likely end up doing quantum again.
Usually there's a negative correlation between the fanciness of a startup webpage and the actual value/product they'll deliver.
This gives "hype" vibes.
Interesting you say that, I had an instinctual reaction in that vein as well. I chalked it up to bias since I couldn’t think of any concrete examples. Something about the webpage being so nice made me think they’ve spent a lot of time on it (relative to their product?) Admittedly I’m nowhere close to even trying to understand their paper, but I’m interested in seeing what others think about it
I've seen it as well. One thing that's universally true about potential competitor startups in the field I work in is that the ones who don't actually have anything concrete to show have way nicer websites than ours (some have significantly more funding and still nothing to show).
I have a passing familiarity with the areas they talk about in the paper, and it feels... dubious. Mainly because of the dedicated accelerator problem. Even dedicated neural net accelerators are having difficulty gaining traction against general purpose compute units in a market that is ludicrously hot for neural net processing, and this is talking about accelerating Monte-Carlo processes which are pretty damn niche in application nowadays (especially in situations where you're compute-limited). So even if they succeed in speeding up that application, it's hard to see how worthwhile it would be. And it's not obvious from the publicly available information whether they're close to even beating the FPGA emulation of the concept which was used in the paper.
I'm more impressed that my laptop fans came on when I loaded the page.
It's the one attached to my TV that just runs movies/YT - I don't recall the last time I heard the fans.
They did say thermodynamic computing.
Same on a serious dev machine. That page just pegs a core at max, it's sort of impressive.
[dead]