Jürgen Schmidhuber：the Father of Generative AI Without Turing Award

95 points by kleiba 14 hours ago

I'll probably get flamed to death for saying this, but I like Jürgen. I mean, I don't know him in person (never met him) but I've seen a lot of his written work and interviews and what-not and he seems like an alright guy to me. Yes, I get it... there's that whole "ooooh, Jürgen is always trying to claim credit for everything" thing and all. But really, to me, it doesn't exactly come off that way. Note that he's often pointing out the lack of credit assigned even to people who lived and died centuries before him.

His "shtick" to me isn't just about him saying "people didn't give me credit" but it seems more "AI people in general haven't credited the history of the field properly." And in many cases he seems to have a point.

godelski 11 hours ago

I think you sum up my feelings about him as well. He's a bit much sometimes but it's hard to deny that he's made monumental contributions to the field.
It's also funny that we laugh at him when we also have a joke that in AI we just reinvent what people did in the 80's. He's just the person being more specific as to what and who.
Ironically, I think the problem is we care too much about credit. It ends up getting hoarded rather than shared. We then just oversell our contributions because if you make the incremental improvements that literally everyone does, you get your works rejected for being incremental.
I don't know what it is about CS specifically, but we have a culture problem or attribution and hype. From building on open source, it's libraries all the way down, but we act like we did it all alone. To jumping on bandwagons as if there's a right and immutable truth to how to do certain things, until the bubbles pop and we laugh at how stupid anyone was to do such a thing. Yet we don't contribute back to those projects that have US foundation, we laugh at "theory" which we stand on, and we listen to the same hype train people who got it wrong last time instead of turning to those who got it right. Why? It goes directly counter to the ideas of a group who love to claim rationalism, "working from first principles", and "I care what works"
- voidhorse 10 hours ago
  
  > we laugh at "theory" which we stand on
  This aspect of the industry really annoys me to no end. People in this field are so allergic to theory (which is ironic because CS, of all fields, is probably one of the ones in which theoretical investigations are most directly applicable) that they'll smugly proclaim their own intelligence and genius while showing you a pet implementation of ideas that have been around since the 70s or earlier. Sure, most of the time they implement it in a new context, but this leads to a fragmented language in which the same core ideas are implemented N times with everyone particular personal ignorant terminology choices (see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages).
  
  godelski 9 hours ago
  
  My favorite Knuth quote[0]
  If you find that you're spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you're spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice.
  But yeah, in general I hate how people treat theory, acting as if it has no economic value. Certainly both matter, no one is denying that. But there's a strong bias against theory and I'm not sure why. Let's ask ourselves, what is the economic impact of Calculus? What about just the work of Leibniz or Newton? I'm pretty confident that that's significantly north of billions of dollars a year. And we what... want to do less of this type of impactful work? It seems a handful of examples far covers any wasted money on research that has failed (or "failed").
  The problem I see with our field, which leads to a lot of hype, is the belief that everything is simple. This just creates "yes men" and people who do not think. Which I think ends up with people hearing "no" when someone is just acting as an engineer. The job of an engineer is to problem solve. That means you have to identify problems! Identifying them and presenting solutions is not "no", it is "yes". But for some reason it is interpreted as "no".
  > see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages
  Don't get me started... but if a PL person goes on a rant here, just know, yes, I upvoted you ;)
  [0] You can probably tell I came to CS from "outside". I have a PhD in CS (ML) but undergrad was Physics. I liked experimental physics because I came to the same conclusion as Knuth: Theory and practice drive one another.
  
  mindcrime 10 hours ago
  
  I get weird looks sometimes lately when I point out that "agents" are not a new thing, and that they date back at least to the 1980's and - depending on how you interpret certain things[1] - possibly back to the 1970's.
  People at work have, I think, gotten tired of my rant about how people who are ignorant of the history of their field have a tendency to either re-invent things that already exist, or to be snowed by other people who are re-inventing things that already exist.
  I suppose my own belief in the importance of understanding and acknowledging history is one reason I tend to be somewhat sympathetic to Schmidhuber's stance.
  [1]: https://en.wikipedia.org/wiki/Actor_model
  
  godelski 9 hours ago
  
  Another interesting thing I see is how people will refuse to learn history thinking it will harm their creativity[0].
  The problem with these types of interpretations is that it's fundamentally authoritarian. Where research itself is fundamentally anti-authoritarian. To elaborate: trust but verify. You trust the results of others, but you replicate and verify. You dig deep and get to the depth (progressive knowledge necessitates higher orders of complexity). If you do not challenge or question results then yes, I'd agree, knowledge harms. But if you're willing to say "okay, it worked in that exact setting, but what about this change?" then there is no problem[1]. In that setting, more reading helps.
  I just find these mindsets baffling... Aren't we trying to understand things? You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.
  [0] https://bsky.app/profile/chrisoffner3d.bsky.social/post/3liy...
  [1] Other than Reviewer #2
  
  bluefirebrand 9 hours ago
  
  > Aren't we trying to understand things?
  Unfortunately, for most of us, no. We are trying to deliver business units to increase shareholder value
  
  godelski 8 hours ago
  
  I think you should have continued reading from where you quoted.
  >> Aren't we trying to understand things? ***You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.***
  I'm arguing that if you want to "deliver business units to increase shareholder value" that this is well aligned with "trying to understand things."
  Think about it this way:
  If you understand things: You can directly address shareholder concerns and adapt readily to market demands. You do not have to search, you already understand the solution space. If you do not understand things: You cannot directly address shareholder concerns and must search over the solution space to meet market demands.
  Which is more efficient? It is hard to argue that search through an unknown solution space is easier than path optimization over a known solution space. Obviously this is the highly idealized case, but this is why I'm arguing that these are aligned. If you're in the latter situation you advantage yourself by trying to get to the former. Otherwise you are just blindly searching. In that case technical debt becomes inevitable and significantly compounds unless you get lucky. It becomes extremely difficult to pivot as the environment naturally changes around you. You are only advantaged by understanding, never harmed. Until we realize this we're going to continue to be extremely wasteful, resulting is significantly lower returns for shareholders or any measure of value.
  
  voidhorse 9 hours ago
  
  I'm in the same boat. At least there's a couple of us that think this way. I'm always amazed when I run into people who think neural nets are a relatively recent thing, and not something that emerged back in the 1940s-50s. People seem to tend to implicitly equate the emergence of modern applications of ideas with the emergence of the ideas themselves.
  I wonder at times if it stems back to flaws in the CS pedagogy. I studied philosophy and literature in which tracing the history of thought is basically the entire game. I wonder if STEM fields, since they have far greater operational emphasis, lose out on some of this.
  
  mindcrime 9 hours ago
  
  > people who think neural nets are a relatively recent thing, and not something that emerged back in the 1940s-50s
  And to bring this full circle... if you really (really) buy into Schmidhuber's argument, then we should consider the genesis of neural networks to date back to around 1800! I think it's fair to say that that might be a little bit of a stretch, but maybe not that much so.
  
  godelski 6 hours ago
  
  Tbf, he literally says that in the interview
  > Around 1800, Carl Friedrich Gauss and Adrien-Marie Legendre introduced what we now call a linear neural network, though they called it the “least squares method.” They had training data consisting of inputs and desired outputs, and minimized training set errors through adjusting weights, to generalize on unseen test data: linear neural nets!
noosphr 11 hours ago

It's a clash of cultures.
He is an academic that cares for understanding where ideas came from. His detractors need to be the smartest people in the room to get paid millions and raise billions.
It's not very sexy to say 'Oh yes, we are just using an old Soviet learning algorithm on better hardware. Turns out we would have lost the cold war if the USSR had access to a 5090.' , which won't get you the billions you need to build the supercomputers that push the state of the art today.
- ks2048 10 hours ago
  
  It seems his "detractors" (or at least his foes) are also academics - i.e. the same culture - they just cite Hinton and LeCun instead of Schmidhuber.
  
  noosphr 9 hours ago
  
  It helps your career to cite the head of AI in Facebook and the former head of Google. Not so much some academician who worked in the 1970s in the Soviet Socialist Republic of Kazakhstan.
  
  ks2048 9 hours ago
  
  I believe the Schmidhuber-ignoring (according to him) began before those two were at Google/Meta. But, I suppose NYU/Bell Labs and U-Toronto will be more likely to be cited than somewhere in Munich or Switzerland...
bjornsing 6 hours ago

Yeah I have the same feeling. I also think it’s weird to say he’s clearly wrong. I mean, it’s a very subtle question exactly where you cross the line that comes with an obligation to credit others. All ideas build on each other and are to some extent mashups of ideas that came before them. It’s not something you can be clearly wrong about in an objective sense, and Jürgen’s view seems consistent.
Looking at how credit and attribution works in science today (google ”citation rings” for example) I can honestly say that I’d much prefer to live in a world where Jürgen did invent most of AI, rather than the one we’re in now.

goldemerald 12 hours ago

No discussion with Schmidhuber is complete without the infamous debate at NIPS 2016 https://youtu.be/HGYYEUSm-0Q?t=3780 . One of my goals as a ML researcher is to publish something and have Schmidhuber claim he's already done it.

But more seriously, I'm not a fan of Schmidhuber because even if he truly did invent all this stuff early in the 90s, he's inability to see its application to modern compute held the field back by years. In principle, we could have had GANs and self-supervised models' years earlier if he had "revisited his early work". It's clear to me no one read his early paper's when developing GANs/self-supervision/transformers.

Vetch 9 hours ago

> he's inability to see its application to modern compute held the field back by years.
I find Schmidhuber's claim on GANs to be tenuous at best, but his claim to have anticipated modern LLMs is very strong, especially if we are going to be awarding nobel prizes for Boltzmann Machines. In https://people.idsia.ch/%7Ejuergen/FKI-147-91ocr.pdf, he really does concretely describe a model that unambiguously anticipated modern attention (technically, either an early form of hypernetworks or a more general form of linear attention, depending on which of its proposed update rules you use).
I also strongly disagree with the idea that his inability to practically apply his ideas held anything back. In the first place, it is uncommon for a discoverer or inventor to immediately grasp all the implications of and applications of their work. Secondly, the key limiter was parallel processing power; it's not a coincidence ANNs took off around the same time GPUs were transitioning away from fixed function pipelines (and Schmidhuber's lab were pioneers there too).
In the interim, when most derided Neural networks, his lab was one of the few that kept research on Neural networks and their application to sequence learning going. Without their contributions, I'm confident Transformers would have happened later.
> It's clear to me no one read his early paper's when developing GANs
This is likely true.
> self-supervision/transformers.
This is not true. Transformers came after lots of research on sequence learners, meta-learning, generalizing RNNs and adaptive alignment. For example, Alex Graves' work on sequence transduction with RNNs eventually led to the direct precursor of modern attention. Graves' work was itself influenced by work with and by Schmidhuber.
andy99 12 hours ago

It's very common in science for people to have had results they didn't understand the significance of that later were popularized by someone else.
There is the whole thing with Damadian claiming to have invented MRI (he didn't) when the Nobel prize went to Mansfield and Lauterbur (see the Nobel prize part of the article). https://en.m.wikipedia.org/wiki/Paul_Lauterbur
And I've seen other less prominent examples.
It's a lot like the difference between ideas and execution and people claiming someone "stole" their idea because they made a successful business from it.
godelski 11 hours ago
```
  > if he had "revisited his early work".
```
Given that you're a researcher yourself I'm surprised by this comment. Have you not yourself experienced the harsh rejection of "not novel"? That sounds like a great way to get stuck in review hell. (I know I've experienced this even when doing novel things just by too closely relating it to other methodologies when explaining "oh, it's just ____").
The other part seems weird too. Who isn't upset when their work doesn't get recognized and someone else gets credit. Are we not all human?
nextos 11 hours ago

I think he did understand both the significance of his work and the importance of hardware. His group pioneered porting models to GPUs.
But personal circumstances matter a lot. He was stuck at IDSIA in Lugano, i.e. relatively small and not-so-well funded academia.
He could have done much better in industry, with access to lots of funding, a bigger headcount, and serious infrastructure.
Ultimately, models matter much less than infrastructure. Transformers are not that important, other architectures such as deep SSMs or xLSTM are able to achieve comparable results.
chermi 8 hours ago

I don't understand how he's at fault for the field being behind where it maybe could've been, especially the language "held back"? Did he actively discourage people in against trying his ideas as compute grew?
cma 10 hours ago

His group actually used GPUs early (earlier) on and won a competition but didn't get the same press.

jandrewrogers 7 hours ago

As someone who was heavily involved in AI research in the 1990s, of a Schmidhuber flavored variety though I’ve never interacted with him, I do think there is some underlying truth to his general point. Most of the current theory isn’t actually new and at least some people should know better that are kind of pretending it was invented out of whole cloth recently. But at the time a lot of this was invented, the hardware simply wasn’t capable enough to reduce it to practice — it is why I got out — and a lot of people who are piling into the field are simultaneously disinterested in that history and claiming every idea that crosses their mind as a novel invention.

This is not unique to AI. Many other subfields of computer science have a similar dynamic e.g. databases. I’ve seen people claiming novelty in ideas that were fully proven out in real systems in the 1970s (and abandoned, for good reason) but are oblivious to this fact because if it isn’t trivially discoverable on the Internet then it doesn’t exist. There is still a lot of interesting computer science research that only exists on physical paper, if you can find a copy. Maybe we should be better about digitizing the pre-Internet research but the value of that research isn’t always obvious and the terminology has changed. We don’t give enough credit to how clever some of those early researchers actually were, working from much more primitive foundations.

Having gone spelunking a number of times into the old literature, it never ceases to amaze me the number of times I have found an insight that is neither known nor cited in modern literature. Literally lost knowledge. It is ironic that computer science, of all fields, should suffer from this.

noosphr 11 hours ago

Before people say that he is claiming credit for things he didn't do, or that he invented everything, please read his own paper on the subject:

https://people.idsia.ch/~juergen/deep-learning-history.html

The history section starts in 1676.

swyx 11 hours ago

> 1676: The Chain Rule For Backward Credit Assignment
Schmidhuber is nothing but a stickler for backward credit assignment

nharada 12 hours ago

Doesn’t he know the Turing Award is really just a generalization of the Fields Medal, an award that actually came years earlier?

triceratops 11 hours ago

I chuckled but I also maybe didn't understand. Is the joke that computer science is a generalization of math? That can't be rigth.
- dgacmu 11 hours ago
  
  The joke is that schmidhuber is known for (rightly or wrongly) pointing to modern contributions in deep neural networks and saying they're just a trivial generalization/adaptation/etc. of work he did 30 years ago.
logicchains 12 hours ago

I'm sure he wouldn't object to a Fields Medal either.

belval 12 hours ago

Every so often Schmidhuber is brought back to the front-page of HN, people will argue that he "invented it all" while others will say that he's a-posteriori claiming all the good ideas were his.

Relativity Priority Dispute: https://en.wikipedia.org/wiki/Relativity_priority_dispute

We all stand on the shoulders of giants, things can be invented and reinvented and ideas can appear twice in a vacuum.

kleiba 12 hours ago

But as far as I understand, Schmidhuber's claim is more severe: namely that Bengio, Hinton and LeCun intentionally failed to cite prior work by others (including himself) but instead only cited each other in order to boost their respective scientific reputation.
I personally think that he's not doing himself or his argument of favor by presenting it the way he does. While he basically argues that science should be totally objective and neutral, there's no denying that if you put yourself in a less likeable light, you're not going to make any friends.
On the other hand, he's gone at length with compiling detailed references to support his points. I can appreciate that because it makes his argument a lot less hand-wavey: you can go to his blog and compare the cited references yourself. Except that I couldn't because I'm not an ML expert.
- srean 21 minutes ago
  
  > Hinton and LeCun intentionally failed to cite prior work by others (including himself)
  This is a well established and easy to verify fact.
  A peer of mine from Hinton's lab once told a story of how one researcher (presumably from his lab or from one of his coworker's lab) intentionally mispelled citations so that the author's citation count does not go up on google-scholar, citeseer etc.
  He did not name the names though.
- jll29 11 hours ago
  
  I have seen many cases where people -- accidentally as well as intentionally -- copied or re-invented the work of others (a friend posted on LinkedIn that someone else plagiarized his whole Ph.D. thesis, including the title, so only the name was changed; only the references to seperately published papers on which the individual chapters were based at the end still had my friend's name in it, so you could see it was a fake thesis).
  If a bona fide scientist makes a mistake about missing attribution, they would correct it as soon as possible. Many, however, would not correct such a re-discovery, because it's embarrassing.
  But the worst is when people don't even imagine anything like what they are working on could already exist, and they don't even bother finding and reading related work -- in other words, ignorance. Science deserves better, but there are more and more ignorant folks around that want to ignore all work before them.
  
  godelski 10 hours ago
  
  > Many, however, would not correct such a re-discovery, because it's embarrassing.
  This is a culture thing that needs to change.
  I'm a pretty big advocate of open publishing and avoiding the "review process" as it stands today. The reason is because we shouldn't be chasing these notions of novelty and "impact". They are inherently subjective and lead to these issues of credit. Your work isn't diminished because you independently invented it, rather, that strengthens your work. There's more evidence! Everything is incremental and so all this stuff does is makes us focus more on trying to show our uniqueness rather than showing our work. The point of publishing is to communicate. The peer review process only happens post communicating: when people review, replicate, build on, or build against. We're just creating an overly competitive environment. It is only "embarrassing" because it "undermines" the work. It only "undermines" the work because how we view credit.
  Consider this as a clear example. Suppose you want to revisit a work but just scale it up and use on modern hardware. You could get state of the art results but if you admit to such a thing with no claimed changes (let's say you literally just increase number of layers) you'll never get published. You'll get responses about how we "already knew this" and "obviously it scales". But no one tested it... right? That's just bad for science. It's bad if we can't do mundane boring shit.
Lerc 12 hours ago

I can see how someone could feel like that if they looked at the world in a particular way.
I have had plenty of ideas in the last few years that I have played with that I have seen published in papers in the following months. Rather than feeling like "I did it first" I feel gratified that not only was I on the right track, but someone else has done the hard slog.
Most papers are not published by people who had the idea the day before. Their work goes back further than that. Refining the idea, testing it and then presenting the results takes time, sometimes years, occasionally decades.
If this happens to you, don't think "Hey! That idea belongs to me!". Thank them for proving you right.
Now if they patent it, that's a different story. I don't think the ideas that sometimes float through my brain belong to me, but I'm not keen on them belonging to someone else either.
- kleiba 12 hours ago
  
  I think that's slightly mispresenting Schmidhuber's case though, because he does not just say "oh, I already had that same idea before you, I just follow up on it". He is usually referring to work that he or members of his group (or third-party researchers for that amatter) did, in fact, already publish.
- Kranar 10 hours ago
  
  This claim is sheer hubris. There is a big difference between spending years researching and working on a nascent subject and publishing it in academic journals, only to have someone else come along and use most of your ideas without attribution and reaping huge rewards for doing so... and having some random ideas float about in your head on a nice summer afternoon by the lake, and then a few months later find out that someone else also shared those same ideas and managed to work through them into something fruitful.
  Now whether what Schmidhuber claims is what actually happened or not I don't know... but that is his claim and it's fundamentally different from what you are describing.

esafak 10 hours ago

The fact that the field of machine learning keeps "discovering" things already established in other fields, and christening them with new names does lend some credence to Schmidhuber. The field is more industrial than academic, and cares about money more than credit, and industrial-scale data theft is all in a day's work.

As another commenter said, his misfortune is being in a lab with no industrial affiliation.

FL33TW00D 11 hours ago

If you guys were the inventors of Facebook, you’d have invented Facebook

banq 11 hours ago

Why did Google give birth to the Transformer? Because Google created an ecosystem where everything could flourish, while the old man in Switzerland lacked such an environment—what could even the smartest and greatest individual do against that?

As an organization, fostering an organically growing context is like governing a great nation with delicate care. A bottom-up (organic growth) environment is the core context for sustained innovation and development!

ks2048 10 hours ago

> Why did Google give birth to the Transformer?
No, Schmidhuber gave birth to the transformer in 1991.
https://x.com/SchmidhuberAI/status/1683870175299239937

starchild3001 7 hours ago

He may be a Turing-award worthy researcher (hey he invented LSTM, and even I knew his name!) but modesty surely isn't his biggest strength :))

Mond_ 12 hours ago

Oh boy, I sure can't wait to see the comments on this one!

Schmidhuber sure seems to be a personality, and so far I've mostly heard negative things about his "I invented this" attitude to modern research.

kleiba 12 hours ago

A lot of this is because nobody likes braggers - however, in all fairness, his argument is that a lot of what is considered modern ML is based on many previous results, including but not limited to his own research.

voidhorse 10 hours ago

I haven't read the article or paper yet, but if the gist I'm getting from the comments is correct, Schmidhuber is generally correct about industry having horrible citation practices. I even see it at a small scale at work. People often fail or forget to mention the others that helped them generate their ideas.

I would not be at all surprised if this behavior extended to research papers published by people in industry as opposed to academia. Good citation practice simply does not exist in industry. We're lucky in any of the thousand blog posts that reimplement some idea that was cranked out ages ago in academic circles are even aware of the original effort, let alone cite it. Citations are few and far between in industry literature generally. Obviously there are exceptions and this just my personal observation, I haven't done or found any kind of meta literary study illustrating such.

ur-whale 11 hours ago

If there ever was an example of a terrible personality getting in the way of career, Schmidhuber is most definitely it.

You may have had many brilliant ideas, but everyone makes an abrupt 180 when they see the tip your beard turn the corner at conferences, that can't be a good signal for getting awards.

cycomanic 7 hours ago

You do see how your statement is problematic when talking about scientific attribution and scientific prizes for achievements? I'd also point out that people that point out inconvenient truth are often not popular, that does not make them wrong though (in the general sense, I don't know enough about the foundations of ML to pass judgement here).