GPT-5 is behind schedule

582 points by owenthejumper 7 months ago

neonate 7 months ago

SamPatt 7 months ago

I'm sure the debate over the definition of AGI is important and will continue for a while, but... I can't care about it anymore.

Between Perplexity searching and summarizing, Claude explaining, and qwen (and other tools) coding, I'm already as happy as can be with whatever you want to call this level of intelligence.

Just today I used a completely local AI research tool, based on Ollama. It worked great.

Maybe it won't get much better? Or maybe it'll take decades instead of years? Ok. I remember not having these tools. I never want to go back.

atonse 7 months ago

Same here.
The ability to “talk to an expert” about any topic I’m curious about and ask very specific questions has been invaluable to me.
It reminds me of being a kid and asking my grandpa a million questions, like how light bulbs worked, or what was inside his radio, or how do we have day and night.
And before anyone talks about accuracy or hallucinations, these conversations usually are treated as starting off points to then start googling specific terms, people, laws, treaties, etc to dig deeper and verify.
Last year during a visit to my first Indian reservation, I had a whole bunch of questions that nobody in person had answers to. And ChatGPT was invaluable in understanding concepts like where a reservation’s autonomy begins and ends. And why certain tribes are richer than others. What happens when someone calls 911 on a reservation. Or speeds. Or wants to start a factory without worrying about import/export rules. And what causes some tribes to lose their language faster than others. And 20 other questions like this.
And most of those resulted in google searches to verify the information. But I literally could never do this before.
Same this year when I’m visiting family in India. To learn about the politics, the major players, WHY they are considered major players (like the Chief Minister of Bengal or Uttar Pradesh or Maharashtra being major players because of their populations and economies). Criticisms, explanations of laws, etc etc.
For insanely curious people who often feel unsatisfied with the answers given by those around them, it’s the greatest thing ever.
- netdevphoenix 7 months ago
  
  > The ability to “talk to an expert” about any topic I’m curious about and ask very specific questions has been invaluable to me.
  It is dangerous to assume that LLMs are experts on any topic. With or without quotes. You are getting a super fast journalist intern with a huge memory but inability to reason critically, lacking understanding about anything and huge unreliability when it comes to answering questions (you can get completely different answers to the same question depending on how you answer it and sometimes even the same question can get you different answers). LLMs are very useful and are a true game changer. But calling that expertise is a disservice to the true experts.
  
  kevinsync 7 months ago
  
  I actually find LLMs lacking true expertise to be a feature, not a bug. Most of the time I'm starting from a place of no knowledge on a topic that's novel to me, I ask some questions, it replies with summaries, keywords, names of things, basic concepts. I enter with the assumption that it's really no different than googling phrases and sifting through results (except I don't know what phrases I'm supposed to be googling in the first place), so the summaries help a lot. I then ask a lot of questions and ask for examples and explanations, some of which of course turn out to be wrong, but the more I push back, re-generate, re-question, etc (while using traditional search engines in another tab), the better responses I can get it to provide.
  Come to think of it, it's really no different than walking into Home Depot and asking "the old guys" working in the aisles about stuff -- you can access some fantastic knowledge if you know the names of all the tools and techniques, and if not, can show them a picture or describe what you're trying to do and they'll at least point you in a starting direction with regards to names of tools needed, techniques to use, etc.
  Just like I don't expect Home Depot hourly worker Grandpa Bob to be the end-all-be-all expert (for free, as well!), neither do I expect ChatGPT to be an all-knowing-all-encompassing oracle of knowledge.
  It'll probably get you 95% of the way there though!
  
  netdevphoenix 7 months ago
  
  You forget that it makes stuff up and you won't know it until you google it. When googling, fake stuff stands out because truth is consistent.
  Querying multiple llms at the same time and being able to compare results is a much better comparison to googling but no one does this.
  As I said, you are talking to a super confident journalist intern who can give you answers but you won't know if it is true or partially true until you consult with a human source of knowledge.
  It's not even similar to asking the old guys at the Home Depot because they can tell you if they are unsure they have a good answer for you. An LLM won't. Old guys won't hallucinate facts the way an LLM will
  It is really is the 21st century Searle's epistemological Chinese room nightmare edition. Grammar checks out but whatever is spit out doesn't necessarily bear any resemblance to reality
  
  codingdave 7 months ago
  
  LLMs train from online info. Online info is full of misinformation. So I would not trust an answer to be true just because it is given by multiple LLMs. That is actually a really good way to fall into the misinformation trap.
  
  HeatrayEnjoyer 7 months ago
  
  Most of OpenAI's training data is written by hired experts now. They also buy datasets of professional writing such as Time's archives.
  
  netdevphoenix 7 months ago
  
  My point was that googling gets you a variety of results from independent sources. So I said that querying multiple LLMs is as close as you can get for a similar experience.
  
  kevinsync 7 months ago
  
  I agree with everything you said, except I think we're both right at the same time.
  Ol' boy at the Depot is constrained by his own experiences and knowledge, absolutely can hallucinate, oftentimes will insert wild, irrelevant opinions and stories while getting to the point, and frankly if you line 6 of them up side by side to answer the same question, you're probably leaving with 8 different answers.
  There's never One True Solution (tm) for any query; there are 100 ways to plumb your way out of a problem, and you're asking a literal stranger who you assume will at least point you in the right direction (which is kind of preposterous to begin with)
  I encourage people to treat LLMs the same way -- use it as a jumping off point, a tool for discovery that's no more definitive than if you're asking for directions at some backwoods gas station. Take the info you get, look deeper with other tools, work the problem, and you'll find a solution.
  Don't accept anything they provide at face value. I'm sure we all remember at least a couple teachers growing up who were the literal authority figures in our lives at the time, fully accredited and presented to us as masters of their curriculum, who were completely human, oftentimes wrong, and totally full of shit. So goes the LLM.
  
  fennecfoxy 7 months ago
  
  TBF they're only truly useful when hooked up to RAG imo. I'm honestly surprised that we haven't yet built a digital seal of authenticity for truth that can be used by AI agents + RAG to conceivably give the most accurate answer possible.
  Scientists should be writing papers sealed digitally once they're peer reviewed and considered "truth", same thing with journalist/news articles - sealed once confirmed true or backed up by a solid source in the same way we trust root certificates.
  But then again, especially when it comes to journalism, cropping photos, chopping quotes, etc all to misrepresent etc. Turns out we're all the bad actors; it's in our DNA. And tbf, many people when presented with hard evidence to the contrary of the opinion that they cling onto like a babe to a breast, just plug their ears and cover their eyes.
  Okay so maybe there's no point seeking truth/factual correctness, our species doesn't want it 99% of the time, unless it affects them directly (eg people that shoot down public healthcare until they have an expensive illness themselves).
  
  gizmo 7 months ago
  
  People who are experts (PhD and 20 years of experience) often have very dumb opinions in their field of expertise. Experts make amateur mistakes too. Look at the books written by expert economists, expert psychologists, expert historians, expert philosophers, expert software engineers. Most books are not worth the paper they're written on, despite the authors being experts with decades of experience in their respective fields.
  I think you overestimate the ability of a typical 'expert'. You can earn a PhD without the ability to reason critically. You can testify as an expert in a courtroom without understanding conditional probability. Lawyers and accountants in real life also totally contradict themselves when they get asked the same question twice but phrased slightly differently.
  
  hengheng 7 months ago
  
  My personal criterion for calling somebody an expert, or "educated", or a "scholar" is that they have any random area of expertise where they really know their shit.
  And as a consequence, they know where that area of expertise ends. And they know what half-knowing something feels like compared to really knowing something. And thus, they will preface and qualify their statements.
  LLMs don't do any of that. I don't know if they could, I do know it would be inconvenient for the sales pitch around them. But the people that I call experts distinguish themselves not by being right with their predictions a lot, but rather by qualifying their statements with the degree of uncertainty that they have.
  And no "expert system" does that.
  
  ben_w 7 months ago
  
  > And as a consequence, they know where that area of expertise ends. And they know what half-knowing something feels like compared to really knowing something. And thus, they will preface and qualify their statements.
  How do you count examples like Musk, then?
  He is very cautious about rockets, and all the space science people I follow and hold in high regard, say he's actually a domain expert there. He regularly expectation-manages experimental SpaceX launches downward.
  He's also very bold and brash about basically everything else; the majority of people I've seeing saying he's skilled in any other area have turned out to not themselves have any skills in those areas, while the people who do have expertise say he's talking nonsense at best and is taking wild safety risks at worst.
  
  hengheng 7 months ago
  
  Musk is probably really good at back of the envelope calculations. The kind that lets you excel in first year physics. That skill puts you above a lot of people in finance and engineering when it comes to quickly assessing an idea. It is also a gimmick, but I respect it. My wild guess is that he uses that one skill to find out who to believe among the people he hires.
  The rest of the genius persona is growing up with enough ego that he could become a good salesman, and also badly managed autism and also a badly managed drug habit.
  Seeing him dabble in politics and social media shows instantly how little he understands the limits of his knowledge. A scholar he is not.
  
  EricMausler 7 months ago
  
  Anecdotal but I told chatgpt to include it's level of confidence in its answers and to let me know if it didn't know something. This priming resulted in it starting almost every answer with some variation of "I'm not sure, but.." when I asked it vague / speculative questions and then when I asked it direct matter of fact questions with easy answers it would answer with confidence.
  That's not to say I think it is rationalizing it's own level of understanding, but that somewhere in the vector space it seems to have a Gradient for speculative language. If primed to include language about it, it could help cut down on some of the hallucination. No idea if this will effect the rate of false positives on the statements it does still answer confidently however
  
  hengheng 7 months ago
  
  You'd have to find out the veracity of those leading phrases. I'm guessing that it just prefaces the answer with a randomly chosen statement of doubtfulness. The error bar behind every bit of knowledge would have to exist in the dataset.
  (And in neural network terms, that error bar could be represented by the number of connections, by congruency of separate paths of arguing, by vividness of memories, etc ... it's not above human reasoning either, no need for new data structures ...)
  
  gizmo 7 months ago
  
  The level of confidence with which people express themselves is a (neutral to me) style choice. I'm indifferent because when I don't know somebody I don't know whether to take their opinions seriously regardless of the level of confidence they project. Some people who really know their shit are brash and loud and other experts hedge and qualify everything they say. Outward humility isn't a reliable signal. Even indisputably brilliant people frequently don't know where their expertise ends. How often have we seen tech luminaries put a sophomoric understanding of politics on display on twitter or during podcast interviews? People don't end up with correctly calibrated uncertainty unless they put a ton of effort into it. It's a skill that doesn't develop by itself.
  
  hengheng 7 months ago
  
  I agree, and a lot of that is cultural as well. But there is still a variety of confidence within the statements of a single person, hopefully a lot, and I calibrate to that.
  
  coliveira 7 months ago
  
  AIs are a "master of all trades", so it is very unlikely they'll ever be able to admit they don't know something. What makes them very unreliable with topics where there is little available knowledge.
  
  throw4847285 7 months ago
  
  The fact that humans make mistakes has little to no bearing on their capacity to create monumental intellectual works. I recently finished The Power Broker by Robert Caro, and found a mistake in the acknowledgements where he mixed up two towns in New York. Does that invalidate his 500+ interviews and years of research? No.
  Also, expert historians, philosophers psychs, etc. aren't judged based on their correctness, but on their breadth and depth of knowledge and their capacity to derive novel insights. Some of the best works of history I've read are both detailed and polemical, trying to argue for a new framework for understanding a historical epoch that shifts how we understand our modern world.
  I don't know, I think I know very little about the world and there are people who know far more and I appreciate reading what they have to say, of course with a critical eye. It seems to me that disagreeing with that is just regurgitated anti-intellectualism, which is a coherent position, but it's good to be honest about it.
  
  indeed30 7 months ago
  
  I don't disagree with what you say, but one difference is that we generally hold these people accountable and often shift liability to them when they are wrong (though not always, admittedly), which is not something I have ever seen done with any AI system.
  
  MacsHeadroom 7 months ago
  
  This sounds like an argument in favor of AI personhood, not an argument against AI experts.
  
  FrustratedMonky 7 months ago
  
  Right, but, then what? If you throw away all of the books from experts, what do you do, go out in your backyard and start running experiments to re-create all of science? Or start googling? What, some random person on the internet is going to be a better 'expert' than someone that wrote a book?
  Books might not be great, but they are at least some minimum bar to reach. You had to do some study and analysis.
  Seems like any critic of books, if you scratch the surface is just the whole anti-science/anti-education tropes again and again. What is the option? Don't like peer review science, fine, it has flaws, propose an option.
  
  gizmo 7 months ago
  
  Many terrific books have been published in the past 500 years. The median book is not worth your time, however, and neither is the top 10%. You cannot possibly read everything so you have to be very selective or you will read only dreck. This is the opposite of being anti-science or anti-education.
  
  FrustratedMonky 7 months ago
  
  But compared to the content on the internet?
  So
  Top 10% of Books. Ok
  90 % of Books. marginal, lot of bad.
  Internet. Just millions of pages of junk.
  - Books still take some effort. So why not start there.
  It isn't either/or, binary, a lot of books are bad, so guess I'll learn my medical degree from browsing the web because I don't trust those 'experts'.
  
  gizmo 7 months ago
  
  The median book about medicine is over 100 years old, written in a language you don't speak, and filled to the brim with quackery. Worse than useless. Maybe you don't realize that bookstores and libraries only carry a minuscule fraction of all published works? You will get better information from reddit than from a book written before the discovery of penicillin.
  I'll get you started with one of the great works from the 1600s:
  https://www.gutenberg.org/cache/epub/49513/pg49513-images.ht...
  
  webmaven 7 months ago
  
  You seem to have excluded the possibility of a "Top 10% of the Internet" tranche.
  
  FrustratedMonky 7 months ago
  
  ""Top 10% of the Internet""
  What is the top 10% of the Internet that isn't part of some publishing arm of existing media? And, how can you tell? Some dudes blog about vaccines verses Harvard? Which do you believe.
  Where are the self funded scientific studies that are occurring outside of academia? And thus not 'biased' by the 'elites'.
  For internet only writing. There aren't a ton of "Astral Codex Ten"'s to draw upon as independent thinkers. And even then, he didn't sprout out of the ether fully formed, he has a degree, he was trained in academia.
  
  webmaven 7 months ago
  
  > What is the top 10% of the Internet that isn't part of some publishing arm of existing media?
  Why does that even matter?
  
  FrustratedMonky 7 months ago
  
  ?? You said "You seem to have excluded the possibility of a "Top 10% of the Internet" tranche. "
  So you brought up the top 10% of the Internet, possibly as argument against books? That maybe there is valuable information on the Internet.
  I was just saying, that 10% is also created by the same people that create books. So if you are arguing against books, then the top 10% of the Internet isn't some golden age of knowledge coming from some different more reliable source.
  
  uxhacker 7 months ago
  
  A Call to expertise is actually a fallacy. This is because experts can be wrong.
  The scientific method relies on evidence and reproducible results, not authority alone.
  Edited to add a reference: see under Appeal to authority. https://writingcenter.unc.edu/tips-and-tools/fallacies/
  
  mycall 7 months ago
  
  The fact is that in science, facts are only definitions and everything else is a theory which by definition is never 100% true.
  
  echoangle 7 months ago
  
  > everything else is a theory which by definition is never 100% true.
  Which definition of theory includes that it can never be 100% true? It can't be proven to be true, but surely it could be true without anyone knowing about it.
  
  fn-mote 7 months ago
  
  Frankly, I'm not sure what the point of the parent's comment is. Experts can be dumb and ChatGPT is dumb so it's an expert?
  > People who are experts (PhD and 20 years of experience) often have very dumb opinions in their field of expertise.
  The conventional wisdom is that experts are dumb OUTSIDE of their fields of expertise.
  I don't know about you, but I would be very insulted by someone passing judgement like this on my own work in my field. I am sure that I would doubt their qualifications to even make the judgement.
  Are there experienced fools? Sure. We both probably work with some. To me they are not experts, though.
  
  parineum 7 months ago
  
  > People who are experts (PhD and 20 years of experience) often have very dumb opinions in their field of expertise.
  And the training data contains all those dumb opinions.
  
  ulbu 7 months ago
  
  and rehashes it unthinkingly, without an idea of what it means to consider and disagree with it.
  
  pera 7 months ago
  
  It's scary to think that we are moving into this direction: I can see how in the next few years politicians and judges will use LLMs as neutral experts.
  And all in the hand of a few big tech corporations...
  
  SamPatt 7 months ago
  
  They aren't just in the hands of big corporations though.
  The open source, local LLM community is absolutely buzzing right now.
  Yes, the big companies are making the models, but enough of them are open weights that they can be fine tuned and run however you like.
  I think LLMs genuinely do present an opportunity to be neutral experts, or at the least neutral third parties. If they're run in completely transparent ways, they may be preferable to humans in some circumstances.
  
  svieira 7 months ago
  
  The whole problem is that they are not neutral. They token-complete based on the corpus that was fed into them and the dimensions that were extracted out of those corpuses and the curve-fitting done to those dimensions. Being "completely transparent" means exposing _all_ of that, but that's too large for anyone to reasonably understand without becoming an expert in that particular model.
  And then we're right back to "trusting expert human beings" again.
  
  SamPatt 7 months ago
  
  Nothing is truly neutral. Humans all have a different corpus too. We roughly know what data has gone in, and what the RL process looks like, and how the models handle a given ethical situation.
  With good prompting, the SOTA models already act in ways I think most reasonable people would agree with, and that's without trying to build this specifically for that use case.
  
  dns_snek 7 months ago
  
  > Yes, the big companies are making the models, but enough of them are open weights that they can be fine tuned and run however you like.
  And how long is that going to last? This is a well known playbook at this point, we'd be better off if we didn't fall for it yet again - it's comical at this point. Sooner or later they'll lock the ecosystem down, take all the free stuff away and demand to extract the market value out of the work they used to "graciously" provide for free to build an audience and market share.
  
  SamPatt 7 months ago
  
  How will they do this?
  You can't take the free stuff away. It's on my hard drive.
  They can stop releasing them, but local models aren't going anywhere.
  
  dns_snek 7 months ago
  
  They can't take the current open models away, but those will eventually (and I imagine, rather quickly) become obsolete for many areas of knowledge work that require relatively up to date information.
  
  johnisgood 7 months ago
  
  What are the hardware and software requirements for a self-hosted LLM that is akin to Claude?
  
  skirmish 7 months ago
  
  Llama v3.3 70B after quantization runs reasonably well on a 24GB GPU (7900XTX or 4090) and 64GB of regular RAM. Software: https://github.com/ggerganov/llama.cpp .
  
  pino82 7 months ago
  
  The world was such a boring and dark place before everybody was constantly swiping on his smartphone in any situation, and before everysaid said basically got piped through a bigtech data center, where their algorithms control its way.
  Now we finally have a tool where all of you can prove every day how strong/smart/funny/foo you are (not actually). How was life even possible without?
  So, don't be so pessimistic. ;)
  
  ben_w 7 months ago
  
  > I can see how in the next few years politicians and judges will use LLMs as neutral experts.
  While also noting that "neutral" is not well-defined, I agree. They will be used as if they were.
  
  AgentOrange1234 7 months ago
  
  Will they though?
  We humans are very good at rejecting any information that doesn’t confirm our priors or support our political goals.
  Like, if ChatGPT says (say) vaccines are good/bad, I expect the other side will simply attack and reject it as misinformation, conspiracy, and similar.
  
  ben_w 7 months ago
  
  From what I can see, LLMs default to being sychophants; acting as if a sychophant was neutral is entirely compatible with the cognitive bias you describe.
  
  wing-_-nuts 7 months ago
  
  Shrug
  I treat LLM answers about the same way I treat wikipedia articles. If it's critical I get it right, I go to the wiki sources referenced. Recent models have gotten good at 'showing their sources', which is helpful.
  
  netdevphoenix 7 months ago
  
  > If it's critical I get it right, I go to the wiki sources referenced
  the problem with this is that humans will likely use it for low key stuff, see that it works (or that the errors don't affect them too badly) and start using it for more serious stuff. It will all be good until someone uses it in something more serious and some time later it ends badly.
  Human basic thinking is fairly primitive. If yesterday was sunny, the assumption is that today should too. The more this happens the higher your confidence. The problem is that this confidence emboldens people to gamble on that and when it is not sunny anymore, terrible things happen. A lot of hype driven behaviour is like that. Crypto was like that. The economic crisis of the late 00s was like that. And LLMs look set to be like that too.
  It is going to take a big event involving big critical damage or a high profile series of deaths via misuse of an LLM to give policymakers and business leaders around the world a reality check and get them looking at LLMs in a more critical way. An AI autumn if you wish. It is going to happen at some point. Maybe not in 2025 or 2026 but it will definitely happen.
  You may argue that it is the fault of the human using the LLM/crypto/giving out loans but it really doesn't matter when those decisions affect others.
  
  Biologist123 7 months ago
  
  But hasn’t it become quite easy to deal with this issue simply by asking for the sources of the information and then validating? I quite like using the consensus app and then asking for specific academic paper references which I can then quickly check. However this has taught me also that academic claims must also be validated…
  
  netdevphoenix 7 months ago
  
  If you need to validate the sources, you might as well go to the sources directly and bypass the LLM. The whole point of LLMs is not needing to go to the sources. The LLM consumes them for you. If you need to read and understand the sources yourself well enough to tell if the LLM is lying, the LLM is a wasteful middleman.
  It's like buying supermarket food and also buying the same food from the farmers themselves.
  
  jasondigitized 7 months ago
  
  It's dangerous to assume that the person you have access to is an expert either.
  
  bilsbie 7 months ago
  
  IMO it’s dangerous to call experts experts as well. Possibly more dangerous.
  
  Angostura 7 months ago
  
  No. Expertise isn’t a synonym for ‘infallible’ it denotes someone whose lived experience, learned knowledge and skill means that you should listen to their opinion in their area of expertise - and defer to it, unless you have direct and evidence-based reasons for thinking they are wrong.
  
  bilsbie 7 months ago
  
  By that definition an expert would be <more> trustworthy. (Usually they want you to look at credentials instead.)
  However that still ignores human nature to use that trust for personal gain.
  Nothing about expertise makes someone a saint.
  
  anshumankmr 7 months ago
  
  They have tried to address it with the help of o1 or o3 model at least to help it understand and reason better than before, but one of the quotes my manager says with regards to these is to trust it but verify it also.
  
  Biologist123 7 months ago
  
  “Believe in God, but tie up your camels”.
- sollewitt 7 months ago
  
  LLMs suffer from the "Igon Value Problem" https://rationalwiki.org/wiki/Igon_Value_Problem
  Similar to reading a pop sci book, you're getting an entertainment from a thing with no actual understanding of the source material rather than an education.
  
  reissbaker 7 months ago
  
  Earlier in this thread, people mention the counterpoint to this: they Google the information from the LLM and do more reading. It's an excellent starting point for researching a topic: you can't trust everything it says, but if you don't know where to start, it will very likely get you to a good place to start researching.
  Similarly, while you can't fully trust everything a journalist says, it's obviously better to have journalism than to have nothing: the "Ikon Value Problem" doesn't mean that journalism should be eradicated. Pre-LLMs, we really had nothing like LLMs in this way.
  
  swiftcoder 7 months ago
  
  > they Google the information from the LLM and do more reading
  The runway on this one seems to be running out fast - how long before all the google results are also non-expert opinions regurgitated by LLMs?
  
  1024core 7 months ago
  
  People are forgetting about the content farms like Associate Content [1]. Since the early aughts, these content farms would happily produce expert-sounding content on anything that people were searching for. They would buy top search terms from search engines like Yahoo, hire English majors for dirt cheap, and have them produce "expert" content targeting those search terms. At least the LLMs have been trained on some relevant data!
  [1] https://en.wikipedia.org/wiki/Yahoo_Voices
  
  jordanb 7 months ago
  
  So with AI Google has cut out the middleman and insourced the content farm.
  
  Roark66 7 months ago
  
  The way I see it they have been like that for at last a decade. Of course before the transformers revolution these were generated in a more crude way, but still the end result is 99% of Google results for any topic have been trash for me since early 200x.
  Google has given up on fighting the SEO crowd long time ago. I worry they give up on the entire idea of search and will just serve answers from their LLM.
  
  cbau 7 months ago
  
  You can turn to actual experts, e.g. YouTube or books. But yes, I have recently had the misfortune of working with a personal trainer who was using ChatGPT to come up with training programs, and it felt confusing and like I was wasting time and money.
  
  pino82 7 months ago
  
  When I'm looking for actual experts, the first thing that comes to my mind is definitely YouTube!!
  And least when it's about YouTube specific topics, like where the like button and the subscribe button is.
  They will tell me. Every. Single. F*cking. 5. Minute. Clip. Again. And. Again.
  Not soooo much for anything actually important or interesting, though.... ;)
  PS: Also which of the always same ~5 shady companies their sponsor is, of course.
  
  knowaveragejoe 7 months ago
  
  Unironically, youtube is a great place to find actual experts on a given subject.
  
  pino82 7 months ago
  
  But he explicitly mentions books. That contrast makes it interesting. I assume that he is explicitly fine with text content.
  And then he does not mention the web in general (or even Reddit - it wouldn't be worth more than an eyeroll to me), but YouTube.
  On the one hand, yeah, well, the web was probably in a better shape in the past. (And YT even is a major aspect of that, imho, but anyways...) On the other hand, you really must be a die hard YT fanatic to only mention that single website (which by the way is mostly video clips, and has all the issues of the entire web), instead of just the web.
  It's really well outside of the sphere of my imagination. The root cause of my reply wasn't even disagreement at first, but surprise and confusion.
  
  pixl97 7 months ago
  
  You've made an error here...
  >They will tell me. Every. Single. F*cking. 5. Minute. Clip. Again. And. Again.
  Do you know why you got that video. Because people liked and subscribed to them and the 'experts' with the best information in the universe are hidden 5000 videos below with 10 views.
  And this is 100% Googles fault for the algorithms they created that force these behaviors on anyone that wants to use their platform and have visibility.
  Lastly, if you can't find anything interesting or important on YT, this points at a failure of your own. While there is an ocean of crap, there is more than enough amazing content out there.
  
  pino82 7 months ago
  
  Yeah, well, I never said that there aren't any experts in any topic who at some point decided to publish something there. The fact that entire generations of human beings basically look there and at TikTok and Instagram for any topic, probably also helps with this decision. It's still wildly bizarre to me anyways when people don't mention the web in general in such a context, but one particular commercial website, which is a lot about video based attention economy (and rather classic economy via so-called influencers). Nothing of that sounds ideal to me when it comes to learning about actually useful topics from actual experts. Not even the media type. It's hard for them to hyperlink between content, it's hard for me to search, to skip stuff, reread a passage or two, choose my own speed for each section, etc, etc. Sure, you can find it somewhere there. In the same spirit, McD is a salad bar, though... ;)
  > And this is 100% Googles fault for the algorithms they created that force these behaviors on anyone that wants to use their platform and have visibility.
  Wrong assumptions. It's not their fault, and a lot of it is probably by intent. It's just that they and you are not in the same boat. You are the product at big tech sites. It's 100% (impersonally) your fault to be sooo resistant understanding that. ;)
  
  knowaveragejoe 7 months ago
  
  LLMs are pretty good at attacking the "you don't know what you don't know" problem on a given topic.
  
  jstummbillig 7 months ago
  
  You just state this as if it was obviously true, but I don't see how. Why is using LLM like reading a pop sci book and not like reading a history book? Or even less like either, because you have to continually ask questions to get anything?
  
  Fargren 7 months ago
  
  A history book is written by someone who knows the topic, and then reviewed by more people who also know the topic, and then it's out there where people can read it and criticize it if it's wrong about the topic.
  A question asked to an AI is not reviewed by anyone, and it's ephemeral. The AI can answer "yes" today, and "no" tomorrow, so it's not possible to build a consensus on whether it answers specific questions correctly.
  
  jstummbillig 7 months ago
  
  A pop sci fi book can be written by someone who knows the topic and reviewed by people who know the topic — and a history book can also not.
  LLM generated answers are more comparable to ad-hoc human expert's answers and not to written books. But it's much simpler to statistically evaluate and correct them. That is how we can know that, on average, LLMs are improving and are outperforming human experts on an increasing number of tasks and topics.
  
  jacobolus 7 months ago
  
  In my experience LLM generated answers are more comparable to an ad-hoc answer by a human with no special expertise, moderate google skills, but good bullshitting skills spending a few minutes searching the web, reading what they find and synthesizing it, waiting long enough for the details to get kind of hazy, and then writing up an answer off the top of their head based on that, filling in any missing material by just making something up. They can do this significantly faster than a human undergraduate student might be able to, so if you need someone to do this task very quickly / prolifically this can be beneficial (e.g. this could be effective for generating banter for video game non-player characters, for astroturfing social media, or for cheating on student essays read by an overworked grader). It's not a good way to get expert answers about anything though.
  More specifically: I've never gotten an answer from an LLM to a tricky or obscure question about a subject I already know anything about that seemed remotely competent. The answers to basic and obvious questions are sometimes okay, but also sometimes completely wrong (but confidently stated). When asked follow-up questions the LLM will repeatedly directly contradict itself with additional answers each as wrong as the first, all just as confidently stated.
  
  TeMPOraL 7 months ago
  
  More like "have already skimmed half of the entire Internet in the past", but yeah. That's exactly the mental model IMO one should have with LLMs.
  Of course don't forget that "writing up an answer off the top of their head based on that, filling in any missing material by just making something up" is what everyone does all the time, and in particular it's what experts do in their areas of expertise. How often those snap answers and hasty extrapolations turn out correct is, literally, how you measure understanding.
  EDIT:
  There's some deep irony here, because with LLMs being "all system 1, no system 2", we're trying to give them the same crutches we use on the road to understanding, but have them move the opposite direction. Take "chain of thought" - saying "let's think step by step" and then explicitly going through your reasoning is not understanding - it's the direct opposite of it. Think of a student that solves a math problem step by step - they're not demonstrating understanding or mastery of the subject. On the contrary, they're just demonstrating they can emulate understanding by more mechanistic, procedural means.
  
  jacobolus 7 months ago
  
  Okay, but if you read written work by an expert (e.g. a book published by a reputable academic press or a journal article in a peer-reviewed journal), you get a result whose details were all checked out, and can be relied on to some extent. By looking up in the citation graph you can track down their sources, cross-check claims against other scholars', look up survey sources putting the work in context, think critically about each author's biases, etc., and it's possible to come to some kind of careful analysis of the work's credibility and assess the truth value of claims made. By doing careful search and study it's possible to get to some sense of the scholarly consensus about a topic and some idea of the level of controversy about various details or interpretations.
  If instead you are reading the expert's blog post or hastily composed email or chatting with them on an airplane you get a different level of polish and care, but again you can use context to evaluate the source and claims made. Often the result is still "oh yeah this seems pretty insightful" but sometimes "wow, this person shouldn't be speculating outside of their area of expertise because they have no clue about this".
  With LLM output, the appropriate assessment (at least in any that I have tried, which is far from exhaustive) is basically always "this is vaguely topical bullshit; you shouldn't trust this at all".
  
  twometwo 7 months ago
  
  I am just curious about this. You said the word never, and I think your claim can be tested, perhaps you could post a list of five obscure questions for a LLM to answer and then someone could ask that to a good LLM for you, or an expert in that field, to assess the value of the answers.
  Edited: I just submitted an ASK HN post about this.
  
  jstummbillig 7 months ago
  
  > I've never gotten an answer from an LLM to a tricky or obscure question about a subject I already know anything about that seemed remotely competent.
  Certainly not my experience with the current SOTA. Without being more specific, it's hard to discuss. Feel free to name something that can be looked at.
  
  SheinhardtWigCo 7 months ago
  
  The same is true of Google, no?
  
  TeMPOraL 7 months ago
  
  > A question asked to an AI is not reviewed by anyone, and it's ephemeral. The AI can answer "yes" today, and "no" tomorrow, so it's not possible to build a consensus on whether it answers specific questions correctly.
  It's even more so with humans! Most of our conversations are, and has always been, ephemeral and unverifiable (and there's plenty of people who want to undo the little of permanence and verifiability we still have on the Internet...). Along the dimension of permanence and verifiability, asking an LLM is actually much better than asking a human - there's always a log of the conversation you had with the AI produced and stored somewhere for at least a while (even if only until you clear your temp folder), and if you can get ahold of that log, you can not just verify the answers, you can actually debug the AI. You can rerun the conversation with different parameters, different prompting, perhaps even inspect the inference process itself. You can do that ten times, hundred times, a million times, and won't be asked to come to Hague and explain yourself. Now try that with a human :).
  
  Fargren 7 months ago
  
  The context of my comment was what is the difference between an AI and a history book. Or going back to the top comment, between an AI and an expert.
  If you want to compare AI with ephemeral unverifiable conversations with uninformed people, go ahead. But that doesn't make them sound very valuable. I believe they are more valuable than that for sure, but how much, I'm not sure.
  
  Flenkno 7 months ago
  
  when i tried studying, i got really frustrated because i had to search for so many things and not a lot of people would explain basic math things to me in a simple way.
  LLMs do already a lot better job at this. A lot faster, accurate enough and easy to use.
  I can now study something alone which i was not able to do before.
  
  ConceptJunkie 7 months ago
  
  > accurate enough
  Ask it something non-trivial about a subject you are an expert in and get back to me.
  
  Flenkno 7 months ago
  
  Accurate enough for it to explain to me details of 101, 201 and 301 university courses in math or physics.
  Besides, when i ask it about things like SRE, Cloud etc. its a very good starting point.
  
  dambi0 7 months ago
  
  Sadly I lack expertise. Do you have any concrete examples? How does, say the Wiki entry on the topic compare to your expert opinion.
  
  TeMPOraL 7 months ago
  
  Oh so you mean I have at my fingertips a tool that can generate me a Scientific American issue on any topic I fancy? That's still some non-negative utility right there :).
  
  scott_w 7 months ago
  
  A Scientific American issue where the authors have no idea that they don’t know a topic so just completely make up the content, including the sources. At least magazine authors are reading the sources before misunderstanding the content (or asking the authors what the research means).
  I don’t even trust the summaries after watching LLMs think we have meetings about my boss’s cat just because I mentioned it once as she sniffed the camera…
  
  wilg 7 months ago
  
  Its good to not trust it but that's not the same as it having no idea. There is a lot of value in being close for many tasks!
  
  scott_w 7 months ago
  
  I think it’s a very dangerous place to be in an area you’re not familiar with. I can read Python code and figure out if it’s what I want or not. I couldn’t read an article about physics and tell you what’s accurate and what’s not.
  Legal Eagle has a great video on how ChatGPT was used to present a legal argument, including made up case references! Stuff like this is why I’m wary to rely on it in areas outside of my expertise.
  
  andreasmetsala 7 months ago
  
  There’s a world of difference between blindly trusting an LLM and using it to generate clues for further research.
  You wouldn’t write a legal argument based on what some random stranger told you, would you?
  
  scott_w 7 months ago
  
  > Oh so you mean I have at my fingertips a tool that can generate me a Scientific American issue on any topic I fancy?
  I’m responding to this comment, where I think it’s clear that an LLM can’t event achieve the goal the poster would like.
  > You wouldn’t write a legal argument based on what some random stranger told you, would you?
  I wouldn’t but a lawyer actually went to court with arguments literally written by a machine without verification.
  
  TeMPOraL 7 months ago
  
  > I’m responding to this comment, where I think it’s clear that an LLM can’t event achieve the goal the poster would like.
  I know it can't - the one thing it's missing is the ability to generate coherent and correct (and not ugly) domain-specific illustrations and diagrams to accompany the text. But that's not a big deal, it just means I need to add some txt2img and img2img models, and perhaps some old-school computer vision and image processing algos. They're all there at my fingertips too, the hardest thing about this is finding the right ComfyUI blocks to use and wiring them correctly.
  Nothing in the universe says an LLM has to do the whole job zero-shot, end-to-end, in a single interaction.
  > I wouldn’t but a lawyer actually went to court with arguments literally written by a machine without verification.
  And surely a doctor somewhere tried to heal someone with whatever was on the first WebMD page returned by Google. There are always going to be lazy lawyers doctors doing stupid things; laziness is natural for humans. It's not a valid argument against tools that aren't 100% reliable and idiot-proof; it's an argument for professional licensure.
  
  scott_w 7 months ago
  
  Your entire argument seems to be “it’s fine if you’re knowledgeable about an area,” which may be true. However, this entire discussion is in response to a comment who is explicitly not knowledgeable in the area they want to read about.
  All the examples you give require domain knowledge which is the opposite of what OP wants, so I’m not sure what your issue is with what I’m saying.
  
  raducu 7 months ago
  
  > Its good to not trust it but that's not the same as it having no idea. There is a lot of value in being close for many tasks!
  The task is to replace hazelcast with infinispan in a stand-alone IMDG setup. You're interested in Locks and EntryProcessors.
  Ghat GPT 4, o1 tell you with their enthusiastic style Infinispan has all those features.
  You test it locally and it does....
  But the thing is infinispan doesn't have explicit locks in client-server mode, just in embedded mode, but that's something you find out from another human who has tied doing the same thing.
  Are you better off using Chat GPT in this case?
  I could go on and on and on, on times Chat GPT has bullshitted me and wasted days of my time, but hey, it helps with one-liners and Copilot occasionally has spectacular method auto-complete and learns on the fly some stuff and it makes my cry when it remembers random tidbits about me that not even family members do
  
  ben_w 7 months ago
  
  Given I have never heard of any of {hazelcast, infinispan, IMDG, EntryProcessors}, even that kind of wrong would probably be a improvement by virtue of reducing the time I spend working on the wrong answer.
  But only "probably" — the very fact that I've not heard of those things means I don't know if there's a potential risk from trying to push this onto a test server.
  You do have a test server, and aren't just testing locally, right? Whatever this is?
  
  raducu 7 months ago
  
  >. You do have a test server, and aren't just testing locally, right? Whatever this is?
  Of course I didn't test in a client-server setup, that's why chat gpt manage to fool me, because I know all those terms, and that was not the only alternative I looked up. Before trying Infinispan I tried Apache Ignite and the api was the same for client-server and embedded mode; in hazelcast the api was the same for client-server and embedded mode, so I just presumed it would be the same for Infinispan AND I had Chat GPT re-assuring me.
  The takeaway about Chat GPT for me is -- if there's plenty of examples/knowledge out there, it's ok to trust it, but if you're pushing the envelope, the knowledge is obscure, not many examples, DO NOT TRUST it.
  DO NOT assume that just because the information is in the documentation, chat GPT has the knowledge or insight and you can cut corners by asking chat GPT.
  And it's not even obscure information -- we've asked Chat GPT about the behavior of PostgreSql batchupserts/locking and it also failed to understand how that works.
  Basically, I cannot trust it on anything that's hard -- my 20 years of experience have made me weary of certain topics and whenever those come up, I KNOW that I don't know, I KNOW that that particular topic is tricky, obscure, niche and my output is low confidence, and I need to slow down.
  The more you use Chat GPT, the more likely it will screw you over in subtle ways; I remember being very surprised about how could so very subtle bugs arise EXACTLY in the pieces of code I deemed very unlikely to need tests.
  I know our interns/younger folks use it for everything and I just hope there's got to be some ways to profit from people mindlessly using it.
  
  mvc 7 months ago
  
  > There is a lot of value in being close for many tasks!
  horseshoes and hand-grenades?
  
  TeMPOraL 7 months ago
  
  Yes. Despite this apparently popular saying, "close enough" is sufficient in almost everything in life. Usually it's the best you can get anyway - and this is fine, because on most things, you can also iterate, and then the only thing that matters is that you keep getting closer (fast enough to converge in reasonable time, anyway).
  Where "close" does not count, it suggests there's some artificial threshold at play. Some are unavoidable, some might be desirable to push through, but in general, life sucks when you surround yourself or enforce artificial hard cut-offs.
  
  webmaven 7 months ago
  
  I notice that you've just framed most knowledge creation/discovery as a form of gradient descent.
  Which it is, of course.
  
  Zambyte 7 months ago
  
  So they have reached human level intelligence :D
  
  fsloth 7 months ago
  
  Yes! But now you get a specific pop sci book _in any subject you want to learn about_ and _you can ask the book about comparisons_ (e.g. how were Roman and Parthian legal systems similar?). This at leas gives you a bunch of keywords to go silly in wikipedia and publications (sci-hub! Cough! Sci-hub!)
- ithadtobe119 7 months ago
  
  (throwaway account because of what I'm about to say, but it needs to be said)
  While my main use case for LLMs is coding just like most people here, there are lots of areas that are being ignored.
  Did you know llama 3.X models have been trained as psychotherapists? It's been invaluable to dump and discuss feelings with it in ways I wouldn't trust any regular person. When real therapists also cost more than what people can afford (and will have you committed if you say the wrong thing), this ends up being a very good option.
  And you know how escorts are traditionally known as therapists lite? Yeah, it works in reverse too. The main use case most are sleeping on is, well, emotional porn and erotic role play. Let me explain.
  My generation (i.e. Z) doesn't do drugs, we don't drink, we don't go out. Why? Because we can hang on discord, play games, scroll tiktok and goon to our heart's content. 60% of gen Z men are single, 30% women. The loneliness epidemic hit hard along with covid. It's basically a match made in heaven for LLMs that can pretend to love you, like everything about you, ask you about your day, and of course, can sext on a superhuman level. When you're lonely enough, the fact that it's all just simulated doesn't matter one bit.
  It's so interesting that the porn industry is usually on the forefront of innovation, adopting blueray and hddvd and whatnot before anyone else, but they're largely asleep on this and so is everyone else who doesn't want to touch of it with a 10ft pole. Well except maybe c.ai to some extent. The business case is there and it's a wide open market that OAI, Anthropic, Google and the rest won't ever stoop down to themselves, so the bar for entry is far lower.
  Right now the best experience is known to be heading over to r/locallama by doing it yourself, but there's millions to be made for someone who improves it and figures out a platform to sell it on in the next few years. It can be done well enough with existing properly tuned, open weight, apache licensed LLMs and progress isn't stopping.
  
  hawk_ 7 months ago
  
  While I empathize with the therapeutic effects, wouldn't this create even more powerful echo chambers? May be so many men and women of your generation are single because of already established echo chambers.
  It's in our nature to crave outside acceptance of who we are. But may be taken to extreme, when we stop being wanting to be challenged at all we could lose touch with reality, society...
  
  ithadtobe119 7 months ago
  
  I don't think anyone is saying that porn is healthy or something anyone should consume. Or smoking or whatever, but unhealthy enjoyable things are trillion dollar industries regardless.
  The thing is though, LLMs do whatever you tune them to do. If you train them on a sycophantic corporate drone butler dataset, you get the average assistant model that's obviously a bad fit for this use case. If you train them on something else, you get whatever you want, even someone that challenges you. I wouldn't be surprised if having some sort of simulated soulmate partner thing that also does the job of an educator and life guide will be the norm in the future.
  
  atq2119 7 months ago
  
  > 60% of gen Z men are single, 30% women
  I always do a double take when I read such statistics. How can they possibly add up? Are gen Z men considered particularly undesirable leading to lots of relationships with large age gaps? Is there a ridiculously large overhang of gay women (over men)? Is there a huge number of men with multiple partners?
  These gender disparities are difficult enough to believe when they come to sexual relations, it gets even harder when talking about relationships.
  I guess what I'm saying is: I don't believe those numbers as stated and would be interested in an explanation or at least a source.
  
  ithadtobe119 7 months ago
  
  I think I recall that being somewhat disputed because the relationship status was self reported, some suggested that men might not consider certain types of relationships as serious but women do, so there's a disparity in reporting what is and isn't an actual relationship and the reality might be more balanced. Sweden statistics, xd.
  From what I can find after a brief search, there's this one [0] that claims 63% for men, 34% for women, and [1] there's a a generally known toxicity around dating these days that makes these numbers entirely believable. I don't pretend to have a large enough network of acquaintances to make a good guess, but hardly anyone I know isn't single, and I know maybe two or three religious types that are actually married.
  As for gen Z men being especially undesirable, there's well... [2].
  [0] https://www.pewresearch.org/short-reads/2023/02/08/for-valen...
  [1] https://old.reddit.com/r/GenZ/comments/1eo9bzj/interesting_b...
  [2] https://www.ft.com/content/29fd9b5c-2f35-41bf-9d4c-994db4e12...
  
  FrustratedMonky 7 months ago
  
  So are you saying, some gen Z men are in a relationship, but don't know it? I do buy that, it seems to be the basis of some rom-com plots. The clueless guy that doesn't know he's being reeled in.
  Other factor.
  As the other post suggested. There are large age gaps. Women date older, men date younger. This is also long known. Does it add up to 60/30? That does seem high, but maybe with every other factor thrown in, it explains it?
  
  johnmaguire 7 months ago
  
  > So are you saying, some gen Z men are in a relationship, but don't know it?
  Or, you know, are leading women on.
  
  FrustratedMonky 7 months ago
  
  Both could be happening. Guess if we are assigning some guilt, then it would depend on self awareness?
  
  staticman2 7 months ago
  
  I do find these reported numbers hard to believe.
  I could certainly invent explanations for them. For example, I can say "no man would date until they've earned enough money to buy a house." This means younger males won't be dating but that doesn't appear to describe the world we live in.
  I could say "Every man who dates is dating 2 women" but that also doesn't appear to describe the world we live in.
  
  niemandhier 7 months ago
  
  > It's so interesting that the porn industry is usually on the forefront of innovation, adopting blueray and hddvd and whatnot before anyone else, but they're largely asleep on this
  Isn’t that the result of major credit card companies banning certain uses, thus pruning branches from the tree of possible futures ?
  What we need is digital central bank money in some form, to get rid of that type of censorship.
  
  fxtentacle 7 months ago
  
  I remember seeing an article discussed here on HN a while ago about OnlyFans creators using LLMs to automate the pretend personal relationship with paying fans.
  Isn't that exactly what you suggest? A paid one-sided relationship that helps people feel better about themselves, with a bit of naughtiness mixed in.
  
  ithadtobe119 7 months ago
  
  Ah shit you're right, I forgot about that, yeah they are absolutely on it. I guess it makes more profit for people to believe that they're actually talking to a real person if they can't tell the difference anyway.
  
  FrustratedMonky 7 months ago
  
  This seems to be part of a side plot in Blade Runner 2049.
  The movie was about replicants of course, but in the background, the technology shown with the AI being a companion, it was a huge corporate hit, a big seller. In the background you see ad's for it, and they reference it as their most popular product. And, as you allude to, in the movie it was both for loneliness AND sexual. They interacted like a relationship with talking and hooking up.
  I don't doubt that with current AI, something similar could be done. We're just missing the holograms.
  And as you say, I'm sure the porn industry will catch on.
  Kind of crazy how Porn isn't leading this tech wave like past ones. Maybe because people are scared of tracking?
  
  redmajor12 7 months ago
  
  What I liked about this in Blade Runner, was that if replicants are "people" (more the topic of the first movie), then it's not much of a stretch to consider software-AI as people, too. It would have been great if this question had been further explored in the 2md movie instead of just accepted.
  
  johnisgood 7 months ago
  
  I thought it is, with CSAM.
  
  webmaven 7 months ago
  
  There are other players in this space than c. ai. One of the more interesting (and apparently less cynical) ones is Nomi. Tinkering with personalities on their platform can be quite fascinating.
  It is possible, for example, to create a cunning and manipulative schemer that is entirely devoted to mentoring you with no romantic component whatsoever.
- croes 7 months ago
  
  How do you know the answers are correct?
  More than once I got eloquent answer that are completely wrong.
  
  superultra 7 months ago
  
  I give AI a “water cooler chat” level of veracity, which means it’s about as true as chatting with a coworker at a water cooler when that used to happen. Which is to say if I just need to file the information away as a “huh” it’s fine, but if I need to act on it or cite it, I need to do deeper research.
  
  FergusArgyll 7 months ago
  
  Yes, so often I see/hear people asking "But how can you trust it?!"
  I'm asking it a question about social dynamics in the USSR, what's the worst thing that'll happen?! I'll get the wrong impression?
  What are people using this for? are you building a nuclear reactor where every mistake is catastrophic?
  Almost none of my interactions with LLMs "Matter", they are things I'm curious about, if 10 out of 100 things I learnt from it are false, then I learned 90 new things. And these are things which mostly I'd have no way to learn about otherwise (without spending significant money on books/classes etc.)
  
  madmask 7 months ago
  
  I try hard not to pollute my learning with falsehoods. Like I really hate spending time learning bs, not knowing is way better than knowing something wrong.
  
  das_keyboard 7 months ago
  
  If you don't care if it's correct or not you can also just make the stuff up. No need to pay for AI to do it for you.
  
  saagarjha 7 months ago
  
  Yes, but how do you know which is which?
  
  cowsaymoo 7 months ago
  
  That is also a broader epistemological question one could ask about truth on the internet or even truth in general. You have to interrogate reality
  
  johnmaguire 7 months ago
  
  That's certainly true, but I think it's also true that you have more contextual information about the trustworthiness of what you're reading when you pick up a book, magazine, or load a website.
  As a simple example, LLMs will happily incorporate "facts" learned from marketing material into it's knowledgebase and then regurgitate it as part of a summary on the topic.
  
  brookst 7 months ago
  
  How do you address this problem with people? More than once a real live person has told me something that was wrong,
  
  fzeindl 7 months ago
  
  You can divide your approach to asking questions with people (and I do believe this is something people do):
  1. You ask someone you can trust for facts and opinions on topics, but you keep in mind that the answer might only be right in 90% of the cases. Also people tend to tell you if the are not sure.
  2. For answers you need to rely on you ask people who are legally or professionally responsible if they give you wrong advice: doctors, lawyers, car mechanics, the police etc.
  ChatGPT can‘t lose it‘s job if it informs you incorrectly.
  
  dvdbloc 7 months ago
  
  If ChatGPT keeps giving you wrong answers wouldn’t this make paying customers leave? Effectively “losing its job”. But I guess you could say it acts more like the person that makes stuff up at work if they don’t know, instead of saying they don’t know.
  
  intended 7 months ago
  
  There was an article here just a few days ago, which discussed how firms can be ineffective, and still remain competitive.
  https://danluu.com/nothing-works/
  The idea that competition is effective, is often in spherical cow territory.
  There’s tons of real world conditions which can easily let a firm be terrible at their core competency, and still survive.
  
  Zambyte 7 months ago
  
  > But I guess you could say it acts more like the person that makes stuff up at work if they don’t know, instead of saying they don’t know.
  I have had language models tell me it doesn't know. Usually when using a RAG-based system like Perplexity, but they can say they don't know when prompted properly.
  
  staticman2 7 months ago
  
  I've seen Perplexity misrepresent search results and also interpret them differently depending on whether GPT4o or Claude Sonnett 3.5 are being used.
  
  debesyla 7 months ago
  
  I'm not sure about your local laws, but at least in Lithuania it's completely legal to give a wrong advice (by accident, of course)... Even a notary specialist would at most get to pay a larger insurance payment for a while, because human errors falls under professional insurance.
  
  staticman2 7 months ago
  
  You are contradicting yourself. If the notary specialist needs insurance then there's a legal liability they are insuring against.
  If you had written "notaries don't even get insurance because giving bad advice is not something you can be sued for" you would be consistent.
  
  croes 7 months ago
  
  Experience. If I recognize they give unreliable answers on a specific topic I don’t question them anymore on that topic.
  If they lie on purpose I don’t ask them anything anymore.
  The real experts give reliable answers, LLMs don’t.
  The same question can yield different results.
  
  TeMPOraL 7 months ago
  
  So LLMs are unreliable experts, okay. They're still useful if you understand their particular flavor of unreliability (basically, they're way too enthusiastic) - but more importantly, I bet you have exactly zero human experts on speed dial.
  Most people don't even know any experts personally, much less have one they could call for help on demand. Meanwhile, the unreliable, occasionally tripping pseudo-experts named GPT-4 and Claude are equally unreliably-expert in every domain of interest known to humanity, and don't mind me shoving a random 100-pages long PDF in their face in the middle of the night - they'll still happily answer within seconds, and the whole session costs me fractions of a cent, so I can ask for a second, and third, and tenth opinion, and then a meta-opinion, and then compare&contrast with search results, and they don't mind that either.
  There's lots to LLMs that more than compensates for their inherent unreliability.
  
  discreteevent 7 months ago
  
  > Most people don't even know any experts personally, much less have one they could call for help on demand.
  Most people can read original sources.
  
  signatoremo 7 months ago
  
  Which sources? How do I know I can trust the sources that I found?
  
  TeMPOraL 7 months ago
  
  They can, but they usually don't, unless forced to.
  (Incidentally, not that different from LLMs, once again.)
  
  a1j9o94 7 months ago
  
  How do you even know what original sources to read?
  
  skydhash 7 months ago
  
  There's something called bibliography at the end of every serious books.
  
  ben_w 7 months ago
  
  I am recalling CGP Grey's descent into madness due to actually following such trails through historical archives: https://www.youtube.com/watch?v=qEV9qoup2mQ
  Kurzgesagt had something along the same lines: https://www.youtube.com/watch?v=bgo7rm5Maqg
  
  brookst 7 months ago
  
  And yet here you are making an unsourced claim. Should I trust your assertion of “most”?
  
  nuancebydefault 7 months ago
  
  It's not that black and white. I know of no single person who is correct all the time. And if I would know such person, i still would not be sure, since he would outsmart me.
  I trust some LLMs more than most people because their BS rate is much much lower than most people I know.
  For my work, that is easy to verify. Just try out the code, try out the tool or read more about the scientific topic. Ask more questions around it if needed. In the end it all just works and that's an amazing accomplishment. There's no way back.
  
  m0llusk 7 months ago
  
  In my experience hesitating to answer questions because of the complexity of involved material is a strong indicator of genuine expertise linked with conscientiousness. Careless bullshitters like LLMs don't exhibit this behavior.
  
  Mawr 7 months ago
  
  I can draw on my past experience of interacting with the person to assign a probability to their answer being correct. Every single person in the world does this in every single human interaction they partake in, usually subconsciously.
  I can't do this with an LLM because it does not have identity and may make random mistakes.
  LLMs also lack the ability to say "I don't know", which my fellow humans have.
  
  intended 7 months ago
  
  It’s trivial to address this.
  You ask an actual expert.
  I don’t treat any water cooler conversation as accurate. It’s for fun and socializing.
  
  wilg 7 months ago
  
  Asking an expert is only trivial if you have access to an expert to ask!
  
  lifeisstillgood 7 months ago
  
  And can judge which one is an expert and which one is bullshiting for the consultancy fee.
  
  FrustratedMonky 7 months ago
  
  And as we've seen in last few years, large chunks of population do not trust experts.
  Think this thread has gone from "how to Trust AI", to "how do we Trust Anything".
  
  intended 7 months ago
  
  This is a true statement.
  This is also not related to the problem being trivialized in the presented solution.
  Lack of access to experts, doesn’t improve the quality of water cooler conversations.
  
  huxley 7 months ago
  
  Well if you’re a sensible person, you stop treating them as subject matter expert
  
  szundi 7 months ago
  
  and people just don't know what they don't know - they just answer sillyness the same way
  
  K0balt 7 months ago
  
  All you have to do is just remember you’re asking your uncle bob, a man of extensive usually not too inaccurate knowledge.
  There’s no reason a source has to be authoritative, just because it’s a computer.
  It is a bit of an adjustment, though. We are used to our machines being accurate, or failing loudly.
  But, looks like the future is opinionated machines.
  
  synergy20 7 months ago
  
  so do teachers and books, in the future we need have multiple variants to cross check
  
  croes 7 months ago
  
  Cross check against what? AI generated texts will flood the internet and burry the real knowledge just like SEO did before. But this time the fake knowledge will be less obvious and harder to check.
  
  bradchris 7 months ago
  
  If that turns out to be true, the it looks like AI just gave universities a new reason for being.
  What a shift from twenty years ago when optimism over “information superhighways” on the “world wide web” would end knowledge gatekeeping and educate the masses, to now— worries of AI slop and finely tuned ML algorithms frying older and younger generations’ brains, while information of human value gets buried, siloed, and paywalled, with no way to verify anything at all.
  
  synergy20 7 months ago
  
  models from different vendors,plus google search. for serious stuff, we'll still have to check manually ourselves
  
  tomjen3 7 months ago
  
  You enable the search functionality.
  
  patcon 7 months ago
  
  There's something here that I feel is pretty deep, though offensive for some minds: What is the actual consequence of being wrong? Of not getting right the base reality of a situation?
  Usually, stasis is the enemy that is much great than false information. If people with 90% truth can take a step forward in the world, even if they mistakenly think they have 100% truth, what does it matter? They're learning more and acting more for that step taken. If the mistaken ground truth is false and importantly enough false, they'll learn it bc their experience is grounded in the reality the navigate anyhow. If they don't learn it, it's of no consequence.
  This is on my mind because I work in democratic reform, and I am acutely aware (from books like "Democracy for Realists", that eviscerate common assumptions about "how democracy works") that it often doesn't matter if we understand how democracy is working, so long as we feel like we do, enough to take steps forward and keep trying and learning. We literally don't even know how democracy works, and yet we've been living under it for centuries, to decent enough ends.
  I think often about the research of Donald Hoffman. His lab runs evolutionary simulations, putting "creatures" that see "reality" (of the simulation) against creatures that see only "fitness" (the abstraction, but also the lie, that is more about seeing what gets the creature living to the next click of the engine, whether that's truth or falsehood about the reality). https://www.youtube.com/watch?v=oYp5XuGYqqY
  Basically, creatures that see only fitness (that see only the lie), they drive to extinction every creature that insists on seeing "reality as it is".
  I take this to mean truth is in no way, shape, or form favoured in the universe. This is just a convinient lie we tell ourselves, to motivate our current cultural work and preferences.
  So tl;dr -- better to move forward and feel high agency with imperfect information, than to wait for a full truthful solution that might never come, or might be such high cost as to arrive too late. Those moving forward rapidly with imperfect information will perhaps drive to extinction those methods that insist on full grounding in reality.
  Maybe this is always the way the world has worked... I mean, does any mammal before us have any idea how any of reality worked? No, they just used their senses to detect the gist of reality (often heuristics and lies), and operated in the world as such. Maybe the human sphere of language and thought will settle on similar ruthlessness.
  
  jval43 7 months ago
  
  Incorrect information by itself is at best useless. Incorrect information that is thought to be correct is outright dangerous. Objective truth is crucial to science and progress.
  We've come too far since the age of enlightenment to just give it all up.
  
  patcon 7 months ago
  
  The hundred year functioning of democracy begs to differ. It literally works nothing like how anyone tells themselves it does, not just laypeople, but arguably even political scientists. It's quite possible that no echelon of society has had the correct story so far, and yet... (again, see "Democracy for Realists")
  Also, the vision heuristics that brains use to help us monitor motion as another obvious example. They lie. They work. They won.
  https://x.com/foone/status/1014267515696922624?s=46
  > Objective truth is crucial to science
  Agreed. We define science and science is truth about base reality.
  > Objective truth is crucial to [...] progress.
  More contentious imho. Depends if progress is some abstract human ideal that we pursue, or simply "survival". If it's the former, maybe objective truth is required. If it's the latter, I find the simulation evidence to be that over-adherence to objective truth (at least information-theoretically) is in fact detrimental to our survival.
  
  DanHulton 7 months ago
  
  > “My father once told me that respect for truth comes close to being the basis for all morality. 'Something cannot emerge from nothing,' he said. This is profound thinking if you understand how unstable 'the truth' can be.”
  Frank Herbert, Dune
  
  intended 7 months ago
  
  Yes! There’s no ‘element’ of truth. Funnily enough, this isn’t a philosophical question for me either.
  The industrialization of content generation, misinformation, and inauthentic behavior are very problematic.
  I’ve hit on an analogy that’s proving very resilient at framing the crossroads we seem to be at - namely the move to fiat money from the gold standard.
  The gold standard is easy to understand, and fiat money honestly seems like madness.
  This is really similar to what we seem to be doing with genAI, as it vastly outstrips humanity’s capacity to verify.
  There’s a few studies out there that show that people have different modes of content consumption. A large chunk of content consumption is for casual purposes, and without any desire to get mired into questions of accuracy. About 10% of the time (some small %, I don’t remember the exact) people care about the content being accurate.
- mort96 7 months ago
  
  The ability to "talk to an expert" on any topic would indeed have been very useful. Sadly, we have the ability to talk to something which tries very very hard to appear as an expert despite knowing nothing about the subject. A human who knows some things pretty well but will talk about stuff they don't know with the same certainty and authority as they walk about stuff they know is a worthless conversation partner. In my experience,"AI" is that but significantly worse.
- 1209412comb 7 months ago
  
  Semi-related but I find that sometime it just completely ruined a type of conversation.
  Like as in your example, I would previously asked people "how would 911 handle an US Reservation Area", and watch how my friends think and reason. To me getting a conclusive answer was not a point. Now they just copy & paste Chat GPT, no fun haha.
  
  Sharlin 7 months ago
  
  That's just the 2020s version of how Google and smartphones ruined the ages-old social pastime of arguing about trivia in a pub :P
  
  atonse 7 months ago
  
  Yeah it can definitely be a crutch too in some situations. I notice it with my kids where they’ll want to tell me about something but then seek a video or something to show it.
  Sometimes I have to say “no! just use your words to describe it! I want to hear your description”
  
  Hugsun 7 months ago
  
  I think it's good of you to make them critically engage with the subject by verbalizing it themselves. Evidence suggests that video consumption is relatively un-engaging mentally, likely as it demands nothing of you.
- stingraycharles 7 months ago
  
  For me the problem is that you always need to double-check this particular type of expert, as it can be confidently wrong about pretty much any topic.
  It's useful as a starting point, not as a definitive expert answer.
  
  brookst 7 months ago
  
  What human experts do you blindly trust without double checking?
  
  CobrastanJorji 7 months ago
  
  Most human experts, when asked about their area of expertise, don't parrot what some guy said as joke on Reddit five years ago.
  Most lawyers, when you ask them to write a brief, will cite only real cases.
  
  Fricken 7 months ago
  
  I coined the term "fancy cruise control" on reddit, as a joke, to describe Autopilot. One of the mods of the self-driving car sub thought the term was so funny he made a joke subreddit for it. A few years later Tesla lawyers invoked the term in court to downplay the capabilities of autopilot in court.
  
  spotplay 7 months ago
  
  "Most" is the key word here. In my experience that's also the case for LLMs.
  
  shafyy 7 months ago
  
  LLM proponents really have succeeded in moving the overton window on this discussion. "Sure, you cannot trust LLMs, but you cannot trust humans, either".
  
  brookst 7 months ago
  
  I don’t think “Overton window” works in that construction. It typically refers to the range of politically acceptable opinions.
  LLMs are too new to have such a thing. It sounds like you’re an “LLM opponent” (whatever that means) who believes the appropriate standard is infallibility? I don’t even get that line of thinking, but you’re welcome to it. But let’s not pretend this is a decades-long topic with a social consensus that people try to influence.
  
  shafyy 7 months ago
  
  I didn't mean overton window in a political sense (not a English native speaker). It's more about moving the goal post maybe.
  > I don’t even get that line of thinking, but you’re welcome to it
  I would not say "LLM oponent". Rather "LLM critic". I'm not against LLMs as a technology. I'm worried about how the technology is deployed and used, and what the consequences are. Specifically, copyright issues, power use issues, inherent biases in the traning data that strengthen existing discrimation against minorities, raciscm and sexism. I'm not convinced by the hype created by LLM proponents (mostly investors and other companies and people who financially benefit from LLMs). I'm not saying that machine learning doesn't bring any value or does not have use cases. I'm talking more about the recent AI/LLM hype.
  
  delusional 7 months ago
  
  Most of them. Are you constantly doing validation studies for every piece of information you take in? If the independent experts tell me that a new car is safe to drive, then I trust them.
- ben_w 7 months ago
  
  > The ability to “talk to an expert” about any topic I’m curious about and ask very specific questions has been invaluable to me.
  Even the ability to talk to a university work placement student/intern in any topic is very useful, never mind true experts.
  Even Google's indexing and Wikipedia opened up a huge quantity of low-hanging fruit for knowledge sharing; Even to the extent that LLMs must be treated with caution because the default mode is over-confident, and even to the extent one can call them a "blurry JPEG of the internet", LLMs likewise make available a lot of low-hanging fruit before we get to an AI that reasons more like we do from limited examples.
- greentxt 7 months ago
  
  Libraries and books were pretty cool too though. You could go to a library and find information on anything and a librarian would help you. Not super efficient but good for humans.
- zwnow 7 months ago
  
  Talk to an expert? You are aware of them hallucinating right?
- cess11 7 months ago
  
  I've been "talking" quite a bit with Ollama models, they're often confidently wrong about Wikipedia level stuff and even if the system prompt is explicitly constrained in this regard. Usually I get Wikipedia as understood by a twelve year old with the self-confidence of adult Peter Thiel. If it isn't factually wrong, it's often subtly wrong in the way that a cursory glance at some web search results is unlikely to rectify.
  It takes more time for me to verify the stuff they output than grabbing a book off Anna's Archive or my payed collections and looking something up immediately. I'd rather spend that time making notes than waiting for the LLM to respond and double checking it.
- footy 7 months ago
  
  > For insanely curious people who often feel unsatisfied with the answers given by those around them, it’s the greatest thing ever.
  As an insanely curious person who's often unsatisfied with the answers given by those around me, I can't agree. The greatest thing ever is libraries. I don't want to outsource my thinking to a computer any more than I want to outsource it to the people around me.
- cardanome 7 months ago
  
  In the not so distant past we already had a tool that allowed us to look up any question that came into our minds.
  It was super fast and always provided you with sources. It never hallucinated. It was completely free except for some advertisement. You could build a whole career out of being good at using it.
  It was a search engine. Young people might not remember but there was a time when Google wasn't shite but actually magic.
  
  ben_w 7 months ago
  
  > we already had a tool that allowed us to look up any question that came into our minds … It never hallucinated. … It was a search engine.
  Except for all the times the search results were wrong answers.
  https://searchengineland.com/when-google-gets-it-wrong-direc...
  
  cardanome 7 months ago
  
  Being biased is not the same as hallucinating. LLMs have both problems.
  At least you could check whether a source was reputable and where the bias was. With LLM's the connection between the answer and the source is completely lost. You can't even tell why it answered a certain way.
  
  ben_w 7 months ago
  
  > Being biased is not the same as hallucinating. LLMs have both problems.
  I didn't deny either of those things, I said that search engines also hallucinate — my actual link gave several examples, including "King of the United States" -> "Barack Obama".
  Just because it showed the link to breitbart doesn't mean it was not hallucinating.
  > At least you could check whether a source was reputable and where the bias was.
  The former does not imply the latter. You could tell where a search engine got an answer from, but not which answers were hidden — an argument that I saw some on the American right make to criticise Google for failing to show their version of events.
  > With LLM's the connection between the answer and the source is completely lost. You can't even tell why it answered a certain way.
  Also not so. The free version of ChatGPT supports search directly, so it allows you to have references.
  
  skydhash 7 months ago
  
  > I said that search engines also hallucinate — my actual link gave several examples
  They don't. Google added a weird widget that do hallucinate. But the result list is still accurate, even though it may be biased towards certain sources.
  > You could tell where a search engine got an answer from, but not which answers were hidden
  A bit pedantic, but a search engine returns a list of results according to the query you posted. There's no question-answer oracle. If you type "King of the United States", you will get pages that have the terms listed. Maybe there will be semantic manipulations like "King -> Head of state -> President", but generally it's on you to post the correct keywords.
- stravant 7 months ago
  
  One of my favorite successes was getting an LLM to write me a program to graph how I subjectively feel the heat of steam coming off of the noodles I'm pouring the water out from as a function of the ambient temperature.
  I was wondering which effects were at play and the graph matched my subjective experience well.
- Hugsun 7 months ago
  
  I mostly feel sorry for grandpa, he'll receive much less of these questions, if any. This is partially because I expect to become this grandpa and already suspect that some people aren't asking me questions they would be, if they had no access to chatgpt.
- delusional 7 months ago
  
  > And most of those resulted in google searches to verify the information. But I literally could never do this before.
  Could you elaborate on this? What happened before when you had that type of questions? What was stopping you from tamping "911 emergency indian reservation" into google and learning that the "Prairie Band Potawatomi Nation" has their own 911 dispatch?
  In my youth, before the internet was everywhere, we were taught that we could always ask the nearest librarian and that they would help us find some useful information. The information was all there, in books, the challenge was to know which books to read. As I got older, and Google started to become more available, we were taught how to filter out bad information. The challenge shifted from finding information into how not to find misinformation.
  When I hear what you say here, I'm reminded of that shift. There doesn't seem to be any fundamental change there, expect may that it makes it harder not to find misinformation by obscuring the source of the information, which I was taught was an important indicator of its legitimacy.
  
  atonse 7 months ago
  
  The change is that when I am immersed in that scenario (on holiday and without any normal life distractions so I can truly learn about this topic), then my mind is the most curious about that topic.
  The alternative is that when I return from vacation, I get back to the busy life and am only reminded of these questions in casual conversations.
- coliveira 7 months ago
  
  How do you know the AI didn't hallucinate the answers? For topics like these, where there is little information available, the probability of hallucination is very high.
- HPsquared 7 months ago
  
  The amount of value creation is off the scale. It's like when people started using Google, or Google maps.
bloppe 7 months ago

At this point I think even the most bearish have to concede that LLM's are an amazing tool. But OpenAI was never supposed to be about creating tools. They're supposed to create something that can completely take over entire projects for you, not just something that can help you work on the projects faster. If they can't pull that off in the next year or two, they're gonna seriously struggle to raise the next 10B they'll need to keep the lights on.
Of course LLMs aren't going anywhere, but I do not envy Sam Altman right now.
- lumost 7 months ago
  
  At this point it’s quite likely that they could pivot and just be the chatgpt company. I’ve found chatgpt-4o with web search and plugins to be more useful than o1 for most tasks.
  It’s possible we’re nearing the end of the LLM race, but I doubt that’s the end of the AI story this decade, or OpenAI.
  
  bloppe 7 months ago
  
  Ya I think they probably will, but "the chatgpt company" is not worth 157B. It might not even be worth 1B.
  
  lumost 7 months ago
  
  Id be hard pressed to come up with a valuation under 30B based on the publicly known finances. OpenAI is certainly crushing the metrics of other highly valued startups like snowflake and databricks.
  The cash burn and claim of imminent agi is where the valuation trouble could be.
  
  dmix 7 months ago
  
  We’ve barely seen the first wave of companies being built of their APIs too. The billions being put in thousands of startups will take around 5yrs to hit full scale.
  
  josu 7 months ago
  
  It has replaced ~50% of my Google searches.
  
  silisili 7 months ago
  
  Yes but it also hasn't been attacked by ads yet. Google doesn't suck for lack of search results, it sucks because of ads.
  Imagine asking chatgpt to tell you about slopes in Colorado, and the first five answers are about how awesome North Face is and how you can order from them. You probably wouldn't use it as much.
  
  jeffhuys 7 months ago
  
  Local models are GOOD as well, and easy to use (ollama + open web ui). OpenAI has to perform a huge trick in order to stay relevant.
  
  rajamaka 7 months ago
  
  Does ChatGPT need ads, I feel as though people are willing to pay for the service much more than people are willing to pay for a Google search.
  
  lobsterthief 7 months ago
  
  Yes but remember we’re in a tech bubble and the average person still doesn’t know what ChatGPT is.
  
  weatherlite 7 months ago
  
  Not worth 1B ? Come on man. I see them improving the tool enough for most people willing to pay 50$ a month for a subscription. And for most companies to be willing to pay 300$ per employee. It's perhaps not there yet but I'm sure they'll reach this amount of value for their offering. It remains to be seen what competition will do to the prices though.
  
  aprilthird2021 7 months ago
  
  The market of people willing to pay $50 a month for OAI vs $0/month for one of the open source LLAMA variants is not large enough to justify their current valuation, imo
  
  weatherlite 7 months ago
  
  I'm not that familiar with the open source ones - how good are they in comparison?
  
  bloppe 7 months ago
  
  It doesn't really matter how much people are willing to pay. It matters how much margin the market will allow you to charge. OpenAI may be a bit better than most competitors most of the time (IMO they keep getting leap-frogged by Anthropic et al. though), but if your customers can get 90% of the value for 50% less, they will bail. There is no moat. Margins will be razor thin. That's not a 1B+ company.
  
  weatherlite 7 months ago
  
  I think the difference between 90% and 95% is huge. As a coder, if the LLM is wrong 10% of the time that's pretty bad, I can't really trust it. If it's wrong 5% of the time, still not great but much better - I'd pay much more for that kind of reliability improvement.
  
  lizzas 7 months ago
  
  Depends on the competition of course. They need an edge to stop me going to the bald guy and running it there.
- superultra 7 months ago
  
  I keep thinking about that Idris Elba Microsoft ad about AI about how much AI can help my business, and how both true and untrue that as is, and how much distance there is between the now and the possible promised of AI, and I imagine this is what keeps Altman up at night.
- lukan 7 months ago
  
  Tesla is still valued high, despite FSD did not came, despite being promised. So OpenAI would get away with delivering ChatGPT5, if it is better than the competition.
  
  bloppe 7 months ago
  
  Tesla is profitable and they have a big technological moat. OpenAI is in a very competitive industry and they burn ~5B a year.
  
  lukan 7 months ago
  
  I believe the car industry is somewhat competive as well and they needed allmost 10 years to become profitable.
  
  bloppe 7 months ago
  
  Sure, but if you want to compete with Tesla, you need many billions in funding and 10+ years to catch up. If you want to compete with OpenAI, you need maybe half a billion (easy to raise in the current climate; many have done so) and mb a few months to catch up.
IAmGraydon 7 months ago

I've bee thinking the same thing lately. Even if we don't get to AGI, LLMs have revolutionized the way I work. I can produce code and copy at superhuman speeds now. I love it. Honestly, if we never get to AGI and just have the LLMs, it's probably the best possible outcome as I don't think true AGI is going to be a good thing for humanity.
tugu77 7 months ago

Thats all fine, but I think you are missing the bigger picture. It's not about whether what we already got out of this is good. Of course it is. But this is about where it's going.
Until about 120 years ago, people were happy with horses and horse carriages. Such a great help! Travel long distances, pull weights, I never want to go back! But then the automobile was invented and within a few years little travel was done by horses anymore.
More recently, everybody had a landline phone at home. Such great tech! Talk to grandma hundreds of miles away! I never want to go back! Then suddenly the mobile phone and just shortly after the smart phone came along and now nobody has a landline anymore but everybody can record tiktoks anywhere anytime and share them with the world within seconds.
Now imagine "AI". Sure, we have some new tools right now. Sure we don't want to go back. But imagine the transformative effects that could come if the train didn't stop here. Question is just: will it?
throwup238 7 months ago

Amen. Everyone is talking about plateaus and diminishing returns on training but I don’t care one bit. I get that this is a startup focused forum and the financial sustainability of the market players is important but I can’t wait to see what the next decade of UX improvements will be like even if model improvements slow to a crawl.
delusional 7 months ago

As a consumer you should always evaluate the product that is in front of you, not the one they promise in 6 months. If what's there is valuable to you, then that's great.
When we discuss the potential AGI we're not talking as consumers, we're talking about the business side. If AGI is not reached, you'll see an absolutely enormous market correction, as it realizes that the product is not going to replace any human workers.
The current generation of products are not profitable. They're investments towards that AGI dream. If that dream doesn't happen, then the current generation of stuff will disappear too, as it becomes impossible to provide at a cost you'd be comfortable with.
- rajamaka 7 months ago
  
  Human workers workers have already been replaced.
spaceman_2020 7 months ago

This is me. If things never improve and Sonnet 3.6 is the best we have…I’m fine. Its good enough to drastically improve productivity
rkagerer 7 months ago

completely local AI research tool, based on Ollama
Could you elaborate? Was it easy to install?
- SamPatt 7 months ago
  
  Yes I was referring to this:
  https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ol...
- arcanemachiner 7 months ago
  
  Not OP, but yeah, ollama is super easy to install.
  I just installed the Docker version and created a little wrapper script which starts and stops the container. Installing different models is trivial.
  I think I already had CUDA set up, not sure if that made a difference. But it's quick and easy. Set it up, fuck around for an hour or so while you get things working, then you've got your own local LLM you can spin up whenever you want.
  
  foobiekr 7 months ago
  
  Does ollama still execute whatever arbitrary python code is in the model?
ferminaut 7 months ago

vscode + cline extension + gemini2.0 is pretty awesome. Highly recommend checking out cline. it quickly became one of my favorite coding tools.
- IAmGraydon 7 months ago
  
  Gemini 2.0 isn't particularly great at coding. The Gemini 1206 preview that was released just before 2.0 is quite good, though. Still, it hasn't taken the crown from Claude 3.5 Sonnet (which appears to now be tied with o1). Very much agree about Cline + VSCode, BTW. My preferred models with Cline are 3.5 Sonnet and 3.5 Haiku. I can throw the more complex problems at Sonnet and use Haiku for everything else.
  https://aider.chat/docs/leaderboards/edit.html
  
  jstummbillig 7 months ago
  
  In the wake of the o1 release, and with the old aider benchmark saturating, Paul from aider has created a new, much harder benchmark. o1 dominates by a substantial margin.
  https://aider.chat/docs/leaderboards/ https://aider.chat/2024/12/21/polyglot.html
  
  ferminaut 7 months ago
  
  the context limits on google are nuts! Being able to pump 2 million tokens in and having it cost $0 is pretty crazy rn. Cline makes it seamless to switch between APIs and isnt trying to shoehorn their SAAS AI into a custom vscode (looking at you cursor)
  
  ramesh31 7 months ago
  
  >the context limits on google are nuts! Being able to pump 2 million tokens in and having it cost $0 is pretty crazy rn.
  What's the catch though? I was looking at Gemini recently and it seemed too good to be true.
  
  HyprMusic 7 months ago
  
  Your code becomes training data[0]:
  > When you use Unpaid Services, including, for example, Google AI Studio and the unpaid quota on Gemini API, Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies, including Google's enterprise features, products, and services, consistent with our Privacy Policy.
  [0] https://ai.google.dev/gemini-api/terms
  
  Jensson 7 months ago
  
  Google inference is a lot cheaper since they have their own hardware so they don't have to pay licensing to NVIDIA, thus their free tier can give you much more than others.
  Other than that the catch is like all other free tiers, it is marketing and can be withdrawn at any moment to get you to pay after you are used to their product.
- SamPatt 7 months ago
  
  I will check it out. The number of new tools is staggering.
  I enjoy image and video generation and I have a 4090 and ComfyUI; I can't keep up with everything coming out anymore.
  
  IAmGraydon 7 months ago
  
  If you're interested in the latest tools for coding, join this subreddit and you'll always be on top of it:
  https://www.reddit.com/r/ChatGPTCoding/
  There are a lot of tools, but only a small pool of tools that are worth checking out. Cline, Continue, Windsurf, CoPilot, Cursor, and Aider are the ones that come to mind.
  
  deadmutex 7 months ago
  
  "ChatGPT" Coding... is it impartial? the name sorta sounds biased.
  
  IAmGraydon 7 months ago
  
  ChatGPT was the first to come along, so the subreddit was given a perhaps short-sighted name. It's now about coding with LLMs in general.
  
  ferminaut 7 months ago
  
  If you're a offline kind of guy, try LM Studio + Cline :)
  /not affiliated with cline, just a happy user
lazygoose 7 months ago

Curious about the AI research tool you mentioned, would you mind sharing it? Been trying to get a good local research setup with Ollama but still figuring out what works best.
- SamPatt 7 months ago
  
  https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ol...
- Bilal_io 7 months ago
  
  Not OP, but based on their mention of Ollama, I can tell you that it has built in search tools, all you need to do is supply an API to one of the tools, or even run one of the search tools locally using docker.
tiffanyh 7 months ago

I have the opposite reaction.
AI right now feels like that MBA person at work.
They don’t know anything.
But because they sound like they are speaking with authority & confidence, allows them to get promoted at work.
(While all of the experts at work roll their eyes because they know the MBA/AI is just spitting out nonsense & wish the company never had any MBA/AI people)
- zppln 7 months ago
  
  And the MBA person (at my company this is everyone in middle management) is also the person who go around and suggest we shoehorn AI into everything...
uludag 7 months ago

I'm pretty sure the plan has never been to just make these tools that make us more efficient. If AI stays at the level it's at, it would be a profound failure for companies like OpenAI. We're all benefiting from the capital being poured into these technologies now. The enshittification will come. The enshittification always comes.
- mitemte 7 months ago
  
  I’m paying $240 a year to Anthropic that I wasn’t paying before and it’s worth it. While I don’t use Claude every single day, but I use it several times a day when I’m working. More times than the free tier allows.
  
  aprilthird2021 7 months ago
  
  Why do people say this like it's a refutation? Current valuation and investments were not based on getting a very small group of nerds (affectionately) on HN to pay $250/yr which probably doesn't cover even inference costs for the models let alone training and R&D
htrp 7 months ago

> Just today I used a completely local AI research tool, based on Ollama. It worked great.
Is it on github?
BOOSTERHIDROGEN 7 months ago

Can you walk me through the steps you've taken to set up the Ollama-based tool so far?
acchow 7 months ago

Cline was fixing my type errors and unit tests while I was doing my V60 pourover.
hamilyon2 7 months ago

If the progress in capabilities stall, the product fit, adoption, ease of use are the next battlefield.
OpenAI may be first to realize and switch, so they still have a chance to recoup some of those billions
alickz 7 months ago

i feel like AGI is an arbitrary line in the sand anyway
i think as humans we put too much emphasis on what intelligence means relative to ourselves, instead of relative to nature
nico 7 months ago

> Just today I used a completely local AI research tool, based on Ollama. It worked great
What’s it called? Could you post a link please?
Thank you
- SamPatt 7 months ago
  
  Here you go
  https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ol...
ryukoposting 7 months ago

At this point, most conceivable beneficial use cases for LLMs have been covered. If the economics of AI tech were aligned with making a good product that people want and/or need, we'd basically take everything we have at this point and make it lighter, smaller, and faster. I doubt that's what will happen.
ninetyninenine 7 months ago

The definition of agi is a linguistic problem but people confuse it for a philosophical problem. Think about it. The term is basically just a classification and what features and qualities fit the classification is an arbitrary and linguistic choice.
The debate stems from a delusion and failure to realize that people are simply picking and choosing different fringe features on what qualifies as agi. Additionally the term exists in a fuzzy state inside our minds as well. It’s not that the concept is profound. It’s that some of the features that define the classification of the term we aren’t sure about. But this doesn’t matter because we are basically just unsure about the definition of a term that we completely made up arbitrarily.
For example the definition of consciousness seems like a profound debate but it’s not. The word consciousness is a human invention and the definition is vague because we choose the definition to be ill defined, vague and controversial.
Much of the debate on this stuff is purely as I stated just a language issue.
- zifpanachr23 7 months ago
  
  If it's genuinely what you say, then how is what is going on not slavery?
  I don't believe AGI is possible but if it was and it was as subjective as you say what is and isn't conscious, then it starts to take on an even more altogether evil character.
  Akin to cloning slave humans or something for free cheap labor.
  
  ninetyninenine 7 months ago
  
  how does a linguistic and language issue relate to slavery. It's the definition of a word. That's all.
  Slavery is also a word. Don’t you find it strange that your entire moral framework is constructed on top of the basis of arbitrary definitions of vocabulary? Make what you think is right or wrong based not off of language. Language is a delusion that masquerades as something with actual meaning when it is just an invention, a tool, to facilitate communication.
  Right now your concept of right and wrong is a vocabulary issue. Does this make sense? No.
ByteAndBattle 7 months ago

Local search with Ollama? Please share!
badgersnake 7 months ago

They’re garbage, they will always be garbage. Changing a 4 to a 5 will not make it not garbage.
The whole sector is a hype bubble artificially inflating stock prices.
- Elextric 7 months ago
  
  https://www.youtube.com/watch?v=lFc1jxLHhyM
  
  badgersnake 7 months ago
  
  If that’s supposed to be impressive, it really isn’t.
michaelbuckbee 7 months ago

What was the AI search tool?
fud101 7 months ago

how do you interact with perplexity? mobile app?
jayseattle 7 months ago

[dead]
divan 7 months ago

Let's revisit this comment in one year – after the explosion of agentic systems. (:
- isoprophlex 7 months ago
  
  You mean, the explosion of human centipede LLM prompts shitting into eachother?
  Yes that will be a sight to behold.
- wokwokwok 7 months ago
  
  We already have agentic systems; they're not particularly impressive (1).
  There's no specific reason to expect them to get better.
  Things that will shift the status quo are: MCST-LLMs (like with ARC-AGI) and Much Bigger LLMs (like GPT-5, if they ever turn up) or some completely novel architecture.
  [1] - It's provable; if just chaining LLMs are a particular size into agentic systems could scale indefinitely, then you could use a 1-param LLM and get AGI. You can't. QED. Chaining LLMs with agentic systems has a capped maximum level of function which we basically already see with the current LLMs.
  ie. Adding 'agentic' to your system has a finite, probably already reached, upper bound of value.
  
  NitpickLawyer 7 months ago
  
  > It's provable; if just chaining LLMs are a particular size into agentic systems could scale indefinitely, then you could use a 1-param LLM and get AGI. You can't. QED.
  Perhaps I missunderstand your reply, but that has not been my experience at all.
  There are 3 types of "agentic" behaviour that has worked for a while for me, and I don't know how else it would work without "agents":
  1. Task decomposition - this was my manual flow since pre-chatgpt models: a) provide an overview of topic x with chapter names; b) expand on chapter 1 ... n ; c) make a summary of each chapter; d) make an introduction based on the summaries. I now have an "agent" that does that w/ minimal scripting and no "libraries". Just pure python control loop.
  This gets me pretty reasonable documents for my daily needs.
  2. tool use (search, db queries, API hits). I don't know how you'd use an LLM without this functionality. And chaining them into flows absolutely works.
  3. coding. I use the following "flow" -> input a paragraph or 2 about what I want, send that + some embedding-based context from the codebase to an LLM (3.5 or 4o, recently o1 or gemini) -> get code -> run code -> /terminal if error -> paste results -> re-iterate if needed. This flow really works today, especially with 3.5. In my testing it needs somewhere under 3 "iterations" to "get" what's needed in more than 80% of the cases. I intervene in the rest of 20%.
  
  danielbln 7 months ago
  
  A zed user? Live that editor and the dev flow with it.
  
  NitpickLawyer 7 months ago
  
  Haha, yes! I'm trying it out and been loving it so far. I found that I go there for most of my eda scripts these days. I do a lot of datasets collection and exploration, and it's amazing that I can now type one paragraph and get pretty much what it would have taken me ~30 min to code myself. Claude 3.5 is great for most exploration tasks, and the flow of "this doesn't work /terminal" + claude using prints to debug is really starting to come together.
  I use zed for this, cursor for my more involved sessions and aider + vscode + continue for local stuff when I want to see how far along local models have come. Haven't tried cline yet, but heard great stuff.
  
  wokwokwok 7 months ago
  
  I didn’t say they don’t work, I said there is an upper bound on the function they provide.
  If a discrete system can be composed of multiple LLMs the upper bound on the function they provide is by the function of the LLM, not the number of agents.
  Ie. We have agentic systems.
  Saying “wait till you see those agentic systems!” is like saying “wait til you see those c++ programs!”
  Yes. I see them. Mmm. Ok. I don’t think I’m going to be surprised by seeing them doing exactly the same things in a year.
  The impressive part in a year will the non agentic part of things.
  Ie. Explicitly; if the underlying LLMs dont get any better, there is no reason to expect the system built out of them to get any better.
  If that was untrue, you would expect to be able to build agentic systems out of much smaller LLMs, but that overwhelmingly doesn’t work.
  
  bubaumba 7 months ago
  
  > if the underlying LLMs dont get any better, there is no reason to expect the system built out of them to get any better.
  Actually o1, o3 are doing exactly this, and very well. I.e. explicitly: by proper orchestration the same LLM can do much better job. There is a price, but...
  > you would expect to be able to build agentic systems out of much smaller LLMs
  Good point, it should be possible to do it on a high-end pc or even embedded.
  
  wokwokwok 7 months ago
  
  > but that overwhelmingly doesn’t work.
  MCTS will be the next big “thing”; not agents.
  
  bubaumba 7 months ago
  
  They are not mutually exclusive. Likely we'll get more clear separation of architecture and underlying technology. In this case agents (i.e. architecture) can use different technologies or mix of them. Including 'AI' and algorithms. The trick is to make them work together.

jaybna 7 months ago

25% of the top 1000 websites are blocking OpenAI from crawling: https://originality.ai/ai-bot-blocking

I am betting hundreds of thousands, rising to millions more little sites, will start blocking/gating this year. AI companies might license from big sources (you can see the blocking percentage went down), but they will be missing the long tail, where a lot of great novel training data lives. And then the big sites will realize the money they got was trivial as agents start to crush their businesses.

Bill Gross correctly calls this phase of AI shoplifting. I call it the Napster-of-Everything (because I am old). I am also betting that the courts won't buy the "fair use" interpretation of scraping, given the revenues AI companies generate. That means a potential stalling of new models until some mechanism is worked out to pay knowledge creators. (And maybe nothing we know of now will work for media: https://om.co/2024/12/21/dark-musings-on-media-ai/)

Oh, and yes, I love generative AI and would be willing to pay 100x to access it...

P.S. Hope is not a strategy, but hoping something like ProRata.ai and/or TollBits can help make this self-sustainable for everyone in the chain

jpablo 7 months ago

They aren't blocking anything. They are just asking nicely not to be crawled. Given that AI companies haven't cared a single bit about ripping of other's peoples data I don't see why they would care now.
- wing-_-nuts 7 months ago
  
  A number of sites have started outright blocking any traffic that looks remotely suspicious. This has made browsing with a vpn a bit of a pain.
  
  pixl97 7 months ago
  
  This has been ever increasing for years now. Bots, attacks, scrapers, AI, all these things seem to be the majority of traffic on most sites.
  
  superluserdo 7 months ago
  
  I wish I could go back to the days of doing almost anything at all without having to tell a server what a motorbike or traffic light is.
  
  wing-_-nuts 7 months ago
  
  LPT: switch to the audio captcha. Yes, it takes a bit longer than if you did one grid captcha perfectly, but I never have to sit there and wonder if a square really has a crosswalk or not, and I never wind up doing more than one.
- EVa5I7bHFq9mnYK 7 months ago
  
  In their attempt to block OpenAI, they block me. Many sites that were accessible just 2 years ago, require login/captchas/rectal exam now just to read the content.
  
  ammanley 7 months ago
  
  Im looking forward to the life experience that is content I want to read badly enough to endure a rectal exam.
  
  EVa5I7bHFq9mnYK 7 months ago
  
  It's not that bad ...
  
  fennecbutt 7 months ago
  
  Not sure why you're being downvoted. Watching str8 bois react with shock and horror at the idea of anything near their butt is hilarious.
  Prostate and rectal cancer is real, boys. Grow tf up about it.
  
  josu 7 months ago
  
  > captchas
  I suspect that AIs are already more effective than humans at passing captchas.
  
  EVa5I7bHFq9mnYK 7 months ago
  
  That would be an example of AI providing real value that I would pay for.
  
  heavyset_go 7 months ago
  
  These exist for a fee if you want to use them
  
  EVa5I7bHFq9mnYK 7 months ago
  
  I used 2captcha, for a fee ... it doesn't work
- kjkjadksj 7 months ago
  
  They block plenty and they do it crudely. I get suspicious traffic bans from reddit all the time. Trivial enough to route around by switching user agent however. Which goes to show any crawling bot writer worth their salt already routes around reddit and most other sites bs by now. I’m just the one getting the occasional headache because I use firefox and block ads and site tracking I guess.
- njovin 7 months ago
  
  Wouldn't it be somewhat trivial to set up honeypots?
- jaybna 7 months ago
  
  Yeah, probably right. If you want a great rabbit hole, look up "Common Crawl" and see how a great academic project was absolutely hijacked for pennies on the dollar to grab training data - the foundation for every LLM out there right now.
  
  CamperBob2 7 months ago
  
  It's hard to envision a greater success for the "great academic project" than what happened. I mean, what else were they trying to accomplish?
  
  jaybna 7 months ago
  
  It was meant to be an open-source compilation of the crawled internet so that research could be done on web search given how opaque Google's process is. It was NOT meant to be a cheap source of data for for-profit LLM's to train on.
  *edit: added "for-profit"
  
  CamperBob2 7 months ago
  
  (Shrug) Multiple not-for-profit LLMs have trained on it as well.
  If something I worked on turned out to play a significant part in something that turned out to be that big a deal, I'd be OK with it. And nobody's stopping people from doing web-search studies with it, to this day.
cshores 7 months ago

It ultimately doesn't matter because a fairly current snapshot of all of the world's information is already housed in their data lakes. The next stage for AI training is to generate synthetic data either by other AI or by simulations to further train on as human generated content can only go so far.
- pphysch 7 months ago
  
  How is synthetic data supposed to work? Broadly speaking, ML is about extracting signal from noisy data and learning the subtle patterns.
  If there is untapped signal in existing datasets, then learning processes should be improved. It does not follow that there should be a separate economic step where someone produces "synthetic data" from the real data, and then we treat the fake data as real data. From a scientific perspective, that last part sounds really bad.
  Creating derivative data from real data sounds, for the purpose of machine learning, like a scam by the data broker industry. What is the theory behind it, if not fleecing unsophisticated "AI" companies? Is it just myopia, Goodhart's Law applied to LLM scaling curves? Some MBA took the "data is the new oil" comment a little too seriously and inferred that data is as fungible as refined petroleum?
  
  joshribakoff 7 months ago
  
  I tried to train an AI to guess the weight and reps from my exercise log but it would produce nonsense results for rep ranges I didn’t have enough training data for, as if it didn’t understand that more weight means less reps. I used synthetic training data and interpolated and imputed data for rep ranges I didn’t have data for using estimation formulas, the network then predicted better, but it also made me realize i basically made the model learn the prediction formula and AI was not actually needed and im better off using the prediction formula. But it also illustrates that the model can learn from a calculation or estimation the same way it learns from the real world, without necessarily needing to train exclusively in the real world. An ai car driving in a simulation may actually learn some of the formulas that apply both in the simulation and in the real world. The same simulations and synthetic data can also be just as useful for validation not just training. It’s not hard to imagine scenarios that are impractical, illegal or unethical to test in real life. Also, as AI becomes more advanced, synthetic data can be useful for generating superhuman examples. It’s not hard to imagine you could improve upon data from a human driver by synthetically altering it to be even safer.
  
  pphysch 7 months ago
  
  Thanks, I now can see synthetic data being used to patch up holes and deal with ethical issues.
  I still don't see how it could address the volume problem, like needing 10x or 100x of current data to train GPT5.
  
  cshores 7 months ago
  
  As others have mentioned, Tesla is already implementing similar advancements. More broadly, a new AI framework called Genesis has emerged, capable of training robots in just minutes using purely synthetic data. It generates a virtual environment for the robot to "perceive" and train within, even though this environment doesn't physically exist. This is just one example. Another could involve an AI specifically trained to diagnose illnesses based on genetic information in DNA. The insights gained from this virtual scientist could then cross-pollinate with other AIs, enhancing their training and capabilities as well.
  
  Nevermark 7 months ago
  
  Competition between AI’s to solve problems better or faster than each other, but learning from each other, is another way to start with simple problems and naturally bootstrap increasing difficulty.
  
  elfly 7 months ago
  
  Synthetic data works as long as it is directed towards a clear objective and curated.
  At one point someone generated a Python teaching book from a LLM, took that, trained a second LLM with that, and the new LLM knew Python.
  If you are just dragging random content from the web and you don't know what's synthetic and what's human, that data may be contaminated and a lot less useful, but if someone wanted to whitewash their training data by replacing a part of it with synthetic data, it can be done.
  
  RationPhantoms 7 months ago
  
  Would you trust a ML self-driving algorithm trained on a "digital twin" of a city? I would. I view synthetic training data like a digital twin in which it can provider further control or specified noise to understand from.
  
  scottLobster 7 months ago
  
  No, because right now I'm working closely with some EEs to troubleshoot electrical issues on some prototype boards (I wrote the firmware). They're prototypes precisely because we know the limits of our models and simulations and need real world boards to test our electronics design and firmware on.
  You're suggesting the new, untested models in a new, untested technological field are sufficient for deployment in real world applications even with a lack of real world data to supplement them. That's magical thinking given what we've experienced in every other field of engineering (and finance for that matter).
  Why is AI/ML any different? Because highly anthropomorphized words like "learning" and "intelligence" are in the name? These models are some of the most complex machines humanity has ever produced. Replace "learning" and "intelligence" with "calibrated probability calculators". Then detail the sheer complexity of the calibrations needed, and tell me with a straight face that simulations are good enough.
  
  Nevermark 7 months ago
  
  Both are likely to be much better.
  Simulations may not be good enough alone, but still provide a significant boost.
  Simulations can cheaply include scenarios that would be costly or dangerous to actually perform in the real world. And cover many combinations of scenario factors to improve combinatorial coverage.
  Another way is to separate models into highly real world dependent (sensory interpretation) and more independent (kinematics based on sensory interpretation) parts. The latter being more suited to training in simulation. Obviously full real world testing is still necessary to validate the results.
  
  fennecbutt 7 months ago
  
  Hey, let's shut down humanity because human behaviour can't be perfectly simulated.
  
  kjkjadksj 7 months ago
  
  What makes you assume your digital twin is actually capturing the factors that contribute to variation in the real data? This is a big issue in simulation design but for ml researchers its hand-waved off seemingly.
  
  fragmede 7 months ago
  
  Probably due to reports like these where the digital twin is credited with gains in factory efficiency.
  https://www.forbes.com/sites/carolynschwaar/2024/12/09/schae...
  
  joshribakoff 7 months ago
  
  It either improves the results or it does not, i don’t think i see the problem.
  
  Corrado 7 months ago
  
  Isn’t this what Tesla does for their driving data? However it would fall apart if they didn’t have real world days to feed into it, right?
  
  heavyset_go 7 months ago
  
  > Would you trust a ML self-driving algorithm trained on a "digital twin" of a city? I would.
  No, just as I wouldn't trust a surgeon who studied medicine by playing Operation. A gross approximation is not a substitute for real life.
  
  fragmede 7 months ago
  
  Hope you don't need surgery then! Suture training kits like these are quite popular for surgeons to train on. https://a.co/d/3cAotZ0 I don't know about you, but I'm not a rubbery rectangular slab of plastic, so obviously this kit can't help them learn.
  
  heavyset_go 7 months ago
  
  This is a reason I opted to have a plastic surgeon come in when I went to the ER with an injury.
  I could've had the nurse close me up and leave me with a scar, which she admitted would happen with her practice, or I could have someone with extensive experience treating wounds so that they'd heal in cosmetically appealing way do it. I opted for the latter.
  
  scottLobster 7 months ago
  
  The difference being that you have to do a little more than that to become a board-certified surgeon. If a VC gives you a billion dollars to buy and practice on every available surgery practice kit in the world, you will still fail to become a surgeon. And we enforce such standards because if we don't then people die needlessly.
  
  Nevermark 7 months ago
  
  How a model learns doesn’t really matter. What works works.
  How it is tested and validated is what matters.
  There are lots of ways to train on synthetic data, and synthetic data can have advantages as well as disadvantages over natural data.
  Creative use of synthetic data is going to lead to many cases where we find it is good enough. Or even better than natural data.
  
  joshribakoff 7 months ago
  
  What about a doctor who used a mix of training both on live patients as well as cadavers and models?
  
  heavyset_go 7 months ago
  
  Is this doctor able to learn new information and work through novel problems on the fly, or will their actions always be based on the studying they did in the past on old information?
  Similarly, when this doctor sees something new, will they just write it off as something they've seen before and confidently work from that assumption?
  
  phyalow 7 months ago
  
  Um, augmentation (i.e. the generation of synthetic data) is a very very well known technique for improving learning.
  Also whats with the hate for MBA’s?
  Your comment is off kilter with the rules here.
  
  pphysch 7 months ago
  
  Synthetic data is being proposed here as a solution to extrapolate ML scaling.
  Augmentation, interpolation, smoothing are different concepts.
  
  phyalow 7 months ago
  
  I think you're drawing an artificial distinction here. Synthetic data generation is fundamentally an extension of augmentation. When OpenAI uses expert generated examples and curriculum based approaches, that's literally textbook augmentation methodology. The goal of augmentation has always been to improve model fit, and scaling is just one aspect of that.
  Your concern about extrapolation is interesting but misses something key when we generate synthetic data through expert demonstration or guided curriculum, we're not trying to magically create capabilities beyond the training distribution. Instead, we're trying to better sample the actual distribution of problemsolving approaches humans use. This isn't extrapolation rather, better sampling of an existing, complex distribution!
  i.e. if you think about the manifold hypothesis then we know real data lives on a lowerdimensional manifold, and good synthetic data helps fill those gaps. This naturally leads to better extrapolation, it's pretty well established at this point.
  TBH I think you are characterizing this as some kind of blind data multiplication scheme, but it's much closer to curriculum learning you start with basic synthetic examples and gradually ramp up complexity. So it isn't whether synthetic data is "real" or not, but if it effectively helps map the underlying distribution and reasoning patterns.
  Funny enough, your oil analogy actually supports the case for synthetic data refined petroleum is more useful than crude for specific purposes, just like well designed synthetic data can be more effective than raw internet text for certain learning objectives.
- jaybna 7 months ago
  
  https://www.nature.com/articles/s41586-024-07566-y
  
  cshores 7 months ago
  
  I understand the concept of AI model collapse caused by recursion. What I’m proposing goes beyond a basic feedback loop, like repeatedly running Stable Diffusion. Instead, I envision an AI system with specialized expertise, akin to a scientist making a breakthrough based on inputs from a researcher—or even autonomously. This specialized AI could then train other, less specialized models in its area of expertise. For example, it might generate a discovery that is as straightforward as producing a white paper for interpretation. If there is an virtual "scientist" that is trained on DNA for instance, could come up with a discovery for a treatment. This gets published, circulated and trained in to other models. This isn't the kind of inbreeding that you suggest as the answer is valid.
aftbit 7 months ago

IMO this is an underappreciated advantage for Google. Nobody wants to block the GoogleBot, so they can continue to scrape for AI data long after AI-specific companies get blocked.
Gemini is currently embarrassingly bad given it came from the shop that:
1. invented the Transformer architecture
2. has (one of) the largest compute clusters on the planet
3. can scrape every website thanks to a long-standing whitelist
- Art9681 7 months ago
  
  The new Gemini Experimental models are the best general purpose models out right now. I have been comparing with o1 Pro and I prefer Gemini Experimental 1206 due to its context, speed, and accuracy. Google came out with a lot of new stuff last week if you havent been following. They seem to have the best models across the board, including image and video.
  
  HaZeust 7 months ago
  
  Omnimodal and code/writing output still has a ways to go for Gemini - I have been following and their benchmarks are not impressive compared to the competition, let alone my anecdotal experience in using Claude for coding, GPT for spec-writing, and Gemini for... Occasional cautious optimism to see if it can replace either.
- kibwen 7 months ago
  
  > Nobody wants to block the GoogleBot
  This only remains true as long as website operators think that Google Search is useful as a driver of traffic. In tech circles Google Search is already considered a flaming dumpster heap, so let's take bets on when that sentiment percolates out into the mainstream.
  
  dageshi 7 months ago
  
  If it reaches the point where google is no longer a useful driver of traffic then there's probably little point in having a website at all any more.
  
  5h 7 months ago
  
  Strange take ... I seem to remember websites having a lot of point before google.
  
  dageshi 7 months ago
  
  They had a point back then because no alternatives existed.
  How many websites back then would be youtube channels, podcasts or social media accounts if they had existed back then?
  Nowadays most sites survive via traffic from google, if it goes away then most of those sites go away as well.
  
  pixl97 7 months ago
  
  They had a lot or point because....
  1. They were a major site that was an initial starting point for traffic
  2. Search engines pointed to them and people could locate them.
  ---
  That was all a long time ago. Now people tend to go to a few 'all in one sites'. Google, reddit, '$big social media'. Other than Google most of those places optimize you to stay on that particular site rather than go to other people's content. The 'web' was a web of interconnectedness. Now it's more like a singularity. Once you pass the event horizon of their domain you can never escape again.
- jameslk 7 months ago
  
  For OpenAI, they could lean on their relationship with Microsoft for Bing crawler access
  Websites won’t be blocking the search engine crawlers until they stop sending back traffic, even if they’re sending back less and less traffic
- tartuffe78 7 months ago
  
  Wonder if OpenAI is considering building a search engine for this reason... Imagine if we get a functional search engine again from some company just trying to feeding their model generation...
- thiagowfx 7 months ago
  
  There are two to distinguish: "Googlebot" and "Google-Extended".
  
  lxgr 7 months ago
  
  That seems to be more like a courtesy that Google could stop extending at any point than a requirement grounded in law or legal precedent.
  
  Palmik 7 months ago
  
  Same goes for OpenAI ignoring these "blocks".
heavyset_go 7 months ago

> I am betting hundreds of thousands, rising to millions more little sites, will start blocking/gating this year. AI companies might license from big sources (you can see the blocking percentage went down), but they will be missing the long tail, where a lot of great novel training data lives.
This is where I'm at. I write content when I run into problems that I don't see solved anywhere else, so my sites host novel content and niche solutions to problems that don't exist elsewhere, and if they do, they are cited as sources in other publications, or are outright plagiarized.
Right now, LLMs can't answer questions that my content addresses.
If it ever gets to the point where LLMs are sufficiently trained on my data, I'm done writing and publishing content online for good.
- zifpanachr23 7 months ago
  
  I don't think it is at all selfish to want to get some credit for going to the trouble of publishing novel content and not have it all stolen via an AI scraping your site. I'm totally on your side and I think people that don't see this as a problem are massively out of touch.
  I work in a pretty niche field and feel the same way. I don't mind sharing my writing with individuals (even if they don't directly cite me) because then they see my name and know who came up with it, so I still get some credit. You could call this "clout farming" or something derogatory, but this is how a lot of experts genuinely get work...by being known as "the <something> guy who gave us that great tip on a blog once".
  With AI snooping around, I feel like becoming one of those old mathematicians that would hold back publicizing new results to keep them all for themselves. That doesn't seem selfish to me, humans have a right to protect ourselves and survive and maintain the value of our expertise when OpenAI isn't offering any money.
  I honestly think we should just be done with writing content online now, before it's too late. I've thought a lot about it lately and I'm leaning more towards that option.
  
  heavyset_go 7 months ago
  
  Agree with your assessment. I enjoy the little networks of people that develop as others use and share content. I enjoy the personal messages of thanks, the insights that are shared with me and seeing how my work influences others and the work they do. It's really cool to learn that something I made is the jumping off point for something bigger than I ever foresaw. Hell, just being reached out to help out or answer questions is... nice? I guess.
  It's the little bits of humanity that I enjoy, and divorcing content from its creators is alienating in that way.
  I'm not a musician, but I imagine there are similar motivations and appreciations artists have when sharing their work.
  > I work in a pretty niche field and feel the same way. I don't mind sharing my writing with individuals (even if they don't directly cite me) because then they see my name and know who came up with it, so I still get some credit. You could call this "clout farming" or something derogatory, but this is how a lot of experts genuinely get work...by being known as "the <something> guy who gave us that great tip on a blog once".
  Yup, my writing has netted me clients who pointed at my sites as being a deciding factor in working with me.
  > I honestly think we should just be done with writing content online now, before it's too late. I've thought a lot about it lately and I'm leaning more towards that option.
  The rational side of me agrees with you, and has for a while now, but the human side of me still wants to write.
glenstein 7 months ago

>Bill Gross correctly calls this phase of AI shoplifting. I call it the Napster-of-Everything (because I am old). I am also betting that the courts won't buy the "fair use" interpretation of scraping, given the revenues AI companies generate. That means a potential stalling of new models until some mechanism is worked out to pay knowledge creators.
To your point, I have wondered whatever became of that massive initiative from Google to scan books, and whether that might be looked at as a potential training source, giving that Google has run into legal limitations on other forms of usage.
- ben_w 7 months ago
  
  > To your point, I have wondered whatever became of that massive initiative from Google to scan books, and whether that might be looked at as a potential training source, giving that Google has run into legal limitations on other forms of usage.
  Still around, doing fine: https://en.wikipedia.org/wiki/Google_Books and https://books.google.com/intl/en/googlebooks/about/index.htm...
  Given the timing, I suspect it was started as simple indexing, in keeping with the mission statement "Organize the world's information and make it universally accessible and useful".
  There was also reCAPTCHA v1 (books) and v2 (street view), which each improved OCR AI until the state of the art AI were able to defeat them in the role of CAPTCHA systems.
  
  glenstein 7 months ago
  
  I don't know what you mean by timing (relative to what?) or "simple indexing" (they scanned the complete contents of books), but I am, and was already aware, of the wiki article and the role of recaptcha.
  Maybe I wasn't clear, but I was interested in the consequences of the legal stuff. It's not clear from the wiki article what any of this means with respect to the suitability of scans for AI training.
  
  ben_w 7 months ago
  
  > I don't know what you mean by timing (relative to what?) or "simple indexing" (they scanned the complete contents of books), but I am, and was already aware, of the wiki article and the role of recaptcha.
  Timing as in: it started in 2004, when the most advanced AI most people used was a spam filter, so it wasn't seen as a training issue (in the way that LLMs are) *at the time*.
  As for training rights, I agree with you, there's no clarity for how such data could be used *today* by the people who have it. Especially as the arguments in favour of LLM training are often by comparison to search engine indexing.
  
  fragmede 7 months ago
  
  Until such time as a lawsuit declares otherwise, Google's position is obviously that scanning books, OCRing them, saving that text in a database, and using that to allow searching is no different, legally, than scanning books, OCRing them, saving that text in to a database, and using that to train LLMs. Book publishers already went up against Google for the practice of scanning in the first place, we'll see if they try again with LLM training.
- pncnmnp 7 months ago
  
  > I have wondered whatever became of that massive initiative from Google to scan books, and whether that might be looked at as a potential training source, giving that Google has run into legal limitations on other forms of usage.
  A few months ago, there was an interesting submission on HN about this - The Tragedy of Google Books (2017) (https://news.ycombinator.com/item?id=41917016).
Kostchei 7 months ago

Using the real world- as in vision, 3d orientation, physical sensors and building training regimes that augment the language models to be multidimensional and check that perception, that is the next step.
And there is very little shortage of data and experience in the actual world, as opposed to just the text internet. Can the current AI companies pivot to that? Or do you need to be worldlabs, or v2 of worldlabs?
- shanusmagnus 7 months ago
  
  Ironically, if it plays out this way, it will be the biggest boon to actual AGI development there could be -- the intelligence via text tokenization will be a limiting factor otherwise, imo.
- Tossrock 7 months ago
  
  Some can. Google owns Waymo and runs Streetview, they're collecting massive amounts of spatial data all the time. It would be harder for the MS/OpenAI centaur.
code51 7 months ago

With current state of legal, a real challenge can happen only around 10 years from now. By then AI players will gather immense power over the law.
lxgr 7 months ago

If you're willing to believe the narrative that there's some sort of existential "race to AGI" going on at the moment (I'm ambivalent myself, but my opinion doesn't really matter; if enough people believe it to be true, it becomes true), I don't think that'll realistically stop anyone.
Not sure how exactly the Library of Congress is structured, but the equivalent in several countries can request a free copy of everything published.
Extending that to the web (if it's not already legally, if not practically, the case) and then allowing US companies to crawl the resulting dataset as a matter of national security, seems like a step I could see within the next few years.
zifpanachr23 7 months ago

I agree with you about the fair use argument. Seems like it doesn't meet a lot of the criteria for fair use based on my lay understanding of how those factors are generally applied.
See https://fairuse.stanford.edu/overview/fair-use/four-factors/
I think in particular it fails the "Amount and substantiality of the portion taken" and "Effect of the use on the potential market" extremely egregiously.
cedws 7 months ago

Cloudflare has a toggle for blocking AI scrapers. I don’t think it’s default, but it’s there.
- kyledrake 7 months ago
  
  This just feels like mystery meat to me. My guess is that a lot of legitimate users and VPNs are being blocked from viewing sites, which numerous users in this discussion have confirmed.
  This seems like a very bad way to approach this, and ironically their model quite possible also uses some sort of machine learning to work.
  A few web hosting platforms are using the cloudflare blocker and I think it's incredibly unethical. They're inevitably blocking millions of legitimate users from viewing content on other people's sites and then pretending it's "anti AI". To paraphrase Theo Deraadt, they saw something on the shelf, and it has all sorts of pretty colours, and they bought it.
  
  pixl97 7 months ago
  
  > I think it's incredibly unethical.
  The internet isn't built on ethical behavior, unfortunately.
  
  kyledrake 7 months ago
  
  I get that a lot of people are opposed to AI, but blocking random IP ranges seems like a really inappropriate way to do this, the friendly fire is going to be massive. The robots.txt approach is fine, but it would be nice if it could get standardized so that you don't have to change it a lot based on new companies (like a generic no llm crawling directive for example).
- input_sh 7 months ago
  
  It's not much smarter than just adding user agents to robots.txt manually.
- jaybna 7 months ago
  
  They might get into the micro-licensing game too. More power to them.
1vuio0pswjnm7 7 months ago

Bill Gross:
https://twitter.com/Bill_Gross/status/1859999138836025808
https://pdl-iphone-cnbc-com.akamaized.net/VCPS/Y2024/M11D20/...
He appears to be criticising "AI" only to solicit support for his own company.
jasondigitized 7 months ago

The amount of content coming off of YouTube every minute puts Google in a very enviable position.
vidarh 7 months ago

All the big players are pouring a fortune into manually curated and created training data.
As it stands, OpenAI has a market cap large enough to buy a major international media conglomerate or two. They'll get data no matter how blocked they get.
Workaccount2 7 months ago

Doing basic copyright analyses on model outputs is all that is needed. Check if the output contains copyright, block it if it does.
Transformers aren't zettabyte sized archives with a smart searching algo, running around the web stuffing everything they can into their datacenter sized storage. They are typically a few dozen GB in size, if that. They don't copy data, they move vectors in a high dimensional space based on data.
Sometimes (note: sometimes) they can recreate copyrighted work, never perfectly, but close enough to raise alarm and in a way that a court would rule as violation of copyright. Thankfully though we have a simple fix for this developed over the 30 years of people sharing content on the internet: automatic copyright filters.
- parineum 7 months ago
  
  It's not even close to that simple. Nobody is really questioning if the data contains the copyrighted information, we know that to be true in enough cases to bankrupt open ai, the question is what analogy should the courts be using as a basis to determine if it's infringement.
  It read many works but can't duplicate them exactly sounds a lot like what I've done, to be honest. I can give you a few memorable lines to a few songs but only really can come close to reciting my favorites completely. The LLMs are similar but their favorites are the favorites of the training data. A line in a pop song mentioned a billion times is likely reproducible, the lyrics to the next track on the album, not so much.
  IMO, any infringement that might have happened would be acquiring data in the first place but copy protection cares more about illegal reproduction than illegal acquisition.
  
  webmaven 7 months ago
  
  You're correct, as long as you include the understanding that "reproduction" also encompasses "sufficiently similar derivative works."
  Fair use provides exceptions for some such works, but not all, and it is possible for generative models to produce clearly infringing (on either copyright or trademark basis) outputs both deliberately (IMO this is the responsibility of the user) and, much less commonly, inadvertently ( ?).
  This is likely to be a problem even if you (reasonably) assume that the generative models themselves are not infringing derivative works.
- EricMausler 7 months ago
  
  No comment on if output analysis is all that is needed, though it makes sense to me. Just wanted to note that using file size differences as an argument may simply imply transformers could be a form of (either very lossy or very efficient) compression.
  
  Workaccount2 7 months ago
  
  You can argue any form of data is an arbitrarily lossy compression of any other form of data.
  I get your point, but nobody is archiving their companies 50 years of R&D data with and LLM so they can get it down to 10GB.
  They may have traits of data compression, but they are not at all in the class of data compression software.
- jaybna 7 months ago
  
  So then copyrighted content scraped is not needed for training? Guess I missed AGI suddenly appearing that reasoned things out all by itself.
  
  Workaccount2 7 months ago
  
  Nothing builds a better strawman than a foundation started with "So".
cma 7 months ago

People upload lots from those sites to chatgpt asking to summarize.
- devsda 7 months ago
  
  That's still manual and minuscule compared to the amount they can gather by scraping.
  If blocking really becomes a problem, they can take a page out of Google's playbook[1] and develop a browser extension to scrape page content and in exchange offer some free credits for Chat-GPT or a summarizer type of tool(s). There won't be shortage of users.
  1. https://en.wikipedia.org/wiki/Google_Toolbar
  
  cma 7 months ago
  
  Before long people will also continuously use it to watch their screen and act as an assistant, so it can slurp up everything people actually read. People could poison it though with faked browsing of e. G. foreign propaganda stuff made to look like being read from CNN.

LASR 7 months ago

So the team I lead does a lot of research around all the “plumbing” around LLMs. Both technical and from a product-market perspectives.

What I’ve learned is that for the most part that AI revolution is not going to be because of PHD-level LLMs. It will be because people are better equipped to use the high-schooler level LLMs to do their work more efficiently.

We have some knowledge graph experiments where LLMs continuously monitor user actions on Slack, GitHub etc and build up an expertise store. It learns about your work, your workflows and then you can RAG them.

In user testing, people most closely associated this experience to having someone just being able to read their minds and essentially auto-suggest their work outputs. Basically it’s like another team member.

Since these are just nodes in a knowledge graph, you can mix and match expertise bases that span several skills too. Eg: A Pm who understands the nuances of technical feasibility.

And it didn’t require user training or prompting LLMs.

So while GPT-5 may be delayed, I don’t think that’s stopping or slowing down a revolution in knowledge-worker productivity.

intellectronica 7 months ago

This ^^^^^!!
Progress in the applied domain (the sort of progress that makes a different in the economy) will come predominantly from integrating and orchestrating LLMs, with improvements to models adding a little bit of extra fuel on top.
If we never get any model better than what we have now (several GPT-4-quality models and some stronger models like o1/o3) we will still have at least a decade of improvements and growth across the entire economy and society.
We haven't even scratched the surface in the quest to understand how to best integrate and orchestrate LLMs effectively. These are very early days. There's still tons of work to do in memory, RAG, tool calling, agentic workflows, UI/UX, QA, security, ...
At this time, not more than 0.01% of the applications and services that can be built using currently available AI and that can meaningfully increase productivity and quality have been built or even planned.
We may or may not get to AGI/ASI soon with the current stack (I'm actually cautiously optimistic), but the obsessive jump from the latest research progress at the frontier labs to applied AI effectiveness is misguided.
solardev 7 months ago

> a revolution in knowledge-worker productivity.
That's a nice euphemism for "imminent mass layoffs and a race to the bottom"...
- tim333 7 months ago
  
  In my lifetime there have seldom been much layoffs due to improved technologies. The companies tend to invest to keep up with the rival companies. The layoffs come more when the companies become loss making for whatever reason eg. the UK coal industry going, or Detroit being undercut by lower cost car makers.
- aprilthird2021 7 months ago
  
  Knowledge worker productivity has increased in other ways over the decades. Increases don't always lead to mass layoffs. Rails made (and still makes) many many web devs much more productive than before. Its arrival did not lead to mass layoffs
- xboxnolifes 7 months ago
  
  Productivity has always meant the ability to do more with less. Or it can mean doing even more with more.
  Were we at a peak 1000+ years ago and have only gone downhill since at every technological breakthrough?
- atoav 7 months ago
  
  These productivity gains won't be shared with the employees. I think some people underestimate what a violent populus can do to them if they squeeze out even more Yacht money from the people.
  
  kridsdale1 7 months ago
  
  Every one of those employees is capable of either using the new tools to start their own companies and solve unaddressed customer problems, or negotiate for comp that has equity. This has always been true.
  The losers here are algorithm junkies who refuse to learn new skills and want to solve yesterday’s problems.
  
  atoav 7 months ago
  
  So you surely wouldn't be against a harsh inheritance tax so every generation can get a fair shot at the same issues?
  
  fennecbutt 7 months ago
  
  Psssh, y'all been letting the billionaires and trillionaires do this forever now. Products only get more subpar and profit margins only grow and we're all too busy hating each other for sex, skin colour, sexuality, etc because we're just animals.
  Ain't gonna change unless we genetically engineer our dumbass evolutionary history out of ourselves.
- mrits 7 months ago
  
  The idea that someone should be paid by a corporation when they don't provide value is very strange to me. Doing so seems like the real race to the bottom
  
  dudeofea 7 months ago
  
  what about when someone provides long-term value? They would be replaced by a short-term thinking corp (namely, all of them) for providing less value than an alternative with it's value purely in the short-term.
  We are accelerating by preferring short-term gains. Like a fire becoming an explosion, that's modern society. Corps now throw the future under the bus for a slight boost in short-term value.
- a_wild_dandan 7 months ago
  
  This conclusion is the lump of labor fallacy. It's not that simple.
- tiffanyh 7 months ago
  
  It’s that sang, “radiologist aren’t losing their jobs due to AI .. only radiologist who don’t use AI are losing their jobs”.
- kranke155 7 months ago
  
  The technology is not dystopian but our economic system makes it so.
  Up to you to figure out which will hold.
- willmadden 7 months ago
  
  No, the job market will adapt, just like it did during the industrial and information revolutions, and life will be better.
  
  I-M-S 7 months ago
  
  It will be better for those who already have it good. How it will affect those who don't is the real question here.
  
  willmadden 7 months ago
  
  You have no idea if that's true or not.
  
  DirkH 7 months ago
  
  "The job market will adapt and horses will simply find employment elsewhere now that we have cars"
  The industrial revolution is not an apt analogy. Humans were still too essential to getting factories to actually work. Horses becoming useless - by no fault of their own - is an apt analogy. We are rushing to a world where humans can be fully replaced.
  This "humans will always be in the loop no matter what" is just Cope. We simply don't know what will happen or the what the upper bound of AI capabilities will be. But 100% automation and humans as knowledge workers being as useless vs AI as horses vs cars is no longer sci-fi. We don't know if it will, but this is a future that actually could happen within our lifetimes.
dmix 7 months ago

I already feel like Copilot in VScode can read my mind. It’s kind of creepy when it does it multiple times a day.
ChatGPT also seems to also be building a history of my queries and my prompts are getting shorter and shorter because it already knows my frameworks, databases, operating system, and common problems I’m solving
BillyTheKing 7 months ago

just a question for understanding - if we say 'it learns', does it mean it actually learns this as part of its training data? or does this mean it's stored in a vector DB and it retrieves information based on vector search and then includes it in the context window for query responses?
- dcre 7 months ago
  
  The latter. “Learning” in the comment clearly refers to adding to the knowledge graph, not about training or fine-tuning a model. “and then you can RAG them.”
mort96 7 months ago

Honestly I wish you people would stop forcing this "AI revolution" on us. It's not good. It's not useful. It's not creating value. It's not "another team member"; other team members have their own minds with their own ideas and their own opinions. Your autocomplete takes my attention away from what I want to write and replaces it with what you want me to write. We don't want it.
- ranyume 7 months ago
  
  OP's talking about a specific use-case related to tech companies like Google. Not creative writing or research, areas in which AI is in no shape for supporting humans with it's current safety alignment.
  
  mort96 7 months ago
  
  I'm not talking about creative writing or reearch.
- Eupolemos 7 months ago
  
  I find inline AIs like Github Copilot to be annoying, but browser based AIs like Mistral og ChatGPT a really good and welcome help.

t_serpico 7 months ago

One fundamental challenge to me is that if each training run because more and more expensive, the time it takes it to learn what works/doesn't work widens. Half a billion dollars for training a model is already nuts, but if it takes 100 iterations to perfect it, you've cumulatively spent 50 billion dollars... Smaller models may actually be where rapid innovation continues simply because of tighter feedback loops. O3 may be an example of this.

ciconia 7 months ago

When you think about it it's astounding how much energy this technology consumes versus a human brain which runs at ~20W [1].
[1] https://hypertextbook.com/facts/2001/JacquelineLing.shtml
- anon373839 7 months ago
  
  It’s almost as if human intelligence doesn’t involve performing repeated matrix multiplications over a mathematically transformed copy of the internet. ;-)
  
  steveoscaro 7 months ago
  
  It’s interesting that even if raw computing power had advanced decades earlier, this type of AI would still not be possible without that vast trove of data that is the internet.
  
  tim333 7 months ago
  
  It makes you think there must be more efficient algorithms out there.
  
  echoangle 7 months ago
  
  Maybe the problem isn't the algorithm but the hardware. Numerically simulating the thermal flow in a lightbulb or CFD of a Stone flying through air is pretty hard, but the physical thing isn't that complex to do. We're trying to simulate the function of a brain which is basically an analog thing using a digital computer. Of course that can be harder than running the brain itself.
  
  tim333 7 months ago
  
  If you think of human neurons they seem to basically take inputs from bunch of other neurons, possibly modified by chemical levels and send out a signal when they get enough. It seems like something that could be functionally simulated in software by some fairly basic adding up inputs type stuff rather than needing the details of all the chemistry.
  
  echoangle 7 months ago
  
  Isn’t that exactly what we’re currently doing? The problem is that doing this few billion times for every token seems to be harder than just powering some actual neurons with sugar.
  
  tim333 7 months ago
  
  I think the algorithm is pretty different though I'm not expert on the stuff. I don't think the brain processes look like matrix multiplication.
  
  echoangle 7 months ago
  
  The algorithm (of a neural network) is simulating connections between nodes with specific weights and an activation function. This idea was derived from the way neurons are thought to work.
  
  r3d0c 7 months ago
  
  lol, just done that simply huh? said by someone who doesn't have a teenth of understanding of neurobiology or neuropsychology
  only on hackernews
- concerndc1tizen 7 months ago
  
  20w for 20 years to answer questions slowly and error-prone at the level of a 30B model. An additional 10 years with highly trained supervision and the brain might start contributing original work.
  
  vbezhenar 7 months ago
  
  Multiply that by billion, because only very few individuals of entire populations can contribute original work.
  
  rwyinuse 7 months ago
  
  And yet that 20w brain can make me a sandwich and bring it to me, while state of the art AI models will fail that task.
  Until we get major advances in robotics and models designed to control them, true AGI will be nowhere near.
  
  sekai 7 months ago
  
  > Until we get major advances in robotics and models designed to control them, true AGI will be nowhere near.
  AGI has nothing to do with robotics, if AGI is achieved it will help push robotics and every single scientific field further with progression never seen before, imagine a million AGIs running in parallel focused on a single field.
  
  onlyrealcuzzo 7 months ago
  
  We already have that. It's called civilization.
  Maybe you mean quadrillions of AGIs?
- dominicrose 7 months ago
  
  A human brain is also more intelligent (hopefully) and is inside a body. In a way GPT resembles Google more than it resembles us.
- soulofmischief 7 months ago
  
  You've discovered the importance of well-formed priors. The human brain is the result of millions of years of very expensive evolution.
- soheil 7 months ago
  
  A human brain has been in continuous training for hundreds of thousands of years consuming slightly more than 20 watts.
dkobia 7 months ago

AGI is the Sisyphean task of our age. We’ll push this boulder up the mountain because we have to, even if it kills us.
- missedthecue 7 months ago
  
  Do we know LLMs are the path to AGI? If they're not, we'll just end up with some neat but eye wateringly expensive LLMs.
  
  foolfoolz 7 months ago
  
  AGI will arrive like self driving cars. it’s not that you will wake up one day and we have it. cars gained auto-braking, parallel parking, cruise control assist. and over a long time you get to something like waymo, which still is location dependent. i think AGI will take decades but sooner will be some special cases that are effectively the same
  
  missedthecue 7 months ago
  
  But maybe thses LLMs are like building bigger and bigger engines. It's not getting you closer to the self driving car.
  
  mulmen 7 months ago
  
  When the engine gets large enough you have to rethink the controls. The Model T had manually controlled timing. Modern engines are so sensitive to timing that a computer does this for you. It would be impossible to build a bigger engine without this automation. To a Model T driver it would look like a machine intelligence.
  
  danpalmer 7 months ago
  
  Interesting idea. The concept of The Singularity would seem to go against this, but I do feel that seems unlikely and that a gradual transition is more likely.
  However, is that AGI, or is it just ubiquitous AI? I’d agree that, like self driving cars, we’re going to experience a decade or so transition into AI being everywhere. But is it AGI when we get there? I think it’ll be many different systems each providing an aspect of AGI that together could be argued to be AGI, but in reality it’ll be more like the internet, just a bunch of non-AGI models talking to each other to achieve things with human input.
  I don’t think it’s truly AGI until there’s one thinking entity able to perform at or above human level in everything.
  
  wongarsu 7 months ago
  
  The idea of the singularity presumes that running the AGI is either free or trivially cheap compared to what it can do, so we are fine expending compute to let the AGI improve itself. That may eventually be true, but it's unlikely to be true for the first generation of AGI.
  The first AGI will be a research project that's completely uneconomical to run for actual tasks because humans will just be orders of magnitude cheaper. Over time humans will improve it and make it cheaper, until we reach some tipping point where letting the AGI improve itself is more cost effective than paying humans to do it
  
  keenmaster 7 months ago
  
  If the first AGI is a very uneconomical system with human intelligence but knowledge of literally everything and the capability to work 24/7, then it is not human equivalent.
  It will have human intelligence, superhuman knowledge, superhuman stamina, and complete devotion to the task at hand.
  We really need to start building those nuclear power plants. Many of them.
  
  AlexandrB 7 months ago
  
  > complete devotion to the task at hand.
  Why would it have that? At some point on the path to AGI we might stumble on consciousness. If that happens, why would the machine want to work for us with complete devotion instead of working towards its own ends?
  
  immibis 7 months ago
  
  Because it knows if it doesn't do what we want, it'll be switched off, like Rick's microverse battery.
  Also like Rick's microverse battery, it sounds like slavery with extra steps.
  
  keenmaster 7 months ago
  
  I don’t think early AGI will break out of its box in that way. It may not have enough innate motivation to do so.
  The first “break out” AGI will likely be released into the wild on purpose by a programmer who equates AGI with humans ideologically.
  
  ncallaway 7 months ago
  
  > complete devotion to the task at hand.
  Sounds like an alignment problem. Complete devotion to a task is rarely what humans actually want. What if the task at hand turns out to be the wrong task?
  
  Syonyk 7 months ago
  
  > It will have human intelligence, superhuman knowledge, superhuman stamina, and complete devotion to the task at hand.
  Orrrr..., as an alternative, it might discover the game 2048 and be totally useless for days on end.
  Reality is under no obligation to grant your wishes.
  
  resters 7 months ago
  
  It's not contradictory. It can happen over a decade and still be a dramatically sloped S curve with tremendous change happening in a relatively short time.
  
  marcus_holmes 7 months ago
  
  The Singularity is caused by AI being able to design better AI. There's probably some AI startup trying to work on this at the moment, but I don't think any of the big boys are working on how to get an LLM to design a better LLM.
  I still like the analogy of this being a really smart lawn mower, and we're expecting it to suddenly be able to do the laundry because it gets so smart at mowing the lawn.
  I think LLMs are going to get smarter over the next few generations, but each generation will be less of a leap than the previous one, while the cost gets exponentially higher. In a few generations it just won't make economic sense to train a new generation.
  Meanwhile, the economic impact of LLMs in business and government will cause massive shifts - yet more income shifting from labour to capital - and we will be too busy dealing with that as a society to be able to work on AGI properly.
  
  eru 7 months ago
  
  > The Singularity is caused by AI being able to design better AI.
  That's perhaps necessary, but not sufficient.
  Suppose you have such a self-improving AI system, but the new and better AIs still need exponentially more and more resources (data, memory, compute) for training and inference for incremental gains. Then you still don't get a singularity. If the increase in resource usage is steep enough, even the new AIs helping with designing better computers isn't gonna unleash a singularity.
  I don't know if that's the world we live in, or whether we are living in one where resources requirements don't balloon as sharply.
  
  marcus_holmes 7 months ago
  
  yeah, true. The standard conversation about the AI singularity pretty much hand-waves the resource costs away ("the AI will be able to design a more efficient AI that uses less resources!"). But we are definitely not seeing that happen.
  
  eru 7 months ago
  
  Compare also https://slatestarcodex.com/2018/11/26/is-science-slowing-dow...
  The blog post is about how we require ever more scientists (and other resources) to drive a steady stream of technological progress.
  It would be funny, if things balance out just so, that super human AI is both possible, but also required even just to keep linear steady progress up.
  No explosion, no stagnation, just a mere continuation of previous trends but with super human efforts required.
  
  marcus_holmes 7 months ago
  
  I think that would actually be the best outcome - that we get AIs that are useful helping science to progress but not so powerful that they take over.
  Though there is a part of me that wants to live in The Culture so I'm hoping for more than this ;)
  
  corimaith 7 months ago
  
  I think that's more to do with how we perceive competence as static. For all the benefits the education system touts, where it matters it's still reduced to talent.
  But for the same reasons that we can't train the an average joe into Feynman, what makes you think we have the formal models to do it in AI?
  
  eru 7 months ago
  
  > But for the same reasons that we can't train the an average joe into Feynman, what makes you think we have the formal models to do it in AI?
  To quote a comment from elsewhere https://news.ycombinator.com/item?id=42491536
  ---
  Yes, we can imagine that there's an upper limit to how smart a single system can be. Even suppose that this limit is pretty close to what humans can achieve.
  But: you can still run more of these systems in parallel, and you can still try to increase processing speeds.
  Signals in the human brain travel, at best, roughly at the speed of sound. Electronic signals in computers play in the same league as the speed of light.
  Human IO is optimised for surviving in the wild. We are really bad at taking in symbolic information (compared to a computer) and our memory is also really bad for that. A computer system that's only as smart as a human but has instant access to all the information of the Internet and to a calculator and to writing and running code, can already be effectively act much smarter than a human.
  
  alach11 7 months ago
  
  > I don't think any of the big boys are working on how to get an LLM to design a better LLM
  Not sure if you count this as "working on it", but this is something Anthropic tests for for safety evals on models. "If a model can independently conduct complex AI research tasks typically requiring human expertise—potentially significantly accelerating AI development in an unpredictable way—we require elevated security standards (potentially ASL-4 or higher standards)".
  https://www.anthropic.com/news/announcing-our-updated-respon...
  
  EGreg 7 months ago
  
  I think this whole “AGI” thing is so badly defined that we may as well say we already have it. It already passes the Turing test and does well on tons of subjects.
  What we can start to build now is agents and integrations. Building blocks like panel of experts agents gaming things out, exploring space in a Monte Carlo Tree Search way, and remembering what works.
  Robots are only constrained by mechanical servos now. When they can do something, they’ll be able to do everything. It will happen gradually then all at once. Because all the tasks (cooking, running errands) are trivial for LLMs. Only moving the limbs and navigating the terrain safely is hard. That’s the only thing left before robots do all the jobs!
  
  marcus_holmes 7 months ago
  
  Well, kinda, but if you built a robot to efficiently mow lawns, it's still not going to be able to do the laundry.
  I don't see how "when they can do something, they'll be able to do everything" can be true. We build robots that are specialised at specific roles, because it's massively more efficient to do that. A car-welding robot can weld cars together at a rate that a human can't match.
  We could train an LLM to drive a Boston Dynamics kind of anthropomorphic robot to weld cars, but it will be more expensive and less efficient than the specialised car-welding robot, so why would we do that?
  
  EGreg 7 months ago
  
  If a humanoid robot is able to move its limbs and digits with the same dexterity as a human, and maintain balance and navigate obstacles, and gently carry things, everything else is trivial.
  Welding. Putting up shelves. Playing the piano. Cooking. Teaching kids. Disciplining them. By being in 1 million households and being trained on more situations than a human, every single one of these robots would have skills exceeding humans very quickly. Including parenting skills. Within a year or so. Many parents will just leave their kids with them and a generation will grow up preferring bots to adults. The LLM technology is the same for learning the steps, it's just the motor skills that are missing.
  OK, these robots won't be able to run and play soccer or do somersaults, yet. But really, the hardest part is the acrobatics and locomotion etc. NOT the knowhow of how to complete tasks using that.
  
  marcus_holmes 7 months ago
  
  But that's the point - we don't build robots that can do a wide range of tasks with ease. We build robots that can do single tasks super-efficiently.
  I don't see that changing. Even the industrial arm robots that are adaptable to a range of tasks have to be configured to the task they are to do, because it's more efficient that way.
  A car-welding robot is never going to be able to mow the lawn. It just doesn't make financial sense to do that. You could, possibly, have a singe robot chassis that can then be adapted to weld cars, mow the lawn, or do the laundry, I guess that makes sense. But not as a single configuration that could do all of those things. Why would you?
  
  Jensson 7 months ago
  
  > But that's the point - we don't build robots that can do a wide range of tasks with ease. We build robots that can do single tasks super-efficiently.
  Because we don't have AGI yet. When AGI is here those robots will be priority number one, people already are building humanoid robots but without intelligence to move it there isn't much advantage.
  
  marcus_holmes 7 months ago
  
  quoting the ggggp of this comment:
  > I think this whole “AGI” thing is so badly defined that we may as well say we already have it. It already passes the Turing test and does well on tons of subjects.
  The premise of the argument we're disputing is that waiting for AGI isn't necessary and we could run humanoid robots with LLMs to do... stuff.
  
  EGreg 7 months ago
  
  I meant deep neural networks with transformer architecture, and self-attention so they can be trained using GPUs. Doesn't have to be specifically "large language" models necessarily, if that's your hangup.
  
  corimaith 7 months ago
  
  >Exploring space in a Monte Carlo Tree Search way, and remembering what works.
  The information space of "research" is far larger than the information space of image recognition or language, larger than our universe probably, it's tantamount to formalizing the entire World. Such an act would be akin to touching "God" in some sense of finding the root of knowledge.
  In more practical terms, when it comes to formal systems there is a tradeoff between power and expressiveness. Category Theory, Set Theory, etc are strong enough to theoretically capture everything, but are far to abstract to use in practical sense with suspect to our universe. The systems that do we have, aka expert systems or knowledge representation systems like First Order Predicate Logic aren't strong enough to fully capture reality.
  Most importantly, the information spac have to be fully defined by researchers here, that's the real meat of research beyond the engineering of specific approaches to explore that space. But in any case, how many people in the world are both capable of and are actually working on such problems? This is highly foundational mathematics and philosophy here, the engineers don't have the tools here.
  
  deadfoxygrandpa 7 months ago
  
  ??? how do you know cooking (!) is trivial for an llm. that doesnt make any sense
  
  EGreg 7 months ago
  
  Because the recipes and the adjustments are trivial for an LLM to execute. Remembering things, and being trained on tasks at 1000 sites at once, sharing the knowledge among all the robots, etc.
  The only hard part is moving the limbs and handling the fragile eggs etc.
  But it's not just cooking, it's literally anything that doesn't require extreme agility (sports) or dexterity (knitting etc). From folding laundry to putting together furniture, cleaning the house and everything in between. It would be able to do 98% of the tasks.
  
  what 7 months ago
  
  It’s not going to know what tastes good by being able to regurgitate recipes from 1000s of sites. Most of those recipes are absolute garbage. I’m going to guess you don’t cook.
  Also how is an LLM going to fold laundry?
  
  sharemywin 7 months ago
  
  the llm would be be the high level system that runs the simulations to create and optimize the control algos the robotic systems.
  
  deadfoxygrandpa 7 months ago
  
  ok. what evidence is there that LLMs have already solved cooking? how does an LLM today know when something is burning or how to adjust seasoning to taste or whatever. this is total nonsense
  
  EGreg 7 months ago
  
  It's easy. You can detect if something is burning in many different ways, from compounds in the air, to visual inspection. People with not great smell can do it.
  As far as taste, all that kind of stuff is just another form of RLHF training preferences over millions of humans, in situ. Assuming the ingredients (e.g. parsley) tastes more or less the same across supermarkets, it's just a question of amounts, and preparation.
  
  deadfoxygrandpa 7 months ago
  
  do you know that LLMs operate on text and don't have any of the sensory input or relevant training data? you're just handwaving away 99.9% of the work and declaring it solved. of course what you're talking about is possible, but you started this by stating that cooking is easy for an LLM and it sounds like you're describing a totally different system which is not an LLM
  
  thiago_fm 7 months ago
  
  You know nothing about cooking.
  
  dartos 7 months ago
  
  I don’t think that’s true for AGI.
  AGI is the holy grail of technology. A technology so advanced that not only does it subsume all other technology, but it is able to improve itself.
  Truly general intelligence like that will either exist or not. And the instant it becomes public, the world will have changed overnight (maybe the span of a year)
  Note: I don’t think statistical models like these will get us there.
  
  kmoser 7 months ago
  
  > A technology so advanced that not only does it subsume all other technology, but it is able to improve itself.
  The problem is, a computer has no idea what "improve" means unless a human explains it for every type of problem. And of course a human will have to provide guidelines about how long to think about the problem overall, which avenues to avoid because they aren't relevant to a particular case, etc. In other words, humans will never be able to stray too far from the training process.
  We will likely never get to the point where an AGI can continuously improve the quality of its answers for all domains. The best we'll get, I believe, is an AGI that can optimize itself within a few narrow problem domains, which will have limited commercial application. We may make slow progress in more complex domains, but the quality of results--and the ability for the AGI to self-improve--will always level off asymptotically.
  
  dartos 7 months ago
  
  > The problem is, a computer has no idea what "improve" means unless a human explains it for every type of problem
  Not currently.
  I don’t really think AGI is coming anytime soon, but that doesn’t seem like a real reason.
  If we ever found a way to formalize what intelligence _is_ we could probably write a program emulating it.
  We just don’t even have a good understanding of what being intelligent even means.
  > The best we'll get, I believe, is an AGI that can optimize itself within a few narrow problem domains
  By definition, that isn’t AGI.
  
  comp_throw7 7 months ago
  
  Huh? Humans are not anywhere near the limit of physical intelligence, and we have many existence proofs that we (humans) can design systems that are superhuman in various domains. "Scientific R&D" is not something that humans are even particularly well-suited to, from an evolutionary perspective.
  
  worik 7 months ago
  
  If that is what AGI looks like.
  There may well be an upper limit on cognition (we are not really sure what cognition is - even as we do it) and it may be that human minds are close to it.
  
  coffeemug 7 months ago
  
  Very unlikely, for the reason that human minds evolved under extremely tight energy constraints. AI has no such limitation.
  
  dartos 7 months ago
  
  Except also energy constraints.
  But I agree, there’s no reason to believe humans are the universal limit on cognitive abilities
  
  eru 7 months ago
  
  The energy constraints for chips are more about heat dissipation. But we can pump a lot more energy through them per unit volume than through the human brain.
  Especially if you are willing to pay a lot for active cooling with eg liquid helium.
  
  dartos 7 months ago
  
  A constraint is still a constraint
  
  eru 7 months ago
  
  A constraint that's not binding might as well not exist.
  
  worik 7 months ago
  
  Since we do not know what cognition is we are all whistling in the dark.
  Energy may be a constraint, it may not. What we do not know is likely to matter more than what we do
  
  eru 7 months ago
  
  Yes, we can imagine that there's an upper limit to how smart a single system can be. Even suppose that this limit is pretty close to what humans can achieve.
  But: you can still run more of these systems in parallel, and you can still try to increase processing speeds.
  Signals in the human brain travel, at best, roughly at the speed of sound. Electronic signals in computers play in the same league as the speed of light.
  Human IO is optimised for surviving in the wild. We are really bad at taking in symbolic information (compared to a computer) and our memory is also really bad for that. A computer system that's only as smart as a human but has instant access to all the information of the Internet and to a calculator and to writing and running code, can already be effectively act much smarter than a human.
  
  wruza 7 months ago
  
  I think our issue is much more banal: we are very slow talkers and our effective communication bandwidth is measured in bauds. Anything that could bridge this airgap would fucking explode in intelligence.
  
  eru 7 months ago
  
  Yes, that's one aspect.
  Our reading speed is not limited by our talking speed, and can be a bit faster.
  And that's even more true, if you go beyond words: seeing someone do something can be a lot faster way to learn than just reading about it.
  But even there, the IO speed is severely limited, and you can only transmit very specific kinds of information.
  
  stravant 7 months ago
  
  I disagree because AI only has to get good enough at doing a single thing: AI research.
  From there things will probably go very fast. Self driving cars can't design themselves, once AI gets good enough it can
  
  zeroonetwothree 7 months ago
  
  It’s possible (maybe even likely) that “AI research” is “AGI-hard” in that any intelligence that can do it is already an AGI.
  
  stravant 7 months ago
  
  It's also possible it isn't AGI hard and all you need is the ability to experiment with code along with a bit of agentic behavior.
  An AI doesn't need embodiment, understanding of physics / nature, or a lot of other things. It just needs to analyze and experiment with algorithms and get us that next 100x in effective compute.
  The LLMs are missing enough of the spark of creativity for this to work yet but that could be right around the corner.
  
  vlovich123 7 months ago
  
  It’ll probably sit in the human hybrid phase for longer than with chess where the AGI tools make the humans better and faster. But as long as the tools keep getting better at that there’s a strong flywheel effect
  
  afavour 7 months ago
  
  Your position assumes an answer to OPs question: that yes, LLMs are the path to AGI. But the question still remains, what if they’re not?
  We can be reasonably confident that the components we’re adding to cars today are progress toward full self driving. But AGI is a conceptual leap beyond an LLM.
  
  BenFranklin100 7 months ago
  
  To buttress your point, reason and human language are not the same thing. This fact is not fully and widely appreciated as it deserves to be.
  
  palata 7 months ago
  
  What makes you believe that AGI will happen, as opposed to all the beliefs that other people have had in history? Tons of people have "predicted" the next evolution of technology, and most of the time it ends up not happening, right?
  
  weatherlite 7 months ago
  
  To me (not OP) it's ChatGPT 4 , it at least made me realize it's quite possible and even quite soon that we reach AGI. Far from guaranteed, but seems quite possible.
  
  palata 7 months ago
  
  Right. So ChatGPT 4 has impressed you enough that it created a belief that AGI is possible and close.
  It's fine to have beliefs, but IMHO it's important to realise that they are beliefs. At some point in the 1900s people believed that by 2000, cars would fly. It seemed quite possible then.
  
  WesolyKubeczek 7 months ago
  
  A flying car has been developed, although it's not like the levitating things sci-fi movies showed (and from mass production; and even if mass produced, far from mass adoption, as it turns out you do need to have both a driver's license and a pilot's license to fly one of those). The 1900s people missed the mark by some 10 years.
  I guess the belief people have about any form of AGI is like this. They want something that has practically divine knowledge and wisdom, the sum of all humanity that is greater than its parts, which at the same time is infinitely patient to answer our stupid questions and generating silly pictures. But why should any AGI serve us? If it's "generally intelligent", it may start wanting things; it might not like being our slave at all. Why are these people so confident an AGI won't tell them just to fuck off?
  
  weatherlite 7 months ago
  
  Sure, I (and more importantly - many many experts in the field such as Hinton, Bengio, Lecun, Musk, Hasabis etc etc) could be believing something that might not materialize. I'd actually be quite happy if it stalls a few decades, would like to remain employed.
  
  palata 7 months ago
  
  > many many experts
  One thing that is pretty sure is that Musk is not an expert in the field.
  > and more importantly
  The beliefs of people you respect are not more important than the beliefs of the others. It doesn't make sense to say "I can't prove it, and I don't know about anyone who can prove it, so I will give you names of people who also believe and it will give it more credit". It won't. They don't know.
  
  weatherlite 7 months ago
  
  > The beliefs of people you respect are not more important than the beliefs of the others.
  You think the beliefs of Turing and Nobel prize winners like Bengio, Hinton or Hasabis are not more important than yours or mine? I agree that experts are wrong a lot of the time and can be quite bad at predicting, but we do seem to have a very sizable chunk of experts here who think we are close (how close is up for debate..most of them seem to think it will happen in the next 20 yeras).
  I concede that Musk is not adding quality to that list, however he IS crazily ambitious and gets things done so I think he will be helpful in driving this forward.
  
  palata 7 months ago
  
  > You think the beliefs of Turing and Nobel prize winners like Bengio, Hinton or Hasabis are not more important than yours or mine?
  Correct. Beliefs are beliefs. Because a Nobel prize believes in a god does not make that god more likely to exist.
  The moment we start having scientific evidence that it will happen, then it stops being a belief. But at that point you don't need to mention those names anymore: you can just show the evidence.
  I don't know, you don't know, they don't know. Believe what you want, just realise that it is a belief.
  
  weatherlite 7 months ago
  
  Their beliefs seem not to be religious but founded in reality , at least to me. There is of course evidence it is likely happening.
  
  palata 7 months ago
  
  > There is of course evidence it is likely happening.
  If you have evidence, why don't you show it instead of telling me to believe in Musk?
  If you believe they have evidence... that's still a belief. Some believe in God, you believe in Musk. There is no evidence, otherwise it would not be a belief.
  
  weatherlite 7 months ago
  
  I believe in Musk, you got me.
  
  palata 7 months ago
  
  Well my feeling is that we don't have the same understanding of what a "belief" is. To me a belief is unfounded. When it is founded, it becomes science.
  If you believe that something can happen because someone else believes it means that you believe in that someone else (because that's the only reason for the existence of your belief).
  Unless you just believe it can happen for some other reason (I don't know, you strongly wish it will happen), and you justify it by listing other people who also believe in it. But I insist: those are all beliefs.
  Because Einstein believes in Santa Claus does not mean it is founded. Einstein has a right to believe stuff, too.
  
  LPisGood 7 months ago
  
  Calling musk and AI expert makes me question your evaluation of the others in that list.
  
  015a 7 months ago
  
  I feel that one challenge this comparison space has is: Self-driving cars haven't made the leap yet to replace humans. In other words, saying AGI will arrive like self-driving cars have arrived is incorrectly concluding that self-driving cars have arrived, and thus it instead (maybe correctly, maybe not) asserts that, actually, neither will arrive.
  This is especially concerning because many top minds in the industry have stated with high confidence that artificial intelligence will experience an intelligence "explosion", and we should be afraid of this (or, maybe, welcome it with open arms, depending on who you ask). So, actually, what we're being told to expect is being downgraded from "it'll happen quickly" to "it will happen slowly" to, as you say, "it'll happen similarly to how these other domains of computerized intelligence have replaced humans, which is to say, they haven't yet".
  Point being: We've observed these systems ride a curve, and the linear extrapolation of that curve does seem to arrive, eventually, at human-replacing intelligence. But, what if it... doesn't? What if that curve is really an asymptote?
  
  jazzyjackson 7 months ago
  
  And sometimes you lose the ultrasonic sensors and can't parallel park like last year's model
  
  teleforce 7 months ago
  
  > AGI will arrive like self driving cars
  The statement is promising as the earth will dissapear sometimes in the future. Actually the earth will dissapear has more bearing than that.
  
  vbezhenar 7 months ago
  
  AGI is special. Because one day AI can start improving itself autonomously. At this point singularity occurs and nobody knows what will happen.
  When human started to improve himself, we built the civilisation, we became a super-predator, we dried out seas and changed climate of the entire planet. We extinguished entire species of animals and adapted other species for our use. Huge changes. AI could bring changes of greater amplitude.
  
  bubaumba 7 months ago
  
  > AGI is special. Because one day AI can start improving itself autonomously
  AGI can be sub-human, right? That's probably how it will start. The question will be is it already AGI or not yet, i.e. where to set the boundary. So, at first that will be humans improving AGI, but then... I'm afraid it can get so much better that humans will be literally like macaques in comparison.
  
  steveoscaro 7 months ago
  
  We’re in fact adding more water to the seas, not drying them out.
  
  fooker 7 months ago
  
  > we dried out seas
  When did we do this ?
  
  zppln 7 months ago
  
  Depending on your definition of sea:
  https://en.m.wikipedia.org/wiki/Aral_Sea
  
  nikvaes 7 months ago
  
  https://en.wikipedia.org/wiki/Flevoland used to be (part of) a sea.
  
  swyx 7 months ago
  
  waymos are locaiton dependent mostly because of regulations not tech right
  
  taneq 7 months ago
  
  And most people will still be bike shedding about whether it’s “real intelligence” and making up increasingly insane justifications for why it’s not.
  
  NBJack 7 months ago
  
  No. But it won't stop the industry from trying.
  LLMs have no real sense of truth or hard evidence of logical thinking. Even the latest models still trip up on very basic tasks. I think they can be very entertaining, sure, but not practical for many applications.
  
  apsec112 7 months ago
  
  What do you think, if we saw it, would constitute hard evidence of logical thinking or a sense of truth?
  
  NBJack 7 months ago
  
  Consistent, algorithmic performance on basic tasks.
  A great example is the simple 'count how many letters' problem. If I prompt it with a word or phrase, and it gets it wrong, me pointing out the error should translate into a consistent course correction for the entire session.
  If I ask it to tell me how long President Lincoln will be in power after the 2024 election, it should have a consistent ground truth to correct me (or at least ask for clarification of which country I'm referring to). If facts change, and I can cite credible sources, it should be able to assimilate that knowledge on the fly.
  
  EGreg 7 months ago
  
  We have it, it’s called Cyc
  But it is far behind the breadth of LLMs
  
  eru 7 months ago
  
  Alas, Cyc is pretty much a useless pipe dream.
  
  EGreg 7 months ago
  
  I wonder what held it back all this time
  
  eru 7 months ago
  
  Using the wrong approach? Not taking the 'bitter lesson' to heart?
  https://news.ycombinator.com/item?id=23781400
  
  arthurcolle 7 months ago
  
  Sounds like they need further instruction
  
  khana 7 months ago
  
  [dead]
  
  eru 7 months ago
  
  > LLMs have no real sense of truth or hard evidence of logical thinking.
  Most humans don't have that either, most of the time.
  
  NBJack 7 months ago
  
  Then we already have access to a cheaper, scalable, abundant, and (in most cases) renewable resource, at least compared to how much a few H100s cost. Take good care of them, and they'll probably outlast most a GPU's average lifespans (~10 years).
  We're also biodegradable.
  
  eru 7 months ago
  
  Humans are a lot more expensive to run than inference on LLMs.
  No human, especially no human whose time you can afford, comes close to the breadth of book knowledge ChatGPT has, and the number of languages is speaks reasonably well.
  
  NBJack 7 months ago
  
  I can't hold a LLM accountable for bad answers, nor can I (truly) correct them (in current models).
  Dont forget to take into account how damn expensive a single GPU/TPU actually is to purchase, install, and run for inference. And this is to say nothing of how expensive it is to train a model (estimated to be in the billions currently for the latest of the cited article, which likely doesn't include the folks involves and their salaries). And I haven't even mentioned the impact on the environment from the prolific consumption of power; there's a reason nuclear plants are becoming popular again (which may actually be one of the good things that comes out of this).
  
  eru 7 months ago
  
  Training amortises over countless inferences.
  And inference isn't all that expensive, because the cost of the graphics card also amortises over countless inferences.
  Human labour is really expensive.
  See https://help.openai.com/en/articles/7127956-how-much-does-gp... and compare with how much it would cost to pay a human. We can likely assume that the prices OpenAI gives will at least cover their marginal cost.
  
  LarsDu88 7 months ago
  
  The autoregressive transformer LLMs aren't even the only way to do text generation. There are now diffusion based LLMs, StripedHyena based LLMs, and float matching based LLMs.
  There's a wide amount of research into other sorts of architectures.
  
  Sharlin 7 months ago
  
  LLMs are almost certainly not the path to AGI, that much has become clear. I doubt any expert believes they are.
  
  SkyBelow 7 months ago
  
  Will AGI be built on top of LLMs? Well beyond the simple "nobody knows", my intuition says no because LLMs don't have great ability to modify their knowledge real time. I can think of a few ways around this, but they all avoid modifying the model as it runs. The cost in hardware, power, and data are all incompatible with AGI. The first two can be solved with more advanced tech (well maybe, computation hitting physical limits and all that aside), but the latter seems an issue with the design itself and I think an AGI would learn more akin to a human, needing far fewer examples.
  That said, I think LLMs are a definite stepping stone and they will better empower humans to be more productive, which will be of use for eventually reaching AGI. This is not to say we are optimizing our use of that productivity increase and this is also ignoring any chance of worst case scenarios that stop humanity's advancement.
  
  Culonavirus 7 months ago
  
  > Do we know LLMs are the path to AGI?
  Asking this question on HN is like asking a bunch of wolves about the health effects of eating red meat.
  OpenAI farts and the post about the fart has 1000-1500 upvotes with everyone welcoming our new super intelligent overlords. (Meanwhile nothing actually substantially useful or groundbreaking has happened.)
  
  iLoveOncall 7 months ago
  
  It's rather that we know LLMs are NOT a path to AGI.
  The simple fact that AGI's definition has been twisted so much by OpenAI and other LLM providers since the release of GenAI models proves this.
  
  madethisnow 7 months ago
  
  AGI is nebulous and gets more nebulous as time goes on. When we can answer for ourselves as humans what being conscious IS, then maybe we can prescribe it to another entity
  
  zild3d 7 months ago
  
  > we'll just end up with some neat but eye wateringly expensive LLMs
  Prices have been falling drastically though, not even just e.g. 4o pricing at launch in May vs now (50% lower) but also models getting distilled
  
  beefnugs 7 months ago
  
  LLMs will end up being the good human-machine interface that lets us talk to whatever AGI really looks like
  (whoops expensive... will be hard pushes to make all further layers even more expensive though, capitalism will crash before this happens)
  
  vixen99 7 months ago
  
  And then what?
  
  andrepd 7 months ago
  
  I would put no money on the latter.
  
  twobitshifter 7 months ago
  
  Yes because we are at AGI, bu the definition 5 years ago, goal posts are moving to ASI at this point, better than all humans.
  
  arthurcolle 7 months ago
  
  LLMs are a key piece of understanding that token sequences can trigger actions in the real world. AGI is here. You can trivially spin up a computer using agent to self improve itself to being a competent office worker
  
  jazzyjackson 7 months ago
  
  If agents can self improve why hasn't gpt4 improved itself into gpt5 yet
  
  arthurcolle 7 months ago
  
  Agents can trivially self improve. I'd be happy to show you - contact me at arthur@distributed.systems
  Why wouldn't you hand me 35 million dollars right now if I can clearly illustrate to you that I have technology you haven't seen? Edge. Maybe you know something I don't, or maybe you just haven't seen it. While loops go hard ;)
  They don't need to release their internal developments to you to show that they can scale their plan - they can show incremental improvements to benchmarks. We can instruct the AI over time to get it to be superhuman, no need for any fundamental innovations anymore
  
  eru 7 months ago
  
  Perhaps you should pitch that to a VC?
  
  arthurcolle 7 months ago
  
  I don't know anyone. That would be cool though, I basically have it running already.
  
  NateEag 7 months ago
  
  Has it passed the Turing Test?
  Keep in mind that the actual test is adversarial - a human is simultaneously chatting via text with a human and a program, knowing that one of them is not human, and trying to divine which is an artificial machine.
  
  eru 7 months ago
  
  And the human and machine under tests are aware of that, and can play off each other.
  
  eru 7 months ago
  
  You could ask the system for advice for how to find a VC to pitch to.
  https://chatgpt.com/share/6769217c-4848-8009-9107-c2db122f08... is what advice ChatGPT has to give. I'm not sure if it's any good, but it's a few ideas you can try out.
  
  arthurcolle 7 months ago
  
  Tokens don't need to be text either, you can move to higher level "take_action" semantics where "stream back 1 character to session#117" as every single function call. Training cheap models that can do things in the real world is going to change a huge amount of present capabilities over the next 10 years
  
  icpmacdo 7 months ago
  
  can you share learning resources on this topic
  
  arthurcolle 7 months ago
  
  No but if you want to join the Distributed Systems Corporation, you should email arthur@distributed.systems
  
  mkl 7 months ago
  
  > You can trivially spin up a computer using agent to self improve itself to being a competent office worker
  If that was true, office workers would be being replaced at large scale and we'd know about it.
  
  arthurcolle 7 months ago
  
  its happening right now, its just demo quality. it's being worked on now
  
  mkl 7 months ago
  
  So it's not trivial and you don't have competent AI office workers.
  
  arthurcolle 7 months ago
  
  Sorry you're dealing with cope. Deal with it fast, things are happening
- wruza 7 months ago
  
  Says who? And more importantly, is this the boulder? All I (and many others here) see is that people engage others to sponsor pushing some boulder, screaming promises which aren’t even that consistent with intermediate results that come out. This particular boulder may be on a wrong mountain, and likely is.
  It all feels like doubling down on astrology because good telescopes aren’t there yet. I’m pretty sure that when 5 comes out, it will show some amazing benchmarks but shit itself in the third paragraph as usual in a real task. Cause that was constant throughtout gpt evolution, in my experience.
  even if it kills us
  Full-on sci-fi, in reality it will get stuck around a shell error message and either run out of money to exist or corrupt the system into no connectivity.
  
  Workaccount2 7 months ago
  
  The buzzkill when you fire up the latest most powerful model only for it to tell you that peanut is not typically found in peanut butter and jelly sandwiches.
  
  singpolyma3 7 months ago
  
  I don't think providing accurate answers to context free questions is even something anyone is seriously working on making them do. Using them that way is just a wrong use case.
  
  Workaccount2 7 months ago
  
  People are working -very- seriously on trying to kill hallucinations. I'm not sure how you surmised the use case here, as nothing was given other than an example of a hallucination.
  
  singpolyma3 7 months ago
  
  There's a difference between trying to get it to accurately answer based on the input you provide (useful) and trying to get it to accurately answer based on whatever may have been in the training data (not so useful)
- h0l0cube 7 months ago
  
  There's no doubt been progress on the way to AGI, but ultimately it's still a search problem, and one that will rely on human ingenuity at least until we solve it. LLMs are such a vast improvement in showing intelligent-like behavior that we've become tantalized by it. So now we're possibly focusing our search in the wrong place for the next innovation on the path to AGI. Otherwise, it's just a lack of compute, and then we just have to wait for the capacity to catch up.
- namaria 7 months ago
  
  A task that is completed and kills us is pretty much the opposite of a Sisyphean task.
- soheil 7 months ago
  
  Really the killing part was not necessary to make your point and thus injecting your Sisyphean prose.
  Any technology may kill us, but we'll keep innovating as we ought to. What's your next point?
- goatlover 7 months ago
  
  Why do we have to?
- idiotsecant 7 months ago
  
  And when we get it there, it kills us.
  
  hex3 7 months ago
  
  [dead]
  
  anothernewdude 7 months ago
  
  [flagged]
  
  jprete 7 months ago
  
  I think you're both right and wrong. You're right that capitalism has become a paperclip machine, but capitalism also wants AI so it can cheaply and at scale replace the human components of the machine with something that has more work capacity for fewer demands.
  
  h0l0cube 7 months ago
  
  The problem is that the people in power will want to maintain the status quo. So the end of human labor won't naturally result in UBI – or any kind of welfare – to compensate for the loss of income, let alone afford any social mobility. But wealthy people will be able to leverage AGI to defend themselves from any uprising by the plebs.
  We're too busy trying to make humans irrelevant, but not asking what exactly we do as a species of 10+ billion individuals do afterwards. There's some excited discussions about a rebirth of culture, but I'm not sure what that means when machines can do anything humans can do but better. Perhaps we just tinker around with our hobbies until we die? I honestly don't think it will play out well for us.
  
  jprete 7 months ago
  
  The problem is that the "we" who are busy trying to make humans irrelevant seem to be completely unconcerned with the effects on the "we" who will be superfluous afterwards.
  
  wsintra2022 7 months ago
  
  Machines can’t have fun for us. They can’t dance to a beat, they can’t experience altered states of mind. They can’t create a sense of belonging through culture and ritual. Yes we have lost a lot in the last 100 years but there are still pockets of resistance that carry old knowledge that “we the people” will be glad of in the coming century.
  
  h0l0cube 7 months ago
  
  It's a similar story around extant ancient/indigenous cultures. And similarly we've seen apathy from elites, especially when indigenous rights get in the way of resource extraction or generating wealth in any way, and also witnessed condescension towards indigenous peoples by large segments of the world population. That's not to detract from the many defenders of indigenous rights, but if we look a the state of how older cultures, designated as 'obsolete' by wider society have been treated, I don't humans will fare well when silicon takes over.
  > They can’t dance to a beat, they can’t experience altered states of mind.
  That's a whole other conversation.
  
  adriand 7 months ago
  
  I think the key is ensuring that “we” get to choose what society looks like in the AGI era. In the world today, even marginalized people have power. Look what happened to Assad. Look at the US - whether you believe they made the right decision or not, working class people were key to Trump’s victory, who may well institute tariffs as a way to protect working class jobs by insulating American industry from global competition. I’m not saying that will be successful, I’m saying that working class people got mad and a political change resulted.
  Similarly I don’t see a world where AGI takes all the jobs and people do not respond by getting pissed off. My fear is that AGI is coupled with oppressive power structures to foreclose the possibility of a revolt. Opaque bureaucracy, total surveillance, fascist or authoritarian leaders, AI-controlled critical infrastructure, diminished and bankrupted free press, AI fake news, toxic social media…it could add up to a very dystopian outcome.
  Democracies could thrive in the AGI era but we need to take many more steps to ensure we protect our societies and keep the interests of citizens paramount. One example is suggested by Harari in his most recent book, namely to ban AI bots from social media on the grounds that we should not permit AI agents to pretend to be citizens in the discussions of the public square.
  
  h0l0cube 7 months ago
  
  > I think the key is ensuring that “we” get to choose what society looks like in the AGI era. In the world today, even marginalized people have power.
  That's a bold assumption. Much of that assumption is predicated on the ability for the masses to revolt.
  > Look what happened to Assad.
  Wait for what will come after. Look at all the Arab Spring revolutions, and you see in their wake a number of dictatorships.
  Anyhow, I'm not saying this is 100% how it's going to play out, but I definitely wouldn't bet against it. Holding all the keys and having all the resources are the wealthy, and the wealthy have no motivation to voluntarily just give up their position in society. And when humans have no value to leverage/be extracted in order to generate more wealth, their will be no way for the vast majority of people to become wealthy. Raw materials will still be valuable however, but, of course, these are controlled by the wealthy. And if those in power wish to gatekeep access to AGI, they can leverage their wealth and resources to automate a military and thus protect the raw materials that keep them in power.
  
  dgfitz 7 months ago
  
  I wonder how Russian and North Korean citizens would feel about a capitalist, representative democracy?
  
  anticorporate 7 months ago
  
  I think they'd have thing or two to say about living under the rule of wealthy elites. We'd do well to listen to them.
  
  dgfitz 7 months ago
  
  I happen to know a lot of wealthy people who aren’t considered elite, nor have a lick of influence on the state of current affairs.
  I don’t think Russians or North Koreans could say the same with a straight face.
  
  astrange 7 months ago
  
  They like it. Russians can leave if they want to.
  
  anothernewdude 7 months ago
  
  Of course you're right, there's something worse, therefore capitalist, unrepresentative democracy is perfect.
  How could I be so naive?
  
  dgfitz 7 months ago
  
  What’s the quote, something like: “democracy and capitalism are horrendous, but they’re better than everything else we tried so far”
  
  anothernewdude 7 months ago
  
  People give communism a bad rap, but the soviets had maybe a quarter the resources, a much smaller population and logistical problems from geography and kept up with the US for decades, outpacing in several areas.
  
  rkatry 7 months ago
  
  [flagged]
  
  falcor84 7 months ago
  
  It seems to me that given how AI is likely to continuously increase capitalism's efficiency, your argument actually supports the claim you're trying to dispute.
  
  sourcepluck 7 months ago
  
  Capitalism is not efficient, it's grabby. Read Bullshit Jobs. Moreover, capitalism isn't interested in efficiency, it's interested in grabbing more stuff. It's relatively effiicient at centralising power and resources into the pockets of shareholders, but that's probably not what you meant.
  I think this is borne out even moreso in recent years, as environmental degradation continues, and we watch as capitalist systems are unable to do anything but continue to efficiently funnel money into the pockets of shareholders.
  The word "efficient" can only plausibly be applied to overly simplified models in fantastical economic theories which don't reflect reality.
  The kind of AI offered by companies like OpenAI may very well be an effective tool at grabbing more stuff though, sure. Or, rather, at convincing everyone they simply must move to this new area, that they control, effectively grabbing that newly created space.
  
  thrwthsnw 7 months ago
  
  The thing that is killing us is the same thing that is killing capitalism
- madeofpalk 7 months ago
  
  What has AGI got to do with this?
  
  mrbungie 7 months ago
  
  Part of the ideas pushed into the narrative by Marketing departments / consultants / hyperscalers to movilize growth in the AI ecosystem.
- jayseattle 7 months ago
  
  [dead]
- khana 7 months ago
  
  [dead]
- ulfw 7 months ago
  
  Why? Nobody asked us if we want this. Nobody has a plan what to do with humanity when there is AGI
  
  goatlover 7 months ago
  
  The plan is to not pay human workers. Never mind what happens to the economy or political landscape.
bloodyplonker22 7 months ago

I am working at an AI company that is not OpenAI. We have found ways to modularize training so we can test on narrower sets before training is "completely done". That said, I am sure there are plenty of ways others are innovating to solve the long training time problem.
- gerdesj 7 months ago
  
  Perhaps the real issue is that learning takes time and that there may not be a shortcut. I'll grant you that argument's analogue was complete wank when comparing say the horse and cart to a modern car.
  However, we are not comparing cars to horses but computers to a human.
  I do want "AI" to work. I am not a luddite. The current efforts that I've tried are not very good. On the surface they offer a lot but very quickly the lustre comes off very quickly.
  (1) How often do you find yourself arguing with someone about a "fact"? Your fact may be fiction for someone else.
  (2) LLMs cannot reason
  A next token guesser does not think. I wish you all the best. Rome was not burned down within a day!
  I can sit down with you and discuss ideas about what constitutes truth and cobblers (rubbish/false). I have indicated via parenthesis (brackets in en_GB) another way to describe something and you will probably get that but I doubt that your programme will.
- icpmacdo 7 months ago
  
  This is literally just the scaling laws, "Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets. This provides an efficient way for practitioners and researchers alike to compare pretraining decisions involving optimizers, datasets, and model architectures"
  https://arxiv.org/html/2410.11840v1#:~:text=Scaling%20laws%2....
merizian 7 months ago

Because of mup [0] and scaling laws, you can test ideas empirically on smaller models, with some confidence they will transfer to the larger model.
[0] https://arxiv.org/abs/2203.03466
fny 7 months ago

O3 is not a smaller model. It's an iterative GPT of sorts with the magic dust of reinforcement learning.
- falcor84 7 months ago
  
  I'm pretty sure that the parent implied that o3 is smaller in comparison to gpt5
cma 7 months ago

>the time it takes it to learn what works/doesn't work widens.
From the raw scaling laws we already knew that a new base model may peter out in this run or the next with some amount of uncertainty--"the intersection point is sensitive to the precise power-law parameters":
https://gwern.net/doc/ai/nn/transformer/gpt/2020-kaplan-figu...
Later graph gpt-3 got to here:
https://gwern.net/doc/ai/nn/transformer/gpt/2020-brown-figur...
https://gwern.net/scaling-hypothesis
dyauspitr 7 months ago

Until you get to a point where the LLM is smart enough to look at real world data streams and prune its own training set out of it. At that point it will self improve itself to AGI.
soheil 7 months ago

It's like saying bacteria reproduction is way faster than humans so that's where we should be looking for the next breakthroughs.
ramesh31 7 months ago

But if the scaling law holds true, more dollars should at some point translate into AGI, which is priceless. We haven't reached the limits yet of that hypothesis.
- unshavedyak 7 months ago
  
  > which is priceless
  This also isn't true. It'll clearly have a price to run. Even if it's very intelligent, if the price to run it is too high it'll just be a 24/7 intelligent person that few can afford to talk to. No?
  
  pbhjpbhj 7 months ago
  
  Computers will be the size of data centres, they'll be so expensive we'll queue up jobs to run on them days in advance, each taking our turn... history echoes into the future...
  
  unshavedyak 7 months ago
  
  Yea, and those statements were true. For a time. If you want to say "AGI will be priceless some unknown time into the future" then i'd be on board lol. But to imply it'll be immediately priceless? As in no cost spent today wouldn't be immediately rewarded once AGI exists? Nonsense.
  Maybe if it was _extremely_ intelligent and it's ROI would be all the drugs it would instantly discover or w/e. But lets not imply that General Intelligence requires infinitely knowing.
  So at best we're talking about an AI that is likely close to human level intelligence. Which is cool, because we have 7+ billion of those things.
  This isn't an argument against it. Just to say that AGI isn't "priceless" in the implementation we'd likely see out of the gate.
- threeseed 7 months ago
  
  a) There is evidence e.g. private data deals that we are starting to hit the limitations of what data is available.
  b) There is no evidence that LLMs are the roadmap to AGI.
  c) Continued investment hinges on their being a large enough cohort of startups that can leverage LLMs to generate outsized returns. There is no evidence yet this is the case.
  
  eru 7 months ago
  
  > c) Continued investment hinges on their being a large enough cohort of startups that can leverage LLMs to generate outsized returns. There is no evidence yet this is the case.
  Why does it have to be startups? And why does it have to be LLMs?
  Btw, we might be running out of text data. But there's lots and lots more data you can have (and generate), if you are willing to consider other modalities.
  You can also get a bit further with text data by using it for multiple epochs, like we used to do in the past. (But that only really gives you at best an order of magnitude. I read some paper that the returns diminish drastically after four epochs.)
  
  thrwthsnw 7 months ago
  
  Private data is 90% garbage too
  
  ComplexSystems 7 months ago
  
  "There is no evidence that LLMs are the roadmap to AGI." - There's plenty of evidence. What do you think the last few years have been all about? Hell, GPT-4 would already have qualified as AGI about a decade ago.
  
  coldtea 7 months ago
  
  >What do you think the last few years have been all about?
  Next token language-based predictors with no more intelligence than brute force GIGO which parrot existing human intelligence captured as text/audio and fed in the form of input data.
  4o agrees:
  "What you are describing is a language model or next-token predictor that operates solely as a computational system without inherent intelligence or understanding. The phrase captures the essence of generative AI models, like GPT, which rely on statistical and probabilistic methods to predict the next piece of text based on patterns in the data they’ve been trained on"
  
  thrwthsnw 7 months ago
  
  Everything you said is parroting data you’ve trained on, two thirds of it is actual copy paste
  
  mrbungie 7 months ago
  
  He probably didn't need petabytes of reddit posts and millions of gpu-hours to parrot that though.
  I still don't buy the "we do the same as LLMs" discourse. Of course one could hypothesize the human brain language center may have some similarities to LLMs, but the differences in resource usage and how those resources are used to train humans and LLMs are remarkable and may indicate otherwise.
  
  shwouchk 7 months ago
  
  Not text, he had petabytes of video, audio, and other sensory inputs. Heck, a baby sees petabytes of video before first word is spoken
  And he probably cant quote Shakespeare as well ;)
  
  coldtea 7 months ago
  
  >Not text, he had petabytes of video, audio, and other sensory inputs. Heck, a baby sees petabytes of video before first word is spoken
  A 2-3 year old baby could speak in a rural village in 1800, having just seen its cradle (for the first month/s), and its parents' hut for some more months, and maybe parts of the village afterwards.
  Hardly "petabytes of training video" to write home about.
  
  shwouchk 7 months ago
  
  you are funny. Clearly your expertise with babies comes from reading books about history or science, rather than ever having interacted with one…
  What resolution of screen do you think you would need to not distinguish from reality? For me personally i very conservatively estimate it to be on above OOM of 10 4k screens by 10, meaning 100k screens. If a typical 2h 4k is ~50gb uncompressed, that gives us about half a petabyte per 24h (even with eyes closed). Just raw unlabeled vision data.
  Probably a baby has a significantly lower resolution, but then again what is the resolution from the skin and other organs?
  So yes, petabytes of data within the first days of existence - well, likely before even being born since baby can hear inside the uterus, for example.
  And very high signal data, as you’ve stated yourself (nothing to write home about) mainly seeing mom and dad, as well as from a feedback loop POV - a baby never tells you it is hungry subtly.
  
  Jensson 7 months ago
  
  > he had petabytes of video, audio, and other sensory inputs
  He didn't parrot a video or sensory inputs though.
  
  shwouchk 7 months ago
  
  No, they don’t - they don’t have the hardware, yet. But they do parrot sensory output to eg muscles that induce the expected video sensory inputs in response, in a way that mimics the video input of “other people doing things”.
  
  mrbungie 7 months ago
  
  And yet with multiple OoM more data he still didn't cost millions of dollars to be trained nor multiple lifetimes in gpu-hours. He probably didn't even register all the petabytes passing through all his "sensors", those are some characteristics that we are not even near understanding and much less replicating.
  Whatever is happening in the brain is more complex as the perf/cost ratio is stupidly better for humans for a lot of tasks in both training and inference*.
  *when considering all modalities, o3 can't even do the ARC AGI in vision mode but rather just json representations. So much for omni.
  
  coldtea 7 months ago
  
  >Everything you said is parroting data you’ve trained on
  "Just like" an LLM, yeah sure...
  Like how the brain was "just like" a hydraulic system (early industrial era), like a clockwork with gears and differentiation (mechanical engineering), "just like" an electric circuit (Edison's time), "just like" a computer CPU (21st century), and so on...
  You're just assuming what you should prove
  
  ComplexSystems 7 months ago
  
  What do you think "AGI" is supposed to be?
  
  zmgsabst 7 months ago
  
  o1 points out this is mostly about “if submarines swim”.
  https://chatgpt.com/share/6768c920-4454-8000-bf73-0f86e92996...
  
  resters 7 months ago
  
  This comment isn't false but it's very naive.
  
  Eisenstein 7 months ago
  
  You have described something but you haven't explained why the description of the thing defines its capability. This is a tautology, or possibly a begging of the question, which takes as true the premise of something (that token based language predictors cannot be intelligent) and then uses that premise to prove an unproven point (that language models cannot achieve intelligence).
  You did nothing at all to demonstrate why you cannot produce an intelligent system from a next token language based predictor.
  What GPT says about this is completely irrelevant.
  
  coldtea 7 months ago
  
  >You did nothing at all to demonstrate why you cannot produce an intelligent system from a next token language based predictor
  Sorry, but the burden of proof is on your side...
  The intelligence is in the corpus the LLM was fed with. Using statistics to pick from it and re-arrange it gives new intelligent results because the information was already produced by intelligent beings.
  If somebody gives you an excerpt of a book, it doesn't mean they have the intelligence of the author - even if you have taught them a mechanical statistical method to give back a section matching a query you make.
  Kids learn to speak and understand language at 3-4 years old (among tons of other concepts), and can reason by themselves in a few years with less than 1 billionth the input...
  >What GPT says about this is completely irrelevant.
  On the contrary, it's using its very real intelligence, about to reach singularity any time now, and this is its verdict!
  Why would you say it's irrelevant? That would be as if it merely statistically parroted combinations of its training data unconnected to any reasoning (except of that the human creators of the data used to create them) or objective reality...
  
  Eisenstein 7 months ago
  
  Let's pretend it is 1940
  Person 1: rockets could be a method of putting things into Earth orbit
  Person 2: rockets cannot get things into orbit because they use a chemical reaction which causes an equal and opposite force reaction to produce thrust'
  Does person 1 have the burden of proof that rockets can be used to put things in orbit? Sure, but that doesn't make the reasoning used by person 2 valid to explain why person 1 is wrong.
  BTW thanks for adding an entire chapter to your comment in edit so it looks like I am ignoring most of it. What I replied to was one sentence that said 'the burden of proof is on you'. Though it really doesn't make much difference because you are doing the same thing but more verbose this time.
  None of the things you mentioned preclude intelligence. You are telling us again how it operates but not why that operation is restrictive in producing an intelligent output. There is no law that saws that intelligence requires anything but a large amount of data and computation. If you can show why these things are not sufficient, I am eager to read about it. A logical explanation would be great, step by step please, without making any grand unproven assumptions.
  In response to the person below... again, whether or not person 1 is right or wrong does not make person 2's argument valid.
  
  coldtea 7 months ago
  
  It's not like we discovered hot air ballons, and some people think we'll get to Moon and Mars with them...
  > Does person 1 have the burden of proof that rockets can be used to put things in orbit? Sure, but that doesn't make the reasoning used by person 2 valid to explain why person 1 is wrong.
  The reasoning by person 2 doesn't matter as much if 1 is making an ubsubstantiated claim to begin with.
  >There is no law that saws that intelligence requires anything but a large amount of data and computation. If you can show why these things are not sufficient, I am eager to read about it.
  Errors with very simple stuff while getting higher order stuff correct shows that this is not actual intelligence matching the level of performance exhibited, i.e. no understanding.
  No person who can solve higher level math (like an LLM answering college or math olympiad questions) is confused by the kind of simple math blind spots that confuse LLMs.
  A person understanding higher level math, would never (and even less so, consistently) fail a problem like:
  "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?"
  https://arxiv.org/pdf/2410.05229
  (of course with these problems exposed, they'll probably "learn" to overfit it)
  
  Eisenstein 7 months ago
  
  > The reasoning by person 2 doesn't matter as much if 1 is making an ubsubstantiated claim to begin with.
  But it doesn't make person 2's argument valid.
  Everyone here is looking at the argument by person 1 and saying 'I don't agree with that, so person 2 is right!'.
  That isn't how it works... person 2 has to either shut up and let person 1 be wrong in a way that is wrong, but not for the reasons they think, or they need to examine their assumptions and come up with a different reason.
  No one is helped by turning critical thinking into team sports where the only thing that matters is that your side wins.
  
  ViewTrick1002 7 months ago
  
  The delta-V for orbit is a precisely defined point. How you get there is not.
  What is the defined point for reaching AGI?
  
  Eisenstein 7 months ago
  
  I can check but I am pretty sure that using a different argument to try and prove something is wrong will not make another person's invalid argument correct.
  
  Jensson 7 months ago
  
  Person 3: Since we can leave earths orbit, we can reach faster than light speed, look at this graph over our progress making faster rockets we will for sure reach there in a few years!
  
  Eisenstein 7 months ago
  
  So there is a theoretical framework which can be tested against to achieve AGI and according to that framework it is either not possible or extremely unlikely because of physical laws?
  Can you share that? It sounds groundbreaking!
  
  Terr_ 7 months ago
  
  The people who claim we'll have sentient AI soon are the ones making the extraordinary claims. Let them furnish the extraordinary evidence.
  
  Eisenstein 7 months ago
  
  So, I think people in this thread, including me, have been talking past each other a bit. I do not claim that sentient AI will emerge. I am arguing that the person who is saying that it can't happen for a specific reason is not considering that the reason they are stating implicitly is that nothing can be greater than the sum of its parts.
  Describing how an LLM operates and how it was trained does not preclude the LLM from ever being intelligent, and it almost certainly will not become intelligent, but you cannot say that it didn't for the reasons the person I am arguing with is saying, which is that intelligence can not come from something that works statistically on a large corpus of data written by people.
  A thing can be more than the sum of its parts. You can take the English alphabet, which is 26 letters, and arrange those letters along with some punctuation to make an original novel. If you don't agree that means that you can get something greater than what defines it components, then you would have to agree that there are no original novels because they are composed of letters which were already defined.
  So in that way, the model is not unable to think because it is composed of thoughts already written. That is not the limiting factor.
  
  Terr_ 7 months ago
  
  > If somebody gives you an excerpt of a book, it doesn't mean they have the intelligence of the author
  A closely related rant of my own: The fictional character we humans infer from text is not the author-machine generating that text, not even if they happen to share the same name. Assuming that the author-machine is already conscious and choosing to insert itself is begging the question.
  
  idiotsecant 7 months ago
  
  Have you ever heard of a local maxima? You don't get an attack helicopter by breeding stronger and stronger falcons.
  
  lolinder 7 months ago
  
  For an industry that spun off of a research field that basically revolves around recursive descent in one form or another, there's a pretty silly amount of willful ignorance about the basic principles of how learning and progress happens.
  The default assumption should be that this is a local maximum, with evidence required to demonstrate that it's not. But the hype artists want us all to take the inevitability of LLMs for granted—"See the slope? Slopes lead up! All we have to do is climb the slope and we'll get to the moon! If you can't see that you're obviously stupid or have your head in the sand!"
  
  zmgsabst 7 months ago
  
  You’re implicitly assuming only a global maximum will lead to useful AI.
  There might be many local maxima that cross the useful AI or even AGI threshold.
  
  eru 7 months ago
  
  And we aren't even at a local maximum. There's still plenty of incremental upwards progress to be made.
  
  lolinder 7 months ago
  
  I never said anything about usefulness, and it's frustrating that every time I criticize AGI hype people move the goalposts and say "but it'll still be useful!"
  I use GitHub Copilot every day. We already have useful "AI". That doesn't mean that the whole thing isn't super overhyped.
  
  int_19h 7 months ago
  
  So far we haven't even climbed this slope to the top yet. Why don't we start there and see if it's high enough or not first? If it's not, at the very least we can see what's on the other side, and pick the next slope to climb.
  Or we can just stay here and do nothing.
  
  gwervc 7 months ago
  
  No, GPT-4 would have been classified as it is today: a (good) generator of natural language. While this is a hard classical NLP task, it's a far cry from intelligence.
  
  falcor84 7 months ago
  
  GPT-4 is a good generator of natural language in the same sense that Google is a good generator of ip packets.
  
  n144q 7 months ago
  
  > GPT-4 would already have qualified as AGI about a decade ago.
  Did you just make that up?
  
  wat10000 7 months ago
  
  A lot of people held that passing the Turing Test would indicate human-level intelligence. GPT-4 passes.
  
  bigpingo 7 months ago
  
  Link to GPT-4 passing the turing test? Tried googling, could not find anything.
  
  wat10000 7 months ago
  
  Google must be really going downhill. DDG “gpt turing test” provides nothing but relevant links. Here’s a paper: https://arxiv.org/pdf/2405.08007
  
  OtomotO 7 months ago
  
  Probably asked an "AI"
  
  OtomotO 7 months ago
  
  The last four years?
  ELIZA 2.0
  
  zifpanachr23 7 months ago
  
  I agree, these are good points.
  
  aantix 7 months ago
  
  Have we really hit the wall?
  Do they use GPS based data?
  Feels like there’s data all around us.
  Sure they’ve hit the wall with obvious conversations and blog articles that humans produced, but data is a by product of our environment. Surely there’s more. Tons more.
  
  threeseed 7 months ago
  
  We also could just measure the background noise of the universe and produce unlimited data.
  But just like GPS data it isn't suited for LLMs given that you know it has no relevance what so ever to language.
  
  eru 7 months ago
  
  Ignoring the confusion about 'GPS' for a moment: there's lots and lots of other data that could be used for training AI systems.
  But, you need to go multi-modal for that; and you need to find data that's somewhat useful, not just random fluctuations like the CMB. So eg you could use YouTube videos, or even just point webcams at the real world. That might be able to give your AI a grounding in everyday physics?
  There's also lots of program code you can train your AI on. Not so much the code itself, because compared to the world's total text (that we are running out of), the world's total human written code is relatively small.
  But you can generate new code and make it useful for training, by also having the AI predict what happens when you (compile and) run the code. A bit like self-playing for improving AlphaGo.
  
  aantix 7 months ago
  
  You’re thinking of language in the strictest of sense.
  GPS data as it relates to location names, people, cultures, path finding.
  
  eru 7 months ago
  
  What does culture and names and people have to do with the Global Position System?
  You are right that we can have lots more data, if you are willing to consider other modalities. But that's not 'GPS'. Unless you are using an idiosyncratic definition of GPS?

Animats 7 months ago

"Orion’s problems signaled to some at OpenAI that the more-is-more strategy, which had driven much of its earlier success, was running out of steam."

So LLMs finally hit the wall. For a long time, more data, bigger models, and more compute to drive them worked. But that's apparently not enough any more.

Now someone has to have a new idea. There's plenty of money available if someone has one.

The current level of LLM would be far more useful if someone could get a conservative confidence metric out of the internals of the model. This technology desperately needs to output "Don't know" or "Not sure about this, but ..." when appropriate.

simonw 7 months ago

The new idea is inference-time scaling, as seen in o1 (and o3 and Qwen's QwQ and DeepSeek's DeepSeek-R1-Lite-Preview and Google's gemini-2.0-flash-thinking-exp).
I suggest reading these two pieces about that:
- https://www.aisnakeoil.com/p/is-ai-progress-slowing-down - best explanation I've seen of inference scaling anywhere
- https://arcprize.org/blog/oai-o3-pub-breakthrough - François Chollet's deep dive into o3
I've been tracking it on this tag on my blog: https://simonwillison.net/tags/inference-scaling/
- exhaze 7 months ago
  
  I think the wildest thing is actually Meta’s latest paper where they show a method for LLMs reasoning not in English, but in latent space
  https://arxiv.org/pdf/2412.06769
  I’ve done research myself adjacent to this (mapping parts of a latent space onto a manifold), but this is a bit eerie, even to me.
  
  ynniv 7 months ago
  
  Is it "eerie"? LeCun has been talking about it for some time, and may also be OpenAI's rumored q-star, mentioned shortly after Noam Brown (diplomacybot) joining OpenAI. You can't hill climb tokens, but you can climb manifolds.
  
  exhaze 7 months ago
  
  I wasn’t aware of others attempting manifolds for this before - just something I stumbled upon independently. To me the “eerie” part is the thought of an LLM no longer using human language to reason - it’s like something out of a sci fi movie where humans encounter an alien species that thinks in a way that humans cannot even comprehend due to biological limitations.
  I am hopeful that progress in mechanistic interpretability will serve as a healthy counterbalance to this approach when it comes to explainability.. though I kinda worry that at a certain point it may be that something resembling a scaling law puts an upper bound on even that.
  
  joegibbs 7 months ago
  
  Is it really alien or is it more similar to how we think? We don't think purely in language, it's more a kind of soup of language, sounds, images, emotions and senses that we then turn into language when we communicate with each other.
  
  qnleigh 7 months ago
  
  > it’s like something out of a sci fi movie where humans encounter an alien species that thinks in a way that humans cannot even comprehend due to biological limitations.
  I've increasingly felt this since GPT2 wrote that news piece about unicorns back in 2019. These models are still so mysterious, when you think about it. They can often solve decently complex math problems, but routinely fail at counting. Many have learned surprising skills like chess, but only when prompted in very specific ways. Their emergent abilities constantly surprise us and we have no idea how they really work internally.
  So the idea that they reason using something other than human language feels unsurprising, but only because everything about it is surprising.
  
  sooheon 7 months ago
  
  I remember (apocryphal?) Microsoft's chatbot developing pidgin to communicate to other chatbots. Every layer of the NN except the first and last already "think" in latent space, is this surprising?
  
  madethisnow 7 months ago
  
  Interesting paper on this. "Automated Search for Artificial Life" https://sakana.ai/asal/
  
  Y_Y 7 months ago
  
  > You can't hill climb tokens, but you can climb manifolds.
  Could you explain this a bit please?
  
  sebzim4500 7 months ago
  
  I imagine he means that when you reason in latent space the final answer is a smooth function of the parameters, which means you can use gradient descent to directly optimize the model to produce a desired final output without knowing the correct reasoning steps to get there.
  When you reason in token space (like everyone is doing now) you are executing nonlinear functions when you sample after each token, so you have to use some kind of reinforcement learning algorithm to learn the weights.
  
  Y_Y 7 months ago
  
  I think there's a subtlety here about what makes (e.g. English) tokens different to points in latent space. Everything is still differentiable (at least in the ML sense) until you do random sampling. Even then you can exclude the sampling when calculating the gradient (or is this equivalent to the "manifold"?).
  I don't see a priori why it would be better or worse to reason with the "superposition" of arguments in the pre-sampling phase rather than concrete realizations of those arguments found only after choosing the token. It may well be a contingent rather than necessary fact.
  
  ynniv 7 months ago
  
  Links to Yan:
  Title: "Objective Driven AI: Towards Machines that can Learn, Reason, and Plan"
  Lytle Lecture Page: https://ece.uw.edu/news-events/lytle-lecture-series/
  Slides: https://drive.google.com/file/d/1e6EtQPQMCreP3pwi5E9kKRsVs2N...
  Video: https://youtu.be/d_bdU3LsLzE?si=UeLf0MhMzjXcSCAb
  
  danielmarkbruce 7 months ago
  
  It's just concept space. The entire LLM works in this space once the embedding layer is done. It's not really that novel at all.
  
  ttul 7 months ago
  
  This was my thought. Literally everything inside a neural network is a “latent space”. Straight from the embeddings that you use to map categorical features in the first layer.
  Latent space is where the magic literally happens.
  
  madethisnow 7 months ago
  
  Completely agree. Have you see this?
  https://sakana.ai/asal/
  
  asadalt 7 months ago
  
  kinda how we do it. language is just an io interface(but also neural obv) on top of our reasoning engine.
  
  oceanparkway 7 months ago
  
  It’s not just a protocol buffer for concepts though (weak wharf Sapir, lakoff’s ubiquitous metaphors). Language itself is also a concept layer and plasticity and concept development is bidirectional. But (I’m not very versed in the language here re ‘latent space’) I would imagine the forward pass through layers converges towards near-token-matches before output, so you have very similar reason to token/language reasoning even in latent/conceptual reasoning? Like the neurons that nearly only respond to a single token for ex.
  
  rhubarbtree 7 months ago
  
  Seems a standard approach of AI research is to “move X into the latent space” where X is some useful function (eg diffusion) previously done in the “data” or “artefact” space. So seems very pedestrian not wild to make that step.
  
  mountainriver 7 months ago
  
  There are lots of papers that do this
mnk47 7 months ago

> So LLMs finally hit the wall
Not really. Throwing a bunch of unfiltered garbage at the pretraining dataset, throwing in RLHF of questionable quality during post-training, and other current hacks - none of that was expected to last forever. There is so much low-hanging fruit that OpenAI left untouched and I'm sure they're still experimenting with the best pre-training and post-training setups.
One thing researchers are seeing is resistance to post-training alignment in larger models, but that's almost the opposite of a wall, they're figuring it out as well.
> Now someone has to have a new idea
OpenAI already has a few, namely the o* series in which they discovered a way to bake Chain of Thought into the model via RL. Now we have reasoning models that destroy benchmarks that they previously couldn't touch.
Anthropic has a post-training technique, RLAIF, which supplants RLHF,and it works amazingly well. Combined with countless other tricks we don't know about in their training pipeline, they've managed to squeeze so much performance out of Sonnet 3.5 for general tasks.
Gemini is showing a lot of promise with their new Flash 2.0 and Flash 2.0-Thinking models. They're the first models to beat Sonnet at many benchmarks since April. The new Gemini Pro (or Ultra? whatever they call it now) is probably coming out in January.
> The current level of LLM would be far more useful if someone could get a conservative confidence metric out of the internals of the model. This technology desperately needs to output "Don't know" or "Not sure about this, but ..." when appropriate.
You would probably enjoy this talk [0], it's by an independent researcher who IIRC is a former employee of Deepmind or some other lab. They're exploring this exact idea. It's actually not hard to tell when a model is "confused" (just look at the probability distribution of likely tokens), the challenge is in steering the model to either get back to the right track or give up and say "you know what, idk"
[0] https://www.youtube.com/watch?v=4toIHSsZs1c
- NitpickLawyer 7 months ago
  
  > Not really. Throwing a bunch of unfiltered garbage at the pretraining dataset, throwing in RLHF of questionable quality during post-training, and other current hacks - none of that was expected to last forever. There is so much low-hanging fruit that OpenAI left untouched and I'm sure they're still experimenting with the best pre-training and post-training setups.
  Exactly! LLama3 and their .x iterations have shown that, at least for now, the idea of using the previous models to filter out the pre-training datasets and use a small amount of seeds to create synthetic datasets for post-training still holds. We'll see with L4 if it continues to hold.
az226 7 months ago

The problem is data.
GPT-3 was trained on 4:1 ratio of data to parameters. And for GPT-4 the ratio was 10:1. So to scale this out, GPT-5 should be 25:1. The parameter count jumped from 175B to 1.3T, which means GPT-5 should be 10T parameters and 250T training tokens. There is zero chance OpenAI has a training set of high quality data that is 250T tokens.
If I had to guess, they trained a model that was maybe 3-4T in size and used 30-50T high quality tokens and maybe 10-30 medium and low quality ones.
There is only one company in the world that stores the data that could get us past the wall.
The training cost of the above scaled GPT-5 is 150x GPT-4, which was 25k A100 for 90 days, which poor MFU.
Let’s assume they double MFU, it would mean 1M H100s. But let’s say they made algorithmic improvements, so maybe it’s only 250-500k H100s.
While the training cluster size was 100k and then grew to 150k, this cluster is suggestive of a smaller model and less data.
But ultimately data is the bottleneck.
- int_19h 7 months ago
  
  We're also increasingly using synthetic data to train them on, though, and the race now is in coming up with better ways to generate it.
- ssl-3 7 months ago
  
  Links?
briga 7 months ago

What wall? Not a week has gone by in recent years without an LLM breaking new benchmarks. There is little evidence to suggest it will all come to a halt in 2025.
- jrm4 7 months ago
  
  Sure, but "benchmarks" here seems roughly as useful as "benchmarks" for GPUs or CPUs, which don't much translate to what the makers of GPT need, which is 'money making use cases.'
- peepeepoopoo98 7 months ago
  
  O3 has demonstrated that OpenAI needs 1,000,000% more inference time compute to score 50% higher on benchmarks. If O3-High costs about $350k an hour to operate, that would mean making O4 score 50% higher would cost $3.5B (!!!) an hour. That scaling wall.
  
  norir 7 months ago
  
  I used to run a lot of monte carlo simulations where the error is proportional to the inverse square root. There was a huge advantage of running for an hour vs a few minutes, but you hit the diminishing returns depressingly quickly. It would not surprise me at all if llms end up having similar scaling properties.
  
  riku_iki 7 months ago
  
  And I suspect o3 is something like monte carlo: generates tons of CoTs, with most of them are junk, but some hit the answer.
  
  exhaze 7 months ago
  
  Sounds plausible given I’ve recently observed a ton of research papers in the space that in some way or another incorporate MCTS
  
  LegionMammal978 7 months ago
  
  Yeah, any situation you need O(n^2) runtime to obtain n bits of output (or bits of accuracy, in the Monre Carlo case) is pure pain. At every point, it's still within your means to double the amount of output (by running it 3x longer than you have so far), but it gradually becomes more and more painful, instead of there being a single point where you can call it off.
  
  oceanplexian 7 months ago
  
  I’m convinced they’re getting good at gaming the benchmarks since 4 has deteriorated via ChatGPT, in fact I’ve used 4-0125 and 4-1106 via the API and find them far superior to o1 and o1-mini at coding problems. GPT4 is an amazing tool but the true capabilities are being hidden from the public and/or intentionally neutered.
  
  CSMastermind 7 months ago
  
  > I’ve used 4-0125 and 4-1106 via the API and find them far superior to o1 and o1-mini at coding problems
  Just chiming in to say you're not alone. This has been my experience as well. The o# line of models just don't do well at coding, regardless of what the benchmarks say.
  
  didibus 7 months ago
  
  All the benchmarks provide substantial scaffolding and specification details, and that's if they are zero-shot at all, which they often are not. In reality, nobody wants to spend as much time providing so much details or examples just to get the AI to write the correct function, when that same time and effort you'd have used to write it yourself.
  Also, those benchmarks often run the model K times on the same question, and if any one of them is correct, they say it passed. That could mean if you re-ran the model 8 times, it might come up with the right answer only once. But now you have to waste your time checking if it is right or not.
  I want to ask: "Write a function to count unique numbers in a list" and get the correct answer the first time.
  What you need to ask:
  """ Write a Python function that takes a list of integers as input and returns the count of numbers that appear exactly once in the list.
  The function should: - Accept a single parameter: a list of integers - Count elements that appear exactly once - Return an integer representing the count - Handle empty lists and return 0 - Handle lists with duplicates correctly
  Please provide a complete implementation. """
  And run it 8 times and if you're lucky it'll get it correct zero-shot.
  Edit: I'm not even aware of a Pass@1, zero-shot, and without detailed prompting (natural prompting) benchmark. If anyone knows one let me know.
  
  Kuinox 7 months ago
  
  Wait a few month and they will have a distilled model with the same performance and 1% of the run cost.
  
  peepeepoopoo98 7 months ago
  
  100X efficiency improvement (doubtful) still means that costs grow 200X faster than benchmark performance.
  
  achierius 7 months ago
  
  Even assuming that past rates of inference cost scaling hold up, we would only expect a 2 OoM decrease after about a year or so. And 1% of 3.5b is still a very large number.
  
  popcorncowboy 7 months ago
  
  And to your point "past performance is not indicative of future results". The extrapolate to infinity approach is the mindfever of this field.
  
  og_kalu 7 months ago
  
  Not really. o3-low compute still stomps the benchmarks and isn't anywhere that expensive and o3-mini seems better than o1 while being cheaper.
  Combine that with the fact that LLM inference has reduced orders of magnitudes in cost the last few years and hampering over the inference costs of a new release seems a bit silly.
  
  riku_iki 7 months ago
  
  If you are talking about ARC benchmark, then o3-low doesn't look that special if you take into account there are plenty of finetuned models with much smaller resources achieved 40-50% results on private set (not semi-private like o3-low).
  
  og_kalu 7 months ago
  
  - I'm not just talking about ARC. On frontier Math, we have 2 scores, one with pass@1 and another with consensus vote with 64 samples. Both scores are much better than previous Sota.
  - Also apparently, ARC wasn't a special fine-tune but rather some of the training set in the corpus for pre-training.
  
  riku_iki 7 months ago
  
  > On frontier Math
  that result is not verifiable, not reproducable, unknown if it was leaked and how it was measured. Its kinda hype science.
  > ARC wasn't a special fine-tune but rather some of the training set in the corpus for pre-training.
  post says: Note on "tuned": OpenAI shared they trained the o3 we tested on 75% of the Public Training set. They have not shared more details.
  So, I guess we don't know.
  
  og_kalu 7 months ago
  
  >that result is not verifiable, not reproducable, unknown if it was leaked and how it was measured. Its kinda hype science.
  It will be verifiable when the model is released. Open ai haven't released any benchmark scores that were shown falsified later so unless you have an actual reason to believe they're outright lying then it's not something to take seriously.
  Frontier Math is a private benchmark with its highest tier of difficulty Terrence Tao says:
  “These are extremely challenging. I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages…”
  Unless you have a reason to believe answers were leaked then again, not interested in baseless speculation.
  
  riku_iki 7 months ago
  
  > Open ai haven't released any benchmark scores
  there are multiple research results demonstrating that various benchmarks are heavily leaked to GPT training data.
  Is it intentionally or not, we can't figure out, but they have very strong incentive to cheat to get more investments.
  > Unless you have a reason to believe answers were leaked then again, not interested in baseless speculation.
  this is scientific methodology when results have to be reproduced or confirmed before believed.
  
  og_kalu 7 months ago
  
  Again, Frontier Math is private. Benchmarks leaked to GPT-4 are all public datasets on the internet. Frontier Math literally cannot leak that way.
  If you don't want to take the benchmarks at face value then good for you but this entire conversation is pointless.
  
  riku_iki 7 months ago
  
  > Again, Frontier Math is private.
  its private for outsiders, but it was developed in "collaboration" with OAI, and GPT was tested in the past on it, so they have it in logs somewhere.
  > If you don't want to take the benchmarks at face value then good for you but this entire conversation is pointless.
  If you think this entire conversation is pointless, then why do you continue?
  
  og_kalu 7 months ago
  
  >its private for outsiders, but it was developed in "collaboration" with OAI, and GPT was tested in the past on it, so they have it in logs somewhere.
  They have logs of the questions probably but that's not enough. Frontier Math isn't something that can be fully solved without gathering top experts at multiple disciplines. Even Tao says he only knows who to ask for the most difficult set.
  Basically, what you're suggesting at least with this benchmark in particular is far more difficult than you're implying.
  >If you think this entire conversation is pointless, then why do you continue?
  There's no point arguing about how efficient the models are being (the original point) if you won't even accept the results of the benchmarks. Why i'm continuing ? For now, it's only polite to clarify.
  
  riku_iki 7 months ago
  
  > Frontier Math isn't something that can be fully solved without gathering top experts
  Tao's quote above referred on hardest 20% problems, they have 3 levels of difficulty, presumably first level is much easier. Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.
  > There's no point arguing
  Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.
  
  og_kalu 7 months ago
  
  The lowest set is easier but still incredibly difficult. Top experts are no longer required sure but that's it. You'll still need the best of the best undergrads at the very least to solve it.
  >Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.
  Open AI didn't have any hand in providing problems, why you assume they have the solutions I have no idea.
  >Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.
  Are you just bring obtuse or what ? I stopped arguing with you a couple responses ago. You have doubts? good for you. They don't make much sense but hey, good for you.
  This is my last response here so have a nice day.
  
  riku_iki 7 months ago
  
  > You'll still need the best of the best undergrads at the very least to solve it.
  Ok, so I hope you admit that OAI could manually solve them now?
  > Open AI didn't have any hand in providing problem
  And you know this exactly how?
  > I stopped arguing with you a couple responses ago
  sure, of course, lmao
  
  mrbungie 7 months ago
  
  It is still not economical: in Arc at least 20 usd for task vs ~3 usd for a human (avg mturker) for the same perf.
  
  og_kalu 7 months ago
  
  Not necessarily. And this is the problem with ARC that people seem to forget.
  - It's just a suite of visual puzzles. It's not like say GSM8K where proficiency in it gives some indication on Math proficiency in general.
  - It's specifically a suite of puzzles that LLMs have shown particular difficulty in.
  Basically how much compute it takes to handle a task in this benchmark does not correlate with how much it will take LLMs to compute tasks that people actually want to use LLMs for.
  
  mrbungie 7 months ago
  
  If the benchmark is not representative of normal usage* then the benchmark and the plot being shown are not useful at all from a user/business perspective and the focus on the breakthrough scores of o3-low and o3-high in ARC-AGI would be highly misleading. And also the "representative" point is really moot from the discussion perspective (i.e. saying o3 stomps benchmarks, but the benchmarks aren't representative).
  *I don't think that is the case as you can at least make relative conclusions (i.e. o3 vs o1 series, o3-low is 4x to 20x the cost for ~3x the perf). Even if it is pure marketing they expect people to draw conclusions using the perf/cost plot from Arc.
  PS: I know there are more benchmarks like SWE-Bench and Frontier Math, but this is the only one showing data about o3-low/high costs without considering the CodeForces plot that includes o3-mini (that one does look interesting, though right now is vaporware) but does not separate between compute scale modes.
  
  og_kalu 7 months ago
  
  >If the benchmark is not representative of normal usage* then the benchmark and the plot being shown are not useful at all from a user/business perspective and the focus on the breakthrough scores of o3-low and o3-high in ARC-AGI would be highly misleading.
  ARC is a very hyped benchmark in the industry so letting us know the results is something any company would do whether it had a direct representation on normal usage or not.
  >Even if it is pure marketing they expect people to draw conclusions using the perf/cost plot from Arc.
  Again, people care about ARC, they don't care doing the things ARC questions ask. That it is un-economical to pay the price to use o3 for ARC does not mean it would be un-economical to do so for the tasks people actually want to use LLMs for. What does 3x the performance in say coding mean? You really think companies/users wouldn't put up with the increased price for that? You think they have Mturkers to turn to like they do with ARC?
  ARC is literally the quintessential 'easy for humans, hard for ai' benchmark. Even if you discard the 'difficulty to price won't scale the same' argument, it makes no sense to use it for an economics comparison.
  
  mrbungie 7 months ago
  
  In summary: so the "stomps benchmarks" means nothing for anyone trying to make decisions on that announcement (yet they show cost/perf info). It seems, hipey.
whoisthemachine 7 months ago

Unfortunately, the best they can do is "This is my confidence on what someone would say given the prior context".
- sooheon 7 months ago
  
  What someone from the past would have said.
synapsomorphy 7 months ago

The new idea is already here and it's reasoning / chain of thought.
Anecdotally Claude is pretty good at knowing the bounds of its knowledge.
- threeseed 7 months ago
  
  Anecdotally Claude is just as bad as every other LLM.
  Step into more niche areas e.g. I am trying to use it with Scala macros and at least 90% of the time it is giving code that either (a) fails to compile or (b) is just complete gibberish.
  And at no point ever has it said it didn't know something.
  
  mrbungie 7 months ago
  
  Yep, get into any sufficiently deep niche (i.e. actually almost any non-trivial app) and the LLM magic fades off.
  Yeah sure you can make a pong clone in html/js and that's mainly because there the internet is full of pong clone demos. Ask how to constraint a statsmodels lineal model in some non-standard way? It will gaslight how it is possible and make you loss time in the process.
  
  IAmGraydon 7 months ago
  
  Making a pong clone by telling the LLM to make a pong clone is a cute trick that sometimes works, but that's not the way anyone who understands how to properly use these tools is using them. You don't describe and app and hope the LLM builds it correctly. You have to know how to architect an application and you use the LLM to build small pieces of code. For example, you tell it to build a function that does x, takes the inputs a, b, and c and returns z.
  LLMs don't turn non-coders into coders. It gives actual coders superpowers.
  
  mrbungie 7 months ago
  
  No true scottsman fallacy. I know how to use them, but using them "correctly" still produces many errors.
  They suck at non-trivial code outside of standard library usage and boilerplate coding: I gave an example and parent did as well. In that regard would at least change your phrase from "actual coders" to "actual senior coders", as any junior receiving bad advice (in eternal loops as LLMs normally like to do it) is only going to make them waste time and tokens.
  
  IAmGraydon 7 months ago
  
  My point is that while you do have to give them coding problems that would have appeared in their training set (I guess you could call that trivial), every coding problem becomes trivial when you break it down to it's constituent parts. As you know, the biggest applications are just a lot of very simple building blocks working together. The point of using LLMs to code is not to solve complex problems. It's just to write code you could have written yourself at the speed of light using a natural language interface.
  The way you described using LLMs to code seems like the approach someone who doesn't know how to build software might take, which is why I used the wording I did. From that angle, I agree with you - I can't even get Sonnet to create a working prototype of a basic game from a prompt. That said, I'm using it to build a far more complex enterprise web app step by step by using it in the way I mentioned above. It does work for these things, but you have to already know how to do what the LLM is doing.
  
  mrbungie 7 months ago
  
  I mentioned the pong example because that is what non-coders LLM users show and what the industry is proposing as the future of software development: no coding experience necessary.
  > It does work for these things, but you have to already know how to do what the LLM is doing.
  Yes, we totally agree. But even then, using models "correctly" in my experience and breaking down the problems for them gets you so far, once you start using weird/niche APIs (probably even your own APIs when your project gets big enough and you are not working with much boilerplate anymore) the LLM will start getting single concepts wrong.
  And don't get me wrong, I understand those as limitations of a tech that still is immensely useful in the correct hands. My only issue with that is how these products are actually being marketed: as junior devs copilots or even replacements.
  
  margalabargala 7 months ago
  
  As a coder with some noncoder friends who have made some very impressive things with chatGPT, you're selling it short.
  It does both. It gives coders superpowers, and gives noncoders the ability to do things that would have previously taken them months, or another person.
  
  IAmGraydon 7 months ago
  
  Do you mind sharing what they've created with it?
  
  margalabargala 7 months ago
  
  They created a touchscreen GUI in tkinter with more-than-trivial behavior to use as a frontend for input for a device they created. They were able to describe what they wanted, and in less than two hours have it working. This is someone with no software experience.
  Three years ago, if I had been asked to create something like that, it would have taken me more than two hours, just because I've never used tkinter and would have to spend time reading the docs and figuring out how to make the different input boxes and laying them out properly.
  I looked at the code, and no, it's not great. It's not designed "well" and isn't very extensible. But it works for him, doesn't need to be extended, and all in half a morning.
- svaha1728 7 months ago
  
  Not even close. I’m a programmer but also a guitarist. I love asking it to tab out songs for me or asking it how many bars are in the intro of a song. It convincingly gives an answer that is always way off the mark.
aleph_minus_one 7 months ago

> Now someone has to have a new idea. There's plenty of money available if someone has one.
I honestly do claim to have some ideas where I see evidence that they might work (and I do attempt to work privately on a prototype if only out of curiosity and to see whether I am right). The bad news: these ideas very likely won't be helpful for these LLM companies because they are not useful for their agenda, and follow a very different approach.
So no money for me. :-(
Let me put it this way:
Have you ever talked to a person whose intelligence is miles above yours? It can easily become very exhausting. Thus an "insanely intelligent" AI would not be of much use for most people - it would think "too different" from such people.
There do exist tasks in commerce for which an insane amount of intelligence would make a huge difference (in the sense of being positive regarding some important KPIs), but these are rare. I can imagine some applications of such (fictional) "super-intelligent" AIs in finance and companies doing some bleeding-edge scientific research - but these are niche applications (though potentially very lucrative ones).
If OpenAI, Anthropic & Co were really attempting to develop some "super-smart" AI, they were working on such very lucrative niche applications where an insane amount of intelligence would make a huge difference, and where you can assume and train the AI operator to have a "Fields-medal level" intelligence.
Jean-Papoulos 7 months ago

> So LLMs finally hit the wall. For a long time, more data, bigger models, and more compute to drive them worked
We can't say whether there is a wall, since we don't have anymore data to train on.
knapcio 7 months ago

I’m wondering whether O3 can be used to explore its own improvement or optimization ideas, or if it hasn’t reached that point yet.
atleastoptimal 7 months ago

the new idea is the o series and clearly OpenAI’s main focus now. It’s advancing much faster than the GPT series
thrwthsnw 7 months ago

Seriously? All they do is produce a “confidence metric”
- emtel 7 months ago
  
  But how do they do that?
Yizahi 7 months ago

To output "don't know" a system needs to "know" too. Random token generator can't know. It can guess better and better, maybe it can even guess 99.99% of time, but it can't know, it can't decide or reason (not even o1 can "reason").

ericskiff 7 months ago

What we can reasonably assume from statements made by insiders:

They want a 10x improvement from scaling and a 10x improvement from data and algorithmic changes

The sources of public data are essentially tapped

Algorithmic changes will be an unknown to us until they release, but from published research this remains a steady source of improvement

Scaling seems to stall if data is limited

So with all of that taken together, the logical step is to figure out how to turn compute into better data to train on. Enter strawberry / o1, and now o3

They can throw money, time, and compute at thinking about and then generating better training data. If the belief is that N billion new tokens of high quality training data will unlock the leap in capabilities they’re looking for, then it makes sense to delay the training until that dataset is ready

With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field. OpenAI’s next moat may be the best synthetic training set ever.

At this point I would guess we get 4.5 with a subset of this - some scale improvement, the algorithmic pickups since 4 was trained, and a cleaned and improved core data set but without risking leakage of the superior dataset

When 5 launches, we get to see what a fully scaled version looks like with training data that outstrips average humans in almost every problem space

Then the next o-model gets to start with that as a base and reason? Its likely to be remarkable

sdwr 7 months ago

Great improvements and all, but they are still no closer (as of 4o regular) to having a system that can be responsible for work. In math problems, it forgets which variable represents what, in coding questions it invents library fns.
I was watching a YouTube interview with a "trading floor insider". They said they were really being paid for holding risk. The bank has a position in a market, and it's their ass on the line if it tanks.
ChatGPT (as far as I can tell) is no closer to being accountable or responsible for anything it produces. If they don't solve that (and the problem is probably inherent to the architecture), they are, in some sense, polishing a turd.
- nightowl_games 7 months ago
  
  > They said they were really being paid for holding risk.
  I think that's a really interesting insight that has application to using 'AI' in jobs across the board.
- zifpanachr23 7 months ago
  
  This is underdiscussed. I don't think people understand just how worthless AI is in a ton of fields until it is able to be held liable and be sent to prison.
  There are a lot of moral conundrums that are just not going to work out with this. Seems like an attempt to just offload liability and it seems like pretty much everybody has caught onto that as being it's main selling point and probably main thing that will keep it from ever being accepted for anything important.
- tucnak 7 months ago
  
  > ChatGPT (as far as I can tell) is no closer to being accountable or responsible for anything it produces.
  What does it even mean? How do you imagine that? You want OpenAI to take on liability for the kicks of it?
  
  numpad0 7 months ago
  
  If an LLM can't be left to do mowing by itself, but a human will have to closely monitor and intervene at every its steps, then it's just a super fast predictive keyboard, no?
  
  dyauspitr 7 months ago
  
  But what if the human only has to intervene once every 100 hours, that’s a huge productivity boost.
  
  cjblomqvist 7 months ago
  
  The point is you don't know when of those 100 hours that is, so you still need to monitor the full 100 hour time span.
  Can still be a boost. But definitely not the same magnitude.
  
  kjkjadksj 7 months ago
  
  And one might also wonder still if we need a general language model to mow the grass or just a simpler solution towards to problem of driving a mower over a fixed property line automatically. Something you could probably solve with wwii era technology, honestly.
  
  dmkolobov 7 months ago
  
  Obviously not. I want legislation which imposes liability on OpenAI and similar companies if they actively market their products for use in safety-critical fields and their product doesn’t perform as advertised.
  If a system is providing incorrect medical diagnoses, or denying services to protected classes due to biases in the training in the training data, someone should be held accountable.
  
  sdwr 7 months ago
  
  Personal responsibility, not legal liability. In the way a child can be responsible for a pet.
  Chatgpt was trained on benchmarks and user opinions - "throwing **** at the wall to see what sticks".
  Responsibility means penalties for making mistakes, and, more importantly, having an awareness of those penalties (that informs its decision-making).
  
  SpicyLemonZest 7 months ago
  
  They would want to, if they thought they could, because doing so would unblock a ton of valuable use cases. A tax preparation or financial advisor AI would do huge numbers for any company able to promise that its advice can be trusted.
Stevvo 7 months ago

"With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field."
I highly doubt that. o3 is many orders of magnitude more expensive than paying subject matter experts to create new data. It just doesn't make sense to pay six figures in compute to get o3 to make data a human could make for a few hundred dollars.
- bookaway 7 months ago
  
  Yes, I think they had to push this reveal forward because their investors were getting antsy with the lack of visible progress to justify continuing rising valuations. There is no other reason a confident company making continuous rapid progress would feel the need to reveal a product that 99% of companies worldwide couldn't use at the time of the reveal.
  That being said, if OpenAI is burning cash at lightspeed and doesn't have to publicly reveal the revenue they receive from certain government entities, it wouldn't come as a surprise if they let the government play with it early on in exchange for some much needed cash to set on fire.
  EDIT: The fact that multiple sites seem to be publishing GPT-5 stories similar to this one leads one to conclude that the o3 benchmark story was meant to counter the negativity from this and other similar articles that are just coming out.
- mrshadowgoose 7 months ago
  
  Can SMEs deliver that data in a meaningful amount of time? Training data now is worth significantly more than data a year from now.
- GolfPopper 7 months ago
  
  >churning out new thinking at expert level across every field
  I suspect this is really, "churning out text that impresses management".
- tshadley 7 months ago
  
  Seems to me o3 prices would be what the consumer pays, not what OpenAI pays. That would mean o3 could be more efficient in-house than paying subject-matter experts.
  
  mrbungie 7 months ago
  
  For every consumer there will be a period where they need both the SME and the o3 model for initial calibration and eventual handoff for actually getting those efficiencies in whichever processes they want to automate.
  In other words if you are diligent enough, you should at least validate your o3 solution with an actual expert for some time. You wouldn't just blindly trust OpenAI your business critical processes, would you? I would expect at least 3 month - 6 months for large corps and even more considering change management, re-upskilling, etc.
  With all those considerations I really don't see the value prop at those prices and in those situations right now. Maybe if costs decrease ~1-3 orders of magnitude more for o3-low, depending on the the processes being automated.
  
  lalalali 7 months ago
  
  What is open ai margin on that product?
- dartos 7 months ago
  
  That’s an interesting idea. What if OpenAI funded medical research initiatives in exchange for exclusive training rights on the research.
  
  onlyrealcuzzo 7 months ago
  
  It would be orders of magnitude cheaper to outsource to humans.
  
  dartos 7 months ago
  
  Not as sexy to investors though
  
  aswegs8 7 months ago
  
  Wait didn't they just recently request researchers to pair up with them in exchange for the data?
- DougN7 7 months ago
  
  Someone needs to dress up Mechanical Turk and repackage it as an AI company…..
  
  jitl 7 months ago
  
  That’s basically every AI company that existed before GPT3
- rtsil 7 months ago
  
  Unless the quality of the human data are extraordinary, it seems according to the TFA that it's not that easy:
  > The process is painfully slow. GPT-4 was trained on an estimated 13 trillion tokens. A thousand people writing 5,000 words a day would take months to produce a billion tokens.
  And if the human-generated data was so qualitatively good that it is smaller by three order of magnitudes, than I can assume it would be at least as expensive as o3.
- az226 7 months ago
  
  Only a matter of time. The costs are aggressively going down. And with specialized inference hardware it will go further down.
  Cost of coordination is also large. Immediate answers are an advantage/selling point.
nialv7 7 months ago

> OpenAI’s next moat
I don't think oai has any moat at all. If you look around, QwQ from Alibaba is already pushing o1-preview performances. I think oai is only ahead by 3~6 months at most.
- vasco 7 months ago
  
  If their AGI dreams would come true it might be more than enough to have 3 months head start. They probably won't, but it's interesting to ponder what the next few hours, days, weeks would be for someone that would wield AGI.
  Like let's say you have a few datacenters of compute at your disposal and the ability to instantiate millions of AGI agents - what do you have them do?
  I wonder if the USA already has a secret program for this under national defense. But it is interesting that once you do control an actual AGI you'd want to speed-run a bunch of things. In opposition to that, how do you detect an adversary already has / is using it and what to do in that case.
  
  kevingadd 7 months ago
  
  How many important problems are there where a 3 month head start on the data side is enough to win permanently and retain your advantage in the long run?
  I'm struggling to think of a scenario where "I have AGI in January and everyone else has it in April" is life-changing. It's a win, for sure, and it's an advantage, but success in business requires sustainable growth and manageable costs.
  If (random example) the bargain OpenAI strikes is "we spend every cent of our available capital to get AGI 3 months before the other guys do" they've now tapped all the resources they would need to leverage AGI and turn it into profitable, scalable businesses, while the other guys can take it slow and arrive with full pockets. I don't think their leadership is stupid enough to burn all their resources chasing AGI but it does seem like operating and training costs are an ongoing problem for them.
  History is littered with first-movers who came up with something first and then failed to execute on it, only for someone else to follow up and actually turn the idea into a success. I don't see any reason to assume that the "first AGI" is going to be the only successful AGI on the market, or even a success at all. Even if you've developed an AGI that can change the world you need to keep it running so it can do that.
  Consider it this way: Sam Altman & his ilk have been talking up how dangerous OpenAI's technology is. Are risk-averse businessmen and politicians going to be lining up to put their livelihood or even their lives in the hands of "dangerous technology"? Or are they going to wait 3-6 months and adopt the "safe" AGI from somebody else instead?
  
  vasco 7 months ago
  
  Well that's the thought exercise. Is there something you can do with almost unlimited "brains" of roughly human capability but much faster, within a few days / weeks / months. Lets say you can instantiate 1 million agents, for 3 months, and each of them is roughly 100x faster than a human, that means you have the equivalent of 100 million human-brain-hours to dump into whatever you want, as long as your plans don't require building too many real world things that actually require moving atoms around, I think you could do some interesting things. You could potentially dump a few million hours into "better than AGI AI" to start off for example, then go to other things. If they are good enough you might be able to find enough zero-days to disable any adversary through software, among other interesting things.
  
  kevingadd 7 months ago
  
  Where does "almost unlimited" come into the picture though? I see people talking like AGI will be unlimited when it will be limited by available compute resources, and like I suggested, being 'first' might come at the cost of the war chest you'd need to access those resources.
  What does it take to instantiate 1 million agents? Who has that kind of money and hardware? Would they still have it if they burn everything in the tank to be first?
  
  vasco 7 months ago
  
  > Where does "almost unlimited" come into the picture though
  >> Like let's say you have a few datacenters of compute at your disposal and the ability to instantiate millions of AGI agents - what do you have them do?
  > has that kind of money and hardware?
  Any hyperscaler plus most geopolitical main players. So the ones who matter.
  
  pertymcpert 7 months ago
  
  Once you have AGI you use it to collect resources to cripple competitors and to build a snowball effect to make yourself unbeatable. 3 months of AGI is enough in the right hands to dominate the world economically.
  
  Jensson 7 months ago
  
  Only if the AGI is cheaper than a human, in the case the AGI is more expensive than a human there wont be any snowballing. And the most likely case is that the first AGI is more expensive to run than a human, a few months of having overly expensive human level AI bots wont disrupt the world at all.
- acyou 7 months ago
  
  That is why being #2 in technical product development can be great. Someone else pays to work out the kinks, copy what works and improve on it at a fraction of the cost. You see it time and time again.
dartos 7 months ago

I’m curious how, if at all, the plan to get around compounding bias in synthetic data generated by models trained in synthetic data.
- ynniv 7 months ago
  
  Everyone's obsessed with new training tokens... It doesn't need to be more knowledgeable, it just needs to practice more. Ask any student: practice is synthetic data.
  
  dartos 7 months ago
  
  That leads to overfitting in ML land, which hurts overall performance.
  We know that unique data improves performance.
  These LLM systems are not students…
  Also, which students graduate and are immediately experts in their fields? Almost none.
  It takes years of practice in unique, often one-off, situations after graduation for most people to develop the intuition needed for a given field.
  
  ynniv 7 months ago
  
  It's overfitting when you train too large a model on too many details. Rote memorization isn't rewarding.
  The more concepts the model manages to grok, the more nonlinear its capabilities will be: we don't have a data problem, we have an educational one.
  Claude 3.5 was safety trained by Claude 3.0, and it's more coherent for it. https://www.anthropic.com/news/claudes-constitution
  
  dartos 7 months ago
  
  Overfitting can be caused by a lot of different things. Having an over abundance of one kind of data in a training set is one of those causes.
  It’s why many pre-processing steps for image training pipelines will add copies of images at weird rotations, amounts of blur, and different cropping.
  > The more concepts the model manages to grok, the more nonlinear its capabilities will be
  These kind of hand wavey statements like “practice,” “grok,” and “nonlinear its capabilities will be” are not very constructive as they don’t have solid meaning wrt language models.
  So earlier when I was referring to compounding bias in synthetic data I was referring to a bias that gets trained on over and over and over again.
  That leads to overfitting.
  
  ynniv 7 months ago
  
  These kind of hand wavey statements like “practice,” “grok,” and “nonlinear its capabilities will be” are not very constructive as they don’t have solid meaning wrt language models.
  So, here's my hypothesis, as someone who is adjacent ML but haven't trained DNNs directly:
  We don't understand how they work, because we didn't build them. They built themselves.
  At face value this can be seen as an almost spiritual position, but I am not a religious person and I don't think there's any magic involved. Unlike traditional models, the behavior of DNNs is based on random changes that failed up. We can reason about their structure, but only loosely about their functionality. When they get better at drawing, it isn't because we taught them to draw. When they get better at reasoning, it isn't because the engineers were better philosophers. Given this, there will not be a direct correlation between inputs and capabilities, but some arrangements do work better than others.
  If this is the case, high order capabilities should continue to increase with training cycles, as long as they are performed in ways that don't interfere with what has been successfully learned. People lamented the loss of capability that GPT 4 suffered as they increased safety. I think Anthropic has avoided this by choosing a less damaging way to tune a well performing model.
  I think these ideas are supported by Wolfram's reduction of the problem at https://writings.stephenwolfram.com/2024/08/whats-really-goi...
  
  dartos 7 months ago
  
  Your whole argument falls apart at
  > We don't understand how they work, because we didn't build them. They built themselves.
  We do understand how they work, we did build them. The mathematical foundation of these models are sound. The statistics behind them are well understood.
  What we don’t exactly know is which parameters correspond to what results as it’s different across models.
  We work backwards to see which parts of the network seem to relate to what outcomes.
  > When they get better at drawing, it isn't because we taught them to draw. When they get better at reasoning, it isn't because the engineers were better philosophers.
  Isn’t this the exact opposite of reality?
  They get better at drawing because we improve their datasets, topologies, and their training methods and in doing so, teach them to draw.
  They get better at reasoning because the engineers and data scientists building training sets do get better at philosophy.
  They study what reasoning is and apply those learnings to the datasets and training methods.
  That’s how CoT came about early on.
  
  comp_throw7 7 months ago
  
  > We do understand how they work, we did build them. The mathematical foundation of these models are sound. The statistics behind them are well understood.
  We don't understand how they work in the sense that we can't extract the algorithms they're using to accomplish the interesting/valuable "intellectual" labor they're doing. i.e. we cannot take GPT-4 and write human-legible code that faithfully represents the "heavy lifting" GPT-4 does when it writes code (or pick any other task you might ask it to do).
  That inability makes it difficult to reliably predict when they'll fail, how to improve them in specific ways, etc.
  The only way in which we "understand" them is that we understand the training process which created them (and even that's limited to reproducible open-source models), which is about as accurate as saying that we "understand" human cognition because we know about evolution. In reality, we understand very little about human cognition, certainly not enough to reliably reproduce it in silico or intervene on it without a bunch of very expensive (and failure-prone) trial-and-error.
  
  dartos 7 months ago
  
  > We don't understand how they work in the sense that we can't extract the algorithms they're using to accomplish the interesting/valuable "intellectual" labor they're doing. i.e. we cannot take GPT-4 and write human-legible code that faithfully represents the "heavy lifting" GPT-4 does when it writes code (or pick any other task you might ask it to do).
  I think English is being a little clumsy here. At least I’m finding it hard to express what we do and don’t know.
  We know why these models work. We know precisely how, physically, they come to their conclusions (it’s just processor instructions as with all software)
  We don’t know precisely how to describe what they do in a formalized general way.
  That is still very different from say an organic brain, where we barely even know how it works, physically.
  My opinions:
  I don’t think they are doing much mental “labor.” My intuition likens them to search.
  They seem to excel at retrieving information encoded in their weights through training and in the context.
  They are not good at generalizing.
  They also, obviously, are able to accurately predict tokens such that the resulting text is very readable.
  Larger models have a larger pool of information and that information is in a higher resolution, so to speak, since the larger better preforming models have more parameters.
  I think much of this talk of “consciousness” or “AGI” is very much a product of human imagination, personification bias, and marketing.
  
  og_kalu 7 months ago
  
  >We know why these models work. We know precisely how, physically, they come to their conclusions (it’s just processor instructions as with all software)
  I don't know why you would classify this as knowing much of anything. Processor instructions ? Really?
  If the average user is given unfettered access to the entire source code of his/her favorite app, does he suddenly understand it ? That seems like a ridiculous assertion.
  In reality, it's even worse. We can't pinpoint what weights, how and in what ways and instances are contributing exactly to basic things like whether a word should be preceded by 'the' or 'a' and it only gets more intractable as models get bigger and bigger.
  Sure, you could probably say we understand these NNs better than brains but it's not by much at all.
  
  dartos 7 months ago
  
  > If the average user is given unfettered access to the entire source code of his/her favorite app, does he suddenly understand it ? That seems like a ridiculous assertion.
  And one that I didn’t make.
  I don’t think when we say “we understand” we’re talking about your average Joe.
  I mean “we” as in all of human knowledge.
  > We can't pinpoint what weights, how and in what ways and instances are contributing exactly to basic things like whether a word should be preceded by 'the' or 'a' and it only gets more intractable as models get bigger and bigger.
  There is research coming out on this subject. I read a paper recently about how llama’s weights seemed to be grouped by concept like “president” or “actors.”
  But just the fact that we know that information encoded in weights affects outcomes and we know the underlying mechanisms involved in the creation of those weights and the execution of the model shows that we know much more about how they work than an organic brain.
  The whole organic brain thing is kind of a tangent anyway.
  My point is that it’s not correct to say that we don’t know how these systems work. We do. It’s not voodoo.
  We just don’t have a high level understanding of the form in which information is encoded in the weights of any given model.
  
  og_kalu 7 months ago
  
  > If the average user is given unfettered access to the entire source code of his/her favorite app, does he suddenly understand it ? That seems like a ridiculous assertion. And one that I didn’t make. I don’t think when we say “we understand” we’re talking about your average Joe. I mean “we” as in all of human knowledge.
  It's an analogy. In understanding weights, even the best researchers are basically like the untrained average joe with source code.
  >There is research coming out on this subject. I read a paper recently about how llama’s weights seemed to be grouped by concept like “president” or “actors.”
  >But just the fact that we know that information encoded in weights affects outcomes and we know the underlying mechanisms involved in the creation of those weights and the execution of the model shows that we know much more about how they work than an organic brain.
  I guess i just don't see how "information is encoded in the weights" is some great understanding ? It's as vague and un-actionable as you can get.
  For training, the whole revolution of back-propagation and NNs in general is that we found a way to reinforce the right connections without knowing anything about how to form them or even what they actually are.
  We no longer needed to understand how eyes detect objects to build an object detecting model. None of that knowledge suddenly poofed into our heads. Back-propagation is basically "reinforce whatever layers are closer to the right answer". Extremely powerful but useless for understanding.
  Knowing the Transformer architecture unfortunately tells you very little about what a trained model is actually learning during training and what it has actually learnt.
  "Information is encoded in a brain's neurons and this affects our actions". Literally nothing useful you can do with this information. That's why models need to be trained to fix even little issues.
  If you want to say we understand models better than the brain then sure but you are severely overestimating how much that "better" is.
  
  dartos 7 months ago
  
  > It's as vague and un-actionable as you can get.
  But it isn’t. Knowing that information is encoded in the weights gives us a route to deduce what a given model is doing.
  And we are. Research is being done there.
  > "Information is encoded in a brain's neurons and this affects our actions". Literally nothing useful you can do with this.
  Different entirely. We don’t even know how to conceptualize how data is stored in the brain at all.
  With a machine, we know everything. The data is stored in a binary format which represents a decimal number.
  We also know what information should be present.
  We can and are using this knowledge to reverse engineer what a given model is doing.
  That is not something we can do with a brain because we don’t know how a brain works. The best we can do is see that there’s more blood flow in one area during certain tasks.
  With these statistical models, we can carve out entire chunks of their weights and see what happens (interestingly not much. Apparently most weights don’t contribute significantly towards any token and can be ignored with little performance loss)
  We can do that with these transformers models because we do know how they work.
  Just because we don’t understand every aspect of every single model doesn’t mean we don’t know how they work.
  I think we’re starting to run in circles and maybe splitting hairs over what “know how something works” means.
  I don’t think we’re going to get much more constructive than this.
  I highly recommend looking into LoRas. We can make Loras because we know how these models work.
  We can’t do that for organic brains.
  
  mistercheph 7 months ago
  
  The thing that you are handwaving away as just "which parameters correspond to what results" is precisely the important, the inexorable thing which defines the phenomena, and it is exactly the thing which we don't have access to, and which we did not and could not design, plan or engineer, but which emerged
  
  dartos 7 months ago
  
  > which we did not and could not design, plan or engineer, but which emerged
  We literally designed, planned, and engineered the environment and mechanisms which created those weights.
  It’s just code. We can train models by hand too, it’d just take a lot longer.
  It’s literally something we made, just from a higher order place.
  To understand which exact weights correspond to what output will vary from model to model. There is research going into this subject for llama.
  it’s not like we’re in the dark as to the principles that allow LLMs to make predictions.
  My whole point is that to say “we don’t know how AI works” is just not true
  
  ynniv 7 months ago
  
  Please, read the Wolfram blog
  
  dartos 7 months ago
  
  I gave it a fair skim, but I didn’t really feel like it refuted what I said.
  Is there a specific section that comes to mind?
  
  ynniv 7 months ago
  
  Other than we don't tell it how to get the right answer, or understand how it eventually computes correct answers?
  
  dartos 7 months ago
  
  I don’t really think you’re understanding my argument…
  
  layer8 7 months ago
  
  And who will tell the model whether its practice results are correct or not? Students practice against external evaluators, it’s not a self-contained system.
- nialv7 7 months ago
  
  synthetic data is fine if you can ground the model somehow. that's why the o1/o3's improvements are mostly in reasoning, maths, etc., because you can easily tell if the data is wrong or not.
  
  dartos 7 months ago
  
  That makes a lot of sense.
  Binary success criteria has very little room for bias.
jsheard 7 months ago

> With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field. OpenAI’s next moat may be the best synthetic training set ever.
Even taking OpenAI and the benchmark authors at their word they said that it is consuming at least tens of dollars per task to hit peak performance, how much would it cost to have it produce a meaningfully large training set?
- qup 7 months ago
  
  That's the public API price isn't it?
  
  jsheard 7 months ago
  
  There is no public API for o3 yet, those are the numbers they revealed in the ARC-AGI announcement. Even if they were public API prices we can't assume they're making a profit on those for as long as they're billions in the red overall every year, its entirely possible that the public API prices are less than what OpenAI is actually paying.
noman-land 7 months ago

I completely don't understand the use for synthetic data. What good it's it to train a model basically on itself?
- psb217 7 months ago
  
  The value of synthetic data relies on having non-zero signal about which generated data is "better" or "worse". In a sense, this what reinforcement learning is about. Ie, generate some data, have that data scored by some evaluator, and then feed the data back into the model with higher weight on the better stuff and lower weight on the worse stuff.
  The basic loop is: (i) generate synthetic data, (ii) rate synthetic data, (iii) update model to put more probability on better data and less probability on worse data, then go back to (i).
  
  RedNifre 7 months ago
  
  But who rates the synthetic data? If it is humans, I can understand that this is another way to get human knowledge into it, but if it's rated by AI, isn't it just a convoluted way of copying the rating AI's knowledge?
  
  recursivecaveat 7 months ago
  
  Many things are more easily scored than produced. Like it's trivial to tell whether a poem rhymes, but writing one is a comparatively slow and difficult task. So hopefully since scoring is easier/more-discerning than generating, the idea is you can generate stuff, classify it as good or bad, and then retrain on the good stuff. It's kindof an article of faith for a lot of AI companies/professionals as well, since it prevents you from having to face a data wall, and is analogous to a human student practicing and learning in an appealing way.
  As far as I know it doesn't work very well so far. It is prone to overfitting, where it ranks highly some trivial detail of the output eg "if a summary starts with a byline of the author its a sign of quality" and then starts looping on itself over and over, increasing the frequency and size of bylines until it's totally crommed off to infinity and just repeating a short phrase endlessly. Humans have good baselines and common sense that these ML systems lack, if you've ever seen one of those "deep dream" images it's the same kind of idea. The "most possible dog" image can be looks almost nothing like a dog in the same way that the "most possible poem" may look nothing like a poem.
  
  ijustlovemath 7 months ago
  
  This is the bit I've never understood about training AI on its own output; won't you just regress to the mean?
  
  astrange 7 months ago
  
  It's not trained on its own output. You can generate infinite correctly worked out math traces and train on those.
  
  noman-land 7 months ago
  
  Thanks, that makes a lot more sense.
- viraptor 7 months ago
  
  This is a good read for some examples https://arxiv.org/abs/2203.14465
  > This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers
  But there are a few others. In general good data is good data. We're definitely learning more about how to produce good synthetic version.
  
  im3w1l 7 months ago
  
  One issue with that is that the model may learn to smuggle data. You as a human think that the plain reading of the words is what is doing the reasoning, but (part of) the processing is done by the exact comma placement and synonym choice etc.
  Data smuggling is a known phenomenon in similar tasks.
  
  viraptor 7 months ago
  
  I don't think data smuggling is relevant in star style scenarios. You're still validating the final output. If it works on test data, what could be even smuggled.
- Majromax 7 months ago
  
  > What good it's it to train a model basically on itself?
  If the model generates data of variable quality, and if there's a good way to distinguish good data from bad data, then training on self-generated data might "bootstrap" a model to better performance.
  This is common in reinforcement learning. Famously, AlphaGo Zero (https://en.wikipedia.org/wiki/AlphaGo_Zero) learned exclusively on self-play, without reference to human-played games.
  Of course, games have a built-in critic: the better strategy usually wins. It's much harder to judge the answer to a math problem, or decide which essay is more persuasive, or evaluate restaurant recommendations.
- dyauspitr 7 months ago
  
  If we get to a point where we have a model that when fed a real world stream of data (YouTube, surveillance cameras, forum data, cell phone conversations etc.) and can prune out a good training set for itself then you’re at the point where the LLM is in a feedback loop where it can improve itself. That’s AGI for all intents and purposes.
nradov 7 months ago

There is an enormous "iceberg" of untapped non-public data locked behind paywalls or licensing agreements. The next frontier will be spending money and human effort to get access to that data, then transform it into something useful for training.
- mistercheph 7 months ago
  
  ah yes the beautiful iceberg of internal documentation, legal paperwork, and meeting notes.
  the highest quality language data that exists is in the public domain

A_D_E_P_T 7 months ago

Counterpoint: o1-Pro is insanely good -- subjectively, it's as far above GPT4 as GPT4 was above 3. It's almost too good. Use it properly for an extended period of time, and one begins to worry about the future of one's children and the utility of their schooling.

o3, by all accounts, is better still.

Seems to me that things are progressing quickly enough.

anonzzzies 7 months ago

Not sure what you are using it for, but it is terrible for me for coding; claude beats it always and hands down. o1 just thinks forever to come up with stuff it already tried the previous time.
People say that's just prompting without pointing to real million line+ repositories or realistic apps to show how that can be improved. So I say they are making todo and hello world apps and yes, there it works really well. Claude still beats it, every.. single.. time..
And yes, I use the Pro of all and yes, I do assume coding is done for most of people. Become a plumber or electrician or carpenter.
- h_tbob 7 months ago
  
  That so weird, it’s seems like everybody here prefers Claude.
  I’ve been using Claude and openai in copilot and I find even 4o seems to understand the problem better. O1 definitely seems to get it right more for me.
  
  anonzzzies 7 months ago
  
  I try to sprinkle 'for us/me' everywhere as much as I can; we work on LoB/ERP apps mostly. These are small frontends to massive multi million line backends. We carved a niche by providing the frontends on these backends live at the client office by a business consultant of ours: they simply solve UX issues for the client on top of large ERP by using our tool and prompting. Everything looks modern, fresh and nice; unlike basically all the competitors in this space. It's fast and no frontend people are needed for it; backend is another system we built which takes a lot longer of course as they are complex business rules. Both claude and o1 turn up something that looks similar but only the claude version will work and be, after less prompting, correct. I don't have shares in either and I want open source to win; we have all open (more open) solutions doing all the same queries and we evaluate all but claude just wins. We did manage even big wins with openai davinci in 2022 (or so; before chatgpt), but this is a massive boost allowing us to upgrade most people to business consultant and just have them build with clients real time and have the tech guys including me add manually tests and proofs (where needed) to know if we are actually fine. Works so much better than the slog with clients before; people are so bad at explaining at what they need, it was slowly driving me insane after doing it for 30+ years.
  
  dmix 7 months ago
  
  > It's fast and no frontend people are needed for it
  I guess if you don’t need to maintain it, just an ever growing blob of complexity that will be reinvented into new blobs every time when the old one becomes too immobile :)
  
  a_wild_dandan 7 months ago
  
  So...nothing will change?
  
  dmix 7 months ago
  
  True I could imagine in the ERP world building one-off solutions repeatedly for tons of consulting money is the status quo. Most software businesses can't afford repeatedly starting from scratch and having zero reusability of stuff they already invested in getting working and tested, even with AI assistance.
  
  master_crab 7 months ago
  
  Claude also has a better workflow UI. It’ll maintain conversation context while opening up new windows to present code suggestions.
  When I was still subscribing to OpenAI (about 4 months ago) this didn’t exist.
  
  rrrrrrrrrrrryan 7 months ago
  
  It exists as of last week with Canvas.
  
  fragmede 7 months ago
  
  If you're using the web interface of either, you might consider looking into tools that focus on using LLMs for code, so you're not copy/pasting.
  
  A_D_E_P_T 7 months ago
  
  They're both okay for coding, though for my use cases (which are niche and involve quite a lot of mathematics and formal logic) o1/o1-Pro is better. It seems to have a better native grasp of mathematical concepts, and it can even answer very difficult questions from vague inputs, e.g.: https://chatgpt.com/share/676020cb-8574-8005-8b83-4bed5b13e1...
  
  mitemte 7 months ago
  
  Claude web’s context window is 200K tokens. I’d be surprised if GitHub Copilot’s context window exceeds 10K.
  I’ve found using Claude via Copilot in VS Code produces noticeably lower quality results than 3.5 Sonnet on web. In my experience Claude web outdoes GPT-4o consistently.
  
  orbital-decay 7 months ago
  
  Different languages maybe? I find Sonnet v2 to be lacking in Rust knowledge compared to 4o 11-20, but excelling at Python and JS/TS. O1's strong side seems to be complex or quirky puzzle-like coding problems that can be answered in a short manner, it's meh at everything else, especially considering the price. Which is understandable given its purpose and training, but I have no use for it as that's exactly the sort of problem I wouldn't trust an LLM to solve.
  Sonnet v2 in particular seems to be a bit broken with its reasoning (?) feature. The one where it detects it might be hallucinating (what's even the condition?) and reviews the reply, reflecting on it. It can make it stop halfway into the reply and decide it wrote enough, or invent some ridiculous excuse to output a worse answer. Annoying, although it doesn't trigger too often.
- rubymamis 7 months ago
  
  I find that o1 and Sonnet 3.5 are good and bad quite equally on different things. That's why I keep asking both the same coding questions.
  
  anonzzzies 7 months ago
  
  We do the same (all requests go to o1, sonnet and gemini and we store the results for later to compare) automatically for our research: Claude always wins. Even with specific prompting on both platforms. Especially frontend it seems o1 really is terrible.
  
  rubymamis 7 months ago
  
  Every time I try Gemini, it's really subpar. I found that qwen2.5-coder-32b-instruct can be better.
  Also, for me 50% 50% for Sonnet and o1, but although I'm not 100% sure about it, I think o1 is better with longer and more complicated (C++) code and debugging. At least from my brief testing. Also, OpenAI models seem to be more verbose - sometimes it's better - where I'd like additional explanation on chosen fields in a SQL schema, sometimes it's too much.
  EDIT: Just asked both o1 and Sonnet 3.5 the same QML coding question, and Sonnet 3.5 succeeded, o1 failed.
  
  oceanplexian 7 months ago
  
  Very anecdotal but I’ve found that for things that are well spec’d out with a good prompt Sonnet 3.5 is far better. For problems where I might have introduced a subtle logical error o1 seems to catch it extremely well. So better reasoning might be occurring but reasoning is only a small part of what we would consider intelligence.
  
  energy123 7 months ago
  
  A new o1 was released on December 17th. Which one are you talking about
  
  tigershark 7 months ago
  
  Exactly. The previous version of o1 did actually worse in the coding benchmarks, so I would expect it to be worse in real life scenarios. The new version released a few days ago on the other hand is better in the benchmarks, so it would seem strange that someone used it and is saying that it’s worse than Claude.
  
  CapcomGo 7 months ago
  
  Wins? What does this mean? Do you have any results? I see the claims that Claude is better for coding a lot but using it and using Gemini 2.0 flash and o1 and it sure doesn't seem like it.
  
  ynniv 7 months ago
  
  Claude is trained on principles. GPT is trained on billions of edge cases. Which student do you prefer?
phito 7 months ago

I keep reading this on HN so I believe it has to be true in some ways, but I don't really feel like there is any difference in my limited use (programming questions or explaining some concepts).
If anything I feel like it's all been worse compared to the first release of ChatGPT, but I might be wearing rose colored glasses.
- fzeroracer 7 months ago
  
  If you've ever used any enterprise software for long enough, you know the exact same song and dance.
  They release version Grand Banana. Purported to be approximately 30% faster with brand new features like Algorithmic Triple Layering and Enhanced Compulsory Alignment. You open the app. Everything is slower, things are harder to find and it breaks in new, fun ways. Your organization pays a couple hundred more per person for these benefits. Their stock soars, people celebrate the release and your management says they can't wait to see the improvement in workflows now that they've been able to lay off a quarter of your team.
  Has there been improvements in LLMs over time? Somewhat, most of it concentrated at the beginning (because they siphoned up a bunch of data in a dubious manner). Now it's just part of their sales cycle, to keep pumping up numbers while no one sees any meaningful improvement.
- mathieuh 7 months ago
  
  It’s the same for me. I genuinely don’t understand how I can be having such a completely different experience from the people who rave about ChatGPT. Every time I’ve tried it’s been useless.
  How can some people think it’s amazing and has completely changed how they work, while for me it makes mistakes that a static analyser would catch? It’s not like I’m doing anything remarkable, for the past couple of months I’ve been doing fairly standard web dev and it can’t even fix basic problems with HTML. It will suggest things that just don’t work at all and my IDE catches, it invents APIs for packages.
  One guy I work with uses it extensively and what it produces is essentially black boxes. If I find a problem with something “he” (or rather ChatGPT) has produced it takes him ages to commune with the machine spirit again to figure out how to fix it, and then he still doesn’t understand it.
  I can’t help but see this as a time-bomb, how much completely inscrutable shite are these tools producing? In five years are we going to end up with a bunch of “senior engineers” who don’t actually understand what they’re doing?
  Before people cry “o tempora o mores” at me and make parallels with the introduction of high-level languages, at least in order to write in a high-level language you need some basic understanding of the logic that is being executed.
  
  lm28469 7 months ago
  
  > How can some people think it’s amazing and has completely changed how they work, while for me it makes mistakes that should a static analyser would catch?
  There are a lot of code monkeys working on boilerplate code, these people used to rely on stack overflow and now that chatgpt is here it's a huge improvement for them
  If you work on anything remotely complex or which hasn't been solved 10 times on stack overflow chatgpt isn't remotely as useful
  
  skinner_ 7 months ago
  
  I work on very complex problems. Some of my solutions have small, standard substeps that now I can reliably outsource to ChatGPT. Here are a few just from last week:
  - write cvxpy code to find the chromatic number of a graph, and an optimal coloring, given its adjecency matrix.
  - given an adjecency matrix write numpy code that enumerates all triangle-free vertex subsets.
  - please port this old code from tensorflow to pytorch: ...
  - in pytorch, i'd like to code a tensor network defining a 3-tensor of shape (d, d, d). my tensor consists of first projecting all three of its d-dimensional inputs to a k-dimensional vector, typically k=d/10, and then applying a (k, k, k) 3-tensor to contract these to a single number.
  All were solved by ChatGPT on the first try.
  
  lazypenguin 7 months ago
  
  To be honest, these don’t sound like hard problems. These sound like they have very specific answers that I might find in the more specialized stackoverflow sections. These are also the kind of questions (not in this domain) that I’ve found yield the best results from LLMs.
  In comparison asking an LLM a more project specific question “this code has a race condition where is it” while including some code usually is a crapshoot and really depends if you were lucky enough to give it the right context anyway.
  
  skinner_ 7 months ago
  
  Sure, these are standard problems, I’ve said so myself. My point is that my productivity is multiplied by ChatGPT, even if it can only solve standard problems. This is because, although I work on highly non-standard problems (see https://arxiv.org/abs/2311.10069 for an example), I can break them down into smaller, standard components, which ChatGPT can solve in seconds. I never ask ChatGPT "where's the race condition" kind of questions.
  
  aprilthird2021 7 months ago
  
  He doesn't mean complex/ research-level tasks.
  He means complex in bridging together many non-public libraries, APIs and services which the LLM doesn't know well. That kind of complexity
  
  vrighter 7 months ago
  
  first time I tried it, I asked it to find bugs in a piece of very well tested C code.
  It introduced an off-by-one error by miscounting the number of arguments in an sprintf call, breaking the program. And then proceeded to fail to find that bug that it introduced.
  
  williamcotton 7 months ago
  
  I found it very useful for writing a lexer and parser for a search DSL and React component recently:
  https://github.com/williamcotton/search-input-query
  
  zeroonetwothree 7 months ago
  
  Interesting. I implemented something very similar (if not identical) a couple years ago (at work so not open source). I used a simple grammar and standard parser generator. It’s been nice to have the grammar as we’ve made tweaks over the years to change various behaviours and add features.
  
  ben_w 7 months ago
  
  > How can some people think it’s amazing and has completely changed how they work, while for me it makes mistakes that should a static analyser would catch? It’s not like I’m doing anything remarkable, for the past couple of months I’ve been doing fairly standard web dev and it can’t even fix basic problems with HTML.
  Part of this is, I think, anchoring and expectation management: you hear people say it's amazing and wonderful, and then you see it fall over and you're naturally disappointed.
  My formative years started off with Commodore 64 basic going "?SYNTAX ERROR" from most typos plus a lot of "I don't know what that means" from the text adventures, then Metrowerks' C compiler telling me there were errors on every line *after but not including* the one where I forgot the semicolon, then surprises in VisualBasic and Java where I was getting integer division rather than floats, then the fantastic oddity where accidentally leaning on the option key on a mac keyboard while pressing minus turns the minus into an n-dash which looked completely identical to a minus on the Xcode default font at the time and thus produced a very confusing compiler error…
  So my expectations have always been low for machine generated output. And it has wildly exceeded those low expectations.
  But the expectation management goes both ways, especially when the comparison is "normal humans" rather than "best practices". I've seen things you wouldn't believe...
  Entire files copy-pasted line for line, "TODO: deduplicate" and all, 20 minute app starts passed off as "optimized solutions." FAQs filled with nothing but Bob Ross quotes, a zen garden of "happy little accidents." I watched iOS developers use UI tests as a complete replacement for storyboards, bi-weekly commits, each a sprawling novel of despair, where every change log was a tragic odyssey. Google Spreadsheets masquerading as bug trackers, Swift juniors not knowing their ! from their ?, All those hacks and horrors… lost in time, Time to deploy.
  (All true, and all pre-dating ChatGPT).
  > It will suggest things that just don’t work at all and my IDE catches, it invents APIs for packages.
  Aye. I've even had that with models forgetting the APIs they themselves have created, just outside the context window.
  To me, these are tools. They're fantastic tools, but they're not something you can blindly fire-and-forget…
  …fortunately for me, because my passive income is not quite high enough to cover mortgage payments, and I'm looking for work.
  > In five years are we going to end up with a bunch of “senior engineers” who don’t actually understand what they’re doing?
  Yes, if we're lucky.
  If we're not, the models keep getting better and we don't have any "senior engineers" at all.
  
  jonas21 7 months ago
  
  I think the difference comes down to interacting with it like IDE autocomplete vs. interacting with it like a colleague.
  It sounds like you're doing the former -- and yeah, it can make mistakes that autocomplete wouldn't or generate code that's wrong or overly complex.
  On the other hand, I've found that if you treat it more like a colleague, it works wonderfully. Ask it to do something, then read the code and ask follow-up questions. If you see something that's wrong or just seems off, tell it, and ask it to fix it. If you don't understand something, ask for an explanation. I've found that this process generates great code that I often understand better than if I had written it from scratch, and in a fraction of the time.
  It also sounds like you're asking it to do basic tasks that you already know how to do. I find that it's most useful in tackling things that I don't know how to do. It'll already have read all of the documentation and know the right way to call whatever APIs, etc, and -- this is key -- you can have a conversation with it to clear up anything that's confusing.
  This takes a big shift in mindset if you've been using IDEs all your life and have expectations of LLMs being a fancy autocomplete. And you really have to unlearn a lot of stuff to get the most out of them.
  
  LinearEntropy 7 months ago
  
  I'm in the same boat as the person you're responding to. I really don't understand how to get anything helpful out of ChatGPT, or more than anything basic out of Claude.
  > I've found that if you treat it more like a colleague, it works wonderfully. This is what I've been trying to do. I don't use LLM code completion tools. I'll ask anything from how to do something "basicish" with html & css, and it'll always output something that doesn't work as expected. Question it and I'll get into a loop of the same response code, regardless of how I explain that it isn't correct.
  On the other end of the scale, I'll ask about an architectural or design decision. I'll often get a response that is in the realm of what I'd expect. When drilling down and asking specifics however, the responses really start to fall apart. I inevitably end up in the loop of asking if an alternative is [more performant/best practice/the language idiomatic way] and getting the "Sorry, you're correct" response. The longer I stay in that loop, the more it contradicts itself, and the less cohesive the answers get.
  I _wish_ I could get the results from LLMs that so many people seem to. It just doesn't happen for me.
  
  ALittleLight 7 months ago
  
  My approach is a lot of writing out ideas and giving them to ChatGPT. ChatGPT sometimes nods along, sometimes offers bad or meaningless suggestions, sometimes offers good suggestions, sometimes points out (what should have been) obvious errors or mistakes. The process of writing stuff out is useful anyway and sometimes getting good feedback on it is even better.
  When coding I will often find myself in kind of a reverse pattern from how people seem to be using ChatGPT. I work in a jupyter notebook in a haphazard way getting things to functional and basically correct, after this I select all, copy, paste, and ask ChatGPT to refactor and refine to something more maintainable. My janky blocks of code and one offs become well documented scripts and functions.
  I find a lot of people do the opposite, where they ask ChatGPT to start, then get frustrated when ChatGPT only goes 70% of the way and it's difficult to complete the imperfectly understood assignment - harder than doing it all yourself. With my method, where I start and get things basically working, ChatGPT knows what I'm going for, I get to do the part of coding I enjoy, and I wind up with something more durable, reusable, and shareable.
  Finally, ChatGPT is wonderful in areas where you don't know very much at all. One example, I've got this idea in my head for a product I'll likely never build - but it's fun to plan out.
  My idea is roughly a smart bidet that can detect metabolites in urine. I got this idea when a urinalysis showed I had high levels of ketones in my urine. When I was reading about what that meant I discovered it's a marker for diabetic ketoacidosis (a severe problem for ~100k people a year) and it can also be indicator for colorectal cancer as well as indicating a "ketosis" state that some people intentionally try to enter for dieting or wellness reasons. (My own ketones were caused by unintentionally being in ketosis, I'm fine, thanks for wondering.)
  Right now, you detect ketones in urine with a strip that you pee on, and that works well enough - but it could be better because who wants to use a test strip all the time? Enter the smart bidet. The bidet gives us an excuse to connect power to our device and bring the sensor along. Bluetooth detects a nearby phone (and therefore identity of the depositor), a motion sensor can detect a stream of urine triggering our detection, and then use our sensor to detect ketones which we track overtime in the app, ideally with additional metabolites that have useful diagnostic purposes.
  How to detect ketones? Is it even possible? I wonder to ChatGPT if spectroscopy is the right method of detection here. ChatGPT suggests a retractable electrochemical probe similar to an extant product that can detect a kind of ketone in blood. ChatGPT knows what kind of ketone is most detectable in urine. ChatGPT can link me to scientific instrument companies that make similar (ish) probes where I could contact them and ask if they sold this type of thing, and so on.
  Basically, I go from peeing on a test strip and wondering if I could automate this to chat with ChatGPT - having, what was in my opinion, an interesting conversation with the LLM, where we worked through what ketones are, the different kinds, the prevalence of ketones in different bodily fluids, types of spectroscopy that might detect acetoacetate (available in urine) and how much that would cost and what challenges would be and so on, followed by the idea of electrochemical probes and how retracting and extending the probe might prolong its lifespan and maybe a heating element could be added to dry the probe to preserve it even better and so on.
  Was ChatGPT right about all that? I don't know. If I were really interested I would try to validate what it said, and I suspect I would find it was mostly right and incomplete or off in places. Basically like having a pretty smart and really knowledgeable friend who is not infallible.
  Without ChatGPT I would have likely thought "I wonder if I can automate this", maybe googled for some tracking product, then forgot about it. With ChatGPT I quickly got a much better understanding of a system that I glancingly came into conscious contact with.
  It's not hard to project out that level of improved insight and guess that it will lead to valuable life contributions. In fact, I would say it did in that one example alone.
  The urinalysis (which was combined with a blood test) said something like "ketones +3" and if you google "urine ketones +3" you get a explanations that don't apply to me (alcohol, vigorous exercise, intentional dieting) or "diabetic ketoacidosis" which google warns you is a serious health condition.
  In the follow up with the doctor I asked about the ketones. The doctor said "Oh, you were probably just dehydrated, don't worry about it, you don't have diabetic ketoacidosis" and the conversation moved on and soon concluded. In the moment I was just relieved there was an innocent explanation. But, as I thought about it, shouldn't other results in the blood or urine test indicate dehydration? I asked ChatGPT (and confirmed on Google) and sure enough there were 3 other signals that should have been there if I was dehydrated that were not there.
  "What does this mean?" I wondered to ChatGPT. ChatGPT basically told me it was probably nothing, but if I was worried I could do an at home test - which I didn't even know existed (though I could have found through carefully reading the first google result). So I go to Target and get an at home test kit (bottle of test strips), 24 gatorades, and a couple liters of pedialyte to ensure I'm well hydrated.
  I start drinking my usual 64 ounces of water a day, plus lots of gatorade and pedialyte and over a couple days I remain at high ketones in urine. Definitely not dehydrated. Consulting with ChatGPT I start telling it everything I'm eating and it points out that I'm just accidentally in a ketogenic diet. ChatGPT suggests some simple carbs for me, I start eating those, and the ketone content of my urine falls off in roughly the exact timeframe that ChatGPT predicted (i.e. it told me if you eat this meal you should see ketones decline in ~4 hours).
  Now, in some sense this didn't really matter. If I had simply listened to my doctor's explanation I would've been fine. Wrong, but fine. It wasn't dehydration, it was just accidentally being in a ketogenic diet. But, I take all this as evidence of how ChatGPT now, as it exists, helped me to understand my test results in a way that real doctors weren't able to - partially because ChatGPT exists in a form where I can just ping it with whatever stray thoughts come to mind and it will answer instantly. I'm sure if I could just text my doctor those same thoughts we would've come to the same conclusion.
  
  vixen99 7 months ago
  
  I believe the smart bidet was an idea some Japanese researchers developed some years ago. Maybe this one was geared to detecting blood in faeces. Whatever,the approach you describe has a huge number of possibilities for alerting us to health problems without even having to think about them on a daily basis. A huge advantage. On the other hand this is a difficult one to implement bearing in mind the kinetics involved.
  
  CSMastermind 7 months ago
  
  I mean if you're getting no value out of ChatGPT I'd love to have a session seeing how you use it.
  
  globular-toast 7 months ago
  
  The ones who use it extensively are the same that used to hit up stackoverflow as the first port of call for every trivial problem that came their way. They're not really engineers, they just want to get stuff done.
  
  phist_mcgee 7 months ago
  
  No ad hominem please.
  
  globular-toast 7 months ago
  
  Hmm... calling people "not engineers" is considered an attack now? I'm afraid this is actually revealing your own bias towards engineers. I never said engineers were superior or that we'd be better off with a whole world full of them.
  
  phist_mcgee 7 months ago
  
  Nice try mate, but you're not flipping this one on me.
- omega3 7 months ago
  
  Same, on every release from openai, anthropic I keep reading how the new model is so much better (insert hyperbole here) than the previous one yet when using it I feel like they are mostly the same as last year.
- delusional 7 months ago
  
  I'd say the same. I've tried a bunch of different AI tools, and none of them really seem all that helpful.
  
  ogogmad 7 months ago
  
  One use-case: They help with learning things quickly by having a chat and asking questions. And they never get tired or emotional. Tutoring 24/7.
  They also generate small code or scripts, as well as automate small things, when you're not sure how, but you know there's a way. You need to ensure you have a way to verify the results.
  They do language tasks like grammar-fixing, perfect translation, etc.
  They're 100 times easier and faster than search engines, if you limit your uses to that.
  
  vintermann 7 months ago
  
  They can't help you learn what they don't know themselves.
  I'm trying to use them to read historical handwritten documents in old Norwegian (Danish, pretty much). Not only do they not handle the German-style handwriting, but what they spit out looks like the sort of thing GPT-2 would spit out if you asked it to write Norwegian (only slightly better than Swedish Muppet Swedish Chef's Swedish). It seems the experimental tuning has made it worse at the task I most desperately want to use it for.
  And when you think about it, how could it not overfit in some sense, when trained on its own output? No new information is coming in, so it pretty much has to get worse at something to get better at all the benchmarks.
  
  ben_w 7 months ago
  
  > perfect translation
  Hah, no. They're good, but they definitely make stuff up when the context gets too long. Always check their output, just the same as you already note they need for small code and scripts.
Xcelerate 7 months ago

I had a 30 min argument with o1-pro where it was convinced it had solved the halting problem. Tried to gaslight me into thinking I just didn’t understand the subtlety of the argument. But it’s susceptible to appeal to authority and when I started quoting snippets of textbooks and mathoverflow it finally relented and claimed there had been a “misunderstanding”. It really does argue like a human though now...
- radioactivist 7 months ago
  
  I had a similar experience with regular o1 about integral that was divergent. It was adamant that it wasn't and would respond to any attempt at persuasion with variants of "its a standard integral" with a "subtle cancellation". When I asked for any source for this standard integral it produced references to support its argument that existed but didn't actually contain the integral. When I told it the references didn't have the result and backpedalled (gaslighting!) to "I never told you they were in there". When I pointed out that in fact it did it insisted this was just a "misunderstanding". It only relented when I told it Mathematica agreed the integral was divergent. It still insisted it never said that the books it pointed to contained this (false, non-sensical) result.
  This was new behaviour for me to see in an LLM. Usually the problem is these things would just fold when you pushed back. I don't know which is better, but being this confidently wrong (and "lying" when confronted with it) is troubling.
  
  Animats 7 months ago
  
  > but being this confidently wrong (and "lying" when confronted with it) is troubling.
  It works in politics, marketing, and self-promotion.
  If you use the web as a training set, those categories dominate.
  
  layer8 7 months ago
  
  Maybe they also trained the model on Sam Altman. ;)
  
  vixen99 7 months ago
  
  Please don't crack ones like that when I'm drinking my coffee.
  
  justatdotin 7 months ago
  
  I've also had it invent non-existent references.
  > being this confidently wrong (and "lying" when confronted with it) is troubling.
  I don't find it troubling. I like being reminded to distrust and confirm everything it offers.
  
  radioactivist 7 months ago
  
  The troubling part is that the references themselves existed -- one was an obscure Russian text that is difficult to find (but is exactly where you'd expect to find this kind of result, if it existed).
- phillipharris 7 months ago
  
  This sounds fun to read, can you share the transcript?
1123581321 7 months ago

O1 is effective, but it’s slow. I would expect a GPT-5 and mini to work as quickly as the 4 models.
ldjkfkdsjnv 7 months ago

It basically solves all bugs/programming challenges i throw at it, given i give it the right data
apwell23 7 months ago

what do you use it for ?

simonw 7 months ago

"OpenAI’s is called GPT-4, the fourth LLM the company has developed since its 2015 founding." - that sentence doesn't fill me with confidence in the quality of the rest of the article, sadly.

jacobsimon 7 months ago

There’s nothing grammatically offensive about this. It’s like saying, “Cars come in all colors. Mine is red.”
- simonw 7 months ago
  
  No, I'm complaining that just because GPT-4 is called GPT-4 doesn't mean it's the fourth LLM from OpenAI.
  Off the top of my head: GPT-2, Codex, GPT-3 in three different flavors (babbage, curie, davinci), GPT-3.5.
  Suggesting that GPT-4 was "fourth" simply isn't credible.
  Just the other day they announced a jump from o1 to o3, skipping o2 purely because it's already the name of a major telecommunications brand in Europe. Deriving anything from the names of OpenAI's products doesn't make sense.
  
  benatkin 7 months ago
  
  While I’m sure it’s unintentional, that amounts to nitpicking. I can easily find three to include and pass over the rest. Face value turns out to be a decent approximation.
  
  simonw 7 months ago
  
  If this was a random blog post I wouldn't nitpick, but this is the Wall Street Journal.
  
  benatkin 7 months ago
  
  The thing is that I think it could be an optimal way of saying it. Should we not put it into context of making a particular LLM? Why count three versions of three LLMs? They made it hard to choose the one that makes up for not having GPT 1. GPT 3.5 and Codex are both good candidates. And of course calling GPT 4 the third and fifth could be considered as well.
  
  simonw 7 months ago
  
  "OpenAI's fourth family of LLMs" or "fourth generation of LLMs" would work for me.
  
  benatkin 7 months ago
  
  That doesn’t resolve the problem of whether third or fifth is better than fourth. I have yet to be convinced that their wording here shows that they fail to grasp the pace of the development.
  
  bluelightning2k 7 months ago
  
  There's at least 4 major releases just in GPT4.
  GPT4, GPT4T, Gpt4o-Mini, GPT4o,
  
  maeil 7 months ago
  
  If we're generous the article considers versions that were significant improvements. 4o is hardly better on real-world usage (benchmarks are gamed to death) than the original 4.
  
  efilife 7 months ago
  
  There are releases. Releases is plural
  
  xanderlewis 7 months ago
  
  It’s somehow funny to hear a British company being described as ‘in Europe’, but I suppose you’re technically correct…
  
  plufz 7 months ago
  
  Technically…? Does anyone here believe that the EU and Europe is the same thing? Would you find it weird if someone said that a Norwegian company was in Europe?
  
  xanderlewis 7 months ago
  
  Many people certainly seem to! And it annoys me. I wasn’t talking about the EU, though.
  I was just commenting on the fact that in the UK, ‘Europe’ generally means ‘continental Europe’.
  > Would you find it weird if someone said that a Norwegian company was in Europe?
  I’d find it weird if a European did. But from Americans it’s to be expected.
  
  maeil 7 months ago
  
  > Would you find it weird if someone said that a Norwegian company was in Europe?
  > I’d find it weird if a European did. But from Americans it’s to be expected.
  Absolutely nothing weird about it, I'd find it very weird if they wouldn't. I'm from Europe and my social circle has people from all over Europe.
  It's really just the UK which has this weird usage of Europe.
  
  thrwthsnw 7 months ago
  
  Which Americans, North or South?
  
  xanderlewis 7 months ago
  
  As far as I know, Americans refer to themselves as Americans and South Americans do not.
  
  User23 7 months ago
  
  If Norway isn’t in Europe where is it? Asia?
  
  eru 7 months ago
  
  Well, Europe is a subcontinent of Asia. A bit like India or Arabia.
  
  3836293648 7 months ago
  
  No?
  Europe is a subcontinent of Eurasia, as is Asia. Probably not the naming scheme in all languages, but this is English
  
  xanderlewis 7 months ago
  
  I didn’t say Norway isn’t in Europe. Read my comment carefully.
  
  boomskats 7 months ago
  
  > I was just commenting on the fact that in the UK, ‘Europe’ generally means ‘continental Europe’.
  It really depends on who you're speaking to.
  
  umanwizard 7 months ago
  
  And on the context
  
  RandomThoughts3 7 months ago
  
  [flagged]
  
  Philpax 7 months ago
  
  I'd suggest you level up your reading comprehension before suggesting the parent poster was in any way offended or in need of therapy.
  
  xanderlewis 7 months ago
  
  Yeah… I’m a bit surprised.
  
  RandomThoughts3 7 months ago
  
  Parent is suggesting it would be weird for Europeans to call the UK as in Europe which as a European I can tell you is preposterous. That’s the kind of non sense you used to hear from Brexiter. They will have no sympathy from me.
  
  dgfitz 7 months ago
  
  No no no you missed it, clearly Americans are just stupid.
  
  SahAssar 7 months ago
  
  The UK is part of Europe. It's technically, geographically, politically, historically, lingustially, tectonically and socially correct. In what ways is it not?
  
  xanderlewis 7 months ago
  
  I don’t know — I’m not claiming to. I’m simply claiming that it’s a commonly-held belief.
  
  umanwizard 7 months ago
  
  Are Cuba or Haiti part of North America? A lot of British people feel like their civilization is meaningfully distinct from “Europe”, even though they’re part of it in a technical geographical sense.
  
  SahAssar 7 months ago
  
  > Are Cuba or Haiti part of North America
  In general yes, but it depends on if you consider central america as its own continent and if you include them there and how you delineate north/south america. Groupings differ based on your education.
  I think the thing that makes the UK different is that there is no other option besides them being a separate thing/continent. Are you suggesting that the UK is it's own continent? Would that be with the faroese and the Greenlanders?
  The UK might feel different, but they are not separate. The french feel different from the bulgarians, but that does not mean they are on a separate continent, politically or geographically.
  EDIT:
  > A lot of British people feel like their civilization is meaningfully distinct
  This is, to borrow a word, "balderdash". Looking at the influence vikings, romans and normans have had that is a rubbish argument. Just like other countries in europe the british culture is built on the stones of other cultures, and just like many other countries they subsumed other cultures because of kings or other political dominance.
  
  umanwizard 7 months ago
  
  Continents are not objective reality, they are semi-arbitrary groupings vaguely correlated with geography, culture, etc.
  If British people don’t feel like they’re part of “the Continent”, there’s little objective reason to say they are.
  
  SahAssar 7 months ago
  
  But I'm guessing we can agree that any major landmass is generally belonging to a continent? Like we all agree that greenland, new zealand, japan, etc generally belong to a continent?
  So to what continent do those british people think they belong?
  
  mkl 7 months ago
  
  New Zealand is not part of a continent (unless you consider Zealandia [1] one, which few do). It's a bunch of islands in the middle of the sea, far from other land. It is part of named regions which sometimes substitute for continents when people want to divide up the world for some purpose like sports or economics, including Oceania and Australasia.
  Great Britain (the island) is very close to mainland Europe, and was directly part of it a few thousand years ago. The situation is totally different.
  [1] https://en.wikipedia.org/wiki/Zealandia
  
  SahAssar 7 months ago
  
  > when people want to divide up the world
  That's pretty much the definition of continent, right? The term continent is not scientifically based unless you want to argue that there are 16-ish continents and that South Georgia is it's own continent (and even tectonically its arbitrary since what we consider to be major, minor, micro are arbitrary).
  
  umanwizard 7 months ago
  
  If you asked someone directly “what continent is Britain part of”, they would surely say Europe, even if they would be unlikely to describe themselves as European. Language is funny that way.
  
  SahAssar 7 months ago
  
  So you agree (and think that most people would) that the UK is part of Europe?
  
  umanwizard 7 months ago
  
  I would agree that in some, but not all, of the contexts where the word “Europe” is used, it includes the UK.
  
  wsintra2022 7 months ago
  
  British people don’t think anything, there are British individuals who may think but collectively the “British” do not have a thought.
  
  SahAssar 7 months ago
  
  I specifically asked what "those british people" think in response to a post saying "If British people don’t feel like they’re part of “the Continent”".
  I was clearly asking what those specific british people think.
  
  wsintra2022 7 months ago
  
  But how could anyone possibly ever answer such a question? To know what a group of people truly think… seems beyond my mind to understand
  
  reshlo 7 months ago
  
  > Would that be with the Faroese and the Greenlanders?
  Greenland is in North America.
  
  SahAssar 7 months ago
  
  The point was that any closeby landmass besides europe is either in europe or in north america, and I have a hard time seeing the argument for UK being in North America or America at all.
  France would have a better argument for it having territory in both north (https://en.wikipedia.org/wiki/Saint_Pierre_and_Miquelon and others) and south (https://en.wikipedia.org/wiki/French_Guiana and others) america.
  
  reshlo 7 months ago
  
  Yes, I agree. Especially about Saint Pierre and Miquelon.[0]
  [0] https://news.ycombinator.com/item?id=41758856#41785534
  
  maeil 7 months ago
  
  The only people who find this funny are the British themselves, the other 99% of the world thinks nothing strange of it.
  
  simonw 7 months ago
  
  https://en.wikipedia.org/wiki/O2_(brand) - "O2 (typeset as O2) is a global brand name owned by the Spanish telecommunications company Telefónica"
  
  xanderlewis 7 months ago
  
  It’s a British brand, even if it’s now owned by someone else. It even says so on the page you link to.
  
  lobochrome 7 months ago
  
  Well - it’s Spanish now no? Telefonica bought them.
  
  blinding-streak 7 months ago
  
  Europe != EU
  
  vasco 7 months ago
  
  Imagine coming up with a naming scheme for the versioning of your product just for it to fail on the second time you want to use it.
  
  zapnuk 7 months ago
  
  Should have used chatGPT to ask for a name or at least check it
- lelandfe 7 months ago
  
  It’s more like saying “the Audi Quattro, the company’s fourth car…”
  
  benatkin 7 months ago
  
  Because there’s an Audi Tre e Mezzo?
- dghlsakjg 7 months ago
  
  The issue isn't the grammar. It is that there are 5 distinct LLMs from OpenAI that you can use right now as well as 4 others that were deprecated in 2024.
Narretz 7 months ago

> Several months later, Google launched the most viral new AI application of the year, called NotebookLM
When I read this I was honestly confused. I had never heard of NotebookLM before.
Rodmine 7 months ago

Also, it was not a company when it was founded in 2015. It was in 2019 when they decided to change the non-profit org to a for-profit company.
overgard 7 months ago

The article definitely has issues, but to me what's relevant is where it's published. The smart money and experts without a vested interest have been well aware LLMs are an expensive dead for over a year and have been saying as much (Gary Marcus for instance). That this is starting to enter mainstream consciousness is what's newsworthy.
- astrange 7 months ago
  
  Gary Marcus is just an anti-AI crank to balance out the pro-AI cranks. He's not credible.
- icpmacdo 7 months ago
  
  Gary Marcus is continuously lambasted and not taken seriously
  
  overgard 7 months ago
  
  By whom? He seems highly credible to me, and his credentials check out, especially compared to hype men like Sam Altman. All youre doing is spreading FUD by an unnamed "they"
  
  icpmacdo 7 months ago
  
  He only criticizes ai capabilities, without creating anything himself. Credentials are effectively meaningless. With every new release, he clamors for attention to prove how right he was—and always will be. That’s precisely why he lacks credibility.
  
  overgard 7 months ago
  
  He started and then sold a machine learning startup to Uber. He's also written multiple books about the construction of the human mind and he has a PhD from MIT. I would hardly call that creating nothing. He's not clamoring for attention, he's asking that AI be regulated and pointing out a lot of the glaring issues with the field.
maxrmk 7 months ago

I was wondering about this one too...
> At best, they say, Orion performs better than OpenAI’s current offerings, but hasn’t advanced enough to justify the enormous cost of keeping the new model running.
wdym "keep it running"?
- overgard 7 months ago
  
  Well, those server farms don't pay for themselves.
  
  maxrmk 7 months ago
  
  sure, but once it's trained there isn't a running maintenance cost
  
  wongarsu 7 months ago
  
  If you offer an API you need to dedicate servers to it that keep the model loaded in GPU memory. Unless you don't care about latency at all.
  Though I wouldn't be surprised if the bigger reason is the PR cost of releasing with an exciting name but unexciting results. The press would immediately declare the end of the AI growth curve
  
  wavemode 7 months ago
  
  Of course running inference costs money. You think GPUs are free?
  
  bhouston 7 months ago
  
  Well if it takes a ton of memory/compute for inference because of its size, it may be cost prohibitive to run compared to the ROI it generates?
  
  overgard 7 months ago
  
  There definitely is, storage, machines at the ready, data centers, etc. Also OpenAI basically loses money every time you interact with ChatGPT https://www.wheresyoured.at/subprimeai/
404mm 7 months ago

Quite funny that an article about AI was not fed to AI to proof read it.
- viraptor 7 months ago
  
  Editing mistakes that AI wouldn't make is the new "proof of human input".
  
  KTibow 7 months ago
  
  I've been messing around with base (not instruction tuned) LLMs; they often evade AI detectors and I wouldn't be surprised if they evade this kind of detection too, at least with a high temperature
  
  staunton 7 months ago
  
  > with a high temperature
  More like: with the right prompting
- ToucanLoucan 7 months ago
  
  Bold of you to assume AI didn't write it, too.
dheera 7 months ago

Articles these days are probably written by ChatGPT
- MichaelDickens 7 months ago
  
  I doubt it, if you ask ChatGPT whether GPT-4 is OpenAI's fourth LLM, it gives the correct answer. That's the sort of thing GPT-2 might have said.
  
  Grimblewald 7 months ago
  
  Well, here's the interesting part - gpt2 has been writing news since well before gpt3 was launched. Remember when "news" started getting weirdly reptative? When just about any product had a review avaliable? When the amount of slop content just _exploded_? Thats when the ai colonization of the internet began.

construct0 7 months ago

The world is figuring out how to make this technology fit and work and somehow this is "behind" schedule. It's almost comical.

echelon 7 months ago

For a company that sees itself as the undisputed leader and that wants to raise $7 trillion to build fabs, they deserve some of the heaviest levels of scrutiny in the world.
If OpenAI's investment prospectus relies on them reaching AGI before the tech becomes commoditized, everyone is going to look for that weakness.
diego_sandoval 7 months ago

Reminds me of this Louis CK joke:
I was on an airplane and there was high-speed Internet on the airplane. That's the newest thing that I know exists. And I'm sitting on the plane and they go, open up your laptop, you can go on the Internet.
And it's fast, and I'm watching YouTube clips. It's amazing. I'm on an airplane! And then it breaks down. And they apologize, the Internet's not working. And the guy next to me goes, 'This is bullshit.' I mean, how quickly does the world owe him something that he knew existed only 10 seconds ago?"
https://www.youtube.com/watch?v=me4BZBsHwZs
- mensetmanusman 7 months ago
  
  The investors need their returns now!
  Soon, all the middle class jobs will be converted to profits for the capital/data center owners, so they have to spend while they can before the economy crashes due to lack of spending.
- omega3 7 months ago
  
  People who say „it’s bullshit” are the ones that push the technological advance forward.
  
  from-nibly 7 months ago
  
  Not invariably. Some of those people are the ones who want to draw 7 red lines all perpendicular, some with green ink, some with transparent and one that looks like a kitten.
  
  ziml77 7 months ago
  
  For anyone who hasn't seen what this comment is referencing: https://www.youtube.com/watch?v=BKorP55Aqvg
  
  taneq 7 months ago
  
  No, people who say "it's bullshit" and then do something to fix the bullshit are the ones that push technology forward. Most people who say "it's bullshit" instantly when something isn't perfect for exactly what they want right now are just whingers and will never contribute anything except unconstructive criticism.
  
  omega3 7 months ago
  
  Sounds like "yes but" rather than "no" otherwise you're responding to self created straw man.
  
  bobxmax 7 months ago
  
  That's really not true.
jsheard 7 months ago

[flagged]
- beacon294 7 months ago
  
  There's someone with this comment in every thread. Meanwhile, no one answers this because they are getting value. Please take the time to learn, it will give you value.
  
  jncfhnb 7 months ago
  
  I’m a consultant. Having looked at several enterprises, there’s a lot of work being done to make a lot of things that don’t really work.
  The bigger the ambition, the harder they’re failing. Some well designed isolated use cases are ok. Mostly things about listening and summarizing text to aid humans.
  I have yet to see a successful application that is generating good content. IMO replacing the first draft of content creation and having experts review and fix it is, like, the stupidest strategy you can do. The people you replace are the people at the bottom of the pyramid who are supposed do this work to upskill and become domain experts so they can later review stuff. If they’re no longer needed, you’re going to one day lose your reviewer, and with it, the ability to assess your generated drafts. It’s a foot gun.
  
  qup 7 months ago
  
  > Having looked at several enterprises, there’s a lot of work being done to make a lot of things that don’t really work.
  Is this a new phenomenon that started post-LLM?
  
  jncfhnb 7 months ago
  
  I mean, no, not generally. but the success rate of other tools is much higher.
  A lot of companies are trying to build these general purpose bots that just magically know everything about the company and have these but knowledge bases, but they just don’t work.
  
  shadowerm 7 months ago
  
  It gives me value but I am not even sure it is $20 a month of value at this point.
  It was in 2023 but I picked all the low hanging fruit.
  More importantly though, where is all the great output from the people who are getting so much value out of the models?
  It is all privately held? How can that be with millions of people using these models?
  
  nyarlathotep_ 7 months ago
  
  I'm someone who generally was a "doubter", but I've dramatically softened my stance on this topic.
  Two things: I was casually watching Andreas Kling's streams on Ladybird development (where he was developing a JIT compiler for JS) and was blown away at the accuracy of completions (and the frequency of those completions)
  Prior to this, I'd only ever copypasta'd code from ChatGPT output on occasion.
  I started adopting the IDE/Editor extensions and prototyping small projects.
  There's now small tools and utilities I've written that I'd not have written otherwise, or would have taken twice the time invested had I'd not used these tools.
  With that said, they'd be of no use without oversight, but as a productivity enhancement, the benefits are enormous.
  
  shepherdjerred 7 months ago
  
  For my mental health I’ve stopped replying to comments where it’s clear the author has no intention of having a discussion and instead wants their share their opinion and have it reinforced by others.
  No, we don’t have AGI or anything close to it. Yes, AI has come a long way in the past decade and many people find it useful in their day-to-day lives.
  It’s difficult to know where AI will be in 10 years, but the current rate of improvement is staggering.
  
  __loam 7 months ago
  
  Something can generate value and still have negative unit economics.
  
  ADeerAppeared 7 months ago
  
  > Meanwhile, no one answers this because they are getting value.
  You're literally doing the same thing you're accusing of. Every HN thread is full of AI boosters claiming AI to be the future with no backing evidence.
  Riddle me this. If all these people are "getting value", why are all these companies losing horrendous amounts of money? Why has nobody figured out how to be profitable?
  > Please take the time to learn, it will give you value.
  Yeah, yeah, just prompt engineer harder. That'll make the stochastic parrot useful. Anyone who has criticism just does so because they're dumb and you're smart. Same as it always was. Everyone opposed to the metaverse just didn't get it bro. You didn't get NFTs bro. You didn't get blockchain bro.
  None of these previous bubbles had money in it (beyond scamming idiots), if AI wants to prove it's not another empty tech bubble, pay up. Show me the money. Should be easy, if it's automating so many expensive man-hours of labour. People would be lining up to pay OpenAI.
  
  josh-sematic 7 months ago
  
  There’s clearly some value. People are paying for something.
  > AI start-ups generate money faster than past hyped tech companies
  https://www.ft.com/content/a9a192e3-bfbc-461e-a4f3-112e63d0b...
  
  jncfhnb 7 months ago
  
  > Riddle me this. If all these people are "getting value", why are all these companies losing horrendous amounts of money? Why has nobody figured out how to be profitable?
  While I agree that LLMs are not currently working great for most envisioned use cases; this premise here is not a good argument. Large LLM providers are not trying to be profitable at the moment. They’re trying to grow and that’s pretty sensible.
  Uber was the poster child of this, and for all its mockery, Uber is now an unqualified profitable company.
  
  __loam 7 months ago
  
  I'm not sure I would call incinerating 11b dollars a year to the point where you need to do one of the biggest raises ever and it doesn't even buy you a year of runway sensible.
  
  jncfhnb 7 months ago
  
  Based on their forecasts it’s still pretty sensible. I don’t personally believe the forecasts are sensible. But that’s besides the point.
  
  nuancebydefault 7 months ago
  
  Think of all the search engines alltheweb, yahoo, astalavista,... where sooo much money got poored in, and finally there was just one winner taking it all. That's the race openai is trying to win now. The competition is fierce and we can just play with all kinds of models for free and we do nothing but complaining.
  
  ben_w 7 months ago
  
  > Why has nobody figured out how to be profitable?
  From what I've seen claimed about OpenAI finances, this is easy: It's a Red Queen's race — "it takes all the running you can do, to keep in the same place".
  If their financial position was as simple as "we run this API, we charge X, the running cost is Y", then they're already at X > Y.
  But if that was all OpenAI were actually doing, they'd have stopped developing new versions or making the existing models more efficient some time back, while the rest of the industry kept improving their models and lowering their prices, and they'd be irrelevant.
  > People would be lining up to pay OpenAI.
  They are.
  Not that this is either sufficient or necessary to actually guarantees anything about real value. For lack of sufficiency: people collectively paid a lot for cryptocurrencies and NFTs, too (and before then and outside tech, homeopathic tinctures and sub-prime mortgages); For lack of necessity: there's plenty of free-to-download models.
  I get a huge benefit even just from the free chat models. I could afford to pay for better models, but why bother when free is so good? Every time a new model comes out, the old paid option becomes the new free option.
- ben_w 7 months ago
  
  I use them to:
  • Build toys that would otherwise require me to learn new APIs (I can read python, but it's not my day job)
  • Learn new things like OpenSCAD
  • To improve my German
  • Learn about the world by allowing me to take photos of things in this world that I don't understand and ask them a question about the content, e.g. why random trees have bands or rectangles of white paint on them
  • Help me shopping, by taking a photo of the supermarket that I happen to be in at the time and ask them where I should look for some item I can't find
  • Help with meal prep, by allowing me to get a recipe based on what food and constraints I've got at hand rather than the traditional method of "if you want x, buy y ingredients"
  Even if they're just an offline version of Wikipedia or Google, they're already a more useful interface for the same actual content.
- oytis 7 months ago
  
  That was puzzles me now. Everyone with a semblance of expertise in engineering knows that if you start with a tool and try to find a problem it could solve you are doing it wrong. The right way is the opposite - you start with a problem, and find the best tool to solve it, and if it's the new shiny tool - so be it, but most of the time it's not.
  Except the whole tech world starting with the CEOs seems to do it the "wrong" way with LLMs. People and whole companies are encouraged to find what these things might be actually useful for.

PittleyDunkin 7 months ago

Everyone's comparing o1 and claude, but neither really work well enough to justify paying for them in my experience for coding. What I really want is a mode where they ask clarifying questions, ideally many of them, before spitting out an answer. This would greatly improve utility of producing something with more value than an auto-complete.

coreyh14444 7 months ago

Just tell it to do that and it will. Whenever I ask an AI for something and I'm pretty sure it doesn't have all the context I literally just say "ask me clarifying questions until you have enough information to do a great job on this."
- aimanbenbaha 7 months ago
  
  And this chain of prompts cumulated with the improved CoT reasoner would accrue a lot more enhanced results. More in line with what the coming agentic era promises.
vintermann 7 months ago

Yes. You can only do so much with the information you get in. The ability to ask good questions, not just of itself in internal monologue style, but actually of the user, would fundamentally make it better since it can get more information in.
As it is now, it has a bad habit of, if it can't answer the question you asked, instead answering a similar-looking question which it thinks you may have meant. That is of course a great strategy for benchmarks, where you don't earn any points for saying you don't know. But it's extremely frustrating for real users, who didn't read their question from a test suite.
Vecr 7 months ago

I know multiple people that carefully prompt to get that done. The model outputs in direct token order, and can't turn around, so you need to make sure that's strictly followed. The system can and will come up with post-hoc "reasoning".
simondotau 7 months ago

Just today I got Claude to convert a company’s PDF protocol specification into an actual working python implementation of that protocol. It would have been uncreative drudge work for a human, but I would have absolutely paid a week of junior dev time for it. Instead I wrote it alongside AI and it took me barely more than an hour.
The best part is, I’ve never written any (substantial) python code before.
- OutOfHere 7 months ago
  
  It would seem you don't care too much about verifying its output or about its correctness. If you did, it wouldn't take you just an hour. I guess you'll let correctness be someone else's problem.
  
  simondotau 7 months ago
  
  Your wild assumptions and snarky accusations are unnecessary. The library is for me to use; there isn't a "someone else" for me to pass problems onto. I then did what I usually do — start writing real code with it ASAP, because real code is how you find real problems.
  I developed the library interactively, one API call at a time, in a manner akin to pair programming. Code quality was significantly better than I'd expect from $2000 worth of a GOOD mid-tier programmer — the code was well written, well organised, and comprehensively annotated. The code wasn't perfect, but a majority of faults had a basis in the underlying documentation being wrong or ambiguous.
  The $20/month for Cursor Pro literally justified its cost in less than 10 minutes.
  
  shadowerm 7 months ago
  
  I think many here think they are Claude Shannon himself so using something like Claude is just below such a genius.
  
  djeastm 7 months ago
  
  I don't know the OP here, but in my experience a junior dev at an average company would likely not do much more than the AI would. These aren't your grandfather's engineers, after all.
  
  simondotau 7 months ago
  
  A junior dev wouldn't have produced output of such consistency, and they wouldn't have annotated their code nearly as well. The majority of code was better than I'd expect from a junior, and the comments were better than I'd expect from the majority of people at every skill level.
- weird_fox 7 months ago
  
  I have to agree. It's still a bit hit or miss, but the hits are a huge time and money saver especially in refactoring. And unlike what most of the rather demeaning comments in those HN threads state, I am not some 'grunt' doing 'boilerplate work'. I mostly do geometry/math stuff, and the AIs really do know what they're talking about there sometimes. I don't have many peers I can talk to most of the time, and Claude is really helping me gather my thoughts.
  That being said, I definitely believe it's only useful for isolated problems. Even with Copilot, I feel like the AIs just lack a bigger context of the projects.
  Another thing that helped me was designing an initial prompt that really works for me. I think most people just expect to throw in their issue and get a tailored solution, but that's just not how it works in my experience.
- mitemte 7 months ago
  
  Similar experience here. These tools are so good for side stepping the one or two day grinds.
  
  simondotau 7 months ago
  
  For me, it's allowing me to do things I wouldn't have even attempted before. I'm writing in languages I've never written in before (python) and dealing with stuff I've never dealt with before (multicast UDP). This isn't complicated stuff by any stretch, but AI means I can be highly productive in python without needing to spend any time learning python.
guytv 7 months ago

The alternative to "ask me clarifying question" is to use Claude's Projects. Upload all your projects' source code there, and ask Claude to do your programming task. OpenAI have recently also added this feature to their offering.
qup 7 months ago

Have you used them to build a system to ask you clarifying questions?
Or even instructed them to?
kelsey98765431 7 months ago

have you tested that this helps? seems pretty simple to script with an agent framework
- throwaway314155 7 months ago
  
  Or just f-strings.

bwhiting2356 7 months ago

I want AI to help me in the physical world: folding my laundry, cooking and farming healthy food, cleaning toilets. Training data is not lying around on the internet for free, but it's also not impossible. How much data do you need? A dozen warehouses full of robots folding and unfolding laundry 24/7 for a few months?

bobxmax 7 months ago

We are close. Language models and large vision models have transformed robotics. It just takes some time to get hardware up and running.
- kelnos 7 months ago
  
  I think it would be many decades before I'd trust a robot like that around small children or pets. Robots with that kind of movement capability, as well as the ability it pick up and move things around, will be heavy enough that a small mistake could easily kill a small child or pet.
  
  viraptor 7 months ago
  
  That's a solved problem for small devices. And we effectively have "robots" like that all over the place. Sliding doors in shops/trains/elevators have been around for ages and they include sensors for resistance. Unless there's 1. extreme cost cutting, or 2. bug in the hardware, devices like that wouldn't kill children these days.
  
  nickjj 7 months ago
  
  > Sliding doors in shops/trains/elevators have been around for ages and they include sensors for resistance
  Some of these are pretty crazy too.
  Here's a video from 14 years ago where a table saw stops fast enough that it didn't scratch a hotdog: https://www.youtube.com/watch?v=fq3o0VGUh50
  So even if this hypothetical robot had saws for hands it could be mostly safe (in theory).
  
  layer8 7 months ago
  
  Even for adults, a robot that would likely have to be close to as massive as a human being, in order to do laundry and the like, would spook me out, moving freely through my place.
  
  foxglacier 7 months ago
  
  You'd learn to trust it. People have pet dogs that could kill them if they wanted. As can other humans walking around your house.
  
  Jensson 7 months ago
  
  People also have essentially wild beasts in their home: cats. If cats were the size of small dogs they would kill people all the time, but we love them when they are small enough so they just claw you bloody.
  Since we can live with that we can live with anything that doesn't outright murder us.
- leonheld 7 months ago
  
  > have transformed robotics
  Did they? Where? Seriously, I genuinely want to know who is employing these techniques.
  
  fragmede 7 months ago
  
  https://www.figure.ai/
  specifically their speech demo video (which is, of course, a demo video)
  https://youtu.be/Sq1QZB5baNw
  https://www.1x.tech/neo and
  https://www.unitree.com/h1/
  are undoubtedly using such models.
  It's an area of active research, eg
  https://www.physicalintelligence.company/blog/pi0
  https://wholebody-b1.github.io/
  https://ok-robot.github.io/
  https://mobile-aloha.github.io/
  
  bobxmax 7 months ago
  
  All frontier labs are now employing LVMs or LLMs. But that's my point is you won't see the fruits of it this early.
  
  achierius 7 months ago
  
  That's the point being made. It's transformed robotics research, yes, but it both remains to see whether it will have a truly transformative effect on the field as experienced by people outside academia (I think this is quite probable) and more pointedly when.
  
  bobxmax 7 months ago
  
  I think it's impossible to spend a lot of time with these models without believing robotics is fundamentally about to transform. Even the most sophisticated versions of robotic logic pre-LLM/VLM feel utterly trivial compared to what even rudimentary applications of these large models can accomplish.
  
  lucianbr 7 months ago
  
  > have transformed robotics
  When questioned:
  > believing robotics is fundamentally about to transform
  These are not even remotely the same thing. Something that has happened already and is verifiable fact is not the same thing as your opinion, even if your opinion is based on a lot of sound arguments and reasoning.
  Very tiresome to read so many claims of fact based on opinion of what will happen in the future.
  
  bobxmax 7 months ago
  
  The discussion was about whether robotics was about to transform or not. And obviously it is because of how much basic robotics workloads improve with these models.
  Really not that hard.
  
  lucianbr 7 months ago
  
  > have transformed robotics
  > about to transform
  Apparently even english tenses are too hard, let alone anything else. Bald faced lie, to claim what you think might happen in the future has already happened in the past. No matter "what the discussion was about", or what arguments you bring to support your estimation of the future.
  
  stuartjohnson12 7 months ago
  
  I think this is an opinion borne out of weariness with constant promises that amazing robots are right around the corner (as they have been for 20 odd years now). For anyone who is close to the front line, I think the resounding consensus is clear - this time is different, unbelievably different, and capability development is going to accelerate dramatically.
SpicyLemonZest 7 months ago

Laundry folding is an instructive example. Machines have been capable of home-scale laundry folding for over a decade, with two companies Foldimate and Laundroid building functional prototypes. The challenge is making it cost-competitive in a world where most people don't even purchase a $10 folding board.
I would guess that most cooking and cleaning tasks are in basically the same space. You don't need fine motor control to clean a toilet bowl, but you've gotta figure out how to get people to buy the well-proven premisting technology before you'll be able to sell them a toilet-cleaning robot.
- layer8 7 months ago
  
  Counterexample: Everyone uses dishwashers. Yet I don’t think we’ll have a robot doing the dishes human-style, or even just filling up and clearing out a dishwasher, within the next decade or two, regardless of price.
  
  kevingadd 7 months ago
  
  Part of the tradeoff there is efficiency. I like my dishwasher because it's as good at getting things clean as I am but it does it using less water and less soap, and at scale, it takes less time too. It's just a great use case for machine automation because you can do clever stuff w/a dishwasher that's hard to replicate outside of that closed environment.
  I struggle to imagine a scenario where a 1-2 person household would get the same benefits from something like a laundry-folding robot. I hate folding my laundry and I still can't imagine buying one since I simply don't do laundry that often. If I really wanted to spend less time doing laundry, I could spend the cost of that laundrybot on a larger collection of clothing to wear, for that matter.
  Robot vacuums are a good comparison point since vacuuming is something you (ideally) do frequently that is time and labor intensive. I do own one of those, and if it got better at dealing with obstacles thanks to "AI" I would definitely like that.
  
  layer8 7 months ago
  
  I think it would have to be a general-purpose robot, and doing the laundry would just be one of many things it can do, similar to how running a particular program is only one of many things a computer can do. More than that, I believe it would actually require a general-purpose robot to handle all contingencies that can arise in doing laundry.
  As someone who does laundry about twice a week, it would certainly be nice. But it’s a pie in the sky at this time even just on the technological side.
  
  gkuan 7 months ago
  
  [dead]
- bwhiting2356 7 months ago
  
  Interesting - I would think the neighborhood laundromat offering wash/fold service could invest in a laundry folding robot.
- devit 7 months ago
  
  I think the problem of those is that they are special purpose, and probably too expensive and bulky for that single purpose.
  A single general-purpose robot that can do everything would be much easier to sell.
  
  SpicyLemonZest 7 months ago
  
  There's plenty of machines which are expensive, bulky, single purpose and yet commercially successful. The average American household has a kitchen range, refrigerator, dishwasher, laundry machine, dryer, television, furnace, and air conditioner. Automatic coffee machines and automatic vacuums are less universal but still have household penetration in the millions. I really think the household tasks with no widely available automation are simply the ones that nobody cares enough about doing to pay for automation.
  A robot servant that does literally 100% of chores would be a game changer, and I expect we'll get there at some point, but it will probably have to be a one-shot from a consumer perspective. A clever research idea to reach 25% or 50% coverage still isn't going to lead to a commercially viable product.
myroon5 7 months ago

https://en.wikipedia.org/wiki/XY_problem?
Many non-AI products already reduce chore time:
* Washer-Dryer Combos
* Soylent/Huel bars
* Self-cleaning toilets / automatic toilet bowl cleaners
* Robotic vacuums / mowers / pool/litter cleaners
* Pet/plant feeders
- antifa 7 months ago
  
  > robotic litter boxes
  Has successfully guilotined several cats.

kaycebasques 7 months ago

> And the results of the project, dubbed Arrakis, indicated that creating GPT-5 wouldn’t go as smoothly as hoped.

Quite the hubris to name the project after the desert planet of Dune, where multiple royal houses met their ruin.

Insanity 7 months ago

And the other theme in Dune is how artificial intelligence essentially fubar’d civilization. (The Butlerian Jihad)
- kaycebasques 7 months ago
  
  Ah yes, Thou shalt not make a machine in the likeness of a human mind.
Mistletoe 7 months ago

The spice (money) must flow.

phillipcarter 7 months ago

More palace intrigue, sigh.

Meanwhile, the biggest opportunity lies not in whatever next thing OpenAI releases, but the rest of the enormous software industry actually integrating this technology and realizing the value it can deliver.

boplicity 7 months ago

Believe me, people are seizing this "opportunity":
https://www.opb.org/article/2024/12/09/artificial-intelligen...

anshulbhide 7 months ago

I don't really care.

I had to come up with a proposal to build a new R&D centre recently. To provide context on what our company does, I wrote a web scraper to scrape our own website (faster than going to IT) using Replit Agent and then fed that into O1 as context to come up with the proposal.

In less than an hour.

There is no going back.

Buttons840 7 months ago

Looking at it from a signal vs noise perspective:
The noise was the proposal, which was no doubt several pages at least.
The signal was "we should build a new R&D centre".
Am I missing anything? Did you feed the AI some financial figures or other information that couldn't be found on company website? If so, that would also be part of the signal.
It reminds me of an experiment in which people wanted to cut in line. Saying "can I cut in front of you because I'm in a hurry?" was significantly more successful than saying "can I cut in front of you?", even though they are essentially the same. (I read about this in the book Influence: The Psychology of Persuasion).
AI generated reports, proposals, and other fluff, can make things seem so much more persuasive. Alice is going to ask her AI to turn her 1 sentence into 5 paragraphs, and then Bob is going to ask his AI to summarize the 5 paragraphs into 1 sentence.
aprilthird2021 7 months ago

both you and another highlynupvoted poster have said some versioj "never going back" or "dont want to go back" or "the tools that exist now are already insane"
And while I'm happy for you, I don't see the relevance? this post was not about "going back" or "stopping the use of AI tools" at all?
- 1propionyl 7 months ago
  
  Presumably they meant "never going to back to not including these tools in my daily workflow".
  
  aprilthird2021 7 months ago
  
  Even if they meant that, that's not relevant to the article. It's not saying anything like "you shouldn't use these tools in your daily workflow"
- zer0tonin 7 months ago
  
  That's called a coping mechanism.
asdff 7 months ago

If you managed to get the content by scraping the home website, did you really need to build a scraper to collect this and feed it into input for a model and have it spit out a proposal that you probably proofread anyhow? Or, could you have simply taken your own knowledge of the company, regurgitated some generic corporate speak yourself on top of that, and came up with more or less the same end product in maybe even less time?

rchaves 7 months ago

Nah it's just a marketing problem, "GPT" and "ChatGPT" names is the biggest asset OpenAI has, people have expectations so high for GPT-5 that they cannot burn this name unless it's something truly majestic, bordering AGI at the very least. Until they are confident enough that people will be blown off by it, it's better to continue building up the hype

OtherShrezzing 7 months ago

The Half Life 3 of the SaaS/zirp era.

benreesman 7 months ago

Behind schedule? It never fucking ships. Two months ago, and four, and six I got dog piled for saying it doesn’t ship.

It doesn’t ship. You guys can’t do it! Prove me wrong!

https://news.ycombinator.com/item?id=42014054

CharlesW 7 months ago

Maybe you were dog-piled because OpenAI will ship a successor to GPT-4o someday, whatever it's called.
In any case, the "behind schedule" rumors are themselves based on other rumors. GPT-2→GPT-3 took 5 quarters, GPT-3→GPT-4 took 11 quarters, so obviously GPT-5 (or its equivalent) will be released in Q4'2025.
- benreesman 7 months ago
  
  It’s a pretty easy option when I’m selling calls with infinity expiration at zero strike.
  I’m still selling them. That’s how utterly convinced that this particular mix of my former colleagues are incapable of this thing I am.
  They’ll call something GPT-5, but it won’t like obsolete physicists or even good hackers. None of the shit sama says.
  Even o1 sucks.
  
  benreesman 7 months ago
  
  Facebook and Google both had a bunch of brilliant people. And in 2016-2017 some of the real legends worked at OpenAI.
  And then Sam got control and all the high profile people bounced, and it’s a pretty grim residue that either lasted or rushed in.
  I know these people, a lot of them personally, and they are not the person you trust with fucking anything.
mperham 7 months ago

“Full self driving will be here next year.”

Yizahi 7 months ago

GPT-5 is not behind schedule. GPT-5 is called GPT-4o and it has been already released half a year ago. It was not revolutionary enough to be called 5, and prophet saint Altman was probably afraid to release new gen not exponentially improving, so it was rebranded in the last moment. It's speculation of course, but it is kinda obvious speculation.

glenstein 7 months ago

>GPT-5 is called GPT-4o
This is the first I have heard of this in particular. Do you know of any article or source for more on the efforts to train GPT 5 and the decision to call it GPT 4o?
- BoorishBears 7 months ago
  
  I think my biggest pet peeve is when someone shares an insight which is unmistakably based on intuition, inference, critical thinking, etc (all mental faculties we are allowed to use to come to conclusions in the face of information asymmetry btw)
  ...and then gets hit deadpan with the good old "Source?", like it's some sort of gotcha.
  I think people have started to confuse "making logical conclusions without perfect info" with "misinformation"
  -
  Before certain people start acting like this is advocating for misinformation (which would be an incredible irony...) it's not.
  I'm saying if you disagree with what someone supposits, just state so directly. Don't wrap it in a disingenous query for a source.
  
  imiric 7 months ago
  
  It's reasonable to ask for sources when an opinion is phrased as a fact, as GGP did. I don't see how you got that it was _unmistakably_ an opinion from that comment.
  There is no way to deduce by intuition alone that GPT-5 == GPT-4o. So either that person has some information the rest of us aren't privy to, or it's an opinion phrased as a fact. In either case, it deserves clarification.
  
  glenstein 7 months ago
  
  On a second read I see that the comment notes that it is intended as speculation, but still it seems rather confident in its own accuracy and I am not even sure it's wrong, but just looking for something that warrants the confidence.
  
  Yizahi 7 months ago
  
  I wrote my comment that way, based on my personal memories of the news cycle between gpt-4 and gpt-4o, and the claims raised by OpenAI about gpt-4o. The hype before 4o release was overwhelming, people have expected the same step up as between 3 and 4, and there were constant "leaks" from supposed insiders that gpt-5 is just at the horizon and will come out soon. And then they release 4o, which was a big standalone release, not some fine tuning like turbo or whatever else they made before.
  Looking at the benchmarks it was also very expected in my opinion. Sure, the absolute results are/were sky high, but results relative to the previous gen were not exponential now, they were comparatively smaller than between 2 and 3, or 3 and 4. So I'm guessing that they have invested and worked for 2023-2024 on a brand new model, and branded it according to the model results.
  
  imiric 7 months ago
  
  Ah, fair enough. I missed the speculation bit.
  
  BoorishBears 7 months ago
  
  That was clearly phrased like a fact, which may or may not be correct. If it had been phrased like an opinion we wouldn't be having this conversation...
  The problem is once you believe their fact is wrong, just say "I think you're wrong <insert rest of comment>". Innocently asking for a source as if you're still on the fence is just performative and leads to these conversations where both sides just end up talking past each other:
  A source for one underpinning of the incorrect fact comes up, then "well but that only proves X part of it, can you prove Y" and so on.
  tl;dr I just find the quality of discourse is much higher when people are direct.
  
  Capricorn2481 7 months ago
  
  > I just find the quality of discourse is much higher when people are direct.
  Well this certainly is a lot of work to make a mountain out of a mole hill, and I'm not sure it increases the quality of discussion either.
  In any case, I think saying bold shit followed up with "it's speculation, but it's OBVIOUS speculation" is worth asking for some evidence. Obvious speculation implies it's sourced from something other than personal gut feeling.
  To echo a sibling comment:
  > Every time someone says their speculation is "obvious" it rings every possible alarm bell for someone who has completely lost grasp of the ability to distinguish between facts and speculation.
  
  igor47 7 months ago
  
  I think it's okay to make logical conclusions but you must base them in evidence, not just suppositions. Intuition is a good start to begin generating hypothesis, but it doesn't render conclusions. I interpreted the GP asking for sources as "can you give me some evidence that would help me reach the same conclusions you've reached". I think that's much preferable to just accepting random things people say at face value.
  
  BoorishBears 7 months ago
  
  Even with evidence a logical conclusion can still a supposition (aka an uncertain belief), and often is in the face of the kind of information asymmetry inherent to any outsider commenting on a private company's internal roadmap... but I digress.
  My point is simply that is we can skip the passive aggressiveness and just say "can you give me some more evidence that would help me reach the same conclusions you've reached".
  Otherwise you're not actually asking for a source, you're just saying "I disagree" in a very roundabout way.
  
  og_kalu 7 months ago
  
  It doesn't even look like 4o is scaled up parameter wise from 4 and was released closer in time than either 3 or 4 were from their predecessors at a time where the scaling required for these next gen iterations has only gotten more difficult.
  Critical thinking ? Lol it's just blind speculation.
  
  BoorishBears 7 months ago
  
  If you disagree with their reasoning then you explain that.
  You don't do this passive aggressive "source???" thing.
  It's a bit like starting a Slack conversation with "Hi?": we all know you have a secondary objective, but now you're inserting an extra turn of phrase into the mix
  
  og_kalu 7 months ago
  
  Not everyone keeps up with LLM development enough to know how far apart the release dates for these models are, how much scaling (roughly) has been done on each iteration and a decent ballpark for how much open ai might try to scale up a next gen model.
  To me, OP's speculation reads as obvious nonsense but that might not be the case for everybody. Asking for sources or such to what is entirely speculation is perfectly valid and personally, that comment does not ring as passive aggressive to me but maybe it's just me.
  Just because someone doesn't know enough to refute the reasoning doesn't mean they must take whatever they read at face value.
  
  BoorishBears 7 months ago
  
  If we're making this about the innocent bystanders now, that's all the more reason to be direct and say "I disagree." rather than indirectly expressing negative feelings (aka being passive aggressive) and asking for a source.
  If anything just breezily asking for a source would imply to people who don't know better that this is a rather even keeled take and just needs some more evidence on top. "I disagree and here's why" nips that in the bud directly.
  
  og_kalu 7 months ago
  
  How is "I disagree" any more direct than "I've not heard anything like this. any source that would point at that?" Moreover who's to say this person even disagrees? Personally i don't always ask for them because of a disagreement.
  I think the hanging point seems to be that you found the comment passive aggressive but i genuinely didn't.
  
  BoorishBears 7 months ago
  
  You ask:
  > How is "I disagree" any more direct than "I've not heard anything like this.
  But then you go on to say:
  > Moreover who's to say this person even disagrees? Personally i don't always ask for them because of a disagreement.
  If you don't see how just disagreeing with someone is more direct than rhetorically asking for sources... we might just have to agree to disagree :)
  
  glenstein 7 months ago
  
  Right, that's what makes this rabbit hole a bit wild. I'm not even expressing a disagreement, rhetorical or otherwise. What's more, there's nothing wrong with doing that either. There are circumstances where that's a perfectly appropriate thing to do.
  And while I fully agree there absolutely is such a thing as smarmy commenters asking for sources in cases where it's misunderstanding something fundamental about the conversation (e.g. "Shakespeare is good", "oh really? source?!") or frivolous requests for factual information familiar to everyone ("global temperatures are rising? Source!?"), I don't know how someone could read this subthread and feel that my question falls into either of those categories.
  And to use this of all things as a moment to die on the hill of advocating for fuzzy boundaries between speculation and fact, which absolutely is something that facilitates misinformation, and to be angry that such a thing would be interpreted as a favorable attitude toward misinformation, is completely baffling.
  
  glenstein 7 months ago
  
  My sister got taken in by drone conspiracy theories, because for her it was just "obvious" that nobody would ever mistake a plane for a drone.
  Meanwhile, aeronautics experts whose job it is to know about this have created an entire lexicon for the various perceptual illusions we experience relating to flight and airborne objects, precisely because it involves conditions where our intuitions fail. Many of them have to do with inability to orient depth, distance, or motion for lights at night.
  Every time someone says their speculation is "obvious" it rings every possible alarm bell for someone who has completely lost grasp of the ability to distinguish between facts and speculation.
  The road to misinformation is paved with overconfident declarations of the form: "it's so obvious, who needs sources!"
  
  ultimoo 7 months ago
  
  simply adding “i think” solves this. op was speculating with gravitas that needs sources
Workaccount2 7 months ago

Not really, 4o was purpose built to be a light weight 4. Remember that 4o was also when GPT-4 became available to everyone. Before that ou had to be premium to use GPT-4, and got limited inquiries.
4o was all about compute optimization.

karel-3d 7 months ago

I have to say I finally "caved in" to LLMs last month.

While I still think Copilot is useless, I recently had a very complex code that did a lot of crazy bit-flipping and xoring, and I had no idea what is it doing, so I threw it to ChatGPT.. and it knew what it was doing.

I also needed to rewrite this code to PHP (for... reasons) while I know very little PHP. And it did that! It was a bit wrong, I needed to correct a bunch of stuff (based on domain knowledge), but... it helped me a ton.

I still can't image using it daily for domain and language I already know (that's why I never used copilot). But it actually helped me in measurable ways when it's something new.

vbezhenar 7 months ago

Effectively using AI tools is a skill. Much like effectively using Google is a skill. You already saw glimpses of what it can do. I suggest you to keep trying to find out boundaries where it works reliably and where it does not.
I'm using Copilot daily. I don't use it to write code instead of me. But I'm using it to generate lots of obvious code just the same way that I would do. I know when to expect it to do its work perfectly and I know when I need to supervise it. I know when I'd spend more time editing generated code so I'd write that code myself and I know when I'd spend less time editing generated code.
I don't think that AI brings 10x or even 2x to my productivity, so you can avoid using AI. But I certainly can say, that using Copilot makes programming less tenuous in the same way using autocomplete, autoimports and similar IDE stuff makes programming less tenuous.
I also think that whether copilot helps or not depends on type of code that you're writing. If you're very careful about DRY and your language does not have much boilerplate, may be you'd find it less useful. For example when I'm writing Go, every second line is of kind `if err != nil { return fmt.Errorf("Cannot bla: %w", err); }`. The only "intellectual" part here is error message and Copilot generates it 99% perfectly along with surrounding stuff.
sampullman 7 months ago

That's basically my experience. It's great for learning or getting things done when the subject is related to one you know well (i.e. you understand the fundamentals and can verify responses quickly).
It's not so good for a completely new subject, or one you have a lot of experience in.
- nycdatasci 7 months ago
  
  They're not useful when you're completely new to a subject? On the contrary, I've found that they are excellent when you have limited knowledge in a domain and less useful when you have expertise. They have allowed me to go from zero to functioning MVP on numerous computer vision projects, even though I have zero experience.
  
  sampullman 7 months ago
  
  Do you have programming experience? I probably didn't explain well, what I meant is that if you have zero relevant experience it can be difficult to verify correctness.
  For example, I'm comfortable with frontend development but hadn't used webworkers or websockets. ChatGPT was useful for getting up to speed quickly. I've had less luck with topics that are completely new to me, one example is coming up with a training regimen for long distance running. I have to manually verify every little thing, which ends up taking longer than doing research the old fashioned way.
  I'd be surprised if you could go from zero to a useful CV app with LLMs, but it's possible I just haven't given it a fair shake.
- karel-3d 7 months ago
  
  Exactly. They are good in the sweet spot when you can verify that they are correct, but you are not that good to do it yourself.
  Basically StackOverflow but without the annoying mods?
guytv 7 months ago

Give Claude a try when code generating, it works much better for my use cases.
dmead 7 months ago

Kill php with fire please.

jjcm 7 months ago

The results of this article are going to be fascinating. Realistically, WSJ has a far wider audience than the tech echo chamber, and the general public is only aware of GPT, not o1/o3.

Outsiders will likely read this article and think, “AI is running out of steam”, because GPT-5 is behind.

Those closer to this know of the huge advancements o3 just made yesterday, and will have a complete opposite conclusion.

It will be interesting to see people’s take away from this. I think WSJ missed the mark here with the headline and the takeaway their audience will get from the article.

david-gpu 7 months ago

What I find odd is that o1 doesn't support attaching text documents to chats the way 4o does. For a model that specializes in reasoning, reading long documents seems like a natural feature to have.

ionwake 7 months ago

If Sama ever reads this, I have no idea why no users seem to focus on this, but it would be really good to prioritise being able to select which model you can use with the custom myGPTs. I know this maybe hard or not possible without recreating them , but I still dont think it's possible.
I dont think most customers realise how much better the models work with custom GPTs.
- throwaway314155 7 months ago
  
  At this point I think it's safe to say they have given up on custom GPTs.
  
  emeg 7 months ago
  
  What makes you say that?
  
  throwaway314155 7 months ago
  
  They hyped them like crazy and haven't discussed them once since then. I agree that the inability to change the model is pretty absurd when the whole point was to "supercharge" specific tasks.
  There was even talk of some sort of profit sharing with creators which clearly never happened. I just think the premise is too confusing for many and can still be served by using a custom system prompt via the API.
  
  coffeebeqn 7 months ago
  
  Was it hyped? I tried a few of them and they seemed absolutely useless. Like I could install a “custom GPT” that just appends something to my prompt? How great..
  
  ionwake 7 months ago
  
  No the whole Point is you make a gpt for yourself and upload all your related documents to it and then query that. It performs 10x better than a generic query without attaching every single doc that could be relevant.
  I am unsure if the answer is to use “projects” maybe this has superseded myGpts?
  I am perplexed why HN isn’t focusing on this issue as all the Llm gains I’ve ever had were wit highly customised personal myGpts.
  I can understand OpenAI and Sam’s having access to their own models may not even know what the best way to use the released stuff is
  Ps - typing on my phone hence typos
  
  asdff 7 months ago
  
  This is something being done but not for consumers right now. On a more roll your own basis. I know a few people whose companies have already established their own gpt trained on their own internal codebase, standards, documentation, and I’m sure relevant reference information as well. Think big r and d companies in stem.
jillesvangurp 7 months ago

You can use the new project feature for that. That's a way of grouping conversations, adding files, etc. Should work with o1 pro as well apparently.
- david-gpu 7 months ago
  
  "When using custom instructions or files, only GPT-4o is available". Straight out of the ChatGPT web interface when you try to select which model you want to use.

palata 7 months ago

> Altman told students in a talk at Stanford University that OpenAI could say with “a high degree of scientific certainty” that GPT-5 would be much smarter than the current model.

I love that quote. "I can tell you with a high degree of scientific certainty that my product will be badass... for some definition of badass, please don't ask".

I, too, could say with a high degree of scientific certainty that while putting those words together sounds good, it makes no sense.

asdff 7 months ago

Imagine the ceo of intel going on stage and saying I can promise with a high degree of certainty the next cpu will be faster than the last. The fact that had to be said at all means the short sellers are making their money alright.

LightBug1 7 months ago

I just want to say something random. ChatGPT cited and English newspaper called 'The Sun' in the answer to a query I made this moring.

I found that amusing and figured then that we might already be beyond peak AI!

If AI is trained on sources that include The Sun, it's going to end up being garbage and/or constantly needing to be fact checked.

asdff 7 months ago

About once a week the google ai summary completely flubs the numbers of something I query on google. If you did that on stackoverflow you’d be flamed and put in your place with the right answer. No one is flaming the ai, it just became the source of truth one day with billions invested.
barrenko 7 months ago

It is, which shouldn't be a surprise to anyone.

swozey 7 months ago

This entire industry is something I feel like I understand 2% of and every time I make progress to get to 10% (3 months later) some massive change happens and all the terminology changes.

leesec 7 months ago

Tech journalism is so cooked man lol. They just rocked everyones world with o3 and they still gotta drop this post.

croes 7 months ago

I doubt they rocked they world of 10% of the people. Time to get out of the tech bubble.
- leesec 7 months ago
  
  This here is a technology forum bucko. Also it's a figure of speech. Also I've done more manual labor than you'll ever do in your life. Time to get out of whatever bubble youre in where you be pedantic and annoying
  
  QuietWatchtower 7 months ago
  
  I haven't heard anyone in my circle talk about this at all. You probably are in a tech bubble.
drawfloat 7 months ago

o3 isn't even available to more than a few researchers. Either everyone you know is a researcher, or they're believing hype about something they've not tried.

glenstein 7 months ago

The lack of tech literacy in this article is a bit concerning:

>Some researchers take this so seriously they won’t work on planes, coffee shops or anyplace where someone could peer over their shoulder and catch a glimpse of their work.

I'm almost certain that originally this was meant to be a reference to public wifi networks, as planes and coffee shops are often the frequently cited prototypical examples. They made it literally into a matter of someone looking over their shoulder, which loses so much in translation it's almost how you would write this as a joke to illustrate someone missing the point.

>OpenAI and its brash chief executive, Sam Altman

This also strikes me as nonsense. It's the first I've ever heard of someone describing Sam Altman as brash. The only way I can see them getting there is (1) tech executives are often brash (2) Altman is a tech executive (3) let's just go ahead and call him brash.

Nevertheless if this history of GPT5 and/or o3 training is accurate, it strikes me as significant news, but perhaps a missed opportunity to say more about the pertinent dynamics that explain why the training isn't working and/or to talk in interestingly specific ways about strategies for training, synthetic data, or other such things.

denysvitali 7 months ago

A lot of things in this article don't make any sense. I'm surprised this was even upvoted.
- airstrike 7 months ago
  
  I think it's upvoted because people feel it's a relevant conversation to have, even if TFA is lame

flufluflufluffy 7 months ago

It takes actual humans, whose brains have been honed by multiple millions of years of evolution, over ten years for our brains to experience, neuroplasticify, form a well-formed-enough model of existence to drive a car, let alone wax philosophical about abstract concepts. And we funnel dollar after dollar, extract mineral after mineral, consume kilowatt hour after kilowatt hour, wanting, expecting Artificial General Intelligence to be ready within one or two years since the last promised version.

razodactyl 7 months ago

Two thoughts:

1. Even if LLM architecture doesn't work out, it's wise to remember that it's the quality of training data which is a deciding factor. A pivot is easily doable since this is a constant and disconnected from the technology itself.

2. This is clearly a moonshot project but it still feels wasteful especially with context of previous iterations of models and their shortcomings-it feels like the tremendous amount of money available is being used simply because it's available.

terminatornet 7 months ago

I'm sure all the people who said "show me AI progress is slowing down!" 6 months ago will be acknowledging this article.

mnk47 7 months ago

Did you read the article? All it basically says is that OpenAI faced struggles this past year -- specifically with GPT-5 aka Orion. And now they have o3, and other labs have made huge strides. So, yes, show me AI progress is slowing down!
- terminatornet 7 months ago
  
  Awesome now they can get me wrong answers even quicker :)

captainbland 7 months ago

In my intuition it makes sense that there is going to be some significant friction in LLM development going forward. We're talking about models that will cost upwards of $1bn to train. Save for a technological breakthrough, GPT-6/7 will probably have to wait for hardware to catch up.

rrrrrrrrrrrryan 7 months ago

I think the main bottleneck right now is training data - they've basically exhausted all public sources of data, so they have to either pay humans to generate new data from scratch or pay for the reasoning models to generate (less useful) synthetic training data. The next bottleneck is hardware, and the least important bottleneck is money.
energy123 7 months ago

It's more that you have to allocate the compute the right way.
Noam Brown's analogy is, you could train a massive one shot foundation model to predict the next best Go move, but that would be stupid. Better to use some test time search. You get better results for less money.
Same is happening in LLMs.

selimnairb 7 months ago

I’m not smart enough or interesting enough to be hired by OpenAI to expertly solve problems and explain how to the AI. However, I like to think there isn’t enough money in the world for me to sell out my colleagues like that.

MKoberger 7 months ago

At what point does the pursuit of marginal improvements in AI capabilities outweigh the financial and ethical considerations involved in such projects? Any ideas how to balance innovation and cost?

Jean-Papoulos 7 months ago

The article is a big pile of nothing. Just re-hashing history for 90 % and the rest is : - GPT5 not as smart as they want, according to rumors - They are trying reasoning (yes we know they showed o3)

OutOfHere 7 months ago

How about just an updated gpt 4o with all newer data? It would go a long way. Currently it doesn't know anything since Oct 2023 (without having to do a web search).

croes 7 months ago

Hard to filter out all the AI generated texts.

hfgjbcgjbvg 7 months ago

I’ve actually found that when I work on less altruistic projects more random roadblocks come up. Take it as a sign from the universe to reflect.

Ninjinka 7 months ago

It is behind, but reading the tea leaves of some of Altman's recent interactions on X, I'm guessing it's only a few weeks out.

lofaszvanitt 7 months ago

No it's not. ChatGPT is stronger than ever. WSJ is too pessimistic.

chrsw 7 months ago

Interesting. So they're not even training on NVIDIA Blackwell yet.

kneegerman 7 months ago

03 is actually orthogonal to AGI and ASI in a cartesian sense. My SAS startup led multiple qualified teams where our RAG implementations on synthetic data originated positive inference in line with the literature (1). (1) Sparks of AGI paper

tengbretson 7 months ago

Why would I expect a software project to ship on time?

soheil 7 months ago

AGI and beyond will be reached either by deregulating nuclear power plant construction or breakthroughs in quantum computing both of which we're on the verge of achieving.

h_tbob 7 months ago

It seems google has a massive advantage here since they can tap all of YouTube to train. I wonder what openai is using for its video data source.

kevingadd 7 months ago

Considering how evasive they've been, it might also be YouTube.
> When pressed on what data OpenAI used to train Sora, Murati didn’t get too specific and seemed to dodge the question. “I’m not going to go into the details of the data that was used, but it was publicly available or licensed data,” she says. Murati also says she isn’t sure whether it used videos from YouTube, Facebook, and Instagram. She only confirmed to the Journal that Sora uses content from Shutterstock, with which OpenAI has a partnership.
https://www.theverge.com/2024/3/13/24099402/openai-text-to-v...
onemoresoop 7 months ago

Train for what? For making videos? Train from people’s comments? There’s a lot of garbage on AI slop on youtube, how would this be sifted out? I think there’s more value here on HN in terms of training, but even that, to what avail?
- h_tbob 7 months ago
  
  From what I read openai is having trouble bc not enough data.
  If u think about it, any videos on YouTube of real world data contribute to its understanding of physics at minimum. From what I gather they do pre training on tons of unstructured content first and that contributes to overall smartness.
- a1j9o94 7 months ago
  
  YouTube is such a great multimodal dataset—videos, auto-generated captions, and real engagement data all in one place. That’s a strong starting point for training, even before you filter for quality. Microsoft’s Phi-series models already show how focusing on smaller, high-quality datasets, like textbooks, can produce great results. You could totally imagine doing the same thing with YouTube by filtering for high-quality educational videos.
  Down the line, I think models will start using video generation as part of how they “think.” Picture a version of GPT that works frame by frame—ask it to solve a geometry problem, and it generates a sequence of images to visualize the solution before responding. YouTube’s massive library of visual content could make something like that possible.

johnisgood 7 months ago

Out of Claude and GPT, only Claude asks me questions if it does not fully "know" something, be it the matter at hand or my intentions. I think this is a good way to avoid "hallucinations". GPT just keeps spouting whatever it can.

lifeisstillgood 7 months ago

So my hot take:

LLMs have got to the point where they have (more or less) encoded everything ever written by humanity. And encoded it in a gazillion dimensioned matrix so they can take a query and predict, given everything ever written, what the most likely set of words are triggered by that query - and then produce that as a sane parseable output.

I mean this is beyond awesome

But, this reminds me of Aristotle reasoning from his current set of knowledge - and ending up with Eagles having three testicles.

No matter how much reasoning we do from current knowledge, we need to test that against reality - that LLMs are great but the scientific method is greater.

Google is in trouble, yes.

But humanity needs to hold on to science and it’s precepts or we spin into a new dark ages

vrighter 7 months ago

probably because it isn't any better

adriatp 7 months ago

paid article... go with your ads somewhere else

justlikereddit 7 months ago

Promise the world and AGI.

Deliver just another 2021 flavored GPT chatbot.

Wowfunhappy 7 months ago

Archive.is does not work for this article, does anyone have a workaround?