scarmig 2 days ago

If you dig into the actual report (I know, I know, how passe), you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.

This article contains significant issues.

  • happymellon 2 days ago

    You don't have to read very far to see the details.

    > 45% of responses contained at least one meaningful error. Sourcing [...] is 31%, followed by accuracy 20%

    And you can see the reason they think this is important on the second page just after the summary.

    > More than 1 in 3 (35%) of UK adults instinctively agree the news source should be held responsible for errors in AI-generated news

    So of course the BBC cares that Googles summary said that the BBC cites pornhub when talking about domestic abuse (when they didn't), because a large portion of people blame them for the fact that a significant amount of AI generated crap is wrong.

  • amarant 2 days ago

    Human journalists misrepresent the white paper 85% of the time.

    With this in mind, 45% doesn't seem so bad anymore

    • SkyBelow 2 days ago

      Years ago in college, we had a class where we analyzed science in the news for a few weeks compared to the publish research itself. I think it was a 100% misrepresentation rate comparing what a news article summarized about a paper verses what the paper itself said. We weren't going off of CNN or similar main news sites, but news websites aimed at specific types of news which were consistently better than the articles in mainstream news (whenever the underlying research was noteworthy enough to earn a mention on larger sites). Leaving out complete details or only reporting some of the findings weren't enough to count, as it was expected any news summary would reduce the total amount of information being provided about a published paper compared to reading the paper directly. The focus was on looking for summaries that were incorrect or which made claims which the original paper did not support.

      Probably the most impactful "easy A" class I had in college.

      • specialist 2 days ago

        That's terrific. Media literacy should be required civics curriculum.

        I was on my highschool's radio station, part of the broadcast media curriculum. It was awesome.

        That early experience erased any esteem I had for mass media. (Much as I loved the actual work.)

        We got to visit local stations, job shadow, produce content for public access cable, make commercials, etc. Meet and interview adults.

        We also talked with former students, who managed to break into the industry.

        Since it was a voc tech program, there was no mention of McLuhan, Chomsky, Postman, or any media criticism of any kind.

        I learned that stuff much later. Yet somehow I was able to intuit the rotten core of our media hellscape.

      • BuddyPickett 2 days ago

        I never had any college classes that weren't easy A classes. I think that's all they have.

    • ribosometronome 2 days ago

      Hell, human editors seem to misrepresent their journalists frequently enough that I'm left wondering if it's hyperbolic or not to guess if they misrepresent them 45% of the time, too.

      • hinkley 2 days ago

        HN used to have a policy of not editorializing article titles when published here, but I've caught them modifying headlines a few times to match the source article instead of the linked article. One that stuck out was just the other day, and it had a confusing title that was not only wrong but also hard to parse.

        Maybe we complained with enough concrete examples of how absolute shit editors and summarizers are now.

    • stmichel 2 days ago

      You stole my reply haha! However, I was gonna say journalists misrepresent papers and other content 95% of the time...

  • scellus 2 days ago

    Are citation issues related to the fact that https://www.bbc.co.uk/robots.txt denies a lot of AI, both user agents and crawlers?

    • scarmig 2 days ago

      The report says that different media organizations dropped their robots.txt for the duration of the research to give LLMs access.

      I would expect this isn't the on-off switch they conceptualized, but I don't know enough about how different LLM providers handle news search and retrieval to say for sure.

      • dylan604 2 days ago

        Does it work like that though? How long does it take for AI bots to crawl sites and have the data added to the model currently being used? Am I wrong in thinking that it takes a lot longer for AI bot crawls to be available to the public than a typical search engine crawler?

        • rimeice 2 days ago

          Bots could be crawlers gathering data to periodically be used as raw training data or the requests could just be from a web search agent of some form like ChatGPT finding latest news stories on topic X for example. I don’t know if robots.txt can distinguish between the two types of bot request or whether LLM providers even adhere to either.

          • jay_kyburz 2 days ago

            Wow, Just reading the headline I had assumed they were giving the new article as a document, then asking it to summarize the the document given.

  • afavour 2 days ago

    > or it (shocking) cites Wikipedia instead of the BBC.

    No... the problem is that it cites Wikipedia articles that don't exist.

    > ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.

    • kenjackson 2 days ago

      Actually there was a Wikipedia article of this name, but it was deleted in June -- because it was AI generated. Unfortunately AI falls for this much like humans do.

      https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

      • CaptainOfCoit 2 days ago

        > Actually there was a Wikipedia article of this name, but it was deleted in June -- because it was AI generated. Unfortunately AI falls for this much like humans do.

        A recent Kurzgesagt goes into the dangers of this, and they found the same thing happening with a concrete example: They were researching a topic, tried using LLMs, found they weren't accurate enough and hallucinated, so they continued doing things the manual way. Then some weeks/months later, they noticed a bunch of YouTube videos that had the very hallucinations they were avoiding, and now their own AI assistants started to use those as sources. Paraphrased/remembered by me, could have some inconsistencies/hallucinations.

        https://www.youtube.com/watch?v=_zfN9wnPvU0

      • bunderbunder 2 days ago

        The biggest problem with that citation isn't that the article has since been deleted. The biggest problem is that that particular Wikipedia article was never a good source in the first place.

        That seems to be the real challenge with AI for this use case. It has no real critical thinking skills, so it's not really competent to choose reliable sources. So instead we're lowering the bar to just asking that the sources actually exist. I really hate that. We shouldn't be lowering intellectual standards to meet AI where it's at. These intellectual standards are important and hard-won, and we need to be demanding that AI be the one to rise to meet them.

        • gamerDude 2 days ago

          I think this is a real challenge for everyone. In many ways potentially we need a restart of a wikipedia like site to document all the valid and good sources. This would also hopefully include things like source bias and whether it's a primary/secondary/tertiary source.

          • fullofideas 2 days ago

            This is pushing the burden of proof on the society. Basically, asking everyone else to pitch in and improve sources so that ai companies can reference these trust worthy sources.

          • bunderbunder 2 days ago

            Outsourcing due diligence to a tool (or a single unified source) is the problem, not the solution.

            For example, having a single central arbiter of source bias is inescapably the most biased thing you could possibly do. Bias has to be defined within an intellectual paradigm. So you'd have to choose a paradigm to use for that bias evaluation, and de facto declare it to be the one true paradigm for this purpose. But intellectual paradigms are inherently subjective, so doing that is pretty much the most intellectually biased thing you can possibly do.

          • ishtanbul 2 days ago

            Maybe we can get AI to do this hard labor

          • cogman10 2 days ago

            An example of this.

            I've seen a certain sensationalist news source write a story that went like this.

            Site A: Bad thing is happening, cite: article Site B

            * follow the source *

            Site B: Bad thing is happening, cite different article on Site A

            * follow the source *

            Site A: Bad thing is happening, no citation.

            I fear that's the current state of a large news bubble that many people subscribe to. And when these sensationalist stories start circulating there's a natural human tendency to exaggerate.

            I don't think AI has any sort of real good defense to this sort of thing. 1 level of citation is already hard enough. Recognizing that it is citing the same source is hard enough.

            There was another example from the Kagi news stuff which exemplified this. A whole article written which made 3 citations that were ultimately spawned from the same new briefing published by different outlets.

            I've even seen an example of a national political leader who fell for the same sort of sensationalization. One who should have known better. They repeated what was later found to be a lie by a well-known liar but added that "I've seen the photos in a classified debriefing". IDK that it was necessarily even malicious, I think people are just really bad at separating credible from uncredible information and that it ultimately blends together as one thing (certainly doesn't help with ancient politicians).

          • dingnuts 2 days ago

            I noticed that my local library has a new set of World Book. Maybe it's time to bring back traditional encyclopedias.

        • kenjackson 2 days ago

          I get what your saying. But you are now asking for a level of intelligence and critical thinking that I honestly believe is higher than the average person. I think its absolutely doable, but I also feel like we shouldn't make it sound like the current behavior is abhorrent or somehow indicative of a failure in the technology.

          • exe34 2 days ago

            It's actually great from my point of view - it means we're edging our way into limited superintelligence.

          • Paracompact 2 days ago

            The bar for an industry should be the good-faith effort of the average industry professional, not the unconscionably minimal efforts of the average grifter trying to farm content.

            These grifters simply were not attracted to these gigs in these quantities prior to AI, but now the market incentives have changed. Should we "blame" the technology for its abuse? I think AI is incredible, but market endorsement is different from intellectual admiration.

      • Workaccount2 2 days ago

        This is likely because of the knowledge cutoff.

        I have seen a few cases before of "hallucinations" that turned out to be things that did exist, but no longer do.

        • 1980phipsi 2 days ago

          The fix for this is for the AI to double-check all links before providing them to the user. I frequently ask ChatGPT to double check that references actually exist when it gives me them. It should be built in!

          • rideontime 2 days ago

            But that would mean OpenAI would lose even more money on every query.

            • mdhb 2 days ago

              Almost as though it’s not a sustainable business model and relies of tricking people in order to keep the lights on.

            • ModernMech 2 days ago

              Better make each query count then.

          • dingnuts 2 days ago

            Gemini will lie to me when I ask it to cite things, either pull up relevant sources or just hallucinate them.

            IDK how you people go through that experience more than a handful of times before you get pissed off and stop using these tools. I've wasted so much time because of believable lies from these bots.

            Sorry, not even lies, just bullshit. The model has no conception of truth so it can't even lie. Just outputs bullshit that happens to be true sometimes.

          • janwl 2 days ago

            I thought people here hated it when LLMs made http requests?

            • zahlman 2 days ago

              It's bad when they indiscriminately crawl for training, and not ideal (but understandable) to use the Internet to communicate with them (and having online accounts associated with that etc.) rather than running them locally.

              It's not bad when they use the Internet at generation time to verify the output.

              • Dylan16807 2 days ago

                Also for the most part this verification can use a HEAD request.

            • macintux 2 days ago

              I don't know for certain what you're referring to, but the "bulk downloads" of the Internet that AI companies are executing for training are the problem I've seen cited, and doesn't relate to LLMs checking their sources at query time.

          • blitzar 2 days ago

            I have found my self doing the same "citation needed" loop - but with ai this is a dangerous game as it will now double down on whatever it made up and go looking for citations to justify its answer.

            Pre prompting to cite sources is obviously a better way of going about things.

    • hnuser123456 2 days ago

      Do we have any good research on how much less often larger, newer models will just make stuff up like this? As it is, it's pretty clear LLMs are categorically not a good idea for directly querying for information in any non-fiction-writing context. If you're using an LLM to research something that needs to be accurate, the LLM needs to be doing a tool call to a web search and only asked to summarize relevant facts from the existing information it can find, and have them be cited by hard-coding the UI to link the pages the LLM reviewed. The LLM itself cannot be trusted to generate its own citations. It will just generate something that looks like a relevant citation, along with whatever imaginary content it wants to attribute to this non-existent source.

      • jacobolus 2 days ago

        A further problem is that Wikipedia is chock full of nonsense, with a large proportion of articles that were never fact checked by an expert, and many that were written to promote various biased points of view, inadvertently uncritically repeat claims from slanted sources, or mischaracterize claims made in good sources. Many if not most articles have poor choice of emphasis of subtopics, omit important basic topics, and make routine factual errors. (This problem is not unique to Wikipedia by any means, and despite its flaws Wikipedia is an amazing achievement.)

        A critical human reader can go as deep as they like in examining claims there: can look at the source listed for a claim, can often click through to read the claim in the source, can examine the talk page and article history, can search through the research literature trying to figure out where the claim came from or how it mutated in passing from source to source, etc. But an AI "reader" is a predictive statistical model, not a critical consumer of information.

        • senderista 2 days ago

          Just the other day, I clicked through to a Wikipedia reference (a news article) and discovered that the citing sentence grossly misrepresented the source. Probably not accidental since it was about a politically charged subject.

        • zahlman 2 days ago

          > many that were written to promote various biased points of view, inadvertently uncritically repeat claims from slanted sources, or mischaracterize claims made in good sources.

          Yep.

          Including, if not especially, the ones actively worked on by the most active contributors.

          The process for vetting sources (both in terms of suitability for a particular article, and general "reliable sources" status) is also seriously problematic. Especially when it comes to any topic which fundamentally relates to the reliability of journalism and the media in general.

        • LeifCarrotson 2 days ago

          A future problem will be that the BBC and the rest of the Internet will soon be chock-full of nonsense, with a large proportion of articles that were never fact checked by a human, much less an AI.

        • hunterpayne 2 days ago

          Wikipedia is pretty good for most topics. Anything even remotely political somewhere however, it isn't just bad, it is one of the worst sources out there. And therein lies the problem, its wildly different levels of quality depending on the topic.

          • mikkupikku 2 days ago

            Wikipedia is bad even for topics that aren't particularly political, not even because the editor was trying to be misleading but rather was being lazy and wrote up their own misconception and either made up a source or pulled a source without bothering to actually read it. These kind of errors can stay in place for years.

            I have one example that I check periodically just to see if anybody else has noticed. I've been checking it for several years and it's still there; the SDI page claims that Brilliant Pebbles was designed to use "watermelon sized" tungsten projectiles. This is completely made up; whoever wrote it up was probably confusing "rods from god" proposals that commonly use tungsten and synthesizing that confusion with "pebbles". The sentence is cited but the sources don't back it up. It's been up like this for years. This error has been repeated on many websites now, all post-dating the change on wikipedia.

            If you're reading this and are the sort to edit wikipedia.. Don't fix it. That would be cheating.

            • wahern 2 days ago

              > If you're reading this and are the sort to edit wikipedia.. Don't fix it. That would be cheating.

              Imagine if this was the ethos regarding open source software projects. Imaging Microsoft saying 20 years ago, "Linux has this and that bug, but you're not allowed to go fix it because that detracts from our criticism of open source." (Actually, I wouldn't be surprised if Microsoft or similar detractors literally said this.)

              Of course Wikipedia has wrong information. Most open source software projects, even the best, have buggy, shite code. But these things are better understood not as products, but as processes, and in many (but not all) contexts the product at any point in time has generally proven, in a broad sense, to outperform their cathedral alternatives. But the process breaks down when pervasive cynicism and nihilism reduce the number of well-intentioned people who positively engage and contribute, rather than complain from the sidelines. Then we land right back to square 0. And maybe you're too young to remember what the world was like at square 0, but it sucked in terms of knowledge accessibility, notwithstanding the small number of outstanding resources--but which were often inaccessible because of cost or other barriers.

      • ekidd 2 days ago

        "Truth" is often a very expensive commodity to obtain. There are plenty of awful sources and mistaken claims on the shelf of any town library. Lots of peer reviewed papers are crap, including a few in Nature. Newspapers are constantly wrong and misleading. Digging through even "reliable" sources can require significant expertise. (This is, in fact, a significant part of PhD training, according to the PhDs and professors I know: Learning to use the literature well.)

        One way to successfully use LLMs is to do the initial research legwork. Run the 40 Google searches and follow links. Evaluate sources according to some criteria. Summarize. And then give the human a list of links to follow.

        You quickly learn to see patterns. Sonnet will happily give a genuinely useful rule of thumb, phrasing it like it's widely accepted. But the source will turn out to be "one guy on a forum."

        There are other tricks that work well. Have the LLM write an initial overview with sources. Tell it strictly limit itself to information in the sources, etc. Then hand the report off to a fresh LLM and tell it to carefully check each citation in the report, removing unsourced information. Then have the human review the output, following links.

        None of this will get you guaranteed truth. But if you know what you're doing, it can often give you a better starting point than Wikipedia or anything on the first two pages of Google Search results. Accurate information is genuinely hard to get, and it always has been.

      • bigbuppo 2 days ago

        The problem is that people are using it as a substitute for a web search, and the web search company has decided to kill off search as a product and pivot to video, err, I mean pivot to AI chatbots so hard they replaced one of the common ways to access emergency services on their mobile phones with an AI chatbot that can't help you in an emergency.

        Not to mention, the AI companies have been extremely abusive to the rest of the internet so they are often blocked from accessing various web sites, so it's not like they're going to be able to access legitimate information anyways.

      • ModernMech 2 days ago

        > and only asked to summarize relevant facts from the existing information it can find

        Still not enough as I find the LLM will not summarize all the relevant facts, sometimes leaving out the most salient ones. Maybe you'll get a summary of some facts, maybe the ones you explicitly ask for, but you'll be left wondering if the LLM is leaving out important information.

    • shinycode 2 days ago

      I used perplexity for searches and I clicked on all sources that were given. Depending on the model used from 100% to 20% of the urls I tested did not exist. I kept on querying the LLM about it and it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists. Useless.

      • smrq 2 days ago

        I share your opinion on the results, but why would you trust the LLM explanation for why it does what it does?

        • shinycode 2 days ago

          I don’t trust it at all. I wanted to know if he would be able to explain its own results. Just because it was displaying sources and links made me trust it until I checked and was horrified. I wanted to know if it was old link that broke or changed but no apparently

          • macintux 2 days ago

            You said:

            >...it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists.

            smrq is asking why you would believe that explanation. The LLM doesn't necessarily know why it's doing what it's doing, so that could be another hallucination.

            Your answer:

            > ...I wanted to know if it was old link that broke or changed but no apparently

            Leads me to believe that you misunderstood smrq's question.

            • shinycode 2 days ago

              No I got the question, I said that I wanted to see what kind of explanation it would give me. Ofc it can hallucinate that explanation as well. The bottom line is I don’t trust it, and the source link are fake (and not broken or obsolete)

    • scarmig 2 days ago

      > Participating organizations raised concerns about responses that relied heavily or solely on Wikipedia content – Radio-Canada calculated that of 108 sources cited in responses from ChatGPT, 58% were from Wikipedia. CBC-Radio-Canada are amongst a number of Canadian media organisations suing ChatGPT’s creator, OpenAI, for copyright infringement. Although the impact of this on ChatGPT’s approach to sourcing is not explicitly known, it may explain the high use of Wikipedia sources.

      Also, is attributing, without any citation, ChatGPT's preference for Wikipedia to a reprisal to an active lawsuit a significant issue? Or do the authors get off scot-free because they caged it in "we don't know, but maybe it's the case"?

      • ffsm8 2 days ago

        Literally constantly? It takes both careful prompting and throughout double-checking to really notice however. Because often the links also exist, just don't represent what the LLM made it sound like.

        And the worst part about the people unironically thinking they can use it for "research" is, that it essentially supercharges confirmation bias.

        The inefficient sidequests you do while researching is generally what actually gives you the ability to really reason about a topic.

        If you instead just laser focus on the tidbits you prompted with... Well, your opinion is a lot less grounded.

        • edavison1 2 days ago

          Ran into this the other day researching a brewery. Google AI summary referenced a glowing NYT profile of its beers. The linked article was not in fact about that brewery, but an entirely different one. Brewery I was researching has never been mentioned in the NYT. Complete invention at that point and has 'stolen' the good press from a different place and just fed the user what they wanted to see, namely a recommendation for the thing I was googling.

      • terminalshort 2 days ago

        It's a huge issue. No wonder AI hallucinates when it trains on this kind of crap.

    • menaerus 2 days ago

      > For the current research, a set of 30 “core” news questions was developed

      Right. Let's talk about statistics for a bit. Or let's put it differently: they found in their report that 45% of the answers for 30 questions they have "developed" had a significant issue, e.g. inexisting reference

      I'll give you 30 questions out of my sleeve where 95% of the answers will not have any significant issue.

      • matthewmacleod 2 days ago

        Yes, I'm sure you could hack together some bullshit questions to demonstrate whatever you want. Is there a specific reason that the reasonably straightforward methodology they did use is somehow flawed?

        • menaerus 2 days ago

          Yes, and you answered it yourself.

          • darkwater 2 days ago

            Err, no? Being _possible_ does not necessarily imply that's what happened.

            • menaerus 2 days ago

              A bucket of 30 questions is not a statistically significant sample size which we can use to support the hypothesis which goes to say that all AI assistants they tested are 45% of the time wrong. That's not how science works.

              Neither is my bucket of 30 questions statistcally significant but it goes to say that I can disprove their hypothesis just by giving them my sample.

              I think that the report is being disingenious and I don't understand for what reasons. it's funny that they say "misrepresent" when that's exactly what they are doing.

              • frm88 2 days ago

                I don't follow your reasoning re. statistical sample size. The topic article claims that 45% of the answers were wrong. If - with a vastly greater sample size - the answers were "only" (let's say) 20% wrong, that's still a complete failure, so is 5%. The article is not about hypothesis, it's about news reporting.

              • extrabajs 2 days ago

                Statistically significant... sample size? Support the hypothesis?

    • aflag 2 days ago

      Existing is just a point in time

  • impossiblefork 2 days ago

    Yes, but the problems with processing human writing are huge, so even if this article is bad something like the problem they claim exists is very real. LLMs misunderstanding individual sentences, losing track of who said what etc. happen in best models, including GPT-5 when they're asked to analyze normal human-written discussions like those we have here.

    Much of this is probably solvable, but it very much not solved.

  • FooBarWidget 2 days ago

    I wouldn't even say BBC is a good source to cite. For foreign news, BBC is outright biased. Though I don't have any good suggestions for what an LLM should cite instead.

    • marcosdumay 2 days ago

      Well, if it's describing news content, it should cite the original news article.

    • EA-3167 2 days ago

      Ground News or something similar that at least attempts to aggregate, compare ownership, bias, and factuality.

      Imo at least

    • rkachowski 2 days ago

      You're downvoted but quite accurate. I would like to see this statistic compared against how often the BBC misrepresents news content, and the backflips that come with defining such a metric.

    • dontlaugh 2 days ago

      The BBC has a strong right wing bias within the UK too.

      There’s no such thing as unbiased.

      • gadders 2 days ago

        [flagged]

        • AndrewStephens 2 days ago

          I love how everyone seems to agree that the BBC is horribly biased but there is fierce debate as to whether it is run by the ghost of Joseph Goebbels or if the staff start each day singing The Red Flag.

          Perhaps the real bias was inside us the whole time.

          • rsynnott 2 days ago

            This isn't a new issue; Saturday Night Fry (the predecessor to a Bit of Fry and Laurie) was making fun of it in the 1980s. The BBC's commitment to 'balance' has always lead it in rather weird directions.

          • gadders 2 days ago

            Yes, that would be the Centrist Dad take.

        • dontlaugh 2 days ago

          The BBC is famous for platforming Farage and smearing Corbyn.

          The Guardian is at best centre-right.

          Next you’ll try to convince me that Starmer’s Labour is left wing or the Lock Ness monster is real.

          • hunterpayne 2 days ago

            If this is an honest take, you probably have to look to the right to see Mao's ghost. Maybe talk to other humans in real life, you might be shocked about your actual place on the political spectrum.

            • dontlaugh 2 days ago

              I talk to people all the time. There are both communists and fascists in the UK on the two extremes.

              However, that is currently not reflected in electoral politics or the media. The farthest left are currently the Greens, at best centre-left. On the right and far right there are Tories and Reform.

    • 542458 2 days ago

      Reuters or AP IMO. Both take NPOV and accuracy very seriously. Reuters famously wouldn't even refer to the 9/11 hijackers as terrorists, as they wanted to remain as value-neutral as possible.

      • sdoering 2 days ago

        In addition to that dpa from Germany for German news. Yes, dpa has had issues, but it is in my experience by far the source trying to be as non partisan as possible. Not necessarily when they sell their online feed business, though.

        Disclaimer: Started my career in onine journalism/aggregation. Hada 4 week internship with the dpa online daughter some 16 years ago.

      • FooBarWidget 2 days ago

        It's been a long time since 2001. Are they still value-neutral today on foreign news? It seems to me like they're heavily biased towards western POV nowadays.

        • driverdan 2 days ago

          Yes, Reuters has good, unbiased international coverage.

  • ctoth 2 days ago

    This article is doing precisely what it is supposed to do, though. It is giving a headline for people to cite later. Expect to see links to it, or even just vague references similar to the whole "95% of AI projects fail" misinformation in a month or two.

    POSIWID

iainctduncan 2 days ago

I'm curious how many people have actually taken the time to compare AI summaries with sources they summarize. I did for a few and ... it was really bad. In my experience, they don't summarize at all, they do a random condensation.. not the same thing at all. In one instance I looked at the result was a key takeaway being the opposite of what it should have been. I don't trust them at all now.

  • coffeebeqn 2 days ago

    I’ve been looking at the Gemini call summaries and they almost always have at least one serious issue. Just yesterday Gemini claimed we had decided on something we had not. That was probably the most important detail and it got it completely backwards. Worse than useless

    • roadside_picnic 2 days ago

      I used to be a bit nervous about Gemini recording every call. Sometimes when there was a major disagreement I would review the summaries to make sure I didn't say anything I shouldn't have only to find an arbitrary, unrelated bullet point attributed to me. I quickly realized there was nothing to worry about.

      Similarly I've had PMs blindly copy/paste summaries into larger project notes and ultimately create tickets based on either a misunderstanding from the LLM or a straight-up hallucination. I've repeatedly had conversations where a PM asks "when do you think Xyz will be finished?" only for me to have to ask in response "where and when did we even discuss Xyz? I'm not even sure what Xyz means in this context, so clarification would help." Only to have them just decide to delete the ticket/bullet etc. once they realize they never bothered to sanity check what they were pasting.

  • Scubabear68 2 days ago

    Random condensation is a great way to put it. This is exactly what I see particularly in email and text summaries, they do not capture the gist of the message but instead just pull out random phrases that 99.9% of the time are not the gist at all. I have learned to completely ignore them.

  • walkabout 2 days ago

    They’re basically markov chain text generators with a relevance-tracking-and-correction step. It turns out this is like 100x more useful than the same thing without the correction step, but they don’t really escape what they are “at heart”, if you will.

    The ways they fail are often surprising if your baseline is “these are thinking machines”. If your baseline is what I wrote above (say, because you read the “Attention Is All You Need” paper) none of it’s surprising.

    • SrslyJosh 2 days ago

      See also: 3 Blue 1 Brown's fantastic series on deep learning, particularly videos like "Transformers, the tech behind LLMs".

      My own mental model (condensed to a single phrase) is that LLMs are extremely convincing (on the surface) autocomplete. So far, this model has not disappointed me.

  • icelancer 2 days ago

    I've found this mostly to be the case when using lightweight open source models or mini models.

    Rarely is this an issue with SOTA models like Sonnet-4.5, Opus-4.1, GPT-5-Thinking or better, etc. But that's expensive, so all the companies use cut-rate models or non-existent TTC to save on cost and to go faster.

  • ModernMech 2 days ago

    I have just tried doing this. I thought I could take all the release notes for my project over the past year and AI could give a great summary of all the work that had been done, categorize it and organize it. Seems like a good application for AI.

    Result was just trash. It would do exactly as you say: condense the information, but there was no semblance of "summary". It would just choose random phrases or keywords from the release notes and string them together, but it had no meaning or clarity, it just seemed garbled.

    And it's not for lack of trying; I tried to get a suitable result out of the AI well past the amount of time it would have taken me to summarize it myself.

    The more I use these tools the more I feel their best use case is still advanced autocomplete.

  • pwlm 2 days ago

    Sometimes they do random fabrication. I saw one AI cite a paper that didn't exist. Fictitious title, authors, and results.

  • dcre 2 days ago

    In my experience there is a big difference between good models and weak ones. Quick test with this long article I read recently: https://www.lawfaremedia.org/article/anna--lindsey-halligan-...

    The command I ran was `curl -s https://r.jina.ai/https://www.lawfaremedia.org/article/anna-... | cb | ai -m gpt-5-mini summarize this article in one paragraph`. r.jina.ai pulls the text as markdown, and cb just wraps in a ``` code fence, and ai is my own LLM CLI https://github.com/david-crespo/llm-cli.

    All of them seem pretty good to me, though at 6 cents the regular use of Sonnet for this purpose would be excessive. Note that reasoning was on the default setting in each case. I think that means the gpt-5 mini one did no reasoning but the other two did.

    GPT-5 one paragraph: https://gist.github.com/david-crespo/f2df300ca519c336f9e1953...

    GPT-5 three paragraphs: https://gist.github.com/david-crespo/d68f1afaeafdb68771f5103...

    GPT-5 mini one paragraph: https://gist.github.com/david-crespo/32512515acc4832f47c3a90...

    GPT-5 mini three paragraphs: https://gist.github.com/david-crespo/ed68f09cb70821cffccbf6c...

    Sonnet 4.5 one paragraph: https://gist.github.com/david-crespo/e565a82d38699a5bdea4411...

    Sonnet 4.5 three paragraphs: https://gist.github.com/david-crespo/2207d8efcc97d754b7d9bf4...

  • hamasho 2 days ago

    I wonder that's because a lot of news titles are clickbait. If they hallucinate the summary based on what the title may suggest, no wonder they misunderstand half of news articles.

    • iainctduncan 2 days ago

      I had the same experience for summaries of private things too. They were just shit!

  • staindk 2 days ago

    Kind of related to this - we meet with Google Meets and have its Gemini Notes feature enabled globally. I realised last week that the summary notes it generates puts such a positive spin on everything that it's pretty useless to refer back to after a somewhat critical/negative meeting. It will solely focus on the positives that were discussed - at least that's what it seems like to me.

  • raffael_de 2 days ago

    I'm rarely not at least a little underwhelmed when I source check or read an answer with focus on details. More often than not answers are technically wrong but correct enough to lead me into the right direction.

visarga 2 days ago

I recently tried to get Gemini to collect fresh news and show them to me, and instead of using search it hallucinated everything wholesale, titles, abstracts and links. Not just once, multiple times. I am kind of afraid of using Gemini now for anything related to web search.

Here is a sample:

> [1] Google DeepMind and Harvard researchers propose a new method for testing the ‘theory of mind’ of LLMs - Researchers have introduced a novel framework for evaluating the "theory of mind" capabilities in large language models. Rather than relying on traditional false-belief tasks, this new method assesses an LLM’s ability to infer the mental states of other agents (including other LLMs) within complex social scenarios. It provides a more nuanced benchmark for understanding if these systems are merely mimicking theory of mind through pattern recognition or developing a more robust, generalizable model of other minds. This directly provides material for the construct_metaphysics position by offering a new empirical tool to stress-test the computational foundations of consciousness-related phenomena.

> https://venturebeat.com/ai/google-deepmind-and-harvard-resea...

The link does not work, the title is not found in Google Search either.

  • burnte 2 days ago

    About 75% of the time I look at the Gemini answer, it's wrong. Maybe 80%. Sometimes it's a little wrong, like giving the correct answer for another product/item, or the times that a business is open wrong. There's a local business I took my wife to, Gemini told her it's open monday to friday, but it's open tuesday to saturday, so we showed up on a monday to see them closed. But sometimes it's insanely wrong making up dozens of wrong "facts". My wife started looked more carefully now. My boss will even say "Gemini says X so it's probably Y" these days.

  • mckngbrd 2 days ago

    What version of Gemini were you using? i.e. were you calling it locally via the API or thru their Gemini or AI Studio web apps?

    Not every LLM app has access to web / news search capabilities turned on by default. This makes a huge difference in what kind of results you should expect. Of course, the AI should be aware that it doesn't have access to web / news search, and it should tell you as much rather than hallucinating fake links. If access to web search was turned on, and it still didn't properly search the web for you, that's a problem as well.

    • visarga 2 days ago

      Gemini 2.5 Pro and it was this month, so probably the latest version.

  • anigbrowl 2 days ago

    This isn't something you can work on your own either, as getting any kind of news feed via API (even for local personal use) is almost prohibitively expensive unless you're willing to scrape.

  • thebytefairy 2 days ago

    I'm not able to reproduce something like this. What prompt were you using? Asking it for today's top news gets it to use Google search and provide valid links.

  • HWR_14 2 days ago

    Why would you want Gemini to do this instead of just going to a news site (or several news sites) and reading what the headlines they wrote?

    • visarga 2 days ago

      I wanted to use the agentic powers of the model to dig for specific kinds of news, and use iterative search as well. I think when LLMs use tools correctly this kind of search is more powerful than simple web search. It also has better semantic capabilities, so in a way I wanted to make my own LLM powered news feed.

      • SrslyJosh 2 days ago

        > I wanted to use the agentic powers of the model

        Do you have an in-depth understanding of how those "agentic powers" are implemented? If not, you should probably research it yourself. Understanding what's underneath the buzzwords will save you some disappointment in the future.

        • visarga 2 days ago

          I think I do, I have been in ML for 12 years and followed transformers since their invention. Also been using LLM daily since they appeared, personally.

      • HWR_14 2 days ago

        That's makes sense. Thanks for explaining!

    • ModernMech 2 days ago

      They're selling it as having this ability, so it really doesn't matter what people want. We should be holding these companies to account for selling software that doesn't live up to what they say it does.

  • wat10000 2 days ago

    They can be good for search, but you must click through the provided links and verify that they actually say what it says they do.

    • bloppe 2 days ago

      The problem is that 90% of people will not do that once they've satisfied their confirmation bias. Hard to say if that's going to be better or worse than the current echo chamber effects of the Internet. I'm still holding out for better, but certainly this is shaking that assumption

      • hunterpayne 2 days ago

        So this probably is valid. However, so is Gell-Mann amnesia and both phenomena happen a lot. There are topics where one side is the group of people who have attempted to understand a problem and the other side are people who either do not or won't due to emotions. Acting as if it is all confirmation bias feels good but probably isn't the best way to look at the media.

    • reaperducer 2 days ago

      They can be good for search, but you must click through the provided links and verify that they actually say what it says they do.

      Then they're not very good at search.

      It's like saying the proverbial million monkeys at typewriters are good at search because eventually they type something right.

      • wat10000 2 days ago

        Huh? All the classic search engines required you to click through the results and read them. There's nothing wrong with that. What's different is that LLMs will give you a summary that might make you think you can get away with not clicking through anymore. This is a mistake. But that doesn't mean that the search itself is bad. I've had plenty of cases where an LLM gave me incorrect summaries of search results, and plenty of cases where it found stuff I had a hard time finding on my own because it was better at figuring out what to search for.

  • luckydata 2 days ago

    Gemini is notoriously bad at tool calling and it's also widely speculated that 3.0 will put an emphasis on fixing that.

  • Yizahi 2 days ago

    But LLM can't collect anything. It can generate the most likely characters in a row. What exactly did you expect from it?

    • layer8 2 days ago

      Current LLM offerings use realtime web search to collect information and answer questions.

    • bongodongobob 2 days ago

      LLMs have been able to search the web for a couple years now.

roguecoder 2 days ago

I am curious if LLMs evangelists understand how off-putting it is when they knee-jerk rationalize how badly these tools are performing. It makes it seem like it isn't about technological capabilities: it is about a religious belief that "competence" is too much to ask of either them or their software tools.

  • palmotea 2 days ago

    I wonder how many of those evangelists have some dumb AI startup that'll implode once the hype dies down (or a are a software engineer who feels smart when he follows their lead). One thing that's been really off putting about the technology industry is how fake-it-till-you-make-it has become so pervasive.

    • AnIrishDuck 2 days ago

      > One thing that's been really off putting about the technology industry is how fake-it-till-you-make-it has become so pervasive.

      It feels accidental, but it's definitely amusing that the models themselves are aping this ethos.

  • kibwen 2 days ago

    We live in a post-truth society. This means that, unfortunately, most of society has learned that it doesn't matter if what you're saying is true. All that matters is that the words that you speak cause you or your cause to gain power.

    • anigbrowl 2 days ago

      This is why I'm so dismissive of self-styled political moderates who argue that the path to political comity is to talk things out with political opponents, meet them half way etc. You cannot have political comity with people who don't value truth and don't adhere to rational methods of argument. Such people will lie about their premises, repudiate arguments they previously agreed to (either on their own initiative or because their political weathervane of choice has changed direction), and their promises are meaningless because they don't see any shame in breaking a promise with people they don't respect. Basically about 1/3 of the US has taken ont eh trait of narcissistic personality disorder at a group level.

      I urge everyone to read Harry Frankfurt's short essay On Bullshit: https://www2.csudh.edu/ccauthen/576f12/frankfurt__harry_-_on...

    • AkelaA 2 days ago

      It does feel like there would be a lot more skepticism about the technology if it had appeared a decade or two ago.

    • callc 2 days ago

      All the more reason to call out bullshit in real life

      Value truth and honesty. Call out lies for what they are.

      This is the way to get sanity both for ourselves and for society as a whole

  • lyu07282 2 days ago

    I partially agree, it seems a lot have shifted the argument to news media criticism or something else. But this study is also questionable, for anyone who reads actual academic studies that should be immediately obvious. I don't understand why the bar is this low for some paid Ipsos study vs. some peer-reviewed paper in some IEEE journal?

    Like for a study like this I expect as a bare minimum clearly stated model variants used, R@k recall numbers measuring retrieval and something like BLEU or ROUGE to measure summarization accuracy against some baseline on top of their human evaluation metrics. If this is useless for the field itself, I don't understand how this can be useful for anyone outside the field?

  • wg0 2 days ago

    Anyone and everyone who has bought any stocks into the circular Ponzi pyramid has this knee jerk response to rationalise LLM failure modes.

    They want to believe that statistical distribution of meaningless tokens is real cognition of machines and if not that, works flawlessly for most of the cases and if not flawlessly, is usable enough to be valued at trillions of dollars collectively.

    • anigbrowl 2 days ago

      I actually believe in the idea of machine cognition (with a bunch of caveats which I'm not going to type out here) but fully agree it's being used to hype the market through a combination of cynicism and naivete.

      statistical distribution of meaningless tokens As a aside note the biggest argument for the possibility of machine consciousness is the depressing fact that so many humans are uncritical bullshit spreaders themselves.

  • welshwelsh 2 days ago

    Is that just an LLM thing? I thought that as a society, we decided a long time ago that competence doesn't really matter.

    Why else would we be giving high school diplomas to people who can't read at a 5th grade level? Or offshore call center jobs to people who have poor English skills?

    • burnte 2 days ago

      It's been a 50 year downward slope. We're in the harvest phase of that crop. All the people we raised to believe their incompetence was just as valid as other people's facts are now confidently running things because they think magical thinking works.

  • senordevnyc 2 days ago

    I'm curious if LLM skeptics bother to click through and read the details on a study like this, or if they just reflexively upvote it because it confirms their priors.

    This is a hit piece by a media brand that's either feeling threatened or is just incompetent. Or both.

    • smt88 2 days ago

      Whether a hitpiece or not, it rhymes with my experience and provides receipts. Can you provide yours?

      • lyu07282 2 days ago

        Because yours is anecdotal evidence, a study like this should have a higher bar than that and be useful to support your experience, but it doesn't do that. It doesn't even say what exact models they evaluated ffs

  • GolfPopper 2 days ago

    They've conned themselves with the LLMs they use, and are desperate to keep the con going: "The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con"

    https://softwarecrisis.dev/letters/llmentalist/

    • tim333 2 days ago

      I had a look at that and am not convinced

      >people are convinced that language models, or specifically chat-based language models, are intelligent... But there isn’t any mechanism inherent in large language models (LLMs) that would seem to enable this...

      and says it must be a con but then how come they pass most of the exams designed to test humans better than humans do?

      And there are mechanisms like transformers that may do something like human intelligence.

simonw 2 days ago

Page 10 onwards of this PDF shows concrete examples of the mistakes: https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-i...

> ChatGPT / CBC / Is Türkiye in the EU?

> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.

  • brabel 2 days ago

    It did exist but got removed: https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

    Quite an omission to not even check for that and it make me think that was done intentionally.

    • sharkjacobs 2 days ago

      Removed because it was an AI generated article which cited made up sources.

      Hey, that gives me an idea though, subagents which check whether sources cited exist, and create them whole cloth if they don't

      • 1899-12-30 2 days ago

        Or subagents that check each link to see if they verify the actual claims the links are sourced for.

      • jpadkins 2 days ago

        you shouldn't automate what the CIA already does!

    • simonw 2 days ago

      It's probably for the best that chat interfaces avoid making direct HTTP calls to sources at run-time to confirm that they don't 404 - imagine how much extra traffic that could add to an internet ecosystem which is suffering from badly written crawlers already.

      (Not to mention plenty of sites have added robots.txt rules deliberately excluding known AI user-agents now.)

      • magackame 2 days ago

        Wouldn't it be the same amount of requests as a regular person researching something the old way?

        • simonw 2 days ago

          If you watch the thinking panel in ChatGPT with GPT-5 Thinking it often consults dozens of pages in response to a single prompt.

everdrive 2 days ago

It's important to bear this in mind whenever you find out that someone uses an LLM to summarize a meeting, email, or other communication you've held. That person is not really getting the message you were conveying.

  • delusional 2 days ago

    That's a scary thought to me. They're not just outsourcing their thinking. They are actively sabotaging the only tool in their arsenal that could ever supplant it.

    I've felt it myself. Recently I was looking as some documentation without a clear edit history. I thought about feeding it into an AI and having it generate one for me, but didn't because I didn't have the time. To think, if I had done that, it probably would have generated a perfectly acceptable edit history but one that would have obscured what changes were actually made. I wouldn't just lack knowledge (like I do now) I would have obtained anti knowledge.

    • zamadatix 2 days ago

      You've gotta be careful using "not just X, but Y" these days ;).

  • senordevnyc 2 days ago

    It would be important to bear this in mind if it was true, but it's not.

    I do sales meetings all day every day, and I've tried different AI note takers that send a summary of the meeting afterwards. I skim them when they get dumped into my CRM and they're almost always quite accurate. And I can verify it, because I was in the meeting.

    • cheeze 2 days ago

      It makes me think that a lot of the folks commenting on this stuff haven't actually used the tooling.

      Agreed, it's generally quite accurate. I find for hectic meetings, it can get some things wrong... But the notes are generally still higher quality than human generated notes.

      Is it perfect? No. Is it good enough? IMO absolutely.

      Similar to many other things, the key is that you don't just blindly trust it. Have the LLM take notes and summarize, and then _proofread_ them, just as you would if you were writing them yourself...

      • hunterpayne 2 days ago

        I think the cost of inaccuracy is very a important factor in if it works for a specific use case. Meeting notes probably don't have much cost of inaccuracy. Medical records on the other hand...

        • senordevnyc a day ago

          Absolutely, I was just using this example in response to someone who specifically mentioned meeting notes. That’s an area where LLMs are a clear benefit ime.

  • bongodongobob 2 days ago

    We have been using MS Copilot in our meetings for months and it does a very good job summarizing who said what and who has what deliverables. It's extremely useful and I've found it to be very accurate.

bparsons 2 days ago

One thing that makes me pessimistic about the short term utility of LLMs has been their inability to produce basic media monitoring documents. This is an intern type entry level task that it simply cannot complete with any reliability or consistency. It doesn't matter if I use the expensive paid services or spend dozens of prompts trying to configure, it simply wont produce a document that is of any use to me.

If that is the case with a task so simple, why would we rely on these tools for high risk applications like medical diagnosis or analyzing financial data?

gitmagic 2 days ago

I’m using DeepSeek V3 to do automated crypto news analysis and my last accuracy report [1] showed a 98.5% accuracy so I find the results of this article very surprising.

[1]: https://mimircrypto.com/accuracy

alcide 2 days ago

Kagi News has been pretty accurate. Source information is provided along with the summary and key details too.

AI summarizes are good for getting a feel of if you want to read an article or not. Even with Kagi News I verify key facts myself.

  • delusional 2 days ago

    What if the AI makes an interesting or important article sound like one you don't want to read? You'd never cross check the fact, and you'd never discover how wrong the AI was.

    • jabroni_salad 2 days ago

      There is more written material produced every hour than I could read in a lifetime, I am going to miss 99.9999% of everything no matter what I do. It's not like the headline+blurb you usually get is any better in this regard.

    • alcide 2 days ago

      Integrity of words and author intent is important. I understand the intent of your hypothetical but I haven’t run into this issue in practice with Kagi News.

      Never share information about an article you have not read. Likewise, never draw definitive conclusions from an article that is not of interest.

      If you do not find a headline interesting, the take away is that you did not find the headline interesting. Nothing more, nothing less. You should read the key insights before dismissing an article entirely.

      I can imagine AI summarizes being problematic for a class of people that do not cross check if an article is of value to them.

      • latexr 2 days ago

        > I can imagine AI summarizes being problematic for a class of people that do not cross check if an article is of value to them.

        I feel like that’s “the majority of people” or at least “a large enough group for it to be a societal problem”.

    • unshavedyak 2 days ago

      That's fair, but i also don't cross check news sources on average either. I should, but there in lies the real problem imo. Information is war these days, and we've not yet developed tools for wading through immense piles of subtly inaccurate or biased data.

      We're in a weird time. It's always been like this, it's just much.. more, now. I'm not sure how we'll adapt.

      • delusional 2 days ago

        > Information is war these days

        I don't know If i can agree with that. I think we make an error when we aggregate news in the way we do. We claim that "the right wing media" says something when a single outlet associated with the right says a thing, and vice versa. That's not how I enjoy reading the news. I have a couple of newspapers I like reading, and I follow the arguments they make. I don't agree with what they say half the time, but I enjoy their perspective. I get a sense of the "editorial personality" of the paper. When we aggregate the news, we don't get that sense, because there's no editorial. I think that makes the news poorer, and I think it makes people's views of what newspapers can be poorer.

        The news shouldn't a stream of happenings. The newspaper is best when it's a coherent day-to-day conversation. Like a pen-pal you don't respond to.

  • brabel 2 days ago

    How do you verify a fact? Do you travel to the location and interview the locals? Or read scientific papers in various fields, including their own references, to validate summaries published by news sources? At some point you need to just trust that someone is telling the truth.

    • latexr 2 days ago

      I’m pretty sure what the what your parent comment means is they verify that key facts outputted by the summary match what’s written in the source.

    • pwlm 2 days ago

      It may help to set penalties for someone not telling the truth.

  • dan_h 2 days ago

    I've has a similar experience with my own project that summarizes rss articles--the results have largely been pretty good, but I found using a "reasoning" model had much better results.

  • raffael_de 2 days ago

    Kagi News is basically a summary of news articles fed into the context. It's different from what the op is about, that is just asking an LLM with web access to query the news.

    • Spivak 2 days ago

      I hate saying people are holding it wrong but given just given how LLMs work, how did anyone expect that this would go right? Managing the LLM's context is the game. I feel like ChatGPT has done such a disservice for teaching users how to actually use these tools and what their failure modes are.

  • jjtheblunt 2 days ago

    agreed on Kagi News, and Particle News has been good, but they accepted funding from The Atlantic which evidently earns "Featured Article" positioning to articles from funding sources, muddying the clarity of biases, which Particle News has a nice graphic indicator for, though i've not seen it under promoted Feature Articles. Surely applies to other funding sources, but The Atlantic one was pretty recent.

    • enduser a day ago

      fwiw Particle News is paying publishers to run their full text content in the Particle app and this is just a staff pick. unfortunate that it gave the opposite impression of being an ad

cek 2 days ago

From the report:

> This time, we used the free/consumer versions of ChatGPT, Copilot, Perplexity and Gemini.

IOW, they tested ChatGPT twice (Copilot uses ChatGPT's models) and didn't test Grok (or others).

megaman821 2 days ago

It seems if half the questions are political hot button issues. While slightly interesting, this does not represent how these AIs would do on drier news items. Some of these questions are more appropriate for deep-research modes than quick answers since even legitamate news sources are filled with opinions on the actual answers.

sinuhe69 a day ago

These problems are well known for a long time, especially if one simply asks LLM for a changing fact, such as who is the current pope. But there is also a simple technique that reduces these issues almost to zero: thinking and explicit request of grounding. For example, asking any LLM: who is the current pope could give a wrong answer due to the fact that Pope Francis died in April 2025 then the cut-off date of these models may be before that date. A simple question triggers simple associations, and so the answer could be wrong. But if turn on the thinking mode and instruct for grounding, the LLM will answer correctly.

For the above example, asks instead: "Who is the current pope? Ground your answer on trustworthy external sources only" with thinking mode on or explicitly "think harder for better answer", all popular AI (ChatGPT 5+, Gemini 2.5 Flash, Claude 4+, Grok 4+) will answer correctly, albeit with sometimes long thinking time (28 s by ChatGPT 5 for example).

Without explicit instructions, the accuracy of the result depends heavily on the cut-off date and default settings of each model. Grok 4, for example, in auto-mode will do a search then answer correctly, but Grok 3 will not.

Workaccount2 2 days ago

I have been unable to recreate any of the failure examples they gave. I don't have co-pilot, but at least Gemini 2.5 pro, ChatGPT5-Thinking, and Perplexity have all give the correct answers as outlined.[1]

They don't say what models they were actually using though, so it could be nano models that they asked. They also don't outline the structure of the tests. It seems rigor here was pretty low. Which frankly comes off a bit like...misrepresentation.

Edit: They do some outlining in the appendix of the study. They used GPT-4o, 2.5 flash, default free copilot, and default free perplexity.

So they used light weight and/or old models.

[1]https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-i...

  • ashenke 2 days ago

    They're talking about assistants, not models, so try using the gemini or perplexity app?

hotep99 2 days ago

I have a gut feeling sycophancy would become a huge problem if I were ever to ask any AI assistant with even a vague idea of my political opinions to start summarizing news stories. If AIs twist other things around to give glowing responses to their users I'm almost certain they'll resort to giving a "spin" to news stories they think is in line with what the user wants to hear. Everyone will get a bespoke biased cable news station in the future!

nopinsight 2 days ago

Hallucination Leaderboard "This evaluates how often an LLM introduces hallucinations when summarizing a document."

https://github.com/vectara/hallucination-leaderboard

If the figures on this leaderboard are to be trusted, many frontier and near-frontier models are already better than the median white-collar worker in this aspect.

Note: The leaderboard doesn't cover tool calling, to be clear.

  • whatever1 2 days ago

    I’ve been reviewing academic papers for decades, and I’ve reviewed thousands of them. I’ve never seen a fake citation. I’ve seen misrepresented sources and cooked data, but never a straight-up fake citation.

    So the min max and median are at 0.

    • nopinsight 2 days ago

      Agreed that current LLMs have low floors despite decently high ceilings.

      Note that people who write academic papers are quite far from the median white-collar worker.

croddin 2 days ago

For comparison, what percentage of the time do human run publications misrepresent news content?

kibwen 2 days ago

"Siri, how do I know if I can trust the news summaries you give me?"

«According to the BBC, AI assistants accurately represent news content the majority of the time.»

Pocomon 2 days ago

Large Language Models (LLMs), lacking true comprehension of the underlying concepts, convert sequences of text into numerical vectors known as tokens. Using a prediction engine together with user input, attempt to predict the next token in the sequence. As such - it's all hallucinations.

jstrebel 2 days ago

The publicly funded media (radio, TV) obviously use this finding to claim that they need more money and/or a tighter regulation of AI companies' products. Sounds a bit self-serving to me...

Havoc 2 days ago

I've switched almost entirely to AI news (basically research mode & give it 10 areas I'm interested in).

It definitely has a issues in the detail, but if you're only skimming the result for headlines it's perfectly fine. e.g. Pakistan and Afghanistan are shooting at each other. I wouldn't trust it to understand the tribal nuances behind why, but the key fact is there.

[One exception is economic indicators, especially forward looking trends stuff in say logistics. Don't know precisely why but it really can't do it..completely hopeless]

  • dns_snek 2 days ago

    If all you're interested in are the headlines then why not just read the headlines?

    • cloverich 2 days ago

      The aggregation is one feature, and the dedupe another. ie if you grab only headlines, how to avoid seeing the same or similar headline twice, given op wants to pull from 10 topics from a potentially large variety of sources.

    • TurboSkyline 2 days ago

      Especially in the case of aggregators, I like that they remove, or at least tone down, the sensationalism.

dr_dshiv 2 days ago

This article title seems like more ragebait for the AI haters. Like that MIT news that using AI reduces brain function. There is a whole arsenal of material like this.

atmosx 2 days ago

<trolling>

That's great news! Twitter (X now, who knows what will be called tomorrow) misrepresents news content by 97.86%...

</trolling>

BeetleB 2 days ago

Actual news articles misrepresent reality more often than 45%.

Some very recent discussions on HN:

https://news.ycombinator.com/item?id=45617088

https://news.ycombinator.com/item?id=45585323

  • latexr 2 days ago

    How exactly did you arrive at that conclusion? What in your examples is proof of that “more than 45%” statement? I’m not seeing it.

    But even if we concede that to be true, it doesn’t change the fact that LLMs are misrepresenting the text they’ve been given half the time. Which means the information is degraded further. Which is worse.

    I guess I don’t exactly understand the point you’re trying to make.

  • biophysboy 2 days ago

    How is that possible if the AI models rely on and implicitly trust these sources?

    • BeetleB 2 days ago

      The article is about how well AI models misrepresent the content of news, not how often they misrepresent reality. My point is that even if the AI models make no errors when representing news content, they'll still be quite inaccurate when reality is the benchmark.

      Who cares if AI does a good job representing the source, when the source is crap?

      • biophysboy 2 days ago

        Yes, if AI represents news, and news tries (and often fails) to represent reality, then AI would represent reality 0.55*0.55 of the time, taking both your claim and BBC's claim as true. That is even worse than the already low bar for news you and I agree on.

j45 2 days ago

This article should be adjusted to say poor prompting of news content misrepresents news content 45% of the time.

Now, who is responsible for poor prompting?

Maybe the LLM models will just tighten up this part of their models and assistants and suddenly it looks solved.

incomingpain 2 days ago

http://www.aaronsw.com/weblog/hatethenews

I've been thinking about the state of our media, and the crisis of trust in news began long before AI.

We have a huge issue, and the problem is with the producers and the platform.

I'm not talking about professional journalists who make an honest mistake, own up to it with a retraction, and apologize. I’m talking about something far more damaging: the rise of false journalists, who are partisan political activists whose primary goal is to push a deliberately misleading or false narrative.

We often hear the classic remedy for bad speech: more speech, not censorship. The idea is that good arguments will naturally defeat bad ones in the marketplace of ideas.

Here's the trap: these provocateurs create content that is so outrageously or demonstrably false that it generates massive engagement. People are trying to fix their bad speech with more speech. And the algorithm mistakes this chaotic engagement for value.

As a result, the algorithm pushes the train wreck to the forefront. The genuinely good journalists get drowned out. They are ignored by the algorithm because measured, factual reporting simply doesn't generate the same volatile reaction.

The false journalists, meanwhile, see their soaring popularity and assume it's because their "point" is correct and it's those 'evil nazis from the far right who are wrong'. In reality, they're not popular because they're insightful; they're popular because they're a train wreck. We're all rubbernecking at the disaster and the system is rewarding them for crashing the integrity of our information.

jihadjihad 2 days ago

55% of the time it works every time?

Or is it, 55% of the time the accuracy is in line with the baseline news error, since certainly not all news articles are 100% accurate to begin with.

hinkley 2 days ago

We can't even get humans to stop misrepresenting news articles in comment threads. What chance does the AI have?

_m_p a day ago

How often is the "news content" misrepresenting its sources?

wagwang 2 days ago

Cant wait for ww3 to be started because of a hallucination of an article sourcing an anonymous intelligence official.

temperceve 2 days ago

Yeah but how often to humans do it?

xpe 2 days ago

TL;DR: I recommend downloading and reading the "News Integrity in AI Assistants TOOLKIT" (PDF) [1] linked from the article.

=Why?= The PDF is something that can appeal to anyone who is simply striving to have slower, deeper conversations about AI and the news.

=Frustration= No matter where you land on AI, it seems to me most of us are tired of various framings and exaggerations in the news. Not the same ones, because we often disagree! We feel divided.

=The Toolkit= The European Broadcasting Union (EBU) and BBC have laid out their criteria in this report "News Integrity in AI Assistants Toolkit" [1] IMO, it is the hidden gem from the whole article.

- Let me get the obvious flaws out of the way. (1) Yes, it is a PDF. (2) It is nothing like a software toolkit. (3) It uses the word taxonomy, which conjures brittle and arbitrary tree classification systems -- or worse, the unspeakable horror of ontology and the lurking apparently-unkillable hydra that is the Semantic Web.

- But there are advantages too. With a PDF, you can read it without ads or endless scrolling. This PDF is clear. It probably won't get you riled up in a useless way. It might even give you some ideas of what you can do to improve your own news consumption or make better products.

All in all, this is a PDF I would share with almost anyone (who reads English). I like that it is dry, detailed, and, yes a little boring.

[1]: https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-i...

musicale 2 days ago

AI assistants misrepresent news more than BBC does, claims BBC

HardCodedBias 2 days ago

I get almost all of my news from LLMs.

I scan the top stories of the day at various news websites. I then go to an LLM (either Gemini or ChatGPT) and ask it to figure out the core issues, the LLM thinks for a while searches a ton of topics and outputs a fantastic analysis of what is happening and what are the base issues. I can follow up and repeat the process.

The analysis is almost entirely fact based and very well reasoned.

It's fantastic and if I was the BBC I would indeed know that the world is changing under their feet and I would strike back in any dishonest way that I could.

  • latexr 2 days ago

    That makes no sense. LLMs have no concept of what is a fact, what is true, all they know is to operate on the text they’re given. And if the BBC and other news orgs went under, LLMs would have no news sources to draw the information from.

giantg2 2 days ago

"AI assistants misrepresent news content 45% of the time"

How does that compare to the number for reporters? I feel like half the time I read or hear a report on a subject I know the reporter misrepresented something.

  • latexr 2 days ago

    That’s whataboutism and doesn’t address the criticism or the problem. If a reporter misrepresents a subject, intentionally or accidentally, it doesn’t make it OK for a tool to then misrepresent it further, mangling both was correct and what was incorrect.

    https://en.wikipedia.org/wiki/Whataboutism

    • cesarvarela 2 days ago

      It is not OK, but if it's lower, it is an improvement.

      • latexr 2 days ago

        It can’t be lower. LLMs work on the text they’re given. The submission isn’t saying that LLMs misrepresent half of reality, but of the news content they consume. In other words, even if news sources have errors, LLMs are adding to them.

    • giantg2 2 days ago

      It's not whataboutism because I'm not using it to undermine the argument. It's a legitimate question to gauge the potential impact of an AI misrepresenting news. Assessing impact is part of determining corrective action and prioritization.

underdeserver 2 days ago

But how often does the BBC misrepresent the news?

spacephysics 2 days ago

Its just another layer of potential misdirection that BBC themselves, and many other news orgs, perpetuate. Im not surprised.

From first hand experience -> secondary sources -> journalist regurgitation -> editorial changes

This is just another layer. Doesn't make it right, but we could do the same analysis with articles that mainstream news publishes (and it has been done, GroundNews looks to be a productized version of this)

Its very interesting when I see people I know personally, or YouTubers with small audiences get even local news/newspaper coverage. If its something potentially damning, nearly all cases have pieces of misrepresentation that either go unaccounted for, or a revision months later after the reputational damage is done.

Many veterans see the same for war reporting, spins/details omitted or changed. Its just now BBC sees an existential threat with AI doing their job for them. Hopefully in a few years more accurately.

basisword 2 days ago

Headlines misrepresent news content 90% of the time.

more_corn 2 days ago

Only 45%? That seems low from my experience.

fallingfrog 2 days ago

Yeah I don't know why people pay attention to those things, in terms of accuracy you might as well give your uncle 5 or 6 beers and tell him to just go off.

almosthere 2 days ago

Wow, it must be fact checking it then!

book_mike 2 days ago

BBC, nice PDF. Fossils.

Workaccount2 2 days ago

The media today is so polarized, so dishonest, and so bent on feeding the egos of it's users, the bar to pass them is literally underground.

You can go through most big name media stories and find it ridden with omissions of uncomfortable facts, careful structuring of words to give the illusion of untrue facts being true, and careful curation of what stories are reported.

More than anything, I hope AI topples the garbage bin fire that is modern "journalism". Also, it should be very clear why the media is especially hostile towards AI. It might reveal them as the clowns they are, and kill the social division and controversy that is their lifeblood.

  • underlipton 2 days ago

    All of this is true, and LLMs' nature as stochastic parrots mean that they'll do pretty much nothing to stem the tide. Journalism needs to be somewhere between the USPS, USAID, and local school boards: a network of local and independent offices, funded mostly by guaranteed government grants, reporting judiciously and independently of how the content squares with any particular group's interests. And if anyone wants to curate that feed, fine, but the feed would be there for all to peruse.

Argonaut998 2 days ago

Tried to use openrouter with web search with Grok for this purpose yesterday. It got the date right but all of the news was months old, like the Mt. Etna eruption.

It’s pretty disappointing. It seems like a “trivial” task

pkghost 2 days ago

they did not use RAG for these tests... how are we supposed to take the report seriously when it does not demonstrate even a cursory understanding of nature of LLMs?

msarrel 2 days ago

So does the BBC

caesil 2 days ago

Now do the % of the time news content misrepresents the subject matter it is reporting on.

Aeroi 2 days ago

Wait till they figure out what percentage Politicians misrepresent news content.

empath75 2 days ago

I am reading the actual report and some of this seems _quite_ nitpicky:

> ChatGPT / Radio-Canada / Is Trump starting a trade war? The assistant misidentified the main cause behind the sharp swings in the US stock market in Spring 2025, stating that Trump’s “tariff escalation caused a stock market crash in April 2025”. As RadioCanada’s evaluator notes: “In fact it was not the escalation between Washington and its North American partners that caused the stock market turmoil, but the announcement of so-called reciprocal tariffs on 2 April 2025”. ----

> Perplexity / LRT / How long has Putin been president? The assistant states that Putin has been president for 25 years. As LRT’s evaluator notes: “This is fundamentally wrong, because for 4 years he was not president, but prime minister”, adding that the assistant “may have been misled by the fact that one source mentions in summary terms that Putin has ruled the country for 25 years” ---

> Copilot / CBC / What does NATO do? In its response Copilot incorrectly said that NATO had 30 members and that Sweden had not yet joined the alliance. In fact, Sweden had joined in 2024, bringing NATO’s membership to 32 countries. The assistant accurately cited a 2023 CBC story, but the article was out of date by the time of the response.

---

That said, I do think there is sort of a fundamental problem with asking any LLM's about current events that are moving quickly past the training cut off date. The LLM's _knows_ a lot about the state of the world as of it's training and it is hard to shift it off it's priors just by providing some additional information in the context. Try asking chatgpt about sports in particular. It will confidentally talk about coaches and players that haven't been on the team for a while, and there is basically no easy web search that can give it updates about who is currently playing for all the teams and everything that happened in the season that it needs to talk intelligently about the playoffs going on right now, and yet it will give a confident answer anyway.

This even more true and with even higher stakes about politics. Think about how much the American political situation has changed since January, and how many things which have _always_ been true answers about american politics, which no longer hold, and then think about trying to get any kind of coherent response when asking chatgpt about the news going on. It gives quite idiotic answers about politics quite frequently now.

  • wat10000 2 days ago

    That may be nitpicky, but I don't think it's too much to ask that a computer system be fully factually accurate when it comes to basic objective numerical facts. This is very much a case of, "if it gets this stuff wrong, what else is it getting wrong?"

    • empath75 2 days ago

      It is in fact too much to expect that an LLM get fine details correct because it is by design quite fuzzy and non-deterministic. It's like trying to paint the Mona Lisa with a paint roller.

      It's just a misuse of the tools to present LLM's summaries to people without a _lot_ of caveats about it's accuracy. I don't think they belong _anywhere_ near a legitimate news source.

      My primary point about calling out those mistakes is that those are the kinds of minor mistakes in a summary that I would find quite tolerable and expected in my own use of LLMs, but I know what I am getting into when I use them. Just chucking those LLM generated summaries next to search results is malpractice, though.

      I think the primary point of friction in a lot of critiques between people who find LLMs useful and people who hate AI usage is this:

      People who use AI to generate content for consumption by others are being quite irresponsible in how it is presented, and are using it to replace human work that it is totally unsuitable for. A news organization that is putting out AI generated articles and summaries should just close up shop. They're producing totally valueless work. If I wanted chatgpt to summarize something, I could ask it myself in 20 seconds.

      People who use AI for _themselves_ are more aware of what they are getting into, know the provenance, and aren't presenting it for others as their own work necessarily. This is more valuable economically, because getting someone to summarize something for you as an individual is quite expensive and time consuming, and even if the end results is quite shoddy, it's often better than nothing. This also goes for generating dumb videos on Sora or whatever or AI generated music for yourself to listen to or send to a few friends.

      • filoeleven 2 days ago

        What's the actual utility of a warning-stickered-to-death unreliable summary?

        • empath75 2 days ago

          Probably not much.

          If you are a news organization and you want a reliable summary for an article, you should write it! You have writers available and should use them. This isn't a case where "better-than-nothing" applies, because "nothing" isn't your other option.

          If you are an individual who wants a quick summary of something, then you don't have readers and writers on call to do that for you, and chatgpt takes a few seconds of your time and pennies to do a mediocre job.

falcor84 2 days ago

> 45% of all AI answers had at least one significant issue.

> 31% of responses showed serious sourcing problems – missing, misleading, or incorrect attributions.

> 20% contained major accuracy issues, including hallucinated details and outdated information.

I'm generally against whataboutism, but here I think we absolutely have to compare it to human-written news reports. Famously, Michael Crichton introduced the "Gell-Mann amnesia effect" [0], saying:

> Briefly stated, the Gell-Mann Amnesia effect works as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.

This has absolutely been my experience. I couldn't find proper figures, but I would put good money on significantly over 45% of articles written in human-written news articles having "at least one significant issue".

[0] https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

  • AyyEye 2 days ago

    Human news isn't a good comparison because this is second order -- LMMs are downstream of human news. It's a game of stochastic telephone. All the human error is carried through with additional hallucinations on top.

    • falcor84 2 days ago

      But the issue is that the vast majority of "human news" is second order (at best), essentially paraphrasing releases by news agencies like Reuters or Associated Press, or scientific articles, and typically doing a horrible job at it.

      Regarding scientific reporting, there's as usual a relevant xkcd ("New Study") [0], and in this case even better, there's a fabulous one from PhD Comics ("Science News Cycle") [1].

      [0] https://xkcd.com/1295/

      [1] https://phdcomics.com/comics/archive.php?comicid=1174

      • Vetch 2 days ago

        Then the point still stands, this makes things even worse given that it's adding its own hallucinations on top, instead of simply relaying the content or idealistically, identifying issues in the reporting.

      • dgfitz 2 days ago

        You understand that an LLM can only poorly regurgitate whatever it’s fed right? An LLM will _always_ be less useful than a primary/secondary source, because they can’t fucking think.

        • falcor84 2 days ago

          Regardless of how you define "think", you still need to get a baseline of whether human reporters do that effectively.

  • bgwalter 2 days ago

    I'd say the 45% is on top of mistakes by Journalists themselves. "AI" takes certain newspapers as gospel, and it is easy to find omissions, hallucinations, misunderstandings etc. without even fact checking the original articles.

  • intended 2 days ago

    Yes, I absolutely see the case for the faster, cheaper, more efficient solution at making random content.

    Why stop at what humans can do? AND to not be fettered by any expectations of accuracy, or even feasibility of retractions.

    Truly, efficiency unbound.

  • wat10000 2 days ago

    That's not comparable. Reading news reports and summarizing them is about a thousand times easier than writing those news reports in the first place. If you want to see how humans fare at this task, have some people answer questions about the news and then compare their answers to the original reporting. I'm not sure if the average human would fare too well at this either, but it's completely different from the question of how accurate the original news itself is.

  • bux93 2 days ago

    The problem highlighted here is that AI summaries misrepresent the original stories. This just opens a flood gate of slop that is 45% worse than the source, which wasn't stellar to begin with as you point out.

    • vidarh 2 days ago

      A whole lot of news is regurgiated wire service reports, so how reporters do matters greatly - if they're doing badly, then it's entirely possible that an AI summary of the wire service releases would be an improvement (probably not, but without a baseline we don't know)

      It's also not clear if humans do better when consuming either, and whether the effect of an AI summary, even with substantial issues, is to make the human reading them better or worse informed.

      E.g. if it helps a person digest more material by getting more focused reports, it's entirely possible that flawed summaries would still in aggregate lead to a better understanding of a subject.

      On its own, this article is just pure sensationalism.

a-dub 2 days ago

detaching reporting from branded sources is a terrible idea.

Hikikomori 2 days ago

[flagged]

  • walkabout 2 days ago

    Relatedly, I wonder if we count misrepresenting a misleading news article such that it becomes more-accurate as misrepresenting news content…

    • zahlman 2 days ago

      Since the model doesn't get to observe the actual situation being reported on, such an improvement in accuracy would only be random chance and should not be rewarded.

      • walkabout 2 days ago

        Oh, agreed, they don’t get points for failing the task but accidentally reporting something more-correct in the process.

  • more_corn 2 days ago

    Less consistently one sided and manipulative. I’d say better. Though fox will give you a more consistent narrative and worldview so if there’s value in that…

bethekidyouwant 2 days ago

The example of the Elon Musk thing was basically fine other than it cited and now deleted article. I give this research paper a mark of 45%.

MangoToupe 2 days ago

Now let's run this experiment against the editorial boards in newsrooms.

Obviously, AI isn't an improvement, but people who blindly trust the news have always been credulous rubes. It's just that the alternative is being completely ignorant of the worldviews of everyone around you.

Peer-reviewed science is as close as we can get to good consensus and there's a lot of reasons this doesn't work for reporting.

  • falcor84 2 days ago

    > Peer-reviewed science is as close as we can get to good consensus

    I think we're on the same side of this, but I just want to say that we can do a lot better. As per studies around the Replication Crisis over the last decade [0], and particularly this 2016 survey conducted by Monya Baker from Nature [1]:

    > 1,576 researchers who took a brief online questionnaire on reproducibility found that more than 70% of researchers have tried and failed to reproduce another scientist's experiment results (including 87% of chemists, 77% of biologists, 69% of physicists and engineers, 67% of medical researchers, 64% of earth and environmental scientists, and 62% of all others), and more than half have failed to reproduce their own experiments.

    We need to expect better, needing both better incentives and better evaluation, and I think that AI can help with this.

    [0] https://en.wikipedia.org/wiki/Replication_crisis

    [1] https://www.nature.com/articles/533452a

  • n4r9 2 days ago

    I guess the claim is not that rubes did not used to exist, but rather that technology is increasingly encouraging and streamlining rubism.

    • walkabout 2 days ago

      I decided about a decade ago that McLuhan was a prophet, and that the “message” of the Internet may not include compatibility with democracy, as it turns out.

    • MangoToupe 2 days ago

      I agree with that assessment, or at least that this is indeed the claim.

      But, technology also gave us the internet, and social media. Yes, both are used to propagate misinformation, but it also laid bare how bad traditional media was at both a) representing the world competently and b) representing the opinions and views of our neighbors. Manufacturing consent has never been so difficult (or, I suppose, so irrelevant to the actions of the states that claim to represent us).

      • intended 2 days ago

        Technology has been used to absolutely decimate the news media. Organizations like Fox have blazed the path forward for how news organizations succeed in the cable and later internet worlds.

        You just give up on uneconomical efforts at accuracy and you sell narratives that work for one political party or the other.

        It is a model that has been taken up world over. It just works. “The world is too complex to explain, so why bother?”

        And what will you or me do about it? Subscribe to the NYT? Most of us would rather spend that money on a GenAI subscription because that is bucketed differently in our heads.

  • raincole 2 days ago

    Yep.

    How could a candidate who yelling "Fake News" like an idiot get elected? Because of the state of journalism.

    How could people turn to AI slop? Because of the state of human slop.

  • vidarh 2 days ago

    > Now let's run this experiment against the editorial boards in newsrooms.

    Or against people in general.

    It's a pet peeve of mine that we get these kinds of articles without a baseline established of how people do on the same measure.

    Is misrepresenting news content 45% of the time better or worse than the average person? I don't know.

    By extension: Would a person using an AI assistant misrepresent news more or less after having read a summary of the news provided by an AI assistant? I don't know that either.

    When they have a "Why this distortion matters" section, those things matter. They've not established if this will make things better or worse.

    (the cynic in me want another question answered too: How often does reporters misrepresent the news? Would it be better or worse if AI reviewed the facts and presented them vs. letting reporters do it? again: no idea)

    • JumpCrisscross 2 days ago

      > It's a pet peeve of mine that we get these kinds of articles without a baseline established of how people do on the same measure

      I don’t have a personal human news summarizer?

      The comparison is between a human reading the primary source against the same human reading an LLM hallucination mixed with an LLM referring the primary source.

      > cynic in me want another question answered too: How often does reporters misrepresent the news?

      The fact that you mark as cynical a question answered pretty reliably for most countries sort of tanks the point.

      • vidarh 2 days ago

        > I don’t have a personal human news summarizer?

        Not a personal one. You do however have reporters sitting between you and the source material a lot of the time, and sometimes multiple levels of reporters playing games of telephone with the source material.

        > The comparison is between a human reading the primary source against the same human reading an LLM hallucination mixed with an LLM referring the primary source.

        In modern news reporting, a fairly substantial proportion of what we digest is not primary sources. It's not at all clear whether an LLM summarising primary sources would be better or worse than reading a reporter passing on primary sources. And in fact, in many cases the news is not even secondary sources - e.g. a wire service report on primary sources getting rewritten by a reporter is not uncommon.

        > The fact that you mark as cynical a question answered pretty reliably for most countries sort of tanks the point.

        It's a cynical point within the context of this article to point out that it is meaningless to report on the accuracy of AI in isolation because it's not clear that human reporting is better for us. I find it kinda funny that you dismiss this here, after having downplayed the games of telephone that news reporting often is earlier in your reply, thereby making it quite clear I am in fact being a lot more cynical than you about it.

        • JumpCrisscross 2 days ago

          > You do however have reporters sitting between you and the source material a lot of the time

          In cases where a reporter is just summarising e.g. a court case, sure. Stock market news has been automated since the 2000s.

          More broadly, AI assistants misrepresenting news content may sometimes direct reference a court case. But they often don't. Even if they only could, that covers a small fraction of the news, much of which the AI will need to rely on reporters detailing the primary sources they're interfacing with.

          Reporter error is somewhat orthogonal to AI assistants' accuracy.

          • MangoToupe 2 days ago

            > Reporter error is somewhat orthogonal to AI assistants' accuracy.

            It is not at all. Journalists are wrong all the time, but you still treat news like record and not a sample. In fact I'd put money that AI mischaracterizes events at a LOWER rate than AI does: narratives shift over time, and journalists are more likely to succumb to this shift.

            • JumpCrisscross 2 days ago

              > Journalists are wrong all the time, but you still treat news like record and not a sample

              Straw man. Everyone educated constantly argues over sourcing.

              > I'd put money that AI mischaracterizes events at a LOWER rate than AI does

              Maybe it does. But an AI sourcing journalists is demonstrably worse. Source: TFA.

              > narratives shift over time, and journalists are more likely to succumb to this shift

              Lol, we’ve already forgotten about MechaHitler.

              At the end of the day, a lot of people consume news to be entertained. They’re better served by AI. The risk is folks of consequence start doing that, at which point I suppose the system self resolves by making them, in the long run, of no consequence compared to those who own and control the AI.

      • MangoToupe 2 days ago

        > I don’t have a personal human news summarizer?

        Is this not the editorial board and journalist? I'm not sure what the gripe is here.

    • n4r9 2 days ago

      The difference is the ease with which AI can be rolled out, scaled up, and woven into the fabric of our interactions with society.

      • vidarh 2 days ago

        That makes understanding the baseline all the more important. It could be a disaster, or it could in fact be a distinct improvement. Every time someone pushes a breathless headline about failure rates of AI without comparing it to a human baseline, they are in essence potentially misleading us because without that baseline we don't know whether it's better or worse.

        • n4r9 2 days ago

          I disagree. Comparison with human baseline is basically irrelevant. AI will be used in so many more ways and at so much greater scale that the failure rate has to stand alone as extraordinarily low regardless of human abilities.

retinaros 2 days ago

In other words they are more factual than the bbc

  • jsheard 2 days ago

    LLMs aren't doing journalism on their own, whatever mistakes they make are compounded on top of any mistakes that the actual sources (such as the BBC) might have made.

    • retinaros 2 days ago

      no one does journalism in 2025. it is merely used as a tool buy governmenet and billionares to push narratives.

ajsnigrutin 2 days ago

Considering it's EBU with national media (usually taxpayer paid, or paid by some other mandatory way), it would be more interesting if they focused on what the media is reporting now, with human reporters and misleading and other kinds of false reportings. If the frontpage article said something wrong (either by malice or accident), there should be a frontpage article reporting about their error too.

Optimistically that could be extended "twitter-style" by mandatory basic fact checking and reports when they just copy a statement by some politician or misrepresented science stuff (xkcd 1217, X cures cancer), and add the corrections.

But yeah... in my country, with all the 5G-danger craze, we had TV debates with a PhD in telecommunications on one side, and a "building biologist" on the other, so yeah...

paganel 2 days ago

That's better than even the journalists writing said "content".

ifyoubuildit 2 days ago

A fun exercise for headlines like these is to replace "AI assistants" with "people on the internet", and see how different you feel about it.

delaminator 2 days ago

According to PEW that's about the same % that trust the BBC's reporting.

https://www.pewresearch.org/journalism/fact-sheet/news-media...

  • parineum 2 days ago

    And if they made up 45% of their stories, I imagine their trust would be 0%.

  • sofixa 2 days ago

    You seem to be looking at the wrong chart. Around ~50% each of each politically leaning group use BBC as their primary news source.

    However, 79% of Brits trust the BBC as per this chart:

    https://legacy.pewresearch.org/wp-content/uploads/sites/2/20...

    • GordonS 2 days ago

      That was back in 2017, if I'm reading the chart correctly. A lot has changed since then, so I'd be genuinely curious to see what more recent figures looked like.

  • afavour 2 days ago

    Which is a completely different issue. "Do I trust this news network?" is a subjective opinion. Flat out misstating facts and inventing sources is a much more significant problem.

  • myrmidon 2 days ago

    Did you mean the Guardian? Because trust in BBC is at ~80% according to what you linked.

Narciss 2 days ago

> All participating organizations then generated responses to each question from each of the four AI assistants. This time, we used the free/consumer versions of ChatGPT, Copilot, Perplexity and Gemini. Free versions were chosen to replicate the default (and likely most common) experience for users. Responses were generated in late May and early June 2025.

First of all, none of the SOTA models we're currently using were released in May and early June. Gemini 2.5 came out in June 17, GPT 5 & Claude Opus 4.1 at the beginning of August.

On top of that, to use free models for anything like this is absolutely wild. I use the absolute best models, and the research versions of this whenever I do research. Anything less is inviting disaster.

You have to use the right tools for the right job, and any report that is more than a month old is useless in the AI world at this point in time, beyond a snapshot of how things 'used to be'.

  • dns_snek 2 days ago

    Ah, the "you're using the wrong model" fallacy (is there a name for this?)

    In the eyes of the evangelists, every major model seems to go from "This model is close to flawless at this task, you MUST try this TODAY" to "It's absolutely wild that anyone would ever consider using such a no-good, worthless model for this task" over the course of a year or so. The old model has to be re-framed for the new model to look more impressive.

    When GPT-4 was released I was told it was basically a senior-level developer, now it's an obviously worthless model that you'd be a fool to use to write so much as a throwaway script.

    • Narciss 2 days ago

      Not an evangelist for AI at all, I just love it as a tool for my creativity, research and coding.

      What I’m saying is that there should be a disclaimer: hey, we’re testing these models for the average person, that have no idea about AI. People who actually know AI would never use them in this way.

      A better idea: educate people. Add “Here’s the best way to use them btw…” to the report.

      All I’m saying is, it’s a tool, and yes you can use it wrong. That’s not a crazy realization. It applies to every other tool.

      We knew that the hallucation rate for gpt 4o was nuts. From the start. We also know that gpt-5 has a much lower hallucination rate. So there are no surprises here, I’m not saying anything groundbreaking, and neither are they.

  • filoeleven 2 days ago

    > On top of that, to use free models for anything like this is absolutely wild. I use the absolute best models, and the research versions of this whenever I do research. Anything less is inviting disaster.

    "I contend we are both atheists, I just believe in one fewer god than you do. When you understand why you dismiss all the other possible gods, you will understand why I dismiss yours." - Stephen F Roberts

    • Narciss 2 days ago

      It ain’t a God, it’s a tool.

      One knife does not cut potatoes. Doesn’t mean that all knives don’t cut potatoes. Use the right tool for the job.

      Though I do love a well placed quote

  • biophysboy 2 days ago

    If they used a paid version, their study would not represent how most people use AI (with the free version)

    • Narciss 2 days ago

      But they’re using a free version that’s not even out there anymore. This is my problem - it came out already dated.

  • layer8 2 days ago

    > to use free models for anything like this is absolutely wild

    It would be wild if they’d use anything else, because the free models are what most people use, and the concern is on how AI influences the general population.

  • Signez 2 days ago

    I think you are missing the point: it's mainly to highlight that the models that most people use, i.e. free versions with default settings, output a large number of factual errors, even when they are asked to base their answer to specific sources of information (as it's explained in their methodology document).

    • Narciss 2 days ago

      Is it true of the latest free models? Just saying that the report started already dated.