Ask HN: What will stop AI generated content from flooding the internet?

36 points by lemonlym 3 years ago

Recently got into a discussion with a friend about how AI generated content allows for the mass production of unique lengthy content that is being and will be used to disseminate misinformation and advertisements.

My question is what protections are in place to stop people from using a mix of SEO and AI generation to flood the internet and our search results with machine produced content promoting some agenda.

Basically, what protections/strategies are being used to avoid our search results from being dominated by non-human written text.

dzdt 3 years ago

Short answer: nothing. Just as the first atmospheric nuclear tests mark the end of the era of natural C14 dating, the introduction of GPT and other advanced AI language models marks the end of being able to assume free-flowing natural language text has a human origin. Thus begins the age of Artificial Inanity. (See Neil Stephenson's Anathem for the origin of that phrase and an interesting take on the effect.)

Longer answer: there may be a period of some months still before models approaching the complexity of GPT3 are available at scale to truly black-hat actors like the worst of SEO firms. Training such a beast is still an expensive endeavor and so there is some gatekeeping yet.

midoridensha 3 years ago

Between AI and nuclear weapons, we're rapidly approaching our Great Fermi Filter.

RareBean 3 years ago

Will we be searching the internet for content that an AI can write?

For example: If I can ask an AI to make me a cake recipe, find an error in my code or tell me about the Hindenburg disaster, and the response is faster, more accurate and easier to read than what I’d typically find on Google, then maybe I simply won’t use Google for that anymore (especially since with the AI, I can ask follow-up questions...)

So maybe bad actors using AI to generate their content will just be yelling into the void. (Until their content gets picked up and incorporated by my AI in its next training session.)

wnkrshm 3 years ago

Maybe the APIs for generating that AI content will include paragraphs to advertise their content free-form in the prose and not neatly separated to where you can block the ad.
The tech allows for seamless integration of ads.
sho_hn 3 years ago

I really like this take. Maybe the mid-term future direction for search engines is finding and surfacing novel thought past the reach of the AI models.
Probably implemented by asking a model if the content is novel to the model ...

keiferski 3 years ago

I think you will see more of a focus on the creator's individual personality, livestreams, etc. and a shift away from anonymous, generic sources of information. Individuals will become authority sources rather than search engines; e.g. "I listen to Joe's podcast daily and trust his recommendations on products," rather than "I am going to type what I need in Google and then look at the results in order to make a decision." It will be a long, long time until AI can convincingly generate the quirkiness and unpredictability of a person doing a livestream video.

This seems bizarre and futuristic, but in a way, it's actually just a return to pre-technological social norms, except with the benefit of broadcasting to a wider audience.

downboots 3 years ago

So, trust
- keiferski 3 years ago
  
  More specifically, Weber’s concept of charismatic authority.

tsol 3 years ago

I think the place you will be able to escape this will be curated communities. As it becomes harder to tell a human and a bot apart, people will value the knowledge they're talking to a real person more. I think this opens up a place for smaller by-members-invite communities.

thr0wawayf00 3 years ago

But how do you know that you're talking to a human? This question will become much, much more difficult to answer pretty soon.
These models are capable of generating high quality discussion on a variety of topics. I think the days of human-only text-based forums are numbered, especially as AI model trainers look for high-quality content to train their systems with.
- sho_hn 3 years ago
  
  > But how do you know that you're talking to a human? This question will become much, much more difficult to answer pretty soon.
  We'll probably come up with a verification and id scheme. It'll be difficult not to make it spoofable of course, and will probably necessarily be rooted in physical f2f and secret information.
  Honestly, what you should be more concerned with:
  (1) All of us in these convos tend to assume that AI participation will be very widespread, but for that to happen, someone will need to foot the bill for running the compute. That means they will need to find a business model that will afford it, and those business models might be pretty terrible in the short term.
  (2) If we head toward pervasive id verification in order to be able to trust, what do people do who require anonymity to evade harm by other humans? How can we open an avenue where we can we still have whistleblowers?
  
  thr0wawayf00 3 years ago
  
  > That means they will need to find a business model that will afford it, and those business models might be pretty terrible in the short term.
  I think AI is going to enable spammers and scammers in ways that we haven't seen before. Once spammers can fully integrate these chatbots into their email systems, it's going to put pressure on email providers to find new ways to detect spam. Thinking in the purely cynical, nefarious case, scammers are going to make fortune robbing people with this stuff.
  Maybe this is what will force us to acknowledge and address that using the internet for social interaction isn't scalable from a content moderation standpoint.

slater 3 years ago

Boredom. People will get bored of AI-generated content.

And/or... maybe a new industry of "Certified Human-made Content®" will pop up? :)

lemonlym 3 years ago

As the content becomes even more indistinguishable from human written content, wouldn't it be valuable to bad actors who wish to perpetrate heaps of unique writing representing their perspective? Not trying to fear monger (although I may sound like it) but while I agree 99.9% of people will stop posting "AI wrote this blog post" etc blogs, there are bound to be parties that begin mass producing content and infiltrating the non tech savvy person's timeline?
- 32gbsd 3 years ago
  
  Sounds like fear mongering. Its not something you need to worry about because the internet is already filled with paid influencers parroting the same info.

ggm 3 years ago

We are told by news sources the PRC attempted to flood the internet with porn spam, to somehow alter the balance regarding news reportage from mainland China. It didn't work. So, unless I misunderstand something, simply having more dreck does not automatically mean you can't see quality content. It may not always work out that way, but the current generation of 'flood the net with crap and see what happens' hasn't interfered with my browsing markedly, with one exception: I increasingly believe my feedback on any forum is 50/50 with a troll seeking karma points, more than people (present company excepted)

vineyardmike 3 years ago

> the current generation of 'flood the net with crap and see what happens' hasn't interfered with my browsing markedly
Wow really? Whenever I search anything now half the sources I click have the same (verbatim) content, just different branding. And SEO optimization has gutted a ton of content and replaced it with trash.
> I increasingly believe my feedback on any forum is 50/50 with a troll seeking karma points
This is the last bastion of human speech left in the internet already and already we see HN posts they’re just GPT.
lemonlym 3 years ago

Interesting case and I appreciate you raising the point. I definitely see what you mean, however, I could see the quality going from trying to spam a post with irrelevant content to spamming with relevant but incorrect content. Again though, you're right that many current systems do seem to work to a reasonable extent.
- ggm 3 years ago
  
  I'd go to what "karma" is providing. An indication at some level of inherent trust in what people reflect on what they see. The problem would of course be that if the bots start voting karma, then karma will inherently decline as a source of value.
  I probably go to verified identity. Thats going to upset a lot of people who want pseudonymous posting to be underpinned by substantive anonymity behind it.
  I think in the case of ChatBPT and GPT, I want the AI to have "Hologram" H tattoo on their foreheads for some time yet.
melagonster 3 years ago

you need remove good search engine from market, too.

mikewarot 3 years ago

GPT and its kin are just doing the same thing we had low wage people doing in the past, but with a slightly better quality output in some cases, and possibly lower cost per Megabyte/Text.

The way around this is curation, and an end to monodimensional rankings of postings. In the future, I see the ability to add a number of ratings to a given post and the person who sent it, to allow communities to curate content, instead of a simple OK/Ban approach.

You know how YouTube puts "fighting disinfo" type stuff under some videos? Imagine if instead of some star-chamber buried in the bowels of a corporation, you could choose your own ratings/content verification providers.

terrycody 3 years ago

It already is. You can't stop them, also Google likely won't stop it, the director of web search even said AI content can be useful, lol.

You can literally find lots of spammy AI sites in 5 minutes, search quality is worse than before after GPT-3 born.

seanwilson 3 years ago

Don't standard tools like PageRank help here where new articles will need to get links from established sites to be ranked? Search engines already have to deal with https://en.wikipedia.org/wiki/Article_spinning, duplicate/near-duplicate content, low-quality content and black hat SEO tricks so what's the difference here?

lemonlym 3 years ago

I suppose the difference here is that the content will look different unique enough to fool these tools. Bots copy and paste stuff all the time, but now it seems it would be easier to fake an a surge of unique human responses.
- seanwilson 3 years ago
  
  What high-quality websites are going to be linking to this content to make it rank though?

ok_dad 3 years ago

I think we’ll go back to things like webrings and personal blogs and stuff, once it gets really bad. Some people may not care that anything is human made someday. Perhaps AI will do all our reading and writing eventually. Things change, though, and it’s not always bad. Used to be that no human wrote text, everything was oral. Maybe our technology can help us surpass the written word while still maintaining its benefits, someday.

xlii 3 years ago

A passing thought: language.

GPT is built on English which is international and I’d bet it has the most content to train on available.

Maybe at some point Chinese will be added but as far as I know Chinese isn’t as homogenized, and depending on fragmentation might net be viable either.

Being in the country were Siri isn’t yet available I don’t expect AI generators to be available anytime soon, if ever (due to different language complexity, not sure if that matters though).

under-Peter 3 years ago

Fun fact: I randomly asked chatgpt to repeat an answer in German yesterday and tomy surprise it just did! Then i continued the conversation (I asked it to be a pretend DM for my first, guided D&D experience) in German and it just worked about as well as in English. The same was true when I switched to French during the ‘conversation’.
I absolutely didn’t expect that because I’ve never seen it in the hyped up examples here and on twitter.
Of course your point still stands with smaller languages and/or markets.
- xlii 3 years ago
  
  Hadn’t had the chance to try it but it’d be interesting to know what languages were used and for languages with regional variety (e.g. Canadian/French French) can it distinguish between them.
- danielscrubs 3 years ago
  
  Because of the germanic roots. GPT2 could learn Swedish very quickly too.
  Not that it’s difficult with other languages, just that it’s not the best example.

thiht 3 years ago

I'm really scared for the future of the internet.

I believe the answer will be some form of certified (maybe not officially) content creators. We'll need a Google alternative indexing only by whitelist instead of crawling the whole internet, and giving better ranking to articles and content produced by domain experts. "Who" wrote it will become a lot more important.

soheil 3 years ago

While AI technology has made great strides in recent years, it is still not capable of producing content that is indistinguishable from human-generated content. This means that AI-generated content is likely to be easily identifiable and may be less credible or trustworthy than content produced by humans. As a result, it may not gain as much traction or attention online.

ancientworldnow 3 years ago

This reads like a poor quality AI response already.
sockaddr 3 years ago

For now.
But they are probably including future improvements in their question about the future so I’m not sure why you think this is worth saying. I’m actually legitimately wondering if your comment is an output of ChatGPT.
[Cue the Spider-Man pointing meme]

jasfi 3 years ago

It could be that the only way to stop it is regulation. This could be helpful for both people and AI itself. The reason is that if AI trains on AI generated content unknowingly the results could seriously degrade over time.

It could be as simple as adding an HTML tag recognizable by both people and machines to any pages with such generated content.

badrabbit 3 years ago

Flood? Nothing but imo much of it will be noise. For images for example: stock photos, wallpapers, porn,etc... sure. But stuff that has meaning to people like charts, profile photos, descriptive images (think the picture of a monent in wikipedia) or even paid porn will be humans because meat people like meat people.

melagonster 3 years ago

this is easy to generate deepfake photo then insert it to Wikipedia.
- badrabbit 3 years ago
  
  Yeah but there are incentives there enough for humans to fight/counter it.

wnkrshm 3 years ago

Not just that, the technology allows for the seamless integration of ads into content that you'd like to read / look at.

Product-placement (Edit: even elegant and subtle product placement) can be automated now.

powerapple 3 years ago

This is a very interesting question. We need to change the way we consume information. Even without AI, we are having a problem with exploding information with low SNR. Maybe we don't have a browser any more, information is consumed by a model and we only receive processed information. Currently we try to provide as much as information for a topic, hopefully we can limit the effect of misinformation and bias on a topic. We are not reducing the content, we are providing more content. There is a limit how much information we can process, it is already time for processed information.

daveevad 3 years ago

Is there perhaps some automatic feedback loop where AI trained on AI content will manifest as obvious uncanny valley?

6gvONxR4sf7o 3 years ago

What protections are in place for non-automated versions of the same? It’s already everywhere.

quickthrower2 3 years ago

Facts will be the new gold. If the AI is just sucking up content then someone has to write the OG “x celebrity died” or “Trump wins election” etc in the first place. Google could see who first reports facts and genuine analysis.

Obviously you would need a way to verify trustworthyness. Search might become a lot more curated and require people to think. Maybe Google will employ a million researchers and have to spend all it’s money defending search instead of all the other stuff.

Give a bunch of facts you will then have “write this story with these facts in a left-leaning irreverent but caring voice” etc. for various personas. Newspapers with opposing views might merge into one corporation!

Biganon 3 years ago

dang could maybe delete the bazillion "I used GPT to answer your question" comments, I would be happier already

bbstats 3 years ago

It already has

gardenhedge 3 years ago

Here's what chatgpt says for your question:

there are several protections and strategies in place to prevent AI-generated content from dominating search results and being used to spread misinformation and advertisements. These include content moderation, fact-checking, user feedback, and transparency.

roland35 3 years ago

I asked ChatGPT to respond to your response:
I agree with this statement. There are indeed several protections and strategies in place to prevent AI-generated content from dominating search results and being used to spread misinformation and advertisements. These measures, such as content moderation, fact-checking, user feedback, and transparency, can help to ensure that search results are accurate and reliable, and that users are not exposed to misleading or harmful content.
However, it is also important to recognize that these protections and strategies are not foolproof and that there is still potential for AI-generated content to be used for nefarious purposes. It is therefore important for users to be aware of the potential risks associated with AI-generated content, and to use critical thinking and fact-checking to evaluate the information they encounter online. Additionally, it is crucial for companies and organizations to continue to invest in and improve these protections and strategies in order to mitigate the risks associated with AI-generated content.

Hani1337 3 years ago

they don't need ai, they got the mainstream media.

verdenti 3 years ago

We will revert to direct video and phone calls? Dare I say we will return to doing face to face interactions?