Meta is inviting researchers to pick apart the flaws in its version of GPT-3

326 points by mgl 3 years ago

jimsmart 3 years ago

Link to original blog post by Meta (3 May 2022)

https://ai.facebook.com/blog/democratizing-access-to-large-s...

nharada 3 years ago

The logbook is awesome: https://github.com/facebookresearch/metaseq/blob/main/projec...

This is the true secret sauce -- all the tricks on how to get these things to train properly that aren't really published.

dang 3 years ago

Related:
100 Pages of raw notes released with the language model OPT-175 - https://news.ycombinator.com/item?id=31260665 - May 2022 (26 comments)
axg11 3 years ago

It really is great that Meta released the notes/logbook. Credit where credit is due. Very few other academic or industry labs release materials like this, especially when the reality is so messy.
Some interesting takeaways:
- Meta aren't using any software for scientific logbooks, just prepending a document
- So many hardware/cluster issues.
- Hot-swapping algorithms is common and likely underreported (in this case activation functions and optimization method)
- A well resourced team didn't solve enough issues to fully utilize compute resources until >50% of the total time into the project
- sanxiyn 3 years ago
  
  If you are interested in hot-swapping, I highly recommend OpenAI Five paper: https://arxiv.org/abs/1912.06680. It includes 10 months log of how they adapted the model to releases of Dota 2 version 7.19, 7.20, and 7.21.
  I agree hot-swapping is underreported, but there are some good existing reports. In fact, OpenAI Five paper today is probably more valuable for its details on hot-swapping than its details on main model which used LSTM and not transformer.
  
  hoseja 3 years ago
  
  Weird they felt the need to update the version since they used a very limited subset of Dota anyway.
- domenicrosati 3 years ago
  
  I wonder what software would be good for a logbook like this... I just use google docs for these kinds of things. Sure wandb and jupyter notebooks are good but they are not so good for notes and ideas and documentation
  
  Gigachad 3 years ago
  
  Sometimes the generic solution is just the best. No one requires special training on Google Docs and it just comes with handy features like version control and live updates.
  
  EricLeer 3 years ago
  
  Personally I use notion for this. Not dedicated software, but the extra options in formatting/structuring and linking can make a document like this a lot more readible.
  
  semi-extrinsic 3 years ago
  
  We tried jupyter notebooks for this kind of thing once, and after the equivalent of ~50 pages it becomes unusably slow in my experience.
  
  asadlionpk 3 years ago
  
  LogSeq? I use it for this kinda work...
sp527 3 years ago

This really does read like 'DevOps: Nightmare Edition'
> CSP fat fingered and deleted our entire cluster when trying to replenish our buffer nodes
Yikes
zubspace 3 years ago

I have no knowledge of such things, but it seems they run Cuda jobs on about 150 nodes?
But why do they have so many problems to keep this cluster stable? Network failures? Bad GPU's? Bad drivers? Bad software?
Running fixmycloud and going after all those cryptic errors every day seems like a nightmare to me...
- semi-extrinsic 3 years ago
  
  Seems kind of par for the course for an HPC cluster, no?
  It makes sense to think of these things like Formula 1 cars, they are trying to eke out the absolute maximum performance, and reliability suffers because of that.
  "Ordinary" cloud is more like a Toyota where you optimize for fuel economy and low maintenance.
amelius 3 years ago

This all makes me wonder: how reproducible is the final output model?
- screye 3 years ago
  
  Not this one, but Google's PaLM (which is 4x OPT3) trains semi-deterministically.
  These kinds of large transformers can be relatively reproduceable in results and benchmarks. However, making them converge to the exact same parameter set might not be a reasonable expectation.
- ur-whale 3 years ago
  
  > how reproducible is the final output model?
  Not much, but it also depends on what you mean by "reproducible".
  Do you expect a similar internal representation (weights) or a similar behavior?
- dekhn 3 years ago
  
  the details of that specific model they ended up with? Irreproducible, unless the system was carefully designed and every detail required to do a fully reproducible computation was recorded and replayed. But they could easily produce a bunch of models that all sort of end up in roughly the same place and perform the same, ideally reducing the number of things they needed to change ad-hoc during the training.
gxqoz 3 years ago

Any particular highlights from this?
- fny 3 years ago
  
  Honestly, no.

mgraczyk 3 years ago

Really concerning to me that people find the luddite argument so persuasive and that it gets so much play in the press. The crux of the argument from the outside "ethicists" quoted in the article is something like.

"This new piece of technology might be dangerous and we don't fully understand it, so we should not poke at it or allow people to study it."

Maybe there's just something about my personality that is deeply at odds with this sentiment, but it's also about the lack of testable predictions coming from people like this. Their position could be taken about literally anything with the same logical justification. It's a political and emotional stance masquerading as a technical or scientific process.

trention 3 years ago

Here is one prediction: Because of language models, the amount of fake news online will increase by an order of magnitude (at least) before this decade ends. And there is no interpretation of that development as anything else but a net negative.
Another more broad prediction: In a decade, the overall influence of language models on our society will be universally seen as a net negative.
- notahacker 3 years ago
  
  Cost per word of fake news is already very low though, and humans are much better at tailoring it to the appropriate audience and agenda (and not just restating stuff that's already out there that might actually be true)
  GPT type models are much better suited to low effort blogspam, and whilst that's not a good thing, they produce better blogspam than existing blogspamming techniques. I think we underestimate how bad the Internet already is, and at worst text generated by AI is simply going to reflect that.
  
  NoMAD76 3 years ago
  
  It's about being 1st to publish it. It is mainly use during live press conferences. No human can snap a photo (at least), write a short update in a dedicated article, and so on... all in 1s.
  Been there (as an independent press member years ago), simply you cannot beat that.
  
  notahacker 3 years ago
  
  First to publish matters, but GPT-3 is neither necessary nor sufficient to achieve that. If you're producing fake news related to a press conference, speed of content generation is entirely unimportant because you don't have to wait for the press conference to start to write the fake article/update/tweet. If you care about fidelity to the message of the press conference, I don't see many situations in which a human who has anticipated the likely message(s) of the conference and therefore has pre-drafted paragraphs about "striking a conciliatory tone" and "confirmed the reversal of the policy" ready to select and combine with a choice quote or two as soon as they're uttered isn't significantly better (and as quick as) GPT-type models prompted by the speech-to-text of the press conference. Sure, more reliable publications will want to stick more validation and at least wait for the conference to finish before summarising its message steps in the way, but those apply to bots as well as journalists (or not, for the publications that prioritise being first over being right).
  
  NoMAD76 3 years ago
  
  You have a solid point, but I wasn't talking abut summarizing or excerpt from a press release (those are anyway handed as press-kits before with all NDA agreements and so on).
  Real human journalists have a delay of about 1m before making a short tweet. Funny (or not), something similar was in the "live update" article page in less than 10s. Including photo(s). I was on quite a lot of tech-conferences/live events and earned a decent living then as an independent tech journalist (but then I got bored and really it was a 1-man-show).
  Another personal observation (from field), that was not happening prior to 2010-2012, the years we all got Siri, Cortana..
  You can make the dots and dashes.
  
  tqi 3 years ago
  
  Personally, I've never understood why first to publish matters? As far as I can tell, the only people who care are other journalists, who seem to think that any story about a breaking news item MUST credit the person who wrote about it first (see: ESPN's NBA and NFL "insiders").
  
  redtexture 3 years ago
  
  The unpersuasive argument is "you (second-to-publish-person) copied my stupendous idea to get your derivative result".
  
  tonypace 3 years ago
  
  If people are looking for news on the topic, they will FB or tweet you first. That has a snowball effect.
- brian_cloutier 3 years ago
  
  What is the issue with fake news?
  Language models can now pass as human in many situations but there are already billions of humans capable of writing fake news, this isn't a new capability.
  We have already created mechanisms for deciding which voices to trust and no matter how good language models get they will not be able to prevent you from visiting economist.com
  
  Viliam1234 3 years ago
  
  > there are already billions of humans capable of writing fake news
  You have to pay them, and most of them are not very good at writing. Even with a big budget, you get a limited number of good articles per day.
  If you can make writing fake news 100x cheaper, and then just throw everything at social networks and let people sort out the most viral stuff, that can change the game.
  Also, computers can be faster. If something new happens today and hundred articles are written about it, a computer can quickly process them and generate hundred more articles on the same topic, than a group of humans would. (Many humans can do the writing in parallel, but each of them needs to read individually the things they want to react to.)
  
  nomel 3 years ago
  
  > that can change the game.
  I don't think people, organizations, and government, pushing false narratives, is some new game. I think it's a game that people are dangerously unaware that they're already playing. Destroying the trust of content on the internet, resulting in having people be more diligent about what they believe, is, I think, almost certainly a net positive.
  But, as a counter argument to myself, people are lazy and will, instead, just go to a news source that they "trust", and listen without ever questioning.
  Either way, I don't see the fake news, itself, as being anything but a fleeting problem for society. The problem will continue to be people, organizations, and governments taking advantage of laziness.
  For an example, imagine Dall-E 2 was released for political use. For a few weeks, we would have a flood of fakes of every politician doing every imaginable act. Society would very quickly believe none, rather than believing all.
  
  AlexCoventry 3 years ago
  
  > Destroying the trust of content on the internet, resulting in having people be more diligent about what they believe, is, I think, almost certainly a net positive.
  That didn't seem to be how it worked with the election-theft narrative. People kept believing garbage from discredited sources.
  
  tsimionescu 3 years ago
  
  Yes, because the election-theft narrative was coming from sources they initially trusted (the President of the United States). That is not the same thing as trusting the hundreds of fake news articles that GPT-3 will be able to generate a second.
  
  MichaelZuo 3 years ago
  
  Agree, this is the same as if IEEE suddenly announced a breakthrough in cold fusion. A large chunk of even the highly educated and technical audience would tend to believe it, or at least give it the benefit of the doubt, simply because of the cachet and reputation IEEE itself has built up. At least until those are exhausted.
  GPT-3 can't create a second IEEE so it's not an issue to be worried about.
  
  macintux 3 years ago
  
  > Society would very quickly believe none, rather than believing all.
  Authoritarians benefit from a society-wide lack of trust in there being any sort of consensus/objective truth. When people don't know what to believe, they either turn to conspiracy theories that claim to offer a peek behind the curtain, or more likely just tune out and hope that their favorite strongman can simplify the chaos.
  There's been speculation on Twitter that the conflicting narratives from Russia itself during the invasion aren't really a political problem, since confusion makes it more difficult for the public to rally behind a counter-narrative.
  
  brian_cloutier 3 years ago
  
  We both agree that language models allow for the quick creation of large amounts of content.
  There already exists far more content than anybody can read. We have developed mechanisms for filtering out content which isn't worth our time, and language models don't have any special ability to force you to read their creations.
  And even if content can be placed somewhere you will read it: no matter what I put into this box you will not suddenly be susceptible to believing the unbelievable no matter whether I wrote it or whether a language model wrote it.
  
  tonypace 3 years ago
  
  Is it unbelievable that pressing a bar will bring you long term happiness? Surely it is, but primates will press the heroin bar. This has happened before with humans as well. You cannot simply dismiss the effectiveness of well - targeted propaganda.
  
  Vetch 3 years ago
  
  The vast majority of generators of fake news today are from Content Mills, Copy Paste Mills and general SEO spammers. Political misinformation is the other big generator. The economics of it and not "ethical" gate keeping is what will affect their output. Realistically, normal people don't have the ability to coordinate hordes of proxied IPs to attack social networks requiring account sign ups and custom feeds.
  The value of exercising critical thinking, checking trusted curated sources, information hygiene, recognizing and avoiding manipulation tactics in news and ads will have to go up. The internet is already almost entirely unreliable without requiring any fancy AI. The listed skills will be necessary regardless any increase of politically manipulative content, advertisements or product misrepresentations.
  
  tonypace 3 years ago
  
  Rich people often try to purchase political influence and respect for their opinions. Historically, this has had very mixed results, but you can't deny that it has sometimes worked. The problem is that this does not have a market mechanism. If the rich person has funds from another source, economics can go out the window, and usually does.
  
  numpad0 3 years ago
  
  I kind of agree - a lot of "fake" news believers seems to be actively seeking for contrarian views purely for the sake of it, with indoctrination as an incentive offered for labor of reading, rather than harm unto themselves. In that sense, the factual accuracy - the "fake" notion, don't seem to be the point, and the volume of text that NN generators enable can be less of an issue.
  
  jmathai 3 years ago
  
  Your argument holds true in theory but does not always work in practice.
  The issue many people have with fake news is that it's a tool that can sway public opinion without any basis on facts. I'm not sure, by your response, if you find that to be problematic or not.
  I think we've recently found that people haven't decided which voices to trust and can be led to believe things placed in front of them. Paired with the ability to spread that information - there is significant impact on society.
  That's the reason some people have issues with fake news, from my experience.
  Also, getting a computer to do something will always scale several orders of magnitude more than having billions of people do it.
  
  tsimionescu 3 years ago
  
  > I think we've recently found that people haven't decided which voices to trust and can be led to believe things placed in front of them. Paired with the ability to spread that information - there is significant impact on society.
  > That's the reason some people have issues with fake news, from my experience.
  From my experience, most people who believe Fake News is a significant problem are people who dislike WHOM some have chosen to trust.
  Few see fake news coming from their preferred media sources, or supporting their preferred narrative, as highly concerning, and instead usually treat it as benign (human errors happen etc etc).
  However, when it's coming from their political adversaries, or from sources they dislike, it suddenly is presented as some huge issue.
  In reality, it is all the same, and people are decently good at filtering it out when it contradicts what they want to believe, and very bad at filtering it out when it agrees with them.
  For an example of fake news from outlets like the NYT, the most egregious recent one has been the dismissal of the Hunter Biden laptop, calling it a Russian hoax, calling it "fake news", listing intelligence agencies (of all people) as proof of this, getting Twitter and Facebook to outright ban the New York Post article on it - when in fact everything in their article was 100% true.
  How many people (especially Dem voters) have taken this story as "dangerous spread of Fake News by the NYT, Washington Post etc"?
  
  antonvs 3 years ago
  
  That's quite the false equivalence. A big part of the issue is quantity. You cherry-picked one example for the NYT, but for Fox News you could find examples on a daily if not hourly basis.
  
  tsimionescu 3 years ago
  
  I'm not claiming all news sources are equally trustworthy - obviously the NYT is a better news source than Fox, and both are better than "my primary school friend of Facebook".
  My point was that we all have blindspots when the news sources we have chosen to trust are feeding us false information. Also, the problem isn't that people don't choose a news source to trust, it is that they choose a bad news source to trust, and this can happen for very different reasons.
  
  texaslonghorn5 3 years ago
  
  I think the issue could be volume (and also that the vast majority of humans aren't actively exercising their ability to write fake news at present). Also that language models might be far more convincing.
- NoMAD76 3 years ago
  
  Fake news and AI generated news are kinda all over the place for a good amount of time. It's faster and cheaper to have AI write news from a press release.
  My prediction is that in the next 10y we will really struggle to determine between fake-people and real-human. There will be an explosion of fake-identities posting more and more human-like.
  But I'm not Nostradamus so I could be very very off here.
  
  shon 3 years ago
  
  You’re probably right but it will be an arms race with ever more sophisticated mitigation techniques being deployed to filter.
  I’d say Neil Stephenson has a pretty good take on what this might look like in his recent book: Fall, where in everyone has a “feed” and those that or more savvy/wealthy have better editors (AI, etc) of their feed.
  
  NoMAD76 3 years ago
  
  It's all about having the right tools, but I wonder how long can we "beat the machine" :)
  
  jerf 3 years ago
  
  I'm not particularly convinced by the Dead Internet Theory [1] as of 2022, in the sense that it is completely true right now. But I am convinced it is building around us, and even now, the correct frame for the question isn't whether it is true, but how true it is. There is too many entities with too much reason to build it for it not to be happening. And the nature of it is such that it doesn't need to be one entity doing it for one unified reason; dozens, hundreds can all be participating & fighting with each other on different levels and the sum total of all that is to put it together that much more quickly.
  You know, the PGP web of trust idea may yet take off all these decades later, not because we need a web of trust to send 100.0000000% safely encrypted messages to each other to protect from governments, but because we need a web of trust just to know who the real humans are.
  [1]: https://forum.agoraroad.com/index.php?threads/dead-internet-...
  
  datadata 3 years ago
  
  Curious to what degree and how you think web of trust idea could help here? Assume you could use it to prove whether an article was signed or not by a real person. I think this would solve the problem of articles being published with a false author attribution. However, it would not work to prevent actual people from publishing AI written articles using their own identity. It would also not (directly) do anything to establish if the facts in an article are correct or not.
  
  jerf 3 years ago
  
  Specifically, a web of trust designed for exactly that assertion: This person is a real human. Signatures here serve to assert what identity something came from.
  There would be some of the usual web of trust problems, e.g., trying to explain to Joe Q. Public that you only sign for people you know, beyond doubt, are human. Preferably in person. Many other problems, too.
  I guess you could say, my thought here isn't that this would solve the problems. The problems at this point are somewhat well known. What has been missing is any benefit significant enough to motivate us to get past those problems. If there is, it's obviously many years away. Wouldn't exactly suggest building a startup around this idea right now, if you get my drift. We still need to go through a phase of the problem getting larger before we even get to the phase where people start to realize this is a problem and start demanding that people online prove they are actually people, and goodness knows "I'm a human" is merely the lowest of low bars itself, not the solution to all trust problems.
  
  agar 3 years ago
  
  I love the idea but hate that I'm so cynical about the likely outcome.
  
  paganel 3 years ago
  
  It depends. On forums like this it would basically take a machine that would pass the Turing test in order not to be seen as an AI in any "meaningful" conversation that it might join (so, not just a comment posted as a reply here and there).
  And even if the powers that be manage to get those future AI bots to post stuff that will very much resemble what we now post in here, it is my belief that the uncanny valley will be, in the end, impossible to pass (in fact that's one of the main motifs of many of Asimov's books when it comes to robots).
- jerf 3 years ago
  
  That has already happened. manimino linked me to this great page on another thread on HN a few days ago: https://cookingflavr.com/should-you-feed-orioles-all-summer/ But consider that a particularly easy to detect version of the problem. Click some of the other links on that site and have a look. I especially suggest you click something on a topic you know nothing about.
  I've been hitting these sites in searches accidentally more and more over the past few months. Goodness help you if you don't realize it's totally fake; some of what I've seen is dangerous, like, bad electrical advice being blithely generated by whichever exact transformer variant is spewing that stuff.
  
  Vetch 3 years ago
  
  There is an economic incentive to detect machine generated output and curate trusted sites since the feedback loop of training models on an unfiltered internet of mostly generated junk output will eventually lead the models into converging on a useless degenerate state.
  
  tonypace 3 years ago
  
  But here it is in our field of view. This model has very limited look ahead. I don't think we can rely on that in the future.
  
  shon 3 years ago
  
  Useless Degenerate State... I think I just found the name of my new band!
- mortenjorck 3 years ago
  
  I keep seeing this prediction, but have yet to see a convincing argument as to how this content is supposed to circumvent existing trust networks.
  News outlets are nothing without a track record. People trust names they recognize. You can spin up as many CMS instances, domains, and social media profiles for fake news as you want, but without a history shared with its core audience, all the language models in the world aren't going to convince anyone but the most credulous when the content is coming from unfamiliar sources.
  
  vintermann 3 years ago
  
  Well, how is it done today? Twitter is instructive. You have mostly-automated accounts that like, retweet and reply to content from a specific perspective. Often it is an incendiary perspective, and usually it's a perspective underrepresented in regular news, because that's more useful for getting people to hit like and follow on the account's post.
  Once the account has gained enough real followers, they can carefully start to push payload content, the content the operator really cares about.
  This drives polarisation. There's this idea that sinister foreign adversaries are trying to "spread chaos", but I don't buy it. I think it's merely a by-product of building an audience. Russia (to pick an example) doesn't care about BLM, anti-BLM, US culture wars or the Assange case. Rather, it cares about exactly the things you'd expect it to care about: sanctions, conflicts it is involved in, allies etc.
  (For that matter, ad-driven media does much the same. Gawker before its demise had really gone all-in on "outrage bait" stories.)
  They're going to start with the most credulous, maybe, but they'll build confidence in the same way as everyone else who's an unknown at start.
  
  ramblenode 3 years ago
  
  Most people now get their news from downstream social media, not from the newsroom. You don't have to sway opinion with fake articles, just a sea of astroturf that supports some opinion and creates the illusion of a crowd, which has traditionally been the main signal for credibility of an idea.
  
  NoMAD76 3 years ago
  
  Our grand-grand-granfathers generation believed in most of the things because, you know "I heard it in the church". Our grand-grandfathers generation believed in most of the things because, you know "I read it on the newspaper". Our grandfathers generation believed in most of the things because, you know "I've heard it on the radio". Our parents generation believed in most of the things because, you know "I've seen it on TV". Our generation believed in most of the things because, you know "I've read/seen/watch it on social media".
  I don't know about the next generations(s), the only thing in common for the above mentions was that they were made by a real-human.
- bufferoverflow 3 years ago
  
  People don't get their news from random AI-generated blogs.
  The actually bad consequence is SEO spam of high quality. You can now generate a hundred articles a minute in any topic.
  
  uni_rule 3 years ago
  
  We are already seeing a lot of fucked up SEO spam rising to the top these days. IMO it might actually start picking at Google's market share because prior to this the product actually seemed infallible to the average layman.
- xmprt 3 years ago
  
  What if that's the push that brings people out into the physical world where they don't have to deal with all this crap online.
  
  carschno 3 years ago
  
  Slightly less optimistic, but perhaps more realistic thought: what if that's the push that makes people validate their (internet) sources? Seems like it might become clear that random web pages are just automatically generated content. If you really want to learn something credible at all, you'll really have to be more specific about your sources than "the internet".
  
  ninkendo 3 years ago
  
  Yeah, that was gonna be my contrarian HN hot-take too. Basically if it becomes really obvious some day that basically all online news publication is noise written by computers, maybe people will stop actually trusting it so much?
- l33t2328 3 years ago
  
  I have already generated fake news-esque things with gpt-3 to send to friends.
  A lot of the outputs look incredibly genuine. We live in interesting times.
- radford-neal 3 years ago
  
  "Because of language models, the amount of fake news online will increase by an order of magnitude (at least) before this decade ends. And there is no interpretation of that development as anything else but a net negative."
  That's not at all clear. You're assuming people will continue to give credence to random stuff they read. But once fake AI-generated content is common, people will surely become less trusting. The end result could easily be that fewer people than before believe fake news is real. Presumably, fewer people will believe real news too, but the result could still be net positive.
  
  joe_the_user 3 years ago
  
  Bogus content has been available in abundance online for a while and people often are credulous of it still.
  But still, you have to assume that the cost of creating fake news is the primary limit to it's appearance in front of people to claim AI will seriously have an impact and that's not at all obvious.
- jstummbillig 3 years ago
  
  > And there is no interpretation of that development as anything else but a net negative.
  Sure there is: An inevitability.
  I am hoping we will increasingly turn attention towards how to handle it (although I am relatively certain that's already going on at rapidly growing scale at fb, google and openai).
  I could see updated legislation doing a lot of heavy lifting – strict rules against automated mass-disinformation – but more action is certainly going to be required.
  
  trention 3 years ago
  
  The fact that something is "an inevitability" (which is definitely not the case here) has nothing to do with whether it's a net negative or not.
  This is like saying "you should classify this as blue" and then a response like "no, it's heavy".
  
  jstummbillig 3 years ago
  
  In actuality, it's more like saying "you can only classify this as blue", being awkwardly wrong with the premise, proceeding to not see reason and then get cute.
- Aeolos 3 years ago
  
  This has already happened, to a large extent.
  Search for most any topic on google, e.g. recipes, and the first two pages will be chock-full of AI-generated copy.
- mgraczyk 3 years ago
  
  Alternative hypothesis for which I have at least as much evidence:
  Because of large language models, detecting Fake News becomes trivial and cheap. Building and doing inference on language models is too expensive for most attackers, so they give up. Only well financed state actors are capable of disseminating fake news, and they are no better at it than they are today because content generation is not a bottleneck.
- AmericanChopper 3 years ago
  
  Hot take: The majority of information consumed by people is compromised of low quality opinions, misrepresentations, and outright lies. It has always been this way, always will be this way, and every single person who clutches at their pearls over this fact also believes things would be improved by having the correct people control access to information.
- textcortex 3 years ago
  
  I would not worry about that too much. There are already models that can predict if its transformer generated or not. At the same time google started penalizing transformer generated text; https://youtu.be/EZWx05O6Hss
  
  hhmc 3 years ago
  
  It doesn't seem like there's any real guarantee that the 'generated text detectors' will outpace the 'generated text generators'
- wruza 3 years ago
  
  Plot twist, people stop relying on news’ blogs’ meanings and create a system of decision making that is another level and actually works for them and not for the most sound speech talkers. You learn to swim by getting wet.
  
  tonypace 3 years ago
  
  In the shadow end, typically. You can easily stand up and breathe.
api 3 years ago

This is the hydrogen bomb of propaganda.
Imagine assigning every single living human being a dedicated 24/7 con artist to follow them around and convince them of something. That's what will soon be possible if not already. It will be intimate con artistry at scale driven by big data, a massive DDOS attack on human cognition and our ability to conduct any form of honest discourse.
What hustlers, trolls, and completely amoral companies will do is bad enough. Now throw in state sponsored intelligence agencies, propaganda farms, militaries, special interest groups, and political parties.
Usually I'm anything but a luddite, but with this I can't help but think of many more evil uses than good ones. It doesn't help that the principal business models of the (consumer) Internet seem to center around surveillance, advertising, propaganda, and addictive forms of entertainment (like slot-machine-like mobile games) designed to suck money or time out of people.
Lesser but also very bad concerns include: the end of useful search engines due to a deluge of continuously learning adversarial SEO spam, the collapse of pretty much any open online forum due to same, and addictive "virtual friend" / "virtual relationship partner" hyper-sophisticated chatbots that hook vulnerable lonely people and then empty their bank accounts in various ways.
I really don't fear AI itself. I fear what human beings will do with AI.
- Konohamaru 3 years ago
  
  > Imagine assigning every single living human being a dedicated 24/7 con artist to follow them around and convince them of something.
  This sounds like traditional Christian teaching of the role of demons.
  
  api 3 years ago
  
  We invented Satan and his minions to make people click ads.
- mgraczyk 3 years ago
  
  > It will be intimate con artistry at scale driven by big data, a massive DDOS attack on human cognition and our ability to conduct any form of honest discourse.
  This is an unfounded fear. For one thing, if the value in doing this is high then it's already cheap enough to be practical. The Chinese govt can pay millions of people to do this to dozens of people each. They basically do this already, for specific topics and issues. LLMs won't significantly move the needle here.
  Second, are you proposing that attempts to stop Facebook from releasing models will somehow slow down or stop the Chinese, US, or Russian governments? What's the goal, to buy us 6 months? I would much rather the technology be out in the open for everyone to research and understand vs accessible only to state actors or huge tech companies.
  
  api 3 years ago
  
  The difference between this and a troll farm is like the difference between machine guns and lines of soldiers manually loading and firing muskets. Yes both can be used to gun down a lot of people, but machine guns are much faster and cheaper. Mechanized warfare is coming to propaganda and con artistry.
  I'm not necessarily arguing for intervention to stop this release or something like that. The cat is out of the bag. There's no stopping it. This is going to happen, so get ready for it.
  Oh, and throw in deepfakes. You'll have automatic con artistry at scale that can incorporate personalized fake audio and video on demand depicting any supporting detail it needs. It'll be like assigning each person a con artist who's also supported by a staff of content producers.
  
  mgraczyk 3 years ago
  
  I guess, but on the flip side there are potentially transformative positive applications that we already know about and have yet to discover. Fundamentally some people are more optimistic and risk-loving when it comes to new technology. I believe the "good" will overwhelmingly outweigh the "bad" that you're pointing out. I think it mostly comes down to personality.
  
  api 3 years ago
  
  I can think of some positive applications. The thing that makes me cynical here is that all the evil applications seem like they're going to be far more profitable in terms of either money or power.
  This would be a continuation of what's happened to the Internet in the last 10-15 years. The Internet is amazing and has tons of incredibly positive uses but all the money is in mass surveillance, addictive "engagement maximizing" stuff, and gambling and scams.
phphphphp 3 years ago

History has shown that humans are terrible judges of the outcomes of our behaviour; your belief that we can confidently understand the risks of anything through experimentation might work in theory but hasn’t been borne out in practice.
Extremists exist at both ends of the spectrum and serve to balance each other out: without people positing the worst-case scenarios, the people positing the best-case scenarios would run full steam ahead without any consideration for what could happen.
Perhaps if the proponents of (various flavours of) AI were doing careful experimentation and iteratively working towards a better understanding, then maybe the loud voices against it would be less valuable, but as we’ve seen through the last 20 years, progress in technology is being made without a second thought for the consequences — and what Facebook are doing here is a bare minimum, so it’s reasonable for proponents to be somewhat cynical about the long term consequence.
jonas21 3 years ago

It's much easier to sit on the sidelines and come up with reasons why you shouldn't do something new than it is to actually do it.
Some people have figured this out and built careers on it. This wouldn't be a problem, except that this opposition eventually becomes their professional identity - they derive prestige from being the person who is fighting against whatever. So even after researchers address their concerns, they have to ignore the progress or move the goalposts so they can keep on opposing it.
- eszaq 3 years ago
  
  This doesn't seem like a bad thing to me. In the same way we have public defenders who professionally defend scoundrels, it seems good to have people who professionally critique new technologies.
  I'm old enough to remember the naive optimism around the internet in the 2000s. "The long tail", "cognitive surplus", "one laptop per child", Creative Commons, the Arab "Spring", breathless Youtube videos about how social media is gonna revolutionize society for the better, etc. Hardly anyone forecasted clickbait, Trump tweets, revenge porn, crypto scams, or social media shaming. If we had a few professional critics who were incentivized to pour cold water on the whole deal, or at least scan the horizon for potential problems, maybe things would've turned out better.
rglover 3 years ago

Read Ted Kaczynski's (yes, that one) "Industrial Society and Its Future" with a neutral mind and you will start to understand why it's compelling.
armchairhacker 3 years ago

My understanding is that when companies say “we aren’t releasing this / limited access / restrictions / for ethical reasons” they really mean “we aren’t releasing this because a) it’s expensive to run these models, b) it was even more expensive to create them and we might be able to profit, and c) maybe it’s bad for our ethics, which affects our funding and relations, and also, ethics.”
adamsmith143 3 years ago

That's not the argument at all. Rather it's that the technology is progressing so fast and could become dangerous far faster than we can make it safe. Therefore it's worth seriously thinking about the risks that scenario poses. Stopping research or further work completely is A potential solution but given the monetary investments involved it's extremely unlikely it will be implemented.
There are lots of very serious people seriously looking at these issues and to dismiss them as simple luddites is frankly insulting.
- mgraczyk 3 years ago
  
  But nobody is failing to take the risks seriously. The people who actually work on these models understand the risks far better than the outside ethicists. I work on ML in research. Reading their work is like listening to a drunk relative at Thanksgiving ranting about Bill Gates putting nanobots in vaccines. It's completely uninformed pseudoscience that comes from a place of strong political bias.
  For example Timnit's "parrots" paper confused training with inference and GPUs with TPUs, making specific quantitative estimates that were off by orders of magnitude. If she had talked to a single person working on large language models, she would have recognized the error. But these people work in a bubble where facts don't matter and identify politics is everything.
  
  trention 3 years ago
  
  There are enough people criticizing both current language models and the overall "quest" towards AGI that come from a non-political (unless you subscribe to an aristotelian everything-is-politics) perspective. I personally don't think any of the companies with significant AI research is actually doing anything meaningful in terms of safety. Also, from their public "utterings", it's quite clear to me that both Altman and Hassabis (not to mention Lecun) don't actually care about safety or consequences.
  
  mgraczyk 3 years ago
  
  > I personally don't think any of the companies with significant AI research is actually doing anything meaningful in terms of safety
  I assume this is just speculation on your part? Do you have any reason to make that claim? I personally know multiple people doing this full time at large tech companies.
  I can give you some examples of serious safety oriented criticism of large language models to contrast with what plays out in the press and amongst "ethicists".
  It's well understood that one can generate so-called "adversarial examples" for image classifiers. These adversarial examples can be chosen so that to a human they look like thing A, but the model classifies them as thing B with high probability. Methods of finding these adversaries are well understood. Methods of preventing them from being problematic are less developed but rapidly advancing.
  For language models, the situation is much worse. I don't know of any effective way to prevent a large language model from being searched for adversarial inputs, trivially. That is, an attacker could find inputs from large curated input spaces that cause the model to output a specific, desired sequence. For example, an attacker with access to the model weights could probably find an innocuous looking input that causes the model to output "kill yourself".
  Is this a risk that AI researchers are aware of? Yes, of course. But the difference between AI researchers and "ethicists" is that AI researchers understand the implications of the risk and will work on mitigations. "Ethicists" do not care about mitigating risk, and they don't care that the people who build the models already understand them and are comfortable with them.
  
  trention 3 years ago
  
  >I personally know multiple people doing this full time at large tech companies.
  The failure mode of internal "ethical" control at private enterprises is well-known and has already played out (at least) once when we tried to regulate medical experiments in the 2 decades after WW2. I personally consider the current AI safety positions to be just blatant whitewashing. The lemoine fiasco is a specifically hilarious case in point combining both a) a person that is utterly incompetent and biased to work at that position and b) total failure of leadership to adequately engage with an issue (or even admit it's possible in principle). At the current point, AI safety is roughly as useful as tobacco lobbying (exaggerated for effect).
  
  adamsmith143 3 years ago
  
  >I can give you some examples of serious safety oriented criticism of large language models to contrast with what plays out in the press and amongst "ethicists".
  To clarify I think the poster above was talking about the AI Alignment/Control Problem and not the specifics failure modes of particular models, LLM, CNNs etc. Very few people at OpenAI or Deepmind for example are seriously engaging with Alignment. Paul Cristiano at least acknowledges the problem but seems to think there will be available solutions in time to avert serious consequences which may or may not be the case. The folks at MIRI certainly don't seem optimistic.
  
  adamsmith143 3 years ago
  
  Well I definitely wasn't talking about people like Timnit but rather researchers like Stuart Russell who actually are at the forefront of the field and discuss AI safety broadly.
humanistbot 3 years ago

> "ethicists"
> It's a political and emotional stance masquerading as a technical or scientific process.
I don't think you understand what ethics is.
- vintermann 3 years ago
  
  I use to say that academic ethics is the study of what you can get away with.
  If you want guidance on being good, you need a saint, not an ethicist.
godmode2019 3 years ago

Its a business play,
They are asking to be regulated because they have finished writing their models.
With regulation it will be harder for up and coming models to gain traction.
Its getting so much coverage because its paid press, I read about it in my newspaper BEFORE tech YouTube and here.
dalbasal 3 years ago

So...
I'm with you on being dispositionally pro-tech ad anti-luddite but....
>> lack of testable predictions coming from people like this
I think this is a disingenuous line of argument, more often than not. Popperian science is great, but it is not everything. The majority of our opinions, knowledge and conclusions are not based on testable hypotheses and falsifiable statements.
Take the sentence "One person should not have absolute power." It's not a falsifiable statement, or scientific in other ways. It's based on anecdote and folk wisdom, not science.
I agree with you. And, I think we need to argue back. But, don't argue meta. Meet the arguments head on.
skybrian 3 years ago

It seems like there is a difference between "let's release it for researchers to investigate" and "let's release it for the general public to use and abuse, including all the script kiddies and malware authors and 4chan trolls and spammer sites out there."
Unfortunately, that difference can only be maintained through some kind of gatekeeping.
I like to try out the new algorithms, but I'm mostly just playing, and I don't see how they make it available to me without letting any random troll use it.
JohnHaugeland 3 years ago

The movie Star Trek 2 has done tremendous damage to peoples' understanding of what a luddite was.
In modern terms, luddite wanted a tax on automation to support loss from tradesworkers who were too old to retrain.
It's similar to when bankers wanted a pension buyout at the introduction of the atm.
Luddites were pro technology, not anti. They just wanted wage distribution of the productivity gains.
They're basically early unions.
guelo 3 years ago

The attitude that I have trouble understanding is "A company spent millions of dollars researching and developing a new technology, they must make it available to me or else they are evil."
- ipaddr 3 years ago
  
  Spent millions on tech that could be a net negative for society. Keeping details secret makes people think they are evil because that's how coverups happen.
amelius 3 years ago

One day, these people will be right!
(And then we know one solution to the Fermi paradox.)
drcode 3 years ago

The remaining great apes remain alive primarily due to our pity for their plight- They were once the highest IQ species in the world, but no more
Probably, we will be in the same situation relatively soon: And there is little reason to expect the AI systems to have the same pity
Sorry I can't set up a double-blind, testable, peer-reviewed study to help convince you of this
dr-detroit 3 years ago

bravura 3 years ago

That access form was... refreshing.

Here's why this matters to me, an independent researcher who wants to start publishing again.

In 2008, Ronan Collobert and Jason Weston had published some work that made neural network training of word vector representations really fast. But only ML people read that paper. Yoshua Bengio and Lev-Arie Ratinov and I plugged old-school cluster based as well as fancy-but-uncool-and-icky neural network word representations into a variety of NLP models. It worked awesome. Before "transformers go brrrrrr" our paper told the NLP community, basically, self-supervised learning and neural networks go "brrrrrrr". People finally started paying attention in the language world, ML stopped being treated with suspicion and the field moved rapidly, our paper racked up 2700 cites and an ACL 10 Year "Test Of Time" award, and here we are.

I don't work in a big research lab but I still publish. I pay for my GPUs the old fashioned way. You know, out of pocket.

It took me ages to get access to GPT-3. Ilya was a colleague of mine, so I messaged him fb, but no dice. Why? I know I could pull other strings through my network but, like, really? Is this where we are right now?

All I'm saying is: It's nice to fill out a form asking my intended use and my previously related publications, as a means of gatekeeping. The access process feels more transparent and principled. Or maybe I'm just being grouchy.

O__________O 3 years ago

Link to the ACL 10 Year "Test Of Time" award paper mentioned above:
Word Representations: A Simple and General Method for Semi-Supervised Learning
https://aclanthology.org/P10-1040/
(PDF link)
https://aclanthology.org/P10-1040.pdf
ninjin 3 years ago

Thank you Turian, your ACL 2010 paper [1] was what got me somewhat early into word representations when it appeared during the first year of my PhD and ultimately gave me a head start on “deep learning thinking” that I suspect helped me greatly on my path to a faculty position many years later. My own lab at the time was not really a hardcore machine learning lab, but we read “Natural Language Processing (Almost) from Scratch” as well. However, it really was your paper that became the gateway to representation learning (and ultimately deep learning) for myself and several other members of the group.
[1]: https://aclanthology.org/P10-1040
I clearly remember Christopher Manning explicitly mentioning your paper at the first ever (to the best of my knowledge) deep learning tutorial at a natural language processing conference (EMNLP 2012), at the very end before we left the room, with something along the lines of: “If there is only one thing you take away from this tutorial: Read Turian’s paper and add word representations as features to your classifiers for whatever task you may have – it works!”
For those less familiar with the history of things, this was prior to Mikolov’s word2vec that arrived in 2013, which apart from popularising neural vector representations (“king - man + woman ~= queen”) above all made fast training of these embeddings feasible – Wikipedia overnight really, compared to the days, weeks, or even months of training for both the cluster-based and neural-based alternatives that were used back then.
Outstanding work and very much ahead of its time. If you are ever around London, my group runs one of the largest public natural language processing talk series in the UK [2] and we would be more than happy to host you if you would be up for example to give a talk on a retrospective on the early days of things up until now.
[2]: https://www.meetup.com/UCL-Natural-Language-Processing-Meetu...

jackblemming 3 years ago

Yandex and Facebook are both more open than OpenAI? And the world isn’t ending because large language models were released? Shocking.

jacooper 3 years ago

OpenAI is basically only open in the name.
- enlyth 3 years ago
  
  I guess "Closed source pay-as-you-go AI" didn't have quite a ring to it
- thesiniot 3 years ago
  
  It reminds me of that time the US Air Force designed an "open source jet engine".
  https://www.wpafb.af.mil/News/Article-Display/Article/201113...
  Their definition of "open source" turned out to be: "the government owns the source IP instead of some defense contractor. No, you can't see it."
  In fairness, I'm impressed that they even got that far. How do you think the defense contractor lobbyists responded to that program?
- dqpb 3 years ago
  
  It’s basic run-of-the-mill gaslighting.
- tiborsaas 3 years ago
  
  Maybe they meant Opening up AI :)
  
  bobkazamakis 3 years ago
  
  OpenWallet
Judgmentality 3 years ago

You dare question the gatekeepers of our future AI overlords?
BbzzbB 3 years ago

What are the odds they (OpenAI) keep their self-imposed returns capped at 100x? Is there a betting pool somewhere?
- hhmc 3 years ago
  
  Returns on what?
  
  BbzzbB 3 years ago
  
  Their investment, based on the first round. That's how they justified flip-flopping away from being an NPO[0].
  If I had to bet, just as they pivoted from non-profit to capped-profit, they'll pivot from capped-profit to regular-for-profit when the time comes. Not that I have something against for-profits, but it all (name & mission) feels hypocritical and plain branding/marketing.
  0: https://openai.com/blog/openai-lp/
  > Returns for our first round of investors are capped at 100x their investment (commensurate with the risks in front of us), and we expect this multiple to be lower for future rounds as we make further progress.
  
  hhmc 3 years ago
  
  I agree that it's pretty meaningless if you can just pivot at the appropriate time. Public benefit corporation feels like the best compromise.

blip54321 3 years ago

On the ethics front:

* Yandex released everything as full open

* Facebook released open with restrictions

* OpenAI is completely non-transparent, and to add insult to injury, is trying to sell my own code back to me.

It seems like OpenAI has outlived its founding purpose, and is now a get-rich-quick scheme.

What I really want is a way to run these on a normal GPU, not one with 200GB of RAM. I'm okay with sloooow execution.

TrinaryWorksToo 3 years ago

Have you looked into HuggingFace Accelerate? People have supposedly been able to make the tradeoff with that. Although you still need to download the huge models.
- leereeves 3 years ago
  
  Can confirm. HuggingFace Accelerate's big model feature[1] has some limits, but it does work. I used it to run a 40GB model on a system with just 20GB of free RAM and a 10GB GPU.
  All I had to do was prepare the weights in the format Accelerate understands, then load the model with Accelerate. After that, all the rest of the model code worked without any changes.
  But it is incredibly slow. A 20 billion parameter model took about a half hour to respond to a prompt and generate 100 tokens. A 175 billion parameter model like Facebook's would probably take hours.
  1: https://huggingface.co/docs/accelerate/big_modeling
- blip54321 3 years ago
  
  Thank you for the pointer. I've been poking at it with a fork for the past few hours, and realized I forgot to respond.
YetAnotherNick 3 years ago

I don't understand why OpenAI has so many restrictions on its API. Isn't things like erotic writing, unlabelled marketing etc. good money for them with minimal chances of litigation? Is it for PR?
- bpodgursky 3 years ago
  
  It's because it was genuinely founded as an organization worried about misaligned AI.
  
  dmix 3 years ago
  
  The critique is that the type of ethics they concern themselves with is borderline moral-panic/Victorian era. Not the Laws of Robotics kind of stuff.
  Maybe it's my personality but I get the impression since AI is rather limited in 2022 that all the paid AI ethicists spending 90% of the time on bullshit problems because there aren't many real threats. And these gets amplified because the news is always looking for a FUD angle with every AI story.
  The priority seems to be protecting random peoples feelings from hypothetical scenarios they invent, when IRL they are releasing research tools on a long-term R&D timeline... GPT-3 isn't a consumer product they are releasing. It's a baby step on a long road to something way bigger. Crippling that progress because of some hyper-sensitivty to people who get offended easily seems ridiculous to me.
  
  mr_toad 3 years ago
  
  > I get the impression since AI is rather limited in 2022 that all the paid AI ethicists spending 90% of the time on bullshit problems because there aren't many real threats. And these gets amplified because the news is always looking for a FUD angle with every AI story.
  I think we’re about due for an AI-ethics winter.
  
  c7DJTLrn 3 years ago
  
  Also, it's pointless. OpenAI might be a leader right now but it won't be forever. It can't control a technology. It's like restricting fire because it can burn down houses... yeah it can, but good look with that, all we need is some friction or flint. As time goes on that flint will become easier to find.
  If OpenAI wants to concern itself with the ethics of machine learning, why not develop tools to fight misuse?
  
  rm_-rf_slash 3 years ago
  
  There are more than enough unaddressed ethics issues in ML/DS from racial bias in criminal sentencing to de-anonymization of weights to keep ethicists busy without needing Skynet.
  
  dmix 3 years ago
  
  Seems like that time would be better spent working for local justice orgs and ACLU than blocking OpenAI/Google from releasing chatbots or image generator because they fear someone might voluntarily type in some wrongthink words into input box and blame them for letting it happen.
chessgecko 3 years ago

That already exists depending on your definition of slow. Just get a big ssd, use it as swap and run the model on cpu.
- leereeves 3 years ago
  
  A comment below said this model uses fp16 (half-precision). If so, it won't easily run on CPU because PyTorch doesn't have good support for fp16 on CPU.
  
  netr0ute 3 years ago
  
  Parent never claimed it was going to be fast.
  
  leereeves 3 years ago
  
  It would probably just fail with an error "[some function] not implemented for 'Half'"
  
  chessgecko 3 years ago
  
  fp16 models inference just fine in fp32, though I was sorta joking in my original comment, it would potentially take weeks for this to run one input. You're better off trying to make something like huggingface accelerate work (like the comment above), which swaps layers of the model on and off the disk
option 3 years ago

On the ethics front Yandex should provide more details on the data they’ve used.
guelo 3 years ago

I don't see giving spammers, marketers and scammers more powerful tools as the ethical stance.
- shon 3 years ago
  
  That’s an understandable view point. However, “Security through obscurity” just doesn’t work. Worse, trying to keep something from people really only punishes/limits the rule followers.
  The bad guys get it anyway so this gives the good guys a chance.
  
  trention 3 years ago
  
  I am curious what is the reasoning behind "giving "good guys" access to language models will {deus ex machina} and thus allow us to prevent the spam and abuse".
  
  leereeves 3 years ago
  
  Automated tools to distinguish AI generated text from human writing and hide the AI spam.
  
  shon 3 years ago
  
  This ^^ + many other mitigation/analytics use cases.
  
  numpad0 3 years ago
  
  Can humans be trained en masse to output less distinguishable text from those of NN?
  
  guelo 3 years ago
  
  There's not much obscurity here. If you have tens of millions of dollars to throw at compute and a bunch of PhDs you could develop similar tech. I don't understand the idea that ethics somehow requires existing private models to be made available to everybody.
  
  shon 3 years ago
  
  Yeah I was responding to a post asking why we should allow open access, given that some of those with access will do bad things.
  I agree with you. Ethics doesn't demand that existing private tech be made available. Who's saying that??
  OpenAI is just catching shade because their initial founding mission was to democratize access to AI tech and they've gone pretty far the other way.
- remram 3 years ago
  
  Almost certainly they are getting it, OpenAI will just get paid for it.
- dqpb 3 years ago
  
  Better take away the internet then
- sarahhudson 3 years ago
  
  They wont, but the cat is out of the bag. It is data, and data gets leaked, shared in the open, shared in the dark. Researchers can be bribed.
  It is like: you can not talk to your kids about drugs and pretend they don't exist ... or you can.

westoncb 3 years ago

This makes for some pretty excellent counter-marketing against OpenAI:

"so Meta's GTP-3 is open?"

"correct"

"and the original is not?"

"correct"

"and the original is made by 'OpenAI'?"

"correct"

"hmm"

Tepix 3 years ago

May 3rd, which is why Yandex's 100B model release is not mentioned.

lee101 3 years ago

Havn't got access to the OPT-175 models yet as they are prioritising researchers, bloom is going to be huge but these OPT models that they did release IMO are not a step forward from research done in GPT-NEO/EleutherAI and AI21. Its a great step forward it being shared/open but the models themselves i don't see having big impact.

They just seem to loop around a lot i don't know why... i think they trained on datasets with duplicate content or lots of repeating characters.

Check out https://text-generator.io which is orders of magnitude cheaper than GPT-3 and still makes creative writing/code autocomplete without too much looping issues that youd see with OPT models.

For that repeating you can dial up the repetition penalty or N to generate more sequences in a single request (and you are only charged by the request not by characters/tokens which helps), often generating N results is much more creative than generating a long result as the generated output gets fed into the input when generating long text and often causes that kind of repetitive looping.

Some tactics to mitigate that repetitiveness are:

* Dont use OPT...

* repetition_penalty/retries/seed

* Generate N results and combine instead of doing one big generate as you dont know how many results you'll really get and they are less creative/likely contain shared info/repetitiveness.

* Creative input prompts

There's probably other creative ways of getting variety by splitting the generation into different calls with higher repetition penalty as you get longer or looping detection etc. Its easy to detect repetition in the structured case like chat but hard when doing longer text generation/creative writing

needle0 3 years ago

Is the word "GPT-3" now being used like a a generic term, simply referring to a large-enough language model? From the article, Meta's OPT model seems to be intentionally designed to match the capabilities of OpenAI's original GPT-3, but doesn't say anywhere that it shares any code lineage.

nodja 3 years ago

That's because the code is unimportant.
GPT-2 code is very simple, I think the official release by openai is only about a couple hundred lines of code. The challenge of GPT-3 is the scale, the GPT-3 paper basically says "This is GPT-2, but we made changes in the model so we can run it on dozens of GPUs". The changes mostly don't matter, because if you had a big enough GPU (~1.5TB VRAM) you could just up the hyperparameters of GPT-2 and you'd end up with the same results.
So what's novel about GPT-3 that warrants a new name? The discovery here is that after the models reach a certain size, it's able to do many tasks without any training. You can literally ask it to translate from one language to another, at it'll do a decent job at it, if you give it a few examples it'll work even better.
Now that doesn't mean that GPT-3 is the final model, it's still not good enough for many tasks, for example copilot is based on GPT-3 but it was specifically fine-tuned for the task of auto-completing code.
So yes, if you can figure out how to scale a GPT-2 model to 175B parameters, you have a GPT-3 clone.

slowhadoken 3 years ago

Is this an advertisement for developers to work on Facebook's AI for free or am I being cynical?

charcircuit 3 years ago

No, it means that researchers now can have access to Facebook's large language model. No one is forcing the researchers to do research using it.
rexpop 3 years ago

Of course it is! That's the premise of every open-source initiative, too. It's not too cynical, it's plain economics. Pretty sure it's the explicit purpose, too.
No one really thinks open-source sponsorships are charity, do they?

4oh9do 3 years ago

What does "inviting" mean? It sounds like it means "Facebook wants free labor instead of paying for formal and expensive security audits".

whoisjuan 3 years ago

Meta/Facebook has given the world React, PyTorch, GraphQL, Jest, and other fantastic technologies, and you are just boiling down their open source efforts to "Facebook wanting free labor."
Not everything in tech is a sinister capitalistic plot. Open Source and Open Research are truly one of the best ways to accelerate technology advancements, in particular software technology advancements.

mrandish 3 years ago

While I don't care at all for Facebook and never go there, this is a solid move by Meta which I appreciate. So far, I've been kind of turned off by OpenAI's apparent attitude when it comes to openness around GPT3, DALL-E, etc.

bribri 3 years ago

So happy more stuff like this is open. Kudos to Meta

vegai_ 3 years ago

Translation: Meta wants researchers to work free for them, just like all of their content creators are doing right now.

That company (and all the other parasitic social media companies like them) needs to be taxed way more heavily than it is being right now, based on the amount of content the people of every nation is generating for them.

annadane 3 years ago

Oh really? Now you invite researchers instead of shutting down legitimate projects to investigate your algorithms?

crabbygrabby 3 years ago

Ah yes please work for free so meta can generate news articles, chat bots, and better targeted advertisements on the internet for billions of dollars a year. Thanks meta...

makz 3 years ago

Beta testing for free?

rdedev 3 years ago

> Meta AI audited OPT to remove some harmful behaviors

How is this usually done in practice ?

option 3 years ago

175B is for research only and as far as I understood their ToU does not allow commercial usage.

Currently, the largest LLM that is both free and commercially usable (Apache 2.0) is 100B YaLM from Yandex (russian’s copy of Google). However, they did not publish any details on their training data.

plegresl 3 years ago

The 176B parameter BLOOM model should be available soon: https://bigscience.notion.site/BLOOM-BigScience-176B-Model-a...
- option 3 years ago
  
  yes, looking forward to it especially because it is going to be multilingual by design
timmg 3 years ago

Dumb question: does 175B parameters mean the number of bytes (or floats?) in the model? Does that also mean you need the whole model in memory to do inference (in practice)?
If so, not many machines have that much RAM. Makes it hard to "play" with.
- lostmsu 3 years ago
  
  float16s or bfloat16s so 2x of that for storage
  You can infer using DeepSpeed.
charcircuit 3 years ago

>However, they did not publish any details on their training data.
Yes they did. It's in the README.
- remram 3 years ago
  
  Direct link: https://github.com/yandex/YaLM-100B/blob/main/README.md#trai...
- option 3 years ago
  
  All I can see is “ 1.7 TB of online texts, books, and countless other sources in both English and Russian.”
  if there are more details, can you please share a link?
  I am worried that “other sources” may contain Yandex.news which is a cesspool of anti-West and anti-Ukraine propaganda
  
  533474 3 years ago
  
  The pile dataset is used for the English language

abrax3141 3 years ago

Yet another confabulation generator with pretty good grammar.

anothernewdude 3 years ago

How much are they paying for this service?

sudden_dystopia 3 years ago

Didn’t we already learn our “free” lesson from this company?

ArrayBoundCheck 3 years ago

Why is facebook using GPT-3? Generate fake content it wants to push out?

Jensson 3 years ago

Meta wants to create a metaverse, AI generated content like texts would be very helpful to make the metaverse more fleshed out.
- smeagull 3 years ago
  
  Does anybody know what that even means? AI generated NFTs?