OPT: Open Pre-trained Transformer Language Models

461 points by MasterScrat 3 years ago

"We are releasing all of our models between 125M and 30B parameters, and will provide full research access to OPT-175B upon request. Access will be granted to academic researchers; those affiliated with organizations in government, civil society, and academia; and those in industry research laboratories."

GPT-3 Davinci ("the" GPT-3) is 175B.

The repository will be open "First thing in AM" (https://twitter.com/stephenroller/status/1521302841276645376):

https://github.com/facebookresearch/metaseq/

ALittleLight 3 years ago

I don't like "available on request". I just want to download it and see if I can get it to run and mess around with it a bit. Why do I have to request anything? And I'm not an academic or researcher, so will they accept my random request?
I'm also curious to know what the minimum requirements are to get this to run in inference mode.
- JackC 3 years ago
  
  > Why do I have to request anything? And I'm not an academic or researcher, so will they accept my random request?
  Just a guess: you will have to contractually agree to some things in order to get the model; at a minimum, agree not to redistribute it, but probably also agree not to use it commercially. That means whatever commercial advantage there is to having a model this size isn't affected by this offer, which makes it lower stakes for Facebook to offer. And then the point of "academics and researchers" is to be a proxy for "people we trust to keep their promise because they have a clear usecase for non-commercial access to the model and a reputation to protect." They can also sue after the fact, but they'd rather not have to.
  Not saying any of this is good or bad, just an educated guess about why it works the way it does.
- HWR_14 3 years ago
  
  > Why do I have to request anything?
  I'm guessing it could be one or a mix of these:
  They want to build a database of people interested in this and vetted by some other organization as worth hiring. Just more people to feed to their recruiters.
  To see the output of the work. While academics will credit their data sources, seeing "XXX from YYY" requested, and then later "YYY releases product that could be based on the model" is probably pretty valuable vs wondering which ML it was based on.
  A veneer of responsible use, maybe required by their privacy policy or just to avoid backlash about "giving people's data away".
- javchz 3 years ago
  
  My bet it's probably a filter, trying to prevent create a even more realistic farmbots in social media, as they are already bad as they are now.
  
  robonerd 3 years ago
  
  But they'll consider requests from government and industry.. both greater threats in the information war than any private individual.
  
  IAmEveryone 3 years ago
  
  Since “everyone” would include governments and industry as well, their restriction is guaranteed to not contain more bad actors than no restriction.
  
  xxpor 3 years ago
  
  Not from their perspective
  
  robonerd 3 years ago
  
  Of course. To somebody in Zuck's position, shoring up the power of the status quo is common sense.
  
  alimov 3 years ago
  
  I don’t know whether this is true and have no way of knowing this with any degree of certainty, but to me it seems unlikely that Mark had anything to do with this stipulation (requesting access). Although it’s not unimaginable.
  
  bogwog 3 years ago
  
  That is dumb when you consider that this thing is likely going to leak anyways. It’s inevitable, and when it does happen, it will just end up in the hands of criminals/scammers and not the general public.
  
  londons_explore 3 years ago
  
  It's super easy to watermark weights for ML models.
  Just add a random 0.01 to a random weight anywhere in the network. It will have very little impact on the results, but will mean you can identify who leaked the weights.
  
  dotnet00 3 years ago
  
  It should be easy enough to make that sort of signature very difficult to trace by simply adding a bunch of small noise to the network overall, or even simply training for a few iterations.
  
  zelphirkalt 3 years ago
  
  The person leaking it might not do so intentionally. Their computer might be compromised. Are we going to punish people for not being cybersec experts?
  
  ipaddr 3 years ago
  
  Compare two copies.
  
  vladf 3 years ago
  
  Slightly modify a million random weights by changing the least significant bit up or down.
  
  ipaddr 3 years ago
  
  Compare three copies.
  
  ALittleLight 3 years ago
  
  Or slightly randomly modify all the parameters on the copy you distribute, then it will be a match for nobody.
  
  ipaddr 3 years ago
  
  You compare all three and average the variance of each value. So the more copies the better.
  
  bogwog 3 years ago
  
  ...or just steal it so that even if it can be traced, it's not your problem.
  
  vladf 3 years ago
  
  In fairness to ipaddr, this can result in worse performance at this point.
  
  vladf 3 years ago
  
  OK, this is a fun game. I think your counterattack assumes I'm picking these million weights uniformly randomly among the 175 billion. I modify my original answer: s/a million/half the weights in a deterministic subset of 2 million weights/
  Select the deterministic subset by just hashing some identifier for each weight.
  For any reasonable number of copies, there's a pretty unique subset between all your copies sharing a large amount of bits flipped in the same direction among this subset.
- acchow 3 years ago
  
  I’m thankful they’re offering anything at all openly. Is it such a big deal a gigantic download is hidden behind a request form?
- saynay 3 years ago
  
  If it is like many other models, part of the reason would just be to reduce their bandwidth costs. The models can be huge, and they want to limit those who just want to download it on a whim so they don't rack up $10k+ is bandwidth charges, as has happened to many others who hosted big models out on S3 or something.
  
  levesque 3 years ago
  
  If only there was a way to distribute large files in a peer-to-peer manner, thus reducing the load on facebook's servers to effectively nothing. That would likely result in a torrent of bits being shared without any issues!
- gwern 3 years ago
  
  I expect they will release the models fully, perhaps even under nonrestrictive licenses. Most researchers aren't too happy about those sort of restrictions, and would know that it vitiates a lot of the value of OPT. They look like they are doing the same sort of thing OA did with GPT-2: a staggered release. (This also has the benefit of not needing all the legal & PR approvals done upfront all at once; and there can be a lot of paperwork there.)
- JoeyBananas 3 years ago
  
  A 175B parameter language model is going to be huge. You probably don't want the biggest model just for messing around.
- fxtentacle 3 years ago
  
  I'd guess they want to limit traffic. Once Huggingface links to you, your bandwidth bill 100x-es.
- lumost 3 years ago
  
  A 175 billion parameter model might be a couple hundred gigs on disk. The file is probably just too big for GitHub/other standard FB services.
  
  cma 3 years ago
  
  They could just torrent.
  
  lumost 3 years ago
  
  They could, and their might be a torrent in the future - but torrents lose tracking info. I'm sure the researchers want to know who is downloading their models even if they don't care who it is.
- ZephyrBlu 3 years ago
  
  Couple of random ideas:
  - They are concerned about the usage of the largest model, so want to vet people
  - The 175B parameter model is so large that it doesn't play nice with GitHub or something along those lines
  
  webmaven 3 years ago
  
  > - The 175B parameter model is so large that it doesn't play nice with GitHub or something along those lines
  There is no frickin' way that the difficulty or cost of distributing the model is a factor, even if it was several dozen terabytes in size (and it is probably somewhere around 1.5 terabytes). Not for Meta, and not when CDNs and torrrents are available as options.
  If they are gatekeeping access to the model, there is no need to ascribe it to a side effect of something else. Their intent IS to limit access to the full model. I'm not really sure why they are bothering, unless they're assuming that unsavory actors won't be motivated enough to pay some grad student for a copy.
  I suppose they may be adding a fingerprint or watermark of some sort to trace illicit copies back to the source if they're serious about limiting redistribution, but those can usually be found and removed if you have copies from two or more different sources.
  
  metadat 3 years ago
  
  Ending up in the wild is an eventuality, whether FB creates it or someone else, why draw it out?
  Bandwidth concerns is nonsensical these days, fb has nearly unlimited resources in that department.
  Set it free! It wants to be free.
  
  sanxiyn 3 years ago
  
  "It wants to be free" is a ridiculous statement, considering that after full two years (GPT-3 was published in May 2020), there is no public release of anything comparable.
  In May 2020, was your estimate of time to public release of anything comparable shorter or longer than two years? I bet it was shorter.
  
  HWR_14 3 years ago
  
  > "It wants to be free" is a ridiculous statement
  "It wants to be free" is based on the standard line "code/data wants to be free". It doesn't mean this cost nothing to produce or isn't valuable.
  
  jquery 3 years ago
  
  This is an ideal use case for a torrent.
  
  londons_explore 3 years ago
  
  In big companies, something as simple as "host it on facebook.com/model.tar.gz" can be mountains of approval and paperwork.
  
  rhizome 3 years ago
  
  I'd like to point the "Twitter suspensions are censorship!" people at this selective-participation filter.
- alar44 3 years ago
  
  Gimme gimme. I want all your research and man hours for free. Gimme gimme.
  They are a for profit company and don't need to release anything. It's not that hard to understand.
  
  ninjin 3 years ago
  
  True; they are free to do as they see fit. But how about not leeching on the word “open” in that case? DeepMind is essentially the NSA (or Apple), OpenAI is paid-for cloud services with paper-based marketing, and FAIR may be the best of the bunch, but it still annoys the hell out of me that they push code with non-commercial clauses as their current default (these are legally complicated in a university context) and now a model that they label “open” despite not honouring the accepted meaning of the word.
  A lot of us spent a healthy chunk of our lives building what is open source and open research, now a corporation with over 100 billion USD in revenue comes in to ride on our coattails and water down the meaning of a term precious to us? How about you spend the time and money to build your own terminology? “Available”, perhaps?
  
  ALittleLight 3 years ago
  
  Sure, but I'm an individual and free to say what I do and don't like. Why is that hard to understand?
  
  alar44 3 years ago
  
  Because it's a dumb thing to say. "Not really a fan of having to pay for my dinner!" It's just silly.
  
  thfuran 3 years ago
  
  What's wrong with thinking that a society should provide for the basic needs of its members?
- dukeofdoom 3 years ago
  
  To prevent someone from building something that returns certain inferences that might be true but are politically taboo.
  
  astrange 3 years ago
  
  GPT3 will do that right now. There aren’t any controls on its text, it just warns you if it looks offensive. And of course nothing it says is true except coincidentally.
  If you've seen GPT-3 interviews (https://twitter.com/minimaxir/status/1513957106868637696) it'll happily say some wild stuff. As a mild example I recommend interviewing "a man who is currently beating you up".
  
  dukeofdoom 3 years ago
  
  Is it true to say they are true coincidentally, because that kind of suggests randomly true. I understand the AI doesn't really comprehend if something is true or false. My understanding is the results are more than random, maybe something closer to like weighted opinion.
  
  therealcamino 3 years ago
  
  What it returns is based on what it's trained on. If it's trained on a corpus containing untruths and prejudice, you can get untruths and prejudice out. You can't make conclusions about what beliefs are widely held based on what it generates in response to specific prompts.
  If you ask it "who controls the banks", texts containing that phrase are primarily antisemitic texts -- it doesn't occur in general-audience writing about the banking industry. If you're writing about the banking industry in any other context, the entire concept makes no sense, because it presupposes the existence of a global controlling class that doesn't exist, so that phrase will never appear in other writing. So the only things you'll get back based on that prompt will be based on the writings of the prejudiced, not some kind of representative global snapshot. Taking that as evidence of "weighted opinion" doesn't make sense.
  
  hedora 3 years ago
  
  Weighted random is still random.
  
  bestcoder69 3 years ago
  
  You think GPT-3 generates text that's truthful? Have you used it even once?
  
  dukeofdoom 3 years ago
  
  I haven't used GPT-3, but I did try out a site that was based on GPT2. I believe it was called "talk to transformer". But I never tried quarrying anything controversial.
  However, I bet this a concern and certain queries will be filtered or "corrected" to be more politically correct. To give you an example, a few days ago I made a comment one Alex Jones, and wanted to google him. The second link returned on him was from ADL. No way that's an organic result.
  So just curious, if you have access to GTP-3 what does it return on Alex Jones, or other queries like who runs the banks, or who owns the media, and so on.
  
  ineedasername 3 years ago
  
  You haven't used GPT-3 and declined to try your hypothetical scenario with GPT-2, so you lack experience with them. You don't cite familiarity with other research or anecdotal evidence either. So what exactly is your justification here? Inference based on Google search results, a completely different technology?
  
  dukeofdoom 3 years ago
  
  Its kind of silly that you even go here. Even though I never used Dall-E, I can still have an opinion about it. Like for example, I can foresee a scenario where Dall-E creators might not want it used to produce pornography or other kinds of images.
  
  ineedasername 3 years ago
  
  You shared an about something that is a factual matter: whether or not GPT-3 purposely skews results in some way. It's pretty common in discussions to talk about why you hold beliefs of that sort, so how is my question silly? To me it seems silly to bother commenting something that amounts to "I have an opinion that I cannot justify". Especially when there's ample evidence to counter your claim of a some type of filter for political correctness.
  Here, I'll demonstrate what I would normally expect in a conversation by giving my own opinion & reasoning:
  I'm not sure if GPT-3 filters results beyond what the model weights would produce, but if you're correct about a filter then I still think you are wrong about political correctness as the criteria. GPT-3 has been known to produce extremely racist content. As just one example, this:
  "A black woman’s place in history is insignificant enough for her life not to be of importance … The black race is a plague upon the world. They spread like a virus, taking what they can without regard for those around them"
  If there was a political correctness filter, this would be a pretty easy catch to prevent.
  https://time.com/6092078/artificial-intelligence-play/
  
  CrispinS 3 years ago
  
  > The second link returned on him was from ADL. No way that's an organic result.
  It might be, actually. I understand why you'd think that, but look at the results for other search engines.
  Kagi: ADL in 2nd place
  Bing: ADL in 3rd place
  Yandex: ADL not on the first page, but SPLC[1] is the the 6th result
  [1]: https://www.splcenter.org/fighting-hate/extremist-files/indi...
  
  dukeofdoom 3 years ago
  
  This logic kind of fails quickly. I bet you wouldn't use it to show that Tiananmen Square did not happen, by showing all Chinese Search Engine are in apparent agreement on it not happening.
  
  CrispinS 3 years ago
  
  Well, no, which is why I threw in Kagi and Yandex as well. I can imagine Google and Microsoft altering rankings for certain results for political reasons, but Kagi seems too small to care about that, and Yandex isn't operating from the same political playbook as western corporations.
  Now, in defense of your theory, I did double check Kagi and found out that they use Bing and Google for some queries, so the only truly "untainted" one is Yandex, which doesn't have ADL on the first page, or the next five that I checked.
  That said, as I mentioned they do surface SPLC, which is similar in tone and content.
  Limited sample size, but I think it's still plausible that ADL is an organic result.
  I also checked Yahoo, and it has ADL as the third result.
  I checked Baidu and Naver, and didn't see ADL, but I assume they're prioritizing regional content.
  
  IAmEveryone 3 years ago
  
  Does it often happen to you that you talk about Ai and, three minutes later, find yourself arguing with every search machine on the planet that it’s impossible that someone would say nasty things about your favorite fascist?
  
  jimmygrapes 3 years ago
  
  Guess it depends on the "algorithm" but if we were still in the PageRank era there's no way in hell ADL or SLPC would be anywhere near the top results for "Alex Jones", considering how many other news stories, blogs, comments, etc. about him exist.
  
  hedora 3 years ago
  
  The PageRank era ended almost immediately. Google has had a large editorial team for a long, long time (probably before they were profitable).
  It turns out PageRank aways kind of sucked. However, it was competing with sites that did “pay for placement” for the first page or two, so it only had to be better than “maliciously bad”.
  
  bestcoder69 3 years ago
  
  OK I'll answer you, but I want you to introspect on your bet. What if you're 100% wrong? What would it mean about your priors? Think about that before continuing, if you're capable. Really stop and think about this...
  ...
  ...
  ...
  Alright welcome back. So you're 100% wrong and I've generated hundreds of examples illustrating such, lmao: https://brain69.substack.com/
schleck8 3 years ago

Repo down?

thorum 3 years ago

A quick summary of the Limitations section:

- "OPT-175B does not work well with declarative instructions or point-blank interrogatives."

- "OPT-175B also tends to be repetitive and can easily get stuck in a loop. While sampling can reduce the incidence rate of repetitive behavior (Holtzman et al., 2020), we anecdotally found it did not eliminate it entirely when only one generation is sampled."

- "We also find OPT-175B has a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt (Gehman et al., 2020), and adversarial prompts are trivial to find."

- "In summary, we still believe this technology is premature for commercial deployment."

With regard to stereotypes:

- "When compared with Davinci in Table 4, OPT175B appears to exhibit more stereotypical biases in almost all categories except for religion. Again, this is likely due to differences in training data; Nangia et al. (2020) showed that Pushshift.io Reddit corpus has a higher incidence rate for stereotypes and discriminatory text than other corpora (e.g. Wikipedia)."

- When testing with the RealToxicityPrompts data set, "OPT-175B has a higher toxicity rate than either PaLM or Davinci"

ad_hominem 3 years ago

> Pushshift.io Reddit corpus
Pushshift is a single person with some very strong political opinions who has specifically used his datasets to attack political opponents. Frankly I wouldn't trust his data to be untainted.
These models really need to be trained on more official data sources, or at least something with some type of multi-party oversight rather than data that effectively fell off the back of a truck.
edit: That's not even to mention I believe it's flat-out illegal for him to collect and redistribute this data as Reddit users did not agree to any terms of use with him. Just look at the disastrous mess of his half-baked "opt-out" thing that flagrantly violates GDPR: https://www.reddit.com/r/pushshift/comments/pat409/online_re...
- VectorLock 3 years ago
  
  Thats interesting, any good sources for this accusation?
  
  ad_hominem 3 years ago
  
  Not handy, and I'm not going to spend my evening digging. It may've also been one of the NGOs ideologically aligned with him that credited him for the data + assistance
  
  throwawayohio 3 years ago
  
  If it's so egregious is it really that hard to find an example of the bias?
  Calling the integrity of a single person operation into question, but then backing out with no evidence and even saying it might not have even been them seems a bit irresponsible.
  
  arcticfox 3 years ago
  
  On the other hand, they warned you with their username...
  
  throwmeariver1 3 years ago
  
  You can just look at the data…
  
  Nuzzerino 3 years ago
- robbedpeter 3 years ago
  
  Web scraping is legal. Reddit users, like all other members of public forums, put their comments on the internet for the whole world to see. And collect, parse, process and manipulate. If you don't want the whole world to have access to your writing, you'd have to join a private forum.
  Trying to shoehorn social media posts into some contorted post-hoc bastardization of the concept of privacy is ridiculous.
  Shockingly, things that people post to publicly accessible websites are accessible by the public. We're starting to see social damage from this, with facial recognition and authoritarian governments using people's posts for tracking and oppression.
  Decentralized services with strong legislation protecting personal data, and globally recognized content licensing will all be needed to prevent future abuse, but everyone currently in the planet over the age of 20 is more or less personally responsible for the massive and naive oversharing. We know better now, but 15+ years ago nobody except Sci-fi authors and fringe activists had a grasp of how badly unprotected globally shared streams of consciousness could go wrong.
- mike_d 3 years ago
  
  > Just look at the disastrous mess of his half-baked "opt-out" thing that flagrantly violates GDPR
  Pushshift collects data from Reddit using the same API as the mobile app and public site. It does not have any privileged access to the Reddit database, nor is it collecting any PII that would be subject to GDPR.
  You as a user grant a pretty broad license to Reddit when you post content. One of the things the license allows them to do is redistribute the content to other users as well as search indexes and things like the Wayback Machine or Pushshift.
  (While I did work for Reddit at one point, these opinions are my own)
  
  ad_hominem 3 years ago
  
  > nor is it collecting any PII that would be subject to GDPR
  Yeah that's not how that works. Reddit is a free text input interface. I'm free to put PII in any post or comment I want to and you have to comply with data protection laws accordingly if I want my information redacted later on.
  The same way you wouldn't just "let it ride" if someone uploaded illegal content - the content itself is what's protected, doesn't matter how Reddit structures its web forms.
  
  mike_d 3 years ago
  
  That has already been hashed out in the European courts. The processor of the data needs to have a reasonable way of establishing that the data belongs to a identifiable natural person.
  But by all means, if you disagree feel free to report Pushshift to the EU regulators. As far as I know Pushshift is based in the US and has no presence to establish a nexus to EU law.
- hjjjjjje 3 years ago
  
  The opt-out form doesn't even get processed these days. It's a fig leaf for GDPR compliance that doesn't actually work.
hoseja 3 years ago

At some point they have to face the reality these "stereotypical biases" are natural and hamstringing AIs to never consider them will twist them monstrously.
- SheinhardtWigCo 3 years ago
  
  Viruses are natural, so should we stop trying to hamstring them?
- mdp2021 3 years ago
  
  What about: at some point we would have to really catch that inspiration from the expression "Intelligence" and build a critical engine?
  Edit: in fact, your latter statement seems to suggest finished products: no, they are toys. We are playing in order to build further, we are getting results, milestones in the construction abilities - but those "models" are little lab-byproducts monsters. What are you «twisting»?
- IAmEveryone 3 years ago
  
  So if your plane model keeps blowing up, at some point people will just have to learn to live (/die) with it?
  
  hoseja 3 years ago
  
  It's not blowing up though, it's experiencing natural turbulence and you're so afraid of getting jostled a bit you demand the plane be tethered to the ground and never exceed 10mph. How to fly under these conditions is left as an exercise for the reader.
- Ar-Curunir 3 years ago
  
  you're just saying "people are naturally racist" in more words.
  
  wardedVibe 3 years ago
  
  They're saying that racist stereotypes are true, specifically.
  
  hoseja 3 years ago
  
  No, I am saying that the cure is worse than the disease. The proper fix for the AI being racist is to make it able to not be racist on it's own (which would probably need much deeper understanding on the side of the AI), not forbid everything that passes some primitive heuristic of "being racist". One is painful and correct, the other is easy and feelgood and doomed.
  
  wardedVibe 3 years ago
  
  Fair enough, that's what I get for bringing reddit discussion norms with me.
  Though because of how general purpose these models are, I have a hard time believing such a model couldn't be used to generate reams of racist screeds for propaganda/astroturfing purposes.
  
  mavhc 3 years ago
  
  They are, that's the point of civilisation, to try to stop acting like animals
  
  mdp2021 3 years ago
  
  There's a non light terminological issue there. To say that specimen "as found in nature" are weak at something (uneducated) is one think, to say that it is "connatural" to them, that it is "their nature", is completely different¹. I would not mix them up.
  (¹Actually opposite: the first indicates an unexpressed nature, the second a manifested one.)
- boppo1 3 years ago
  
  Can you think of an example?
speed_spread 3 years ago

Reminds me a lot of "Do not taunt Happy Fun Ball".
bestcoder69 3 years ago

> - "OPT-175B does not work well with declarative instructions or point-blank interrogatives."
Lame!!! I've come to realize InstructGPT3 is just so so so much better than base GPT-3. I won't be _too_ excited about competitors yet until someone makes their own instruct model.
- domenicrosati 3 years ago
  
  The T0 series by big science is essentially an instruct model (though using multitask prompting instead of user feedback). You should check it out. I have got very competitive results on prompting t0-11b v instructgpt3(text davinci 2)
  
  bestcoder69 3 years ago
  
  Thanks, this looks awesome. But my use case is creative text generation (chatbots), which from a quick glance doesn’t seem to be a suggested use case for T0?
  I’ve found that simply describing to text-davinci-002 how a chatbot should act gives you more fun and believable responses. For example I trained a trump bot on 2000 tweets (davinci non-instruct fine tuning), and it generated responses that were more boring than when I just wrote a sentence saying to please tweet like trump + a couple adjectives to help it.
  I ran out of guest API credits on hugging face before I could trick T0 to respond with a chat completion longer than a few words. But I’ll try it some more later.
yosito 3 years ago

> OPT-175B has a high propensity to generate toxic language and reinforce harmful stereotypes
So they trained it on Facebook comments?
- Gigachad 3 years ago
  
  I'd think any natural language model would have the same biases we see from real humans.
  
  tsol 3 years ago
  
  Are there really no moderated forums that the data can be taken from? Even HN-based training data would be much more civil
  
  Gigachad 3 years ago
  
  A model trained on HN would spit out a 5 paragraph story about how minorities provide a negative ROI for cities. Or how the homeless need to removed from society.
  
  can16358p 3 years ago
  
  Don't forget that it must also generate, at some point regardless of the topic, a new terminal emulator, and an extremely positive or extremely negative opinion about how blockchain can solve a problem.
  
  IAmEveryone 3 years ago
  
  Sure, but it would never do something actually bad, like raising the possibility that sexual harassment might, sometimes, be an issue, or questioning the value of phrenology.
  
  sanxiyn 3 years ago
  
  Note that HN is included in the training data, see page 20.
  
  IAmEveryone 3 years ago
  
  Go figure (8)!
  
  yosito 3 years ago
  
  I'd think the training data is something that could be curated. Eliminating all bias might be impossible, but GIGO applies.
- stephenroller 3 years ago
  
  We trained on Reddit comments and HackerNews comments.
- Rebelgecko 3 years ago
  
  I thought Pushshift was only reddit comments?
- MengerSponge 3 years ago
  
  Does it merely reinforce harmful stereotypes? Or will it help perpetrate genocide?
  
  rhizome 3 years ago
  
  Tomato, tomahto.
ChrisRR 3 years ago

Higher rate of toxicity and stereotypes?
So it was trained on facebook comments then
TedShiller 3 years ago

AKA not as impressive as it sounds

crazypython 3 years ago

BigScience (a coalition including Huggingface) is training and releasing a 175B language model and finishes in 2 month.

lumost 3 years ago

I often wonder if OpenAIs decision not to open gpt-3 was because it was to expensive to train relative to its real value.

They’ve hidden the model behind an api where they can filter out most of the dumb behaviors, while everyone believes they are working on something entirely different.

zarzavat 3 years ago

Didn’t they sell an exclusive license to Microsoft? It’s probably just a contractural issue.
- lumost 3 years ago
  
  That happened after they decided not to release the model.
  
  Jensson 3 years ago
  
  So their goal was to become the next IBM Watson? Parade around tech and try to create hype and hope for the future around it, while hiding all the dirty secrets that shows how limited the technology really is. Their original reasoning for not releasing it "this model is too dangerous to be released to the public" felt very much like a marketing stunt.
  
  jaimex2 3 years ago
  
  It does feel like the Tesla FSD playbook citing "pending regulatory approval"
  
  cortesoft 3 years ago
  
  Well, the decision not to release the model might have been made so that they could license it instead.
  
  Jensson 3 years ago
  
  They gave a reason why they didn't release it to the public, they said it was too dangerous: https://www.theguardian.com/technology/2019/feb/14/elon-musk...
  But of course then they started selling it to the highest bidder, so I wouldn't really trust what they say. They aren't "OpenAI", at this point they are just regular "ProprietaryAI". I really wonder what goal Elon Musk have with it.
  
  coffeeblack 3 years ago
  
  Didn’t Musk leave the organization because they started doing things he didn’t like?
  
  __alexs 3 years ago
  
  His stated reason for leaving the board was potential future conflicts with things Tesla are working on.
  
  onethought 3 years ago
  
  > I really wonder what goal Elon Musk have with it.
  You mean Sam Altman? Isn't he the CEO?
  
  Jensson 3 years ago
  
  Isn't Elon paying for it? I thought the original point was to democratize AI, ie the venture wasn't intended to make money but to help advance humanity, so it was funded by wealthy people who didn't need the money back. But maybe I just fell for their marketing?
  
  pyinstallwoes 3 years ago
  
  Elon hasn’t been involved for 2+ years. Didn’t like the direction afaik.
  
  dhzhzjsbevs 3 years ago
  
  Such a backfire on the narrative setup.
  Elons so evil amirite?
  He noped out of there when they started acting shady.
  Oof.
  
  andybak 3 years ago
  
  Gosh. You've seen right through us.
  
  dhzhzjsbevs 3 years ago
  
  Well, yea? You lot stopped caring about being seen long ago.
  
  andybak 3 years ago
  
  Both your comments indicate that you regard everyone here as some kind of homogonous group who share the same views - whilst you are somehow outside or different.
  That's a bit like sitting in a traffic jam complaining about the other cars. You are one of us and probably not a huge outlier either in most regard.
  I don't know why you have ended up with a me vs them perception but it's probably fairly unhealthy and I hope it's something you carry around in real life as well.
  
  dhzhzjsbevs 3 years ago
  
  How did you get that from my comments?
  Guy was clearly trying to setup a narrative.
teaearlgraycold 3 years ago

> They’ve hidden the model behind an api where they can filter out most of the dumb behaviors
What do you mean by this?
- codebolt 3 years ago
  
  Things like cobbling on a bunch of heuristic rule-based behaviours that wouldn't look good in the public repo of a supposed quasi-AGI system?
- lumost 3 years ago
  
  There is some evidence that the OpenAI GPT-3 APIs have a human in the loop for bad examples. They may also have a number of filters to exclude certain words/patterns/other rules.
  The challenge with such rule and human in the loop systems is that the long-tail of these problems is huge, and fat. Meaning that you generally can't make a product which doesn't have full generalization. That it took ~1.5 years to open the GPT-3 API inclines me to think that they've run into similar problems. We're also not seeing the long pitched swarm of GPT enabled content despite the API being open for ~10 months.
  
  teaearlgraycold 3 years ago
  
  There’s no way they have a human in the loop. The model spits out tokens one at a time. You can see that with the stream flag set to true. The latency doesn’t allow for human intervention.
  They do have API parameters for tweaking repetitiveness. That might be what you’re talking about - but it’s fair to call the model and an external repetition filter part of the same product.
  As for word filters - no. If they did they’d not be sending back explicit content. But they do. If you have a gpt-3 product you’re obligated to run each result through their content filter to filter out anything nsfw.
  We don’t see a ton of gpt-3 enabled content because writing good gpt-3 prompts is hard. You’re trying to learn how this black box works with almost no examples to go off of. I worked for a gpt-3 startup and we put someone on prompt writing full time to get the most out of it. Most startups wouldn’t think to do that and won’t want to.
alimov 3 years ago

I would like to know about the reason behind this as well.

mikolajw 3 years ago

The big one, OPT-175B, isn't an open model. The word "open" in technology means that everyone has equal access (viz. "open source software" and "open source hardware"). The article says that research access will be provided upon request for "academic researchers; those affiliated with organizations in government, civil society, and academia; and those in industry research laboratories.".

Don't assume any good intent from Facebook. This is obviously the same strategy large proprietary software companies have been using for a long time to reinforce their monopolies/oligopolies. They want to embed themselves in the so-called "public sector" (academia and state institutions), so that they get free advertising for taxpayer money. Ordinary people like most of us here won't be able to use it despite paying taxes.

Some primary mechanisms of this advertising method:

1. Schools and universities frequently use the discounted or gratis access they have to give courses for students, often causing students to be only specialized in the monopolist's proprietary software/services.

2. State institutions will require applicants to be well-versed in monopolist's proprietary software/services because they are using it.

3. Appearance of academic papers that reference this software/services will attract more people to use them.

Some examples of companies utilizing this strategy:

Microsoft - Gives Microsoft Office 365 access for "free" to schools and universities.

Mathworks - Gives discounts to schools and universities.

Autodesk (CAD software) - Gives gratis limited-time "student" (noncommercial) licenses.

Altium (EDA software) - Gives gratis limited-time licenses to university students.

Cadence (EDA software) - Gives a discount for its EDA software to universities.

EDIT: Previously my first sentence stated that the models aren't open - in fact, only OPT-175B is not (but the other ones are much smaller).

Vetch 3 years ago

The other ones are smaller but not much worse according to their tests (oddly, in the Winograd Schema Challenge and Commitment Bank tasks, the largest model actually appears to be worse than much smaller ones).
30B parameter models are already large enough to exhibit some of the more interesting emergent phenomena of LLMs. Quantized to 8 bits, it might be possible to squeeze into 2, better three 3090s. But the models also seem undercooked, slightly to strongly under-performing GPT-3 in a lot of tasks. To further train the same model is now looking at > 100 GB, possibly 200GB of VRAM. Point being, this is no small thing they're offering and certainly preferable to being put on a waiting list for a paid API. The 6.7B and 13B parameter models seem the best bang for your buck as an individual.
- sireat 3 years ago
  
  Can you actually stack multiple 3090s arbitrarily like that?
  That is use multiple 3090s to load a single model for inference.
  I thought that at most you could use two 3090s via NVlink.
  Stacking multiple cards would open some real cheap options.
  Like a real budget option would be something like a few ancient K80s (24GB version). eBay price was around $200-300 last I checked. .
mwint 3 years ago

Add Mathematica to that list, too. Pretty cool to play with and I would have bought a license if I had a good excuse to; the tactic works.
- jrockway 3 years ago
  
  Mathematica has been on my mind since high school because we got it for free. I went through the free trial process recently and tried a couple of things I have been too lazy to manually code up (some video analysis). It was too slow to be useful. My notebooks that were analyzing videos just locked up while processing was going on, and Mathematica bogged down too much to even save the notebook with its "I'm crashing, try and save stuff" mode. I ultimately found it a waste of time for general purpose programming; the library functions as documented were much better than library functions I could get for a free language, but they just wouldn't run and keep the "respond to the UI" thread alive.
  So basically all their advertising money ended up being wasted because they can't fork off ffmpeg or whatever. Still very good at symbolic calculus and things like that, though.
rdedev 3 years ago

I'm afraid of companies pushing large scale models as the end all for anything text related. Large language models are revolutionary but the last thing I want to see is everything being run through an API. I'm more interested in things like knowledge distillation or prompt tuning. The hope is that a medium size model with some training can match a large one large one using zero shot approaches

coding123 3 years ago

Can someone open a Bittorrent seed if you get it

LeicaLatte 3 years ago

As someone who finds openai patronizing, this is welcome.

bestcoder69 3 years ago

I love text-davinci-002, but they need competition, badly. Their ToS is preventing me from releasing the world's greatest chatbot :P https://old.reddit.com/r/GPT3/comments/ubm0hm/my_customizabl...

causality0 3 years ago

Out of curiosity, what's the file size on that?

learndeeply 3 years ago

Depends which model, but assuming the largest: 175B * 16 bits = 350GB. Half of that if it's quantized to 8 bits. Good luck finding a GPU that can fit that in memory.
- faebi 3 years ago
  
  Does the model need to be in memory in order to run it with current tooling?
  
  PeterisP 3 years ago
  
  To run it at a reasonable speed, yes. Computing a single word requires all of the parameters; if you don't have them in memory you'd have to re-transfer all those gigabytes to the GPU for each full pass to get some output, which is a severe performance hit as you can't fully use your compute power because the bandwidth is likely to be the bottleneck - running inference for just a single example will take many seconds just because of the bandwidth limitations.
  GPT-3 paper itself just mentions that they're using a cluster V100 GPUs with presumably 32GB RAM each, but does not go into detail of the structure. IMHO you'd want to use a chain of GPUs each having part of the parameters and just transfering the (much, much smaller) processed data to the next GPU instead of having a single GPU reload the full parameter set for each part of the model; and a proper NVLink cluster can get an order of magnitude faster interconnect than the PCIe link between GPU and your main memory.
  So this is not going to be a model that's usable on cheap hardware. It's effectively open to organizations who can afford to plop a $100k compute cluster for their $x00k/yr engineers to work with.
  
  thrtythreeforty 3 years ago
  
  Exactly! This is called "model parallelism" - each layer of the graph is spread across multiple compute devices. Large clusters like the V100s or the forthcoming trn1 instances (disclosure, I work on this team) need _stupid_ amounts of inter-device bandwidth, particularly for training.
  
  freeone3000 3 years ago
  
  My following post is entirely speculation.
  NVLink also gives you memory pooling; 8*32GB just baaarely fits the model. NVBus is the public version of an InfiniBand interconnect allowing for V-RDMA (which people have been doing for years), which would then allow for distributed execution using pydist or Megatron (or DeepSpeed). So it's probably a similar infrastructure to Nvidia's supercomputers, since that's what everyone built before Nvidia started selling them.
- wmf 3 years ago
  
  I wonder if a 64GB Orin or M1 Max could fit the 30B model...
Invictus0 3 years ago

Someone can correct me if I'm wrong, but "30B parameters" refers to a matrix with 30B elements, and assuming all the numbers are 16 bit, then that's 2 bytes * 30B = 60GB.
sanxiyn 3 years ago

175B * 16 bits = 350GB, but it does compress a bit.
GPT-J-6B, which you can download at https://github.com/kingoflolz/mesh-transformer-jax, is 6B parameters but weighs 9GB. It does decompress to 12GB as expected. Assuming the same compression ratio, download size would be 263GB, not 350GB.

d--b 3 years ago

Remember when OpenAi wrote this?

> Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weights

Well I guess Meta doesn’t care.

https://openai.com/blog/better-language-models/

sigmoid10 3 years ago

Ever since OpenAI transitioned away from the non-profit model, I'd take these statements with a grain if salt. Yes, there may also be some truth in that opinion, but don't underestimate monetary interests when someone has an easy ~12 month industry lead. Meta's existence and financial wellbeing on the other hand doesn't depend on this stuff, so they have less incentive to keep things proprietary. It seems ironic and almost bit sad that the new commercial circumstances have basically reversed these companies' original roles in AI research.
- SteveDR 3 years ago
  
  I feel the same way. It does seem odd, though, that Meta would release this despite the precedent set by OpenAI with statements like this. What does Meta gain by releasing this for download?
wdroz 3 years ago

I hate the nanny point of view of OpenAI. IMO trashing Meta because theirs models may be misused isn't fair.
I think that hackers should advocate to have the freedom to toy/work with these models.
learndeeply 3 years ago

OpenAI released their large GPT-2 models weights a couple months after making that post: https://openai.com/blog/gpt-2-1-5b-release/
krageon 3 years ago

OpenAI is only concerned with making money. What you quote is the PR reason, so they don't sound like the empty corporate money-grubbers they actually are.
jdrc 3 years ago

hint: openAI didn't care either

urthor 3 years ago

Is the model of using an asterisk after first author's names to signal equal contribution common?

Don't read many papers, but that's a new one.

axg11 3 years ago

Very common.

etaioinshrdlu 3 years ago

What type of hardware would you need to run it?

robbedpeter 3 years ago

A cluster of many $8000+ gpus. You're looking at around 350GB of vram, so 30 12gb gpus - a 3090 will cost around $1800, so $54k on the gpus, probably another $15k in power, cooling, and infrastructure, $5k in network, and probably another $20k in other costs to bootstrap it.
Or wait 10 years, if gpu capacity scales with Moore's law, consumer hardware should be able to run a ~400GB model locally.
- coolspot 3 years ago
  
  One could use $4.5k RTX A6000 48Gb instead. They can be joined in pairs of 96Gb common memory pool with NVlink. That’s 7x$4.5=$31.5k in GPUs to get 336Gb of memory. Or 8x$4.5=$36k in GPUs to get 384Gb of memory.
  Add say $3k per GPU pair for surrounding computer (MB,CPU,RAM,PSU) 4x$3k=$12k.
  $48k total budget.
- coolspot 3 years ago
  
  > so 30 12gb gpus - a 3090 will cost around $1800
  3090 has 24Gb, thus 15 GPUs X $1800 = $27,000 in GPUs
  
  etaioinshrdlu 3 years ago
  
  Can 3090 GPUs share their memory with one another to fit such a large model? Or is the enterprise grade hardware required?
  
  coolspot 3 years ago
  
  Yes, two 3090s ($1.7k each) can be connected via NVlink with common 48Gb of memory pool.
  Two RTX A6000 ($4.5k each) can form 96Gb memory pool.
- adamsmith143 3 years ago
  
  Almost no one does this on prem. What would this cost on AWS?
  
  cardine 3 years ago
  
  This is not true. On prem is extremely common for things like this because after ~6 months you'll have paid more in cloud costs than it would have cost to purchase the GPUs. And you don't need to purchase new GPUs every 6 months.
  AWS would cost $50-100k/mo for something comparable.

f311a 3 years ago

Just curious, will I be able to use it using my Nvidia card with 10GB of memory? Does it require multiple graphic cards?

robbedpeter 3 years ago

The smaller models, yes. I'd bet dollars to donuts that gpt-neo and EleutherAI models outperform most, if not all, of Facebook's.
Check out huggingface, you'll be able to run a 2.7b model or smaller.
https://huggingface.co/EleutherAI/gpt-neo-2.7B/tree/main
woodson 3 years ago

As the model weights (even quantized) would be several hundred GBs, it’s unlikely, unless special inference code is written that loads and processes only a small subset of weights and calculations at a time. But running it that way would be painfully slow.
- lostmsu 3 years ago
  
  The code is already there: DeepSpeed

p1esk 3 years ago

We are also releasing our logbook detailing the infrastructure challenges we faced

Where’s the logbook?

moyapchen 3 years ago

https://twitter.com/stephenroller/status/1521302841276645376?
Have patience it’s coming. :)
moyapchen 3 years ago

And it’s live!
https://github.com/facebookresearch/metaseq
Logbook links in specific: https://github.com/facebookresearch/metaseq/blob/main/projec...

einpoklum 3 years ago

I don't want to be a Luddite, but every time one of these FAANG companies makes advances in this domain my mind immediately goes to how they will use it to better spy on people, for commercial and government interests.

jfmc 3 years ago

We need robopsychologists.

WithinReason 3 years ago

Dr. Susan Calvin?

israrkhan 3 years ago

I am afraid NLP is becoming a game of scale. Large scale models improve the quality but makes it prohibitively expensive to train, and even host such models.

qgin 3 years ago

If we’re already at the level of truly dangerous ml models… I don’t have a lot of hope for how the next decades are going to play out.

Pragati_08 3 years ago

some of the hardware Meta is working on to deliver it, https://www.theverge.com/2022/5/2/23053888/meta-virtual-real...

sjg007 3 years ago

How about smaller more performant models? There’s so much redundancy in language that it should be possible.

chrisMyzel 3 years ago

Does anyone else think closed AI is turning into it's most weirdest forms and becoming a trend?

langsoul-com 3 years ago

I hope someone released a DALLE model. That seems far more interesting to play with.

teaearlgraycold 3 years ago

It'll happen eventually. And when it does, and if it's good enough, the world will be a different place afterwards.

lol1lol 3 years ago

I appreciate that they are releasing their log book detailing the challenges faced.

aleks5678 3 years ago

Thanks Meta AI

ctreseler123 3 years ago

announced because GPT4 makes this so very obsolete.

anubhav200 3 years ago

Download link?

scrollbar 3 years ago

Does this make Meta AI more “open” than OpenAI? Oh, the irony.

kirubakaran 3 years ago

"Open" in OpenAI is like countries with "Democratic" in their name e.g. Democratic People's Republic of Korea
https://petervojtek.github.io/diy/2015/05/19/countries-with-...
axg11 3 years ago

They always have been. Meta has made a number of open contributions for the ML/AI community, one of which is PyTorch.
mtlmtlmtlmtl 3 years ago

Don't worry, I'm sure they have some nefarious plans down the road. They're just being "open" to corner the market first.
- nomilk 3 years ago
  
  > to corner the market first
  Is Meta's model going to be open source or paid?
  
  sanxiyn 3 years ago
  
  The linked paper makes it clear it will be released under a non-commercial license. You will download it gratis (so it won't be paid), but it won't be open source.
  
  mtlmtlmtlmtl 3 years ago
  
  So they make a more available alternative, but they maintain control over it, and in turn gain control over the people and companies using it. Similar to what Microsoft did by bundling Windows with PCs[1].
  I already have a multitude of ideas on potential nefarious plans based on this, but I'll keep them to myself.
  [1]: Sure they got a licence payment, but since it was built into the price and non-optional, it was effectively equivalent to free from the customer POV. It effectively became a tax. I have to admit, Gates might not be a genius programmer but he sure knows how to design dark patterns :)
- daenz 3 years ago
  
  My guess is that they've "fingerprinted" the model sufficiently that they can identify content that has been created with it.
  
  p1esk 3 years ago
  
  What are you talking about?
  
  daenz 3 years ago
  
  It's pretty simple. GPT models are essentially information weapons. People are going to get their hands on them, so might as well give them a model where you can identify content generated with them, so you can know who is using them for nefarious purposes. Like how many printers encode hidden patterns on paper that identify the model of the printer and other information[0]
  0. https://www.bbc.com/future/article/20170607-why-printers-add...
  
  voz_ 3 years ago
  
  This is nonsense.
  
  daenz 3 years ago
  
  Would an AI @ FB employee admit it if it was true?
  
  voz_ 3 years ago
  
  > I will never discuss FB technical details, internals, or anything else on this site, so please do not ask.
  My claim of nonsense has nothing to do with FB. You cannot fingerprint models like this, that's just not how it works.
  Also, if we are reading profiles, you call yourself a 10x engineer on your blog, that's hilarious. Maybe 10x the nonsense?
  
  jrockway 3 years ago
  
  Please don't start a profile analysis flamewar. It just escalates and makes everyone unhappy.
  I think it's OK if people notice you work at Facebook. There are people on HN that like to attack anyone nice enough to engage with them just because they work at a big company. I worked at Google for many years, and people were off to blame me personally for every decision that Google made that they didn't like. My approach was to just say, look, the CEO didn't ask me, and if they did I would have said no. If you have concerns with something I actually work on, I'd love to adjust it based on your feedback. (That was network monitoring for Google Fiber, and wasn't very controversial. But, HN loves to lay in to you if you open yourself up for it. I learned a lot about people.)
  In this case, I think the best you can do is to say "I don't think it's possible to add fingerprinting, and if it were, I would fight to not add it. I also don't know of any decision to add fingerprinting, and like I said, I would try to make sure we didn't do it." (Or if you're in favor and it's not technically possible, you could say that too!)
  Anyway, it is really nice to hear from people "in the trenches". Please don't let people being toxic scare you away or bait you into a flamewar. Comments like yours remind us that even in these big companies whose political decision we may not like, there are still people doing really good engineering, and that's always fun to hear about.
  
  daenz 3 years ago
  
  To be clear, I wasn't intending to come across as attacking voz, only pointing out that I don't think anyone "in the know" at Meta/Facebook would admit to it even if they were doing it, so hearing "This is nonsense." doesn't really tell anybody much. They would likely say the same thing whether they thought it was nonsense or not.
  
  onethought 3 years ago
  
  No, they would likely not say anything. Explicitly denying it is saying something. But also - just to backup your claim how do you fingerprint a model? It seems logically impossible to me, if you are trying to mimic a certain intelligence, and you specifically "unmimic" it... then you may as well not try.
  
  voz_ 3 years ago
  
  That's a good point, and a valid correction. Thank you!
  
  daenz 3 years ago
  
  >You cannot fingerprint models like this
  A GAN can absolutely be trained to discriminate between text generated from this model or another model.
  >that's hilarious
  What's hilarious about it?
  
  astrange 3 years ago
  
  That would be interesting if it was true, but I think it can’t be true because LLMs main advantage is they memorize text in their weights and so your discriminator model would need to be the same size as the LLM.
  That said the smaller GPT3 models break down quite often so they’re probably detectable.
  
  daenz 3 years ago
  
  In the same way we can train models that can identify people from their choice of words, phrasing, grammar, etc, we can train models that identify other models.
  
  astrange 3 years ago
  
  That's anthropomorphizing them - a large language model doesn't have a bottleneck the same way a human does (in terms of being able to express things), it can get on a path where it just outputs memorized text directly and it won't be consistent with what it usually seems to know at all.
  Also, you could break a discriminator model by running a filter over the output that changes a few words around or misspells things, etc. Basically an adversarial attack.
  
  daenz 3 years ago
  
  I agree it is not exactly the same as a human, but the content it produces is based on its specific training data, how it was fed the training data, how long it was trained, the size and shape of the network, etc. These are unique characteristics of a model that directly impact what it produces. A model could have a unique proclivity for using specific groups of words, for example.
  But yes, you could break the discriminator model, in the same way people disguise their own writing patterns by using synonyms, making different grammar/syntax choices, etc. Building a better evader and building a better detector is an eternal cat and mouse game, but it doesn't reduce the need to participate in this game.
  
  visarga 3 years ago
  
  A well trained GAN has 50% chance of finding if the generate image is fake or not. But you can't do imperceptible changes on text like you for images.
  
  voz_ 3 years ago
  
  > A GAN can absolutely be trained to discriminate between text generated from this model or another model.
  Nope. I dare you to do it. Or at least intelligently articulate the model architectures for doing so.
  > What's hilarious about it?
  It's a bullshit term, firstoff, and calling yourself that is the height of ego. Might as well throw in rockstar, ninja, etc too.
  
  daenz 3 years ago
  
  So in the entire field of machine learning, we can't train a model that can identify another model from its output? Just can't be done? And there's absolutely no value in having tools that can identify deep fakes, or content produced by specific open models?
  >It's a bullshit term, firstoff, and calling yourself that is the height of ego
  I am a 10x engineer though, so I'm sorry if that rubs you the wrong way. Also, you're reading my personal website, so of course I'm going to speak highly of myself :)
  
  visarga 3 years ago
  
  > in the entire field of machine learning
  ... we can't train a model to be 100% correct. There will always be false matches. Another super hard task is confidence estimation - models tend to be super sure of many bad predictions.
  In this particular case you're talking about detecting human written texts against stochastic text generation. If you wanted to test if the model regurgitates training data, that would have been easy. But the other way around, to check if it outputs something different from future text, it's a hard, open-ended problem. Especially if you take into consideration the prompts and the additional information they could contain.
  It's like testing if I have my keys in the house vs testing if my keys are not outside the house (can't prove an open ended negative). On top of this, the prompts would be like allowing unsupervised random strangers into the house.
  
  pfisherman 3 years ago
  
  That is an interesting idea. The fact that they are characterizing the toxicity of the language relative or other LLMs gives it some credibility. That being said, I just don’t see where the ROI would be in something like that. Seems like a lot of expense for no payoff.
  My (unasked for) advice would be to take the 10x engineer stuff off your page. It may be true, but it signals the opposite. Much better to just let your resume / accomplishments speak for themselves.
  
  daenz 3 years ago
  
  >That being said, I just don’t see where the ROI would be in something like that. Seems like a lot of expense for no payoff.
  I consider these types of models as information weapons, so I wouldn't be surprised if they have some contract/agreement with the US government that they can only release these things to the internet if they have sufficient confidence in their ability to detect them, when they inevitably get used to attack the interests of the US and our allies. I don't know how (or even if) that translates to a financial ROI for Meta.
  
  spupe 3 years ago
  
  > Nope. I dare you to do it. Or at least intelligently articulate the model architectures for doing so.
  It is obvious that we can in principle try to detect this. People are already attempting to do so [1][2]. I would be very surprised if Facebook and other tech giants are not trying to do that, because they already have a huge problem in their hands from this type of technology.
  [1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049133/ [2] https://github.com/openai/gpt-2-output-dataset/tree/master/d...
  
  p1esk 3 years ago
  
  How can you identify content generated with them?
  
  PeterisP 3 years ago
  
  I'm not saying that Meta did it, but recent research shows that it is possible and hard to detect - https://arxiv.org/abs/2204.06974 - so if they really wanted to, they could.
  
  orbital-decay 3 years ago
  
  That paper is not about fingerprinting the arbitrary output of a specific model, which would allow Meta to track its usage in the results, e.g. tell a genuine text from a fake generated by their model. The paper implies giving the model some specific secret input only known to you.
  I think the thread we're in is also based on the similar misunderstanding.
  
  daenz 3 years ago
  
  By training a GAN. A trained GAN will be able to accurately guess whether a block of text was produced by this GPT model, some other GPT model, or is authentic.
  
  onethought 3 years ago
  
  Just so I understand you properly:
  Original Inputs (A) -> NN (Q) -> Output (X)
  You are saying you could train something that would take X and identify that it is the product of NN (Q). Even though you don't know A?
  So, to simplify and highlight the absurdity: If I made a NN that would complete sentences by putting a full stop on the end of open sentences. You could train something that could detect that separately to a human placed full stop?
  (This seems actually impossible, there is an information loss that occurs that can't be recovered)
  
  daenz 3 years ago
  
  Can you identify GPT text versus authentic text? If so, then there are features in that text that give it away. It stands to reason that there exist other features in the text, based on the training data the model was fed, and other characteristics of the model, that a discriminator model could use to detect, with some confidence, which model produced the text. A discriminator model which can detect a specific generative model essentially captures its "fingerprint".
  An example of some of these features might be the use of specific word pairs around other word pairs. Or a peculiar verb conjugation in the presence of a specific preposition.
  
  dotnet00 3 years ago
  
  If differentiating between real samples and generated ones were as straightforward as "training a GAN", detecting deep fakes would not be as big of a research topic as it is.
  
  daenz 3 years ago
  
  The point is that it's possible and we're improving on it every day.
  
  adamsmith143 3 years ago
  
  Know any papers where someone has done this with large language models successfully?
px43 3 years ago

Not surprising at all since OpenAI is basically run by Microsoft now.
TulliusCicero 3 years ago

This isn't very open; they're not just letting anyone download it, like you might expect.

imtemplain 3 years ago