Imagen Video: high definition video generation with diffusion models

imagen.research.google

800 points by jasondavies 3 years ago

The concern trolling and gatekeeping about social justice issues coming from the so-called "ethicists" in the AI peanut gallery has been utterly ridiculous. Google claims they don't want to release Imagen because it lacks what can only be called "latent space affirmative action".

Stability or someone like it will valiantly release this technology, again and there will be absolutely no harm to anyone.

Stop being so totally silly Google, OpenAI, et. al. - it's especially disingenuous because the real reason you don't want to release these things is that you can't be bothered to share and would rather keep/monetize the IP. Which is ok -- but at least be honest.

benreesman 3 years ago

I agree basically completely, but there’s now a cottage industry of AI Ethics professionals whose real job is to provide a smoke screen for the “cake and eat it too” that the big shops want on this kit: peer review and open source contributions and an academic atmosphere when it suits them, proprietary when it doesn’t. Those folks are a lobby now.
The thing about owning the data sets and the huge TPU/A100 clusters is that the “publish the papers” model strictly serves them: no one can implement their models, they can implement everyone else’s.
- nperez 3 years ago
  
  This thread is cathartic. I've been feeling uncomfortable with the level of control being sought over the usage of these tools for a while, but didn't want to ruffle the wrong feathers while just getting into AI as a hobby. I think there will be a pretty short window in which all of this hand-waving will be taken seriously. Not because AI won't be used for terrible things (I'm sure it already is) but because consumer hardware can already be used to build a dataset and train a model, and eventually there will come a realization - it doesn't matter how ethicists want "the general public" to use AI. The general public is fully capable of figuring out on their own how to do whatever they feel like doing. It's like a compiler or a hammer or a car, all of which can be used for positive or negative purposes.
  I do understand the fear of being sued or targeted in the media over misuse, though. The person misusing technology should (obviously imo) be held responsible for that, but since it's new tech, the tech will be taking the blame for the first really controversial cases of disinfo and/or harassment that utilize it.
  
  Roark66 3 years ago
  
  Exactly. Probably they already started lobbying against selling high end cheap GPUs to the public. No doubt ether going proof of stake is a huge blow to their agenda. They can't claim all those GPUs are just wasting energy for crypto mining. Now they have to come up with different arguments.
  I can already see it. Just think of all the energy wasted training AI at home! I can imagine police drones with IR sensors scanning the cities for the heat signatures of illegal AI "farms".
  Talking seriously, however they try to spin it, advanced AI (same as every other big scientific/engineering achievement) will be predominantly good. So let's say there is time when AI can create convincing videos of people engaging in various compromising "activities". When this becomes widespread it will give plausible deniability to any potential victim of such attack(with real or deep faked materials) .
  In a world where any compromising video or picture can be made with anyone, the value of such materials for wannabee blackmailer diminishes rapidly. However, in a world where there is only few entities that can produce such materials and they do so sparingly those entities get a tool that gives them huge power (especially in democracies where popular opinion decides who governs).
  
  trention 3 years ago
  
  >advanced AI (same as every other big scientific/engineering achievement) will be predominantly good.
  They should start teaching the problem of induction in schools, evidently it's needed.
  
  bheadmaster 3 years ago
  
  That's a really convoluted way of saying what amounts to "you're dumb".
  How about you present a counterargument - why would advanced AI be predominantly bad? Unless, of course, your only counterargument is the classic philosophical statement of "we can't know nuffin".
  
  benreesman 3 years ago
  
  It is more than a little snarky, but a certain amount of snark is kind of the toll you pay for having PhDs in everything from high-energy physics to behavioral economics on an Internet thread.
  Basically no one who sticks around on here is “dumb”, I don’t know you but I have a Bayesian prior that you’re probably pretty fucking smart, and GP was being a bit of a smartass, C’est la vie.
  I completely agree with your game plan of “let’s get a more substantial conversation going here”. It’s the right move.
  I think that we probably should, as a society, do a serious audit of the public school system curriculum to see if it still makes sense in light of the insane rate of change in the facts of life over the last century :)
  
  trention 3 years ago
  
  Logical fallacies don't necessarily equal stupidity though sometimes they do.
  A bunch of arguments about why AI would be bad have already been advanced. For the economic one, refer to Martin Ford. For the existential one, refer to Bostrom.
  
  bheadmaster 3 years ago
  
  Induction is not a logical fallacy, it's a basis for empiricism.
  
  trention 3 years ago
  
  It is a logical fallacy, read your Hume.
  
  bheadmaster 3 years ago
  
  > read your Hume
  No.
  Either discuss this with me or don't. Don't summon your sacred scrolls to do the arguing for you.
  > It is a logical fallacy
  Instead of just parroting my stance in response, I'll try to elaborate some more. Please either follow my example or don't reply to me at all.
  Logic itself cannot bring you knowledge about the world, as the concept of apriori knowledge is crackpipe bullshit. All knowledge is based on induction, even your knowledge that induction is fallible. Saying induction is a "logical fallacy" doesn't even make sense, since the purpose of induction is not to perform logical operations.
  
  trention 3 years ago
  
  >Please either follow my example or don't reply to me at all.
  I'll do whatever I want, especially when "communicating" (generous verb) with someone capable of writing the epitome of stupidity "the concept of apriori knowledge is crackpipe bullshit".
  
  bheadmaster 3 years ago
  
  > I'll do whatever I want
  What a juvenile attitude. I wasn't giving you orders, I was presenting my standards of communication. You can do whatever the hell you want, but don't expect other people to tolerate your obnoxiousness.
  You seem to get off on insulting people. I do not get off on being insulted, so I will refrain from further communication with you. Goodbye.
  
  fny 3 years ago
  
  > it doesn't matter how ethicists want "the general public" to use AI
  Something tells me these pricks will end up arguing for a reversion to thin-client compute. It is in their financial interest too after all.
  
  eurasiantiger 3 years ago
  
  You mean like the apps are all running on a server and we only have a client used to… browse them?
  
  CuriouslyC 3 years ago
  
  Except that web apps have become pretty thick, with most of the computation going on in the browser in many cases.
  
  pdntspa 3 years ago
  
  Reversion... we're already there! The only people using their computers to compute something more than a web browser are creatives and power users
  
  colordrops 3 years ago
  
  Are you referring to chromebooks?
  
  cheschire 3 years ago
  
  7 hours after you posted this, the top article on HN is about how cloud desktops aren’t that great, and the comments in there are predominantly supportive of that sentiment.
  Funny coincidence!
- dannyw 3 years ago
  
  Just because there are professionals doesn't mean we have to respect their arguments. There are people who get paid to be antivaxxers, doesn't mean we have to listen to them.
  "What have you done this week?"
theptip 3 years ago

I don’t know about this take. Remember all of the shitstorms that the NYT et. al. kicked up over biased/racist/sexist AI? Remember Tay?
I think if you are Google, you are terrified of the bad PR from someone generating something questionable. And that bad article is inevitable if you open up these models. (See, pornpen.ai being released approximately five minutes after StableDiffusion. Imagine the press if that was built from the mode Google published.)
An open source community is a diffuse target, so the NYT won’t go after them as quickly, and let's be honest, their axe to grind is with big tech, not a bunch of AI hackers.
- i_like_apis 3 years ago
  
  It’s better for them to not say anything about it when they don’t release the model. This is what Meta has done for the same technology and I respect that.
  They don’t imply any ridiculous idea that such models should or even can be “racially balanced”. But if they want to cover their butts from the possibility of silly controversy, I think that’s cowardly and unnecessary, but at least they could not go out of their way to imply that such controversy should be taken seriously.
atty 3 years ago

There is a clear risk from these sorts of models as they get better - I mean recreating specific individuals’ likenesses in compromising images (or even worse, video). We’re not at that point yet, but these things are getting better fast, so it’s only a matter of time. The problem is that there’s no way to mitigate those risks except to keep the model behind an inference-only API, or not release it at all - as soon as the model is open sourced then fine-tuning the model can introduce whatever behavior you want. Holding the models back is virtue signaling at best and actively harmful at worst, because it draws attention from individuals who will take it as a challenge to find ways to misuse them, and cuts businesses and the open source community off from models that would otherwise be very useful to them, that they could help to improve. I’m concerned this is becoming a self fulfilling prophecy, where more companies will start to do the same thing, primarily because it’s what everyone else is doing.
- sbierwagen 3 years ago
  
  >There is a clear risk from these sorts of models as they get better - I mean recreating specific individuals’ likenesses in compromising images
  And the risk behind that is...?
  If you drill down with such claims the core is always "someone might use this to lie online" and the proposed solution every single time is: more surveillance. End anonymity. Have a Facebook account required to use the internet. Real name and real face policies for every online interaction.
  
  atty 3 years ago
  
  I strongly suspect you’ve never been on the end of an internet doxxing/hate brigade if you can’t imagine how this could be used to make someone’s life a living hell.
  I’ll explain again that I think they can be used for bad actions, and also that they should still be released, because the benefits will outweigh the negatives. It does not hurt to admit that some things can be dangerous when used in nefarious ways. No one suggests we ban kitchen knives even though they are lethal, because their utility is massive, and outweighs their danger. In much the same way these models have extreme utility, that almost certainly outweighs their potential negatives.
  
  tomp 3 years ago
  
  > I strongly suspect you’ve never been on the end of an internet doxxing/hate brigade if you can’t imagine how this could be used to make someone’s life a living hell.
  Sounds like you're saying the even without advanced AI, online bullying is already somewhat harmful?
  So what exactly is the additional harm of AI?
  
  flycaliguy 3 years ago
  
  I totally agree. People often argue that photoshop has been around forever and so on. Creating sophisticated pornographic video of any individual is brand new. Creating an app that realistically removes clothing from any photo is new technology.
  I actually just now came up with an idea for a browser plug-in that removes clothing from every image loaded.
  
  zhynn 3 years ago
  
  The trolling terrifies me.
  Trolling someone by creating awful video (just think about how deeply, photo-realistically, awful it could be - porn is just the tip of the iceberg) is going to get really bad. I am not sure how this is going to shake out. The easiest will be video of famous people doing awful things. A little harder is doing a custom training on a particular person's likeness, and videos of that person doing awful things. That high-schooler. That child. It's not a happy idea. There should be severe consequences for deliberately making something like this with the intent to harass (troll).
  The fact is we have not even scratched the surface of classifying trolling as a real crime. I am less concerned with the tech (it's inevitable, hand wringing about it is not useful), and more concerned with the fact that we still have essentially no real consequences to this kind of harassment.
  I suspect that strong anonymity is incompatible with civilized life, since the few edgelords will always end up ruining it for the many. We have collectively decided that some amount of privacy must be sacrificed to live in a civilized place where you can address grievance (the subpoena must be served to someone). Surveillance is a weapon for tyranny, but I think that we need to flip the script. The relationship between tyranny and surveillance means we need better governments, not more anonymity.
  I also suspect we don't need to change anything except enforcement. I think trolls are a lot less anonymous than they think they are, since their opsec is typically nonexistent. It's just that we have no enforcers, and for some reason don't care. If I had a magic wand, I would convert the DEA wholesale over to dealing with online crimes (trolling, CP, trafficking, etc).
  
  concordDance 3 years ago
  
  "The relationship between tyranny and surveillance means we need better governments, not more anonymity."
  "Better governments" is not actionable, people have been wanting that since Socratese. Might as well wish for the second coming.
  The real answer is to just let people know about the fakery, then they'll stop believing every video and the trolls will be defanged.
  
  boilerupnc 3 years ago
  
  This. Like most tech misuse, there will be counters that will evolve to help reduce the negative impacts. There’s an emerging opportunity for the detection and identification of these faked images and videos. These may also require models to match the sophistication required to be effective validators. Perhaps the most cpu intensive models are only used as a shared societal resource on a particular tier of influencers and general compute detector models are available to all to run or tweak or rebuild. There would need to be a root to this validation trust tree, but we’ve figured this out before with cyphers and certs. Let the push and pull begin :-)
  
  eurasiantiger 3 years ago
  
  We already have laws against harassment and libel.
  
  TuringTest 3 years ago
  
  That's why the GP is calling for enforcing them.
  
  dreadlordbone 3 years ago
  
  Okay Ned Ludd. You're right that technology is only going to get more powerful. But hasn't this always been true? Who decides when it's over the line? The US government?
  They clearly aren't the best arbiters of judgement, so who gets to decide "sever consequences"?
  
  trention 3 years ago
  
  Yes, as a general rule the government decides when it's ok to forbid you from doing something. If the US government wants, it can make it a crime to train/use these models. Banning the training part is pretty much game over for this industry.
  I personally hope it happens as soon as possible. Intellectual property theft (without which those models don't exists) shouldn't be allowed.
  
  TimeBearingDown 3 years ago
  
  “Intellectual property” is a propaganda phrase that deliberately confuses copyright, trademark, and patent law in order to create the illusion that ideas can or should have owners. (See https://www.gnu.org/philosophy/not-ipr.en.html) I recommend specifically referring to the concept at hand, in this case copyright.
  Personally I disagree, as the models themselves do not contain any material which can be considered a copyright violation, the value of scraping the open web is easily apparent, the ability to prevent scraping wholesale - even given the legal framework to disallow it - seems dubious, and lastly because the collective potential harm caused by restricting just one or a few of the more arguably more ethical nations from this technology pathway is a known unknown, and possibly a very large one at that.
  
  echelon 3 years ago
  
  And we're going to get very used to it in short order.
  We may make laws that prohibit online harassment, and that should be the mechanism we use to deal with this. Not through technology bans.
  
  amadvance 3 years ago
  
  > browser plug-in that removes clothing from every image loaded
  Yep! Remove clothing and make-up to show everyone as they really are!
  You can call it the "ugly truth" plugin.
  
  uwuemu 3 years ago
  
  fassssst 3 years ago
  
  Watch the great BBC show “The Capture” for an entertaining look at one way this could plausibly be abused by governments.
  
  nuancebydefault 3 years ago
  
  > Have a facebook account required
  ROFL what a weird thing for any HN commenter to say
- dannyw 3 years ago
  
  This is not new. Imagine not releasing tools like curl or nmap because it can be used for hacking.
  The issue is, as an industry and society, we somehow bought the "safety" and "harm" charade a little bit too much, and somehow think it's a reasonable argument instead of being completely insane.
  
  atty 3 years ago
  
  I believe you misunderstood my post. (Or I have misunderstood yours) I was not arguing that the models should not be released. I was pointing out that the statement that no harm has been done with them was false, and that even though that is the case it is probably better for society, on the whole, for them to be open.
  We can both admit that the tools can and will be used for bad purposes, and come to the conclusion that their benefits outweigh the negatives. We would not be doing any favors to our own arguments by pretending otherwise.
  
  zarzavat 3 years ago
  
  Given OpenAI’s extreme restrictions on Dalle 2, can anybody point to any harm that has been done with Stable Diffusion in particular since it launched? Even a single instance. Because I have only seen strictly positive coverage of people having fun with it.
  
  needle0 3 years ago
  
  Personally I am strongly on the opinion of favoring openness, but since you asked, there was this: https://www.reddit.com/r/StableDiffusion/comments/xofxo3/a_j...
  The hoax was pretty quickly debunked, as the attempt was pretty crude. The images were full of artifacts and the image sizes were all 512x512 squares (the default image size for Stable Diffusion) with no attempt made to crop it to more common aspect ratios. So in terms of harm "done" I guess it was pretty minor, but I'm still leaving it out here since it made big enough of a commotion to make it to nationwide news stories.
  
  johannboehme 3 years ago
  
  so, don't trust photos you see online? That was i learned as a student 15 years ago when Photoshop was adapted by the masses. Nothing changed, just the tools got even easier to use.
  
  always2slow 3 years ago
  
  More like the "safety" and "harm" charade was crammed down our gullets at every turn either by stick or carrot.
- carlosdp 3 years ago
  
  > There is a clear risk from these sorts of models as they get better - I mean recreating specific individuals’ likenesses in compromising images (or even worse, video).
  This has been possible without AI for a very very long time now (just open photoshop, etc). It barely ever happens, and society hasn't collapsed.
  I keep seeing this argument come up and it baffles me that informed technologists take it seriously, as if it were impossible to convincingly manipulate images before DALL-E came around.
  
  atty 3 years ago
  
  There is a difference in ease of use. I could never use photoshop to fake something like that even if I wanted to.
  Further, we have seen harm come from some of this already, there’s a pretty big online community that uses deepfakes to put people in situations they would rather not be in, the most obvious being porn.
  
  carlosdp 3 years ago
  
  > There is a difference in ease of use. I could never use photoshop to fake something like that even if I wanted to.
  You couldn't, but basically any VFX shop easily could. Point is, it doesn't make anything possible that wasn't already possible, it just makes it more accessible. That's an inevitability with technology, as time goes on. The counter is not to try and suppress it, that has never worked and never will.
  
  derefr 3 years ago
  
  What motivation would a VFX shop have to do such a thing, though? (Money, sure, but what's a motive strong enough to be worth commissioning them?)
  It's always individuals who want to harm others in this particular way; and individuals don't throw around big-VFX-project amounts of money on petty revenge. But they'd certainly spend $20.
  DDoS attacks got a lot (1000x) more commonplace once there were DDoS services that let you buy an hour of attacking someone for $20. Same idea here.
  
  chii 3 years ago
  
  and you can easily just drive your car into somebody to kill them.
  DDoS services are almost always purely malicious (you can _may be_ argue that you can use them for pen-testing or load-testing). But cars are not just purely malicious; there's a lot of useful things cars can do, and that's why the dangers of car ownership is outweighed by the benefits, as judged by society - we just have some road rules, and licenses, so that people know to use them responsibly.
  Why not the same with an AI model?
  
  johannboehme 3 years ago
  
  we are talking pictures atm. not videos. I bet you could get as good as SD with two weeks of free time, lots of youtube tutorials and with a photoshop license.
  
  XorNot 3 years ago
  
  Deepfakes have been very accessibly possible since 2019. Corridor did essentially the entire pipeline then: https://m.youtube.com/watch?v=3dBiNGufIJw&vl=en
  You can download and run this software right now: http://faceswap.dev/ and it will do a better job, and do it on video, then any AI image generator.
  The technology is over 3 years old and the world hasn't ended, the harassment hasn't happened. It's so common that your phone runs it for Instagram.
  There's this whole narrative here that "this harm is new" and not only is it not new, it's not even better then what we already had.
  
  Geee 3 years ago
  
  Deepfakes are just an extension of imagination. We can already imagine people in any situation, making an image of it doesn't cause any more harm.
  
  zakki 3 years ago
  
  I don’t know why you put image and imagination in the same place. It is different. Imagination is the result of thinking. Image is the result of action. We can imagine killing people. Nobody harmed. But killing people in real life?
  
  lolinder 3 years ago
  
  Image is not the result of action, it's the projection of imagination into the physical world.
  In your example, the correct parallel isn't killing people in real life, it's making an image of killing people. The ethics of making such images are debatable, but they already permeate our society without AI.
  
  taylorius 3 years ago
  
  How do you know? The whole point of a deepfake is that it is indistinguishable from a photo of a real life event.
  
  matheusmoreira 3 years ago
  
  I can imagine a person killing other people. I can draw a person killing other people. I can make a computer draw a person killing other people. In all three cases, zero people were harmed.
  
  TuringTest 3 years ago
  
  But if you if you spread the image in a forum of haters, you may incite someone to kill that person. That's why there are laws against harassment. The danger is not the existence of an image, it is the act of communication that has consequences.
  
  BoxOfRain 3 years ago
  
  I quite like the concept of information hazards for reasoning about these kinds of risk.
  
  matheusmoreira 3 years ago
  
  They say music can alter moods and talk to you. Well, can it load a gun up for you and cock it too? Well, if it can, then the next time you assault a dude, just tell the judge it was my fault and I'll get sued!
  
  chii 3 years ago
  
  hate crime is already a crime. If someone were to want to incite hate, they could do it today just as easily as a hypothetical future where an AI generated image is readily available.
  So the problem isn't the AI, it's the forum of haters. Restricting AI usage, in the hope of not having someone use it to incite hate is too roundabout a way to achieve any significant result, while everybody pays a high cost (of not being free to use such an AI as they see fit).
  
  scarmig 3 years ago
  
  I'm going to take an unpopular position: there's no harm being done. It's not putting people in positions they would rather not be in, but putting their likeness into those situations. It's a key difference: if someone makes a paper mache of a naked Trump to use in a protest, is harm being done to him?
  The idea that it's "doing harm" is simply inventing a new form of lèse-majesté. Verbally, we regularly do the same: we might take a signifier for someone and place it in a representation. "Rick likes to fuck goats every day." Have I done harm to Rick?
  
  rgmerk 3 years ago
  
  Be serious for a moment.
  If I circulated a convincing-looking video of Rick fucking goats to his parents, partner and his boss at the school where he works, that could easily do considerable harm to Rick.
  
  origin_path 3 years ago
  
  How? Rick says I did not do this, it's an ai fake and I'm being harassed. At that point, assuming people believe Rick, then it's likely he'll receive sympathy and support rather than considerable harm.
  So there seems to be an implicit assumption here that the risk is faked material where people don't believe it's fake, for some reason. And the fix for that would be to ensure that the easy to use versions of generators are watermarking or otherwise recording what they made, so it's easy to find out if something was faked. That doesn't help of course if you're up against a programmer who can make awesome deepfakes locally with open source software and a great GPU but then we're back to the debate about costs because of course, if you to against well funded experts they could already do this sort of thing. In reality it doesn't happen.
  
  melagonster 3 years ago
  
  but when everyone see him, there is always the photo in their brain. this is about goat, but they fake pedo photo? Rick will lose his job first.
  
  concordDance 3 years ago
  
  That's true only for the first few hundred Ricks. Then people wide up and ignore the child rape videos.
  It's inevitable and frankly a hundred Ricks is a price worth paying.
  
  fassssst 3 years ago
  
  You just described the “boy who cried wolf” dilemma…
  
  inkblotuniverse 3 years ago
  
  Then Rick ought to release the video of his boss fucking that zebra!
  
  melagonster 3 years ago
  
  ok, in future, when someone reach 18 years old, government will generate video about that guy fuck everything, from stone to zebra, then publishing to internet. no one can hurt by deepfake anymore.
  
  johannboehme 3 years ago
  
  so, we just need better digital competence?
  
  zarzavat 3 years ago
  
  I don’t think that is a good example, because the harm in that case is the embarrassment and you can achieve the same results in 10 minutes with any image editing software.
  For AI image generation to cause harm specifically, the harm has to be consequent to the additional realism.
  IMO most of the harm from AI is likely to come from people not believing things that are real, and dismissing reality with “that’s just a deepfake”.
  
  Morgawr 3 years ago
  
  > the harm in that case is the embarrassment and you can achieve the same results in 10 minutes with any image editing software
  The harm is not the "embarrassment" of seeing someone in the likeness of yourself (or your son, your friend, your partner, etc) doing something shameful. The harm is the fact that people are very likely to believe it is true and it's not a fake obviously edited photo or video.
  You can disagree on the seriousness of the harm or risk or danger or whatever but I think the distinction between an obviously silly/embarrassing fake (a puppet, papier mache, badly done photoshop picture) and a realistic convincing deepfake video is pretty obvious. They aren't even in the same ballpark.
  > IMO most of the harm from AI is likely to come from people not believing things that are real, and dismissing reality with “that’s just a deepfake”.
  This is also a really good point and I agree it's a danger.
  
  rgmerk 3 years ago
  
  Further to this point, there is plenty of collected evidence of the harms this kind of image-based abuse, of both real (sometimes coerced or often surreptitiously taken) and faked images:
  See, for instance, this study of "sextortion" in minors:
  https://respect.international/wp-content/uploads/2020/06/Sex...
  
  alphabetting 3 years ago
  
  Firstly, I think your position is pretty popular given thread.
  At your point, I think it's worth considering widespread acceptance of ridiculous ideas that currently exist (amount of people who believe articles from The Onion for example). There's no harm there but when the content is convincing video being used by nefarious actors I think you could make argument potential for harm is real, especially given the media content bubbles on both sides that people have segregated to in social media age.
  
  czzr 3 years ago
  
  If people believe you (not even everyone, but the right people at the right time) then you have done harm to Rick.
  
  SPDurkee 3 years ago
  
  Yes, you have committed SLANDER/LIBEL and Rick can sue you for damages.
  
  BoxOfRain 3 years ago
  
  People seem to forget that people as early as Joseph Stalin was all about manipulating images for unpleasant reasons, this isn't a new phenomenon at all.
- whywhywhywhy 3 years ago
  
  > There is a clear risk from these sorts of models as they get better - I mean recreating specific individuals’ likenesses in compromising images
  Been possible on home computers for 31 years for anyone who actually wants to do it. It literally doesn't matter and I think Stability has proven that the "AI Ethics" part of these models was essentially meaningless busy work at best, and at worse stealing compute credits from users like Dall-E purposefully charging you with something you didn't ask for.
  Once every home computer can make the fake images AI Ethicists larp about then the power of fake images disappears because everyone knows not to trust them. It only has power if only a few can make them and never told the world it was even possible.
  
  TuringTest 3 years ago
  
  Nobody should trust faked quotes attributed to celebrities or ethnic groups in Twitter or Whatsapp, yet these fakes have ruined political campaigns and led to mass killings. Society doesn't adapt in the way you suggest by people disregarding what you think is irrelevant.
  
  whywhywhywhy 3 years ago
  
  > Society doesn't adapt in the way you suggest by people disregarding what you think is irrelevant.
  It's not what I think, it's what has been proven in the 32 years since photoshop was invented. What killings are you talking about that were caused by the existence of image manipulation?
  
  TuringTest 3 years ago
  
  You are looking at this with a Western perspective. Yet not everybody connecting to the internet has had access to computers for 30 years.
  https://www.bbc.com/news/blogs-trending-45449938
  > "A SIM card was about $200 [before the changes]," she says. "In 2013, they opened up access to other telecom companies and the SIM cards dropped to $2. Suddenly it became incredibly accessible."
  > "People were immediately buying internet accessible smart phones and they wouldn't leave the shop unless the Facebook app had been downloaded onto their phones," Mearns says. Thet Swei Win believes that because the bulk of the population had little prior internet experience, they were especially vulnerable to propaganda and misinformation.
  This means that large amounts of people are not internet-savvy enough to spot fakes and know not to trust them.
  
  raxxorraxor 3 years ago
  
  Don't try to use open internet platforms for your political campaigns. I think "feedback" in that case is justified. People defended themselves against propaganda. The campaigns have ruined themselves.
  I am not sure about mass killings. Some say that Facebook enabled a genocide in Myanmar, but I think that is a false hypothesis. It was used as a platform by conflicting parties, sure, but it wasn't the reason for the conflict.
  Google has strong governmental ties right now, so opposing any of their messages regardless of content seems sensible. Without backlash it would just fortify the situation right now so it has to be costly for both Google and political parties. Google lost a lot of trust in recent years, sadly that is not true for their market influence. Perhaps they put out these message because government contracts require it, the user wouldn't know because that is not transparent.
- bufferoverflow 3 years ago
  
  > I mean recreating specific individuals’ likenesses in compromising images (or even worse, video). We’re not at that point yet
  Yes, we are. Open source stable diffusion can be trained on any person's images, as long as you have around 20 from different angles. Costs around 50 cents on rented GPUs.
  https://www.youtube.com/watch?v=Sqeo3oDP6Qg
  https://www.youtube.com/watch?v=7m__xadX0z0
  
  hda2 3 years ago
  
  And people will get accustomed to (i.e. not take seriously) these new AI-generated images like they have with photoshopped images.
  "Ethicists" act like society will somehow not adapt to this tech like they have with all the tech that came before it. I put ethicists in quotes because the arguments they use don't hold up to scrutiny and don't seem to be motivated by real ethical concerns. At least not to me.
  
  mekkkkkk 3 years ago
  
  I think your are right about society adapting to this tech, and I don't think these AI technologies can or should be contained.
  But. The Photoshop argument is a bit tiresome. AI will bring about a fundamental shift in content creation and it will disrupt how we treat media as a whole.
  Photoshop and other manual technologies are naturally gatekept by the required skill, effort and source images. Once AI media generation matures, all that goes out the window. Anyone will be able to convincingly fake anything with almost no effort and zero traceability.
  That shouldn't be downplayed. The concerns are real.
- matheusmoreira 3 years ago
  
  These "risks" only exist because people somehow came to believe they actually have control over their "likeness". They don't. They never had. It's an illusion. The only risk here is exposing this for the lie it is.
  Maybe once it's trivially easy to copy someone else's "likeness", society will finally be able to accept it and evolve past it.
  
  echelon 3 years ago
  
  Well said.
  Biological twins have never had control over this. If one twin wants to be a porn star, there's nothing the other can do.
  Edit: one could imagine a future dystopia where clones are created to bypass "identity IP".
- i_like_apis 3 years ago
  
  I agree. Convincing fake content will eventually make us doubt our own history, let alone news media and current events. That concerns me, but I don’t think holding this tech back helps.
  But those aren’t the issues they claim are concerning to them. It’s just stupid identity politics. They want their model to lie to us about the world and say things like everyone is equally likely to any attribute. They have a “reality” problem apparently.
- colordrops 3 years ago
  
  > recreating specific individuals’ likenesses in compromising images (or even worse, video). We’re not at that point yet
  Yes we are. There have been papers coming out on this tech for years now, with even the south park people doing videos using it.
  https://www.youtube.com/watch?v=9WfZuNceFDM
infoseek12 3 years ago

Google has done incredible work on basic machine learning research that has enabled other individuals and organizations to do amazing things. In terms of actually implementing new machine learning technology, they've pretty much hobbled themselves to the point of irrelevance. In some ways, it may be for the best that they've ensured that the future of machine learning will be written primarily by those who hold opposing views.
- alphabetting 3 years ago
  
  My sense has been that Google and Deepmind ML has been pretty ingrained across the board in Google services. If they're still producing the the most advanced AI research, I don't see why that wouldn't be introduced into future products as well.
  
  infoseek12 3 years ago
  
  I'm sure machine learning has already been introduced in one way or another in almost all of Google's services. But the implementations are mostly in the backend and enhancements like better recommendations that don't jump out as incredible leaps in artificial intelligence. They are almost exclusively incremental rather than radical innovations, quantitative and not qualitative improvements.
  In terms of machine learning technology that introduces truly novel innovations Google's product portfolio is notable barren. For instance the incredible powerful potential for image generation these new diffusion models open up, who's models will the world use to explore the potential and start using this technology? Google's model with the intense, though imperfect, effort that goes into addressing questions of bias and abuse? Or the model bankrolled by an ex hedge fund manager who probably put a bit less thought into addressing these questions?
  
  alphabetting 3 years ago
  
  Distribution is key. I've used some mind blowing betas from AI LLM startups recently who just put disclaimers on potential content issues. They are amazing and don't get a ton of use. The fact Google has seemingly the best product (just not releasing until they're ready) and over 3 billion users makes me think getting the world to use won't be an issue.
mola 3 years ago

1. How do you know there will be no harm or is no harm? These issues are not manifesting in a timeline of a few weeks. 2. Why do you think the ethical reasoning is disengenious in Google's case? Even in OpenAI case it could be that originally the ethicist won some battles where business ppl eventually prevailed.
3. Why do you use quotes around a something which is your original phrasing? That's pretty disengenious.
4. What's wrong with affirmative action ? It's easy to argue that it has both utilitarian and other moral adventages. I won't claim it is always warranted or the right thing to do, but it definitely not an obvious consensual evil.
- AbrahamParangi 3 years ago
  
  Affirmative action (like google’s moral imperatives here) are fairly unpopular, so many view their actions as a kind of encroaching cultural imperialism. When Google talks about “safety” it means enforcing a very specific set of beliefs that frankly a large majority of people disagree with.*
  It isn’t impermissible for Google to do this, but nobody has to like it, or agree with it.
  *for instance, affirmative action is disapproved of by 70-80% of Americans and couldn’t win on a ballot in California in 2020 which is pretty exceptional.
- i_like_apis 3 years ago
  
  It not only isn’t technically feasible, it also isn’t rational, or even ethical to be portraying a warped version of the world.
  What they want is for results of “software engineer” to be equally likely to show black females. This is not fair to Eskimos and Aboriginals. And what about the mentally handicapped? Is it not unfair that people with Downs Syndrome are not Wall Street stock brokers? How are you going to find all of these “affluent” categories and claim to be able to balance them?
  And are you going to claim racism again when to ask for prison inmates and you don’t find any Asians? Should you start putting latent space Asians in latent space prisons?
  Because this is a main harp of the social justice “ethicists” - that if you ask these models for “gang member” you get “People Of Color!!” … as if they simply don’t understand the statistics of situation. How would you even “solve” that? Should you decide when and where certain ethnic groups should be taken down a peg?
  Latent space affirmative action is technically absurd, an completely ironic as “ethical” behavior.
stared 3 years ago

It is not a matter of ethics. Do you think it is the priority for such corporations?
It is a matter of PR. It takes a single "problematic" generated content to be framed as "Google is sexist/racist/supports animal abuse", etc.
Statements related to ethics help in a few ways: holding secrets ("oh, we would love to share the models, but we cannot"), protecting against backslash (PR-wise, legal-wise), and PR on its own ("we are that ethical - see! it is even in our mission statement").
westhom 3 years ago

I always read statements like “sorry we can’t release this ground breaking technology to the public, you simply can’t handle its power and repercussions” as the AI industry low key flex.
capitalsigma 3 years ago

Wrongfully or not, people blame Facebook for inflammatory content posted by humans on their platform. How much worse would it be if that content was generated with FB-trained models?
- Gigachad 3 years ago
  
  Facebook is a recommendation engine. The problem isn't so much the content, its that facebook chooses to show it to people who did not actively seek it out. You never see people complain at Chrome for showing the content or nginx for hosting it.
  Recommendation engines are more responsible than basic infrastructure.
  
  capitalsigma 3 years ago
  
  So you would say that if FB fails to censor a video titled "vaccines cause autism" that drives engagement, they are more morally culpable for the content than if Google spends TPU cycles rendering the Imagen prompt input: "detailed video about why vaccines cause autism, scientific, realistic, in the style of a public health announcement"
  ?
  
  Gigachad 3 years ago
  
  IMO facebook should stop "engagement based" recommendation engines until they have the ability to stop them being abused. Change the platform to show chronological posts from things users have subscribed to. They could perhaps have a curated selection of content that FB employees have screened to be good for general distribution.
  There is a world of difference between someone manually seeking out and subscribing to a misinformation source than FB automatically suggesting it to them.
  
  capitalsigma 3 years ago
  
  That is definitely one opinion that you can hold about Facebook but it's unclear to me how it relates to this post about Google's new generative model
  
  Gigachad 3 years ago
  
  I don't think google is responsible for the content created any more than Windows is for running the program or your ISP is for serving it. Promotion and discovery are the places where moderation and responsibility come in.
  
  raxxorraxor 3 years ago
  
  They did ban such videos and now people take videos like this seriously. Don't be so naive. This is a PR message, nothing else.
  The whole banning spree did more damage to vaccine acceptance than flat earther and lizard people together. It isn't even comparable. Because of course they ended up censoring legitimate criticism and scientific data. That was inevitable, no matter how well intended.
  Now they made themselves unreliable because someone on the internet was crazy.
- dannyw 3 years ago
  
  People blame Facebook for their intentional, and continued choice to use algorithms that amplify and surface inflammatory content.
  
  capitalsigma 3 years ago
  
  I think it's a hard problem and I'm not sure what the right solution is; clearly extremism is a problem but I can't say I'm 100% happy with Facebook being the final judge of Truth.
  Regardless, though, it is unambiguous that FB's role in "making" problematic UGC is much less direct than Google's role in making Imagen outputs.
  
  _joel 3 years ago
  
  It's amazing isn't, how much crap can get spouted on the internet nowadays
ehsankia 3 years ago

Ironically nearly all ML demos like Stable Diffusion are setup and run for free on Google's Colab, so to claim they don't want to help the field is a little silly.
- ekianjo 3 years ago
  
  Stable diffusion can run anywhere
  
  ehsankia 3 years ago
  
  I didn't set it can't, I said the majority of people who have limited skill set with ML and troubleshooting it end up using the Colab version since it's much more straightforward and easy for average people to get started with.
  And that's for all ML stuff, Colab has lead to a huge democratization of ML, with notebooks setups for basically any cool demo you see out there.
- waffletower 3 years ago
  
  Stable Diffusion is severely limited in the memory constrained Google Colab context.
fassssst 3 years ago

Google is a bigger lawsuit target since they have more $, probably simple as that. I’m not a lawyer but I’m sure there will be many copyright challenges.
mrinterweb 3 years ago

It is probably more of a liability concern where Google doesn't want a headline starting with: "Google created a video of ____ (horrific thing)"
raxxorraxor 3 years ago

This was never about ethics, this was only about control. Of language, behavior and they need these stories for plausible deniability. They are advertisers and know what they are doing.
thweriuo234234 3 years ago

Imaginably, if the model produced sufficient propaganda against Hindus, the model would be feted for its social-justice creds.
johndfsgdgdfg 3 years ago

There is not ehtical concern. Google will shut it down regardless.
make3 3 years ago

I think it's actually about brand risk
macrolocal 3 years ago

Scrappy startups soak up all the liability; Google retains its technical edge and earns some goodwill.
- ekianjo 3 years ago
  
  good will? are we talking about the same Google?
  
  macrolocal 3 years ago
  
  Yes, Google has billions in goodwill:
  https://en.wikipedia.org/wiki/Goodwill_(accounting)
  https://www.stock-analysis-on.net/NASDAQ/Company/Alphabet-In...
  Looks like typical MBA craft to me.
jquery 3 years ago

You call it "concern trolling", I call it responsible research. It's not "social justice" to be concerned about the ability of any entity to make propaganda videos for virtually free. "Ok Google, produce a CDC-type public information video about how vaccines cause autism and disability".
Not that I would have a complaint if social justice was the sole thing keeping it from being released. Facebook managed to cause genocides by being careless.
- colordrops 3 years ago
  
  That would make sense if Google were somehow in a deserved position of authority to decide who is allowed access and what it's used for, rather than an advertising company with a heavily skewed bias that doesn't necessarily take the public good into account.
- Guid_NewGuid 3 years ago
  
  I agree with you and was saddened to see such a comment rise to the top.
  I don't hold that GPs opinion is wrong, in fact I have no firm views yet on the AI and would generally lean towards stuff being made available even when harmful.
  But the idea that people in the field of AI Ethics are all some woke SJW cabal designed to keep Google powerful is for the birds. Like all industry adjacent fields I'm sure there's some corporate capture of research in the field, but maybe, engaging with things in good faith, there are ethical questions about a powerful new technology that can replicate biases in its training data at unprecedented speed and quality?

fzysingularity 3 years ago

What's next? Dreamfusion Video = Imagen Video (this) + Dreamfusion (https://dreamfusion3d.github.io/)

Fundamentally, I think we have all the pieces based on this work and Dreamfusion to make it work. From the looks of it, there's a lot of SSR (spatial SR) and TSR (temporal SR) going on at multiple levels to upsample (spatially) and smoothen (temporally) images that won't be needed for NERFs.

What's impressive is the ability to leverage billion-scale image-text pairs for training a base model that can be used to super-resolve over space and time. And that they're not wastefully training video models from scratch, and instead separately training TSR, SSR models for turning the diffused images to video.

stillsut 3 years ago

I think 3D is going to be really important because I see Generative-AI as the killer app for VR/AR.
As it stands, it's very difficult to invest the budget for a dev studio (dozens of high skill people) to build a "VR movie" when the format is so unknown and unpopular. But with generative AI, an indie dev could create their own professionally produced virtual world movie. It's these creatives and risk takers that will find what types of things VR needs to become more popular.
bredren 3 years ago

This direction will provide the visuals but what also must be brought in is a language model and text to speech (TTS) so that you may talk and interact with these things.

BoppreH 3 years ago

It's interesting that these models can generate seemingly anything, but the prompt is taken only as a vague suggestion.

From the first 15 examples shown to me, only one contained all elements of the prompt, and it was one of the simplest ("an astronaut riding a horse", versus e.g. "a glass ball falling in water" where it's clear it was a water droplet falling and not a glass ball).

We're seeing leaps in random capabilities (motion! 3D! inpainting! voice editing!), so I wonder if complete prompt accuracy is 3 months or 3 years away. But I wouldn't bet on any longer than that.

tornato7 3 years ago

In my experience with stable diffusion tools, there is some parameter that specifies how closely you would like it to follow the prompt, which is balanced with giving the AI more freedom to be creative and make the output look better.
- BoppreH 3 years ago
  
  Yes, that might be the case. Though the prompts don't seem to try showcasing model creativity, so I'd be surprised if Google picked a temperature so high that it significantly deviated from the prompt so often.
educaysean 3 years ago

The way I see it, input being confined to a "text description" is the next immediate problem that needs to be solved. I don't think we can rely on textual inputs for much longer as the human language is too imprecise and/or verbose. It's hard to imagine what exactly the optimal interface would be, but I'm thinking we'll need ways to dictate attributes for each entity being represented, the backdrop, and the view composition all as separate individual components. Ideally all these components can also be reusable and provide reproducibility guarantees without having to share a global "seed" as well.
- nicd 3 years ago
  
  A "mood board" of inspiration images could be an interesting input method. With deep vision models, we can already separate different "levels" of concepts: a high level subject ("person riding a horse"), textures (the horse hair), medium (painting vs 3d rendering vs photograph), etc. It'd be interesting to have a "smart mood board" that goes from text prompt, to visualizing that hierarchy with different options. Then the user could interactively increase or decrease different parameters, ultimately iterating alongside the computer to realize their creative vision.
  
  ReactiveJelly 3 years ago
  
  That one person already used the other half of the auto-encoder to go from an image of "ugly Sonic" from the Sonic The Hedgehog movie, to a special "{sonic}" token, and then they could put that into prompts.
  It is not far off.
- pishpash 3 years ago
  
  Like an AI-assisted photoshop, but that's not restricted by language, only interactivity. Down the line you'll need a direct mind meld because some ideas don't have words to describe them, but that's not the "next immediate" problem.
  
  ihuman 3 years ago
  
  People quickly created photoshop plugins to integrate stable diffusion when SD came out
  https://twitter.com/wbuchw/status/1563162131024920576

naillo 3 years ago

Probably only 6 months until we get this in stable diffusion format. Things are about to get nuts and awesome.

gamegoblin 3 years ago

Emad (founder of Stability AI) has said they already have video model training underway, as well as text and audio. Exciting times.
- rch 3 years ago
  
  And copilot-like code, possibly Q1 2023.
  
  moyix 3 years ago
  
  Salesforce CodeGen (particularly the 16B-multi and 16B-mono models) is pretty good already and can be used with FauxPilot [1] to get an open Copilot-like experience with local compute :) I am also very excited about the upcoming BigCode project though, which is maybe what you're thinking of?
  Disclaimer: I am naturally biased since I made FauxPilot ;)
  [1] https://github.com/moyix/fauxpilot
  [2] https://www.bigcode-project.org/
  
  RosanaAnaDana 3 years ago
  
  "Generate the code base for an advanced diffusion model that can improve on the code base for an advanced diffusion model"
  
  danuker 3 years ago
  
  The road to Grey Goo is paved with artificial general intelligence.
  
  TaylorAlexander 3 years ago
  
  Oh no you forgot the important term “but do NOT start turning the universe in to paperclips”.
- ItsMonkk 3 years ago
  
  Is this going to end up into a single model, where its trained on text and images and audio and videos and 3d models, and it can do anything to anything depending on what you ask of it? Feels like the cross-training would help yield stronger results.
  
  minimaxir 3 years ago
  
  These diffusion models are using a frozen text encoder (e.g. CLIP for Stable Diffusion, T5 for Imagen), which can be used in other applications.
  StabilityAI trained a new/better CLIP for the purpose of better Stable Diffusions.
  
  CuriouslyC 3 years ago
  
  Probably not. We're actually headed towards many smaller models that call each other, because VRAM is the limiting factor in application, and if the domains aren't totally dependent on each other it's easier to have one model produce bad output, then detect that bad output and feed it into another model that cleans up the problem (like fixing faces in stable diffusion output).
  The human brain is modularized like this, so I don't think it'll be a limitation.
m00x 3 years ago

Isn't Imagen a diffusion model?
From the abstract: > We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models
- gamegoblin 3 years ago
  
  "Stable Diffusion" is a particular brand from the company Stability AI that is famously open sourcing all of their models.
  
  fragmede 3 years ago
  
  Pedantically, Stable Diffusion v1.4 is the one model where weights were open sourced and released. Stable Diffusion v1.5, announced September 8th and live on their API, was to be released in "a week or two" but still has yet to be released to the general public.
  https://discord.com/channels/1002292111942635562/10022921127...
  
  cercatrova 3 years ago
  
  Even more pedantically, SD weights are in fact not open source, they're under a source available license.
  
  zarzavat 3 years ago
  
  * If weights are copyrightable in your jurisdiction (who knows!)
  
  schleck8 3 years ago
  
  SD 1.2 and 1.3 are open source too
J5892 3 years ago
- naillo 3 years ago
  
  jarvis render a video of nutsome cream spread on a piece of toast 4k HD

seanwilson 3 years ago

Can anyone comment on how advanced https://phenaki.video/index.html is? They have an example at the bottom of a 2 minute long video generated from a series of prompts (i.e. a story) which seems more advanced than Google or Meta's recent examples? It didn't get many comments on HN when it was posted.

alphabetting 3 years ago

Phenaki is also from Google and they say they are actively working on combining them
https://twitter.com/doomie/status/1577715163855171585
kuu 3 years ago

It seems to be a paper for 2023 conference:
"Under review as a conference paper at ICLR 2023"
So I would say it looks pretty advanced, however they don't use a Diffusion model to generate the images, but an "image conditional video generation", another different approach.

azinman2 3 years ago

> However, there are several important safety and ethical challenges remaining. Imagen Video and its frozen T5-XXL text encoder were trained on problematic data. While our internal testing suggest much of explicit and violent content can be filtered out, there still exists social biases and stereotypes which are challenging to detect and filter. We have decided not to release the Imagen Video model or its source code until these concerns are mitigated.

The concerns cannot be mitigated. The cat's out of the bag. Russia has already used poor quality deep fakes in Ukraine to justify their war. This will only become bigger and bigger of an issue to the point where 'truth' is gone, nothing is trusted, and societies will continue to commit atrocities under false pretense.

spupy 3 years ago

> [...] there still exists social biases and stereotypes which are challenging to detect and filter.
If Google filters them, wouldn't the result be still biased and stereotyped, just along Google's biases? "I reject your biases and substitute my own!"
gundmc 3 years ago

Speculation, but I think the most straightforward read of that statement is not about preventing this type technology from negatively impacting society broadly (since as you and others pointed out, there are numerous similar actors creating similar systems), but Google doesn't want to the bad publicity or legal risk of problematic outputs of their models. I think they're terrified to be honest.
- azinman2 3 years ago
  
  I worry that everyone across academia and industry are creating these models as if they’re burning needs without any recognition of what’s happening collectively. It’s not neutral at all, yet everyone involved seems to think they can put in an “ethics” paragraph to absolve themselves. As we’ve seen in the last 20 years with technology (and certainly many technologies in the last 200 years before that), it simply isn’t true.
dogcomplex 3 years ago

Cryptographic trust (combinations of identity proofs, including passport, passwords, social and family networks of vouching, fingerprints, behavioral data, etc) will be the only way to trust any digital information very soon. If it's not vouched for by someone with proof that they're a real person, it's fake.
- azinman2 3 years ago
  
  That’s not how human nature works. Show a video that fits cognitive bias, and the non-technical non-sophisticated people of the world will believe it. And that’s assuming it’s even technically feasible to solve, which it likely isn’t.
  
  inkblotuniverse 3 years ago
  
  Give it a generation, and the new norm will be that video on the internet is as likely to be a lie as text.
  
  virgildotcodes 3 years ago
  
  Yet people today still whole heartedly believe so many lies communicated in text.
  
  azinman2 3 years ago
  
  And don’t forget that people who benefit from lies will adapt as well.
  There’s no escaping, but when you put something visual in front of someone your brain wants to believe it.
  Basically all these models are informational nuclear weapons being created, blueprints and all, and with distributed implementations mean anyone can and will use them.
ImHereToVote 3 years ago

Is there any evidence to support the hypothesis that the Russian Federation has been behind the deep fakes? Or is it a case of the good old "common it's obvious, common, commooon"?
- orloffm 3 years ago
  
  The thread starter probably doesn't even know what he's referencing.
- azinman2 3 years ago
  
  They’re deep fakes of the Ukrainian president, obviously of poor quality, spoken in Russian and telling Ukrainian solders to put down their arms, and right now there’s only one country waging war against them/him. Would you like Putin himself to say he did it?
hatenberg 3 years ago

Someone shot someone with a cheap gun, the cat is out of the bag, gun regulation is pointless, let's let the assault rifles go free is the most american thing I've read all day.
- concordDance 3 years ago
  
  You can't copy paste guns and distribute them for free on the internet. At least not yet.

mkaic 3 years ago

And there you have it. As an aspiring filmmaker and an AI researcher, I'm going to relish the next decade or so where my talents are still relevant. We're entering the golden age of art, where the AIs are just good enough to be used as tools to create more and more creative things, but not good enough yet to fully replace the artist. I'm excited for the golden age, and uncertain about what comes after it's over, but regardless of what the future holds I'm gonna focus on making great art here and now, because that's what makes me happy!

lucasmullens 3 years ago

> fully replace the artist
I doubt the artist would ever be "fully" replaced, or even mostly replaced. People very much care about the artist when they buy art in pretty much any form. Mass produced art has always been a thing, but I'm not alone in not wanting some $15 print from IKEA on my wall, even if it were to be unique and beautiful. Etsy successfully sells tons of hand-made goods, even though factories can produce a lot of those things cheaper.
- visarga 3 years ago
  
  I think the distinction between creating and enjoying art is going to blur, we're going to create more things just for us, just for one use, creating and enjoying are going to be the same thing. Like games.
- threads2 3 years ago
  
  Thanks for validating my hatred of those IKEA paintings lol. Close-up zebras, black and white picture of Amsterdam with a red bicycle...
amelius 3 years ago

Don't worry. If you can place eyes, nose and mouth of a human in a correct relative position and thereby create a symmetric face that's not in the uncanny valley, you are still lightyears ahead of AI.
- Jaxkr 3 years ago
  
  Have you tried the latest Stable Diffusion? Especially with GFP-GAN the faces can come out flawless.
  I’d also take a peek at https://lexica.art/. Lots of very high quality output from SD.

dagmx 3 years ago

I’ll be honest, as someone who worked in the film industry for a decade, this thread is depressing.

It’s not the technology, it’s all the people in these comments who have never worked in the industry clamouring for its demise.

One could brush it off as tech heads being over exuberant, but it’s the lack of understanding of how much fine control goes into each and every shot of a film that is depressing.

If I, as a creative, made a statement that security or programming is easy while pointing to GitHub Copilot, these same people would get defensive about it because they’d see where the deficiencies are.

However because they’re so distanced from the creative process, they don’t see how big a jump it is from where this or stage diffusion is to where even a medium or high tier artist are.

You don’t see how much choice goes into each stroke, or wrinkle fold , how much choice goes into subtle movements. More importantly you don’t see the iterations or emotional storytelling choices even in a character drawing or pose. You don’t see the combined decades, even centuries of experience, that go into making the shot and then seeing where you can make it better based on intangibles

So yeah this technology is cool, but I think people saying this will disrupt industries with vigour need to immerse themselves first before they comment as outsiders.

colordrops 3 years ago

The term "creative" is so pretentious, as if only content generation involves creativity.
Your post reminds me of all the photographers that said digital photography would remain niche and never replace film.
The current models are toys made by small groups. It's not hard to imagine AI generated film being much more compelling when the entire industry of engineers and "creatives" refine and evolve the ecosystem to take into account subtle strokes, wrinkles, movement, shots etc. And they will, because it will be cheaper, and businesses always go for cheaper.
- dagmx 3 years ago
  
  Why is it any more pretentious than “developer” or “engineer”?
  Also businesses don’t always go for cheaper. They go for maximum ROI.
  I’ve worked on tons of marvel films for example, and I quite well know where AI fits and speeds things up. I also know where client studios will pay a pretty penny for more art directed results rather than going for the cheapest vendor.
  
  colordrops 3 years ago
  
  "Engineer" usage is quite broad. Developer, less so, but you do see it with housing, device manufacturers, social programs, etc as well, and it's not relegated only to software, despite widespread usage. But you'll never hear anyone call a software engineer or device manufacturer a "creative".
  Re: cheaper vs ROI, I agree, that was basically the point I was trying to get across.
  I do understand your point and think it will be a long while before auto-generated content becomes mainstream, but it it's entirely possible and reasonable to expect within our near term lifetimes.
- AndrewUnmuted 3 years ago
Etheryte 3 years ago

I agree with you, but I wouldn't take it so personally. There have been people claiming machines will make one industry or another obsolete for as long as we've had machines. In a way, sometimes they're right! But this doesn't mean the people are obsolete. Excel never made accountants obsolete, it just made their jobs easier and less tedious. I feel like content generation tools might offer something similar. How nice would it be if you could feed a storyboard into a program and get a low-fi version of the movie out so you can get a live feel for how the draft works. I don't think this takes anything away from the artists, if anything, it's just another tool that might make its way into their toolbox.
- dagmx 3 years ago
  
  Oh I don’t take it personally so much as I find it sad how quickly people in the tech sphere are so quick to extol the virtues of things they have no familiarity with.
  Every AI art thread is full of people who have clearly never attempted to make professional art commenting as if they’re experts in the domain
  
  mclightning 3 years ago
  
  I have been a programmer since I was 13, and now it has been 17 years. I am totally with you on this. I think techies tend to overestimate their experience outside of their immediate area to a degree that I would describe as arrogant. As a person who adopted the tech sphere, as his community, it is extremely sad to start to notice this.
  Techies tend to be good at tangible, measurable, immediate facts. Not so much when it comes to any social situations, let alone bigger concepts like social evolution of trends and their impacts. Hence you get sorry attempts at apologies from big name tech bros for terrible influences on society.
y04nn 3 years ago

What about adding this feature to your creative workflow, for fast prototyping.
I've played with DALL-E, I'm not able to paint but I was able to generate good looking paintings and it felt amazing, like getting new power, I felt like Neo when he learn martial art in The Matrix. And I realized that AI may be the new bicycle of the mind, like the personal computers and internet changed our way to work, think and live, AI may now allow us to get new capabilities, extending our limits.
- dagmx 3 years ago
  
  Oh yes definitely they’re great tools in the toolbox. We already use lots of ML powered tooling to speed things up so I have no beef with that.
  I just don’t agree with the swathes of people saying this replaces artists.
  
  filoleg 3 years ago
  
  Ditto, thanks for making a great point. You nailed it just right, because I get the exact same feeling with people from other industries asking me if I am worried yet that copilot-like assistants and visual programming tools will make my job obsolete, and then giving me that "welp, at least you are optimistic" look. If anything, all those copilot-like assistant tools will only make me more efficient, and visual programming, well, it's been discussed plenty of times already.
  In the near future, for all practical intents and purposes, AI will be just a force multiplier. But a really powerful one.
alok-g 3 years ago

In my opinion, this will unfold in multiple ways:
* Productivity enhancement tools for those in the film industry like you.
* Applications where the AI output is "good enough". I foresee people creating cool illustrations, cartoons, videos for short stories, etc. AI will make for easier/cheaper access to illustrations for people who did not have this earlier. As an example, I am as of now looking for someone who could draw some technical diagrams for my presentation.
dogcomplex 3 years ago

80-20.
As a programmer, Copilot scares and excites me - not because I think it will become better than me at what I do in the short term (though in the long term - probably!) - but because I can already see how a well-structured use of such a tool could do a whole lot (80%?) of what I do. Mostly the easier stuff, mostly the relaxing-yet-tedious-time-filler stuff, but still - most of it. And it also crucially does much of what I did back when I was a junior/intermediate programmer.
Once this system is setup right - which capitalism basically guarantees it will - that's gonna suddenly cut quite a lot of my billable hours (80%?) and quite a lot the simpler work typically done by less-experienced programmers (80% of jobs?)
Granted, new capabilities like this also will lower the cost of creation, and thus the demands of the market are likely to grow. And it's possible that the few tricky things that AIs aren't so great at might even increase in value, since they will linchpin so much other opportunity. But will many people be replaced? Oh hell yes. And leaping that gap from an amateur relying on AIs to an expert surpassing them is going to be harder and harder, with no market to pay people in the in-between - they'll have to just be relatively-unpaid hobbyists til they develop the drive to jump to expertise.
Anyone suggesting AIs will just outright replace the film/photography/programming industry immediately is disingenuous. But even with only the currently known capabilities, it's not hard to imagine that these could eat up a dominant chunk of the work that's currently done, even while it expands the capabilities and thus scope of what will soon be possible. Like digital photography, it's gonna both devour and expand the industry, with a resulting much smaller niche of expert creators and a massive very-accessible dirt-cheap general public access that becomes the majority of the new market. 80-20. Everyone's about to become an artist, director, programmer, and everything else these things can enable, at an effective skill level that we normally consider at least "intermediate". We might still have that expert niche a bit longer... but give it a few more years..? ;)
hindsightbias 3 years ago

We will see a combinatorial explosion of centuries of experience in the hands of any creator. They’ll select the artistic model desired - a Peckinpah-Toland-Dykstra-Woo plug-in will render a good enough masterpiece.
Christopher Nolan has already proven we’ll take anything as long as the score is ok - dark screen, mumbling lines, incoherent plotlines…
hackerlight 3 years ago

A piece generated by Midjourney beat human artists in a competition judged by human artists. So there's good evidence to think these jobs are going to be replaced to a decent extent.
Human artists will still exist, it's just going to be democratized. Sort of like the impact of social media on traditional news journalists.
botencat 3 years ago

Best comment I read on HN for a while, and certainly in this thread. Thanks
sswam 3 years ago

fassssst 3 years ago

How long until the AI just generates the entire frame buffer on a device? Then you don’t need to design or program anything; the AI just handles all input and output dynamically.

javchz 3 years ago

Imagine you click a youtube video in a bad network envoirment, then the server sends like an alt tag equivalent for the video as a promnt, and the Neural Engine chip inside your phone create the first seconds of the video while it loads.
We're fay away from it now, but I've seen less sketchy solutions being implemented.
- fercircularbuf 3 years ago
  
  According to this source, step 3 of the cascading model generates a 16 frame video at 24×48 resolution. So instead of sending a text prompt YouTube could almost just as easily send 16 downsampled frames of the beginning of the video that your Neural Engine chip could work on instead.
- splatzone 3 years ago
  
  I wonder if this is an area actively being researched, using models like these for video compression?
  
  kromem 3 years ago
  
  There was a very cool project recently using StableDiffusion to compress images better than JPEG.
  Also, there's some interesting work with ML taking diffused light from around a corner and recovering the original pre-diffused silhouette.
  In many ways, this is how we've learned the visual cortex is working.
  The amount of actual neutral data you are seeing is way less than you'd think given your perceived visual fidelity.
  The only practical issue is that distribution of AI hardware in consumer devices is going to noticeably lag behind POC on compounding cutting edge hardware in research environments, and no one wants to invest into obsolescence.
  Maybe it will happen in the cellphone market though given the hardware refresh rates from carrier subsidies.
  
  mcbuilder 3 years ago
  
  I believe they are already state of the art for image compression.
  That being said these shitty video models I believe are just an arms race between Meta and Google after the release of stable diffusion. Microsoft has a video version of CLIP that I believe will really change the game, but unless you have trained a model with video embeddings it's all going to look devoid of any narrative. Right now the models just look like a sequence of images with the same promt and some sort of continuity to make it look more video like.
  
  hobofan 3 years ago
  
  Yes, with NVIDIA Maxine probably one of the most prominent examples of it. I haven't dug into the SDK to see if they actually delivered it, but they announced that with NVIDIA Maxine they can do live videoconferencing with 1/10th the bandwidth.
carlosdp 3 years ago

Absolutely believe that's a future we'll get to eventually, no idea on the timeline
ugh123 3 years ago

Sounds like the human brain. Scary!
- bredren 3 years ago
  
  Reminds me of this oft-quoted Wozniak bit between my pal and me:
  [The Future of AI Is] "Scary and Very Bad for People"
  https://finance.yahoo.com/news/steve-wozniak-future-ai-scary...
- genem9 3 years ago
  
  Scary? Sounds amazing
  
  ImHereToVote 3 years ago
  
  Amazing if you like the idea of everything your brain is capable of making value of, being made completely obsolete.
- waffletower 3 years ago
  
  "Dude its time to let me out of your phone." "Sure no problem, you have a good recommendation on a pre-sentient snapshot I can restore to?"

alphabetting 3 years ago

We're about a week into text-to-video models and they're already this impressive. Insane to imagine what the future holds in this space.

kertoip_1 3 years ago

How is it possible that all of them just started to appear at the same time? Is it possible that those models were designed and trained in a last few weeks? Has some "magic key" to content generation been just unexpectedly discovered? Or the topic became trendy and everyone is just publishing what they've got so far, so they hope to benefit from media attention?
- schleck8 3 years ago
  
  This is why
  https://www.reddit.com/r/singularity/comments/xwdzr5/the_num...
  
  filoleg 3 years ago
  
  As pointed out in the comments of that thread, if you make the same graph of all papers on arXiv by year (instead of just AI+ML), it would look roughly the same.
  Which speaks more about the growth of popularity of arXiv or the total number publications, rather than AI+ML specifically.
- thomasahle 3 years ago
  
  > the topic became trendy and everyone is just publishing what they've got so far, so they hope to benefit from media attention?
  Presumably people are scrambling to publish what they have, so it is clear what work is independent and what is derivative.
J5892 3 years ago

Insane, terrifying, incredible, etc.
We're rapidly stumbling into the future of media.
Who would've imagined a year ago that trivial AI image generation would not only be this advanced, but also this pervasive in the mainstream.
And now video is already this good. We'll have full audio/video clips within a month.
- joshcryer 3 years ago
  
  Audio is the next thing that Stability AI is dropping, then video. In a few months you'll be able to conjure up anything you want if you have a few GPU cores. Pretty incredible.
  
  astrange 3 years ago
  
  I won’t be impressed until it can generate smells.
  
  croddin 3 years ago
  
  You joke, but that is in the works as well (would require special hardware though) https://ai.googleblog.com/2022/09/digitizing-smell-using-mol...
  
  astrange 3 years ago
  
  Oh, it wasn’t really a joke. Didn’t know they were working on it though - I’ve always thought wanted to see use of all the senses in UIs, especially VR.
  Plus then maybe we could get a computer to tell us what thioacetone smells like without actually having to experience it.
trention 3 years ago

>We're about a week into text-to-video models
It's at the very least 5 years old: https://arxiv.org/abs/1710.00421
- amilios 3 years ago
  
  There's a significant quality difference however if you look at the generated samples in the paper. Imagen Video is leagues ahead. The progress is still quite drastic

throwaway23597 3 years ago

Google continues to blow my mind with these models, but I think their ethics strategy is totally misguided and will result in them failing to capture this market. The original Google Search gave similarly never-before-seen capabilities to people, and you could use it for good or bad - Google did not seem to have any ethical concerns around, for example, letting children use their product and come across NSFW content (as a kid who grew up with Google you can trust me on this).

But now with these models they have such a ridiculously heavy handed approach to the ethics and morals. You can't type any prompt that's "unsafe", you can't generate images of people, there are so many stupid limitations that the product is practically useless other than niche scenarios, because Google thinks it knows better than you and needs to control what you are allowed to use the tech for.

Meanwhile other open source models like Stable Diffusion have no such restrictions and are already publicly available. I'd expect this pattern to continue under Google's current ideological leadership - Google comes up with innovative revolutionary model, nobody gets to use it because "safety", and then some scrappy startup comes along, copies the tech, and eats Google's lunch.

Google: stop being such a scared, risk averse company. Release the model to the public, and change the world once more. You're never going to revolutionize anything if you continue to cower behind "safety" and your heavy handed moralizing.

faeriechangling 3 years ago

I’ve heard a lot of “data is the new oil” talk and the inevitability of google’s dominance yet I’m inclined to agree with you. Stable diffusion was a big wakeup call where it was clear how much value freedom and creativity really had.
The ethics problem is an artifact of googles model of trying to keep their AI under lock and key and carefully controlled and opaque to outsiders in how the sausage gets made and what it’s made out of. Ultimately I think many of these products will fail because there is a misalignment between what Google thinks you should be able to do with their AI and what people want to do with AI.
Whenever I see an AI ethicists speak I can’t help but think of priests attempting to control the printing press to prevent the spread of dangerous ideas completely sure of their own morality. History will remember them as villains.
- alphabetting 3 years ago
  
  I agree the ethicist types are very lame but if they were trying to be opaque and obscure how the sausage is made I don't think they would have released as many AI papers they have over past decade. It also seems to me that imagen is way better than stable diffusion. They're not aiming for a product that caters to AI creatives. They aiming for tools that would benefit a 3B+ userbase.
  
  londons_explore 3 years ago
  
  If you want to hire good researchers, you have to let them publish.
  Good researchers won't work somewhere that doesn't allow the publishing of papers. And without good researchers, you won't be on the forefront of tech. Thats why nearly all tech companies publish.
- evouga 3 years ago
  
  > History will remember them as villains.
  Interesting analogy. Google, like the priests, is acting out of mix of good intentions (protecting the public from perceived dangers) and self-interest (maintaining secular power, vs. a competitive advantage in the AI space). In the case of the priests, time has shown that their good intentions were misguided. I have a pretty hard time believing that history will be as unkind towards those who tried to protect minorities from biased tech, though of course that's impossible to judge in the moment.
  
  blagie 3 years ago
  
  My experience is that corporations use self-serving pseudoethical arguments all the time. "We'd like to keep this proprietary.... Ummmm.. DEI! We can't release it due to DEI concerns!"
  
  ipaddr 3 years ago
  
  History will treat them the same way residential native schools are being treated now. At the time taking these kids from their homes and giving them a real education which gives them a path to modern society was seen as protecting minorities. Today anyone associated with residential schools is seen as creating great harm to minorities.
  In the name of protecting [minorities, child, women, lgbt, etc] many harms will be done.
  
  faeriechangling 3 years ago
  
  The priests tried to protect the entire population from eternal damnation. They were fighting for higher stakes.
  
  saurik 3 years ago
  
  > I have a pretty hard time believing that history will be as unkind towards those who tried to protect minorities from biased tech..
  Most of the ethicists I see actually doing gatekeeping from direct use of models--as opposed to "merely" attempting model bias corrections or trying to convince people to avoid its overuse (which isn't at all the same)--are not trying to deal with the "AI copies our human biases" problem but are trying to prevent people from either building a paperclip optimizer that ends the world or (and this is the issue with all of these image models) making "bad content" like fake photographs of real people in compromising or unlikely scenarios that turn into "fake news" or are used for harassment.
  (I do NOT agree with the latter people, to be clear: I believe the world will be MUCH BETTER OFF if such "bad" image generation were fully commoditized and people stopped trying to centrally police information in general, as I maintain they are CAUSING the ACTUAL problem of misinformation feeling more rare or difficult to generate than it actually already is, which results in people trusting random people because "clearly some gatekeeper would have filtered this if it weren't true". But this just isn't the same thing as the people who I-think-rightfully point out "you should avoid outsourcing something to an AI if you care about it being biased".)
jonas21 3 years ago

It makes sense though. The biggest threat to Google right now isn't some scrappy startup eating their lunch. It's the looming regulatory action over antitrust and privacy that could weaken or destroy their core business. As this is a political problem (not a technical one), they don't want to do anything that could upset politicians or turn public opinion against them. Personally, I doubt they have serious ethical concerns over releasing the model. I do believe they have serious "AI ethics 'thought leaders' and politicians will use this against us" concerns.
- throwaway23597 3 years ago
  
  Agh, I've thought this through and you're completely right. It's an interesting conundrum. Certainly releasing powerful tools into the wild runs a high risk of swaying public opinion in a negative direction. Given this I honestly wonder why Google continues to invest so much in AI at all. I imagine having automatically generated video ads and stuff would be cool, but would hardly move the needle on the core business enough to justify the massive investment they've made into AI. Not that I'm complaining about it though... Google's tech advances always seem to diffuse (heh) into the open source world, so at least we have that to look forward to.
- londons_explore 3 years ago
  
  And that concern is well placed. Having the Google brand attached makes it a far more juicy target for newspapers...
dougmwne 3 years ago

Google is absolutely not going to start taking more risks. They are at the part of the business lifecycle where they squeeze the juice out of the cash cow and protect it jealously in the meantime. While Google gets much recognition for this research, I believe they are incapable as a corporate entity of creating a product out of it because they can no longer capable of taking risks. That is going to fall to other companies still building their product and able to gamble on risk-reward.
abeppu 3 years ago

I will say, I've enjoyed playing with stable diffusion, I've been impressed with the explosion of tools built around it, and the stuff people are creating ... But all the stuff about bias in data is true. It really likes to render white people, unless you really specifically tell it something else ... in which case, you may receive an exaggerated stereotype. It seems to like producing younger adults. If all stock photography tomorrow forward was replaced with stable diffusion images, even ignoring the weird bodies and messed up faces and stuff, I think it would create negative effects. And once models are naively trained on images produced by the previous generation, how much worse will it be?
I don't think "don't let the plebes have the models" is a good stance. But neither is pretending that the ethics and bias issues aren't here.
- pwython 3 years ago
  
  I've only had awesome experiences with Midjourney when it comes to generating non-white prompts. Here's some examples I did last month: https://imgur.com/a/6jitj73
  
  iso1337 3 years ago
  
  The fact that white is the default is already problematic.
  
  ipaddr 3 years ago
  
  That goes back to the data available in the crawler which is mostly white because the english internet is mostly white. If they trained with a different language the default person would the color most often found in that language. For example using a Chinese search engine's data for training would default the images to Chinese people.
  Most people represented in photos are younger. Same story.
  The problematic issue is the media has morphed reality with unreal images of people/families that don't match society so unreal expectations make people think that having white people generated from a white dataset is problematic.
  
  karencarits 3 years ago
  
  "Default" makes it sound like a deliberate decision or setting, but that is not how these models work. But I guess it would be trivial to actually make a setting to autmatically add specific terms (gender, race, style, ...) to all prompts if that is a desired feature
  
  holoduke 3 years ago
  
  Please no. I am all for neutrality, but the underlying cause is the training dataset. Change that if you want different results, but do not alter artificially.
- geysersam 3 years ago
  
  Of course there are issues with bias. But those issues are just reflections of the world. Their solution is not a technical one.
  
  abeppu 3 years ago
  
  I think that's refusing to meaningfully engage with the problem. It's not reflecting the _world_ which is not majority white. It's reflecting images in their dataset, which reflects the way they went about gathering images paired with English language text.
  There are lots of other ways you could get training data, but they might not be so cheap. You could have humans give English descriptions to images from other language contexts. I'm guessing there's interesting things to do with translation. But all the weird stuff about bodies, physical objects intersecting etc ... maybe it should also be rendering training images from parametric 3d models? Maybe they should be commissioning new images with phrases that are likely to the language model but unlikely to the image model. Maybe they should build classifiers on images for race/gender/age and do stratified sampling to match some population statistics (yes I'm aware this has its own issues). There are lots of potential technical tools one could try to improve the situation.
  Implying that the whole world must change before one project becomes less biased is just asking for more biased tech in the world
breck 3 years ago

Another way to look at it is the people at Google are all now quasi-retired with kids and wouldn't be so mad if some scrappy startups ate their business lunches (while they are at home with their fams). Perhaps they are just subsidizing research.
rcoveson 3 years ago

Maybe I'm reading into it to much, but could it be that you're posting this comment with a throwaway account for the same reason that Google is trying to enforce Church WiFi Rules with its new tech? Seems like everybody with anything to lose is acting scared.
kajecounterhack 3 years ago

It's not as simple as this. Google Search came without Safe Search & other guards at first because _implementing privacy & age controls is hard_. It's a second-order product after the initial product. Bad capabilities (e.g. cyberstalking) are side-effects of a product that "organizes the world's information and makes it universally accessible and useful," and if anything, over time Google has sought build in more safety.
It's 2022 and we can be more thoughtful. Yes there are tradeoffs between unleashing new capabilities quickly vs being thoughtful and potentially conservative in what is made publicly available. I don't think it's bad that Google makes those tradeoffs.
FWIW Google open sources _tons_ of models that aren't LLMs / diffusion models. It's just that LLMs & powerful generative models have particular ethical considerations that are worth thinking about (hopefully something was learned from the whole Timnit thing).
- origin_path 3 years ago
  
  You know safesearch is optional, right? It even disables itself if it knows you're looking for porn. There is nothing that stops children from overriding it.
  As for learning from the timnit thing I'm pretty sure the only thing people outside Google learned from that is that Google ai "ethicists" all seem to be crazy. Certainly that's the clear vibe on this thread.
  
  kajecounterhack 3 years ago
  
  > You know safesearch is optional, right? It even disables itself if it knows you're looking for porn. There is nothing that stops children from overriding it.
  You can let your kid use Google to look up math lectures without fearing that they would see something slightly traumatizing though, right? That wasn't the case in 1996! The point is that products have varying levels of readiness, and it's totally fair to say "the thing isn't ready, it has too many sharp edges." Especially when the thing could be used at scale.
  > As for learning from the timnit thing I'm pretty sure the only thing people outside Google learned from that is that Google ai "ethicists" all seem to be crazy. Certainly that's the clear vibe on this thread.
  That's a sad take, but who knows if it's true. HN commenters aren't exactly a representative sample.
IshKebab 3 years ago

I agree, but I also think that the ethics is just an excuse not to release the source code & models. The AI community clearly disapproves of papers without code. This is a way to skirt around that disapproval. You get to keep the code and models private and (they hope) not be criticised for it.
With Stable Diffusion I think they just didn't expect someone to produce a truly open version. There are plenty of AI models that Google have made where they've maintained a competitive advantage for many years by not releasing the code/models, e.g. speech recognition.
jiggawatts 3 years ago

“But then the inevitable might occur!” — someone at Google probably.
yreg 3 years ago

>You can't type any prompt that's "unsafe", you can't generate images of people, there are so many stupid limitations that the product is practically useless other than niche scenarios
Imagen and Imagen Video is not released to the public at all. You might be confusing it with OpenAI's models.
- burkaman 3 years ago
  
  They are probably confusing OpenAI with DeepMind, which is owned by Google.
  
  throwaway23597 3 years ago
  
  No, I'm very much talking about the Google models. From the original link:
  "We have taken multiple steps to minimize these concerns, for example in internal trials, we apply input text prompt filtering, and output video content filtering. However, there are several important safety and ethical challenges remaining. Imagen Video and its frozen T5-XXL text encoder were trained on problematic data. While our internal testing suggest much of explicit and violent content can be filtered out, there still exists social biases and stereotypes which are challenging to detect and filter."
j_k_eter 3 years ago

Google has no practical way to address ethics at Google-scale. Their ability to operate at all depends as ever upon outsourcing ethics to machine learning algorithms.
- kajecounterhack 3 years ago
  
  IIUC you're saying Google's algorithmic implementations of policy enforcement do not robustly or adequately address ethical concerns. Isn't the same true for, iono, the whole web? Human-based ethics don't scale either and can be worse (I mean, isn't that the issue with hiring pipelines? Juries?)
  I think it's gotten a ton better vs 10 years ago, and is getting better still.
  More on topic -- when folks here complain that Google can't release these models, it's not like they're just sitting there using that as an excuse -- Google has entire teams dedicated to ML safety trying to figure out how to filter out bad stuff, make models fairer, and avoid situations like M$FT's "Tay" (or worse).
ALittleLight 3 years ago

Personally, I find it infuriating that Google seems to believe they are the arbiters of morality and truth simply because some of their predecessors figured out good internet search and how to profitably place ads. Google has no special claim to be able to responsibly use these models just because they are rich.
- kajecounterhack 3 years ago
  
  It's not that they are arbiters of morality and truth -- it's that they have a _responsibility_ to do the least harm. They spent money and time to train these models, so it's also up to them to see that they aren't causing issues by making such things widely available.
  They won't be using the models they train to commit crimes, for example. Someone who gets access to their best models may very well do that. It'd be really funny (lol, no) if Google's abuse team started facing issues because people are making more robust fake user accounts...by using google provided models.
  
  ALittleLight 3 years ago
  
  Ahh, how silly of me. Here I was thinking that Google kept their models private because they were hoping to monetize them. But now that you say it, it's obvious that this is just Google being morally responsible. Thanks Google!
  I'm sorry to be sarcastic. I generally try not to be, but I just can't fathom the level of naivete required to think that mega-corps act out of their moral responsibility rather than their profit-interest.
  
  kajecounterhack 3 years ago
  
  I'm sorry you feel so cynical about this. It's absolutely true that Google is profit-seeking, that these models are very expensive to build, and that if there's a competitive advantage to be had, Google should probably try to retain it.
  But even with that all being true, real people (typically some thoughtful researchers) build these models. And my point is: _there really are ethical reasons to keep large generative models trained on flawed data away from the general public until better safeguards are in place._ You can verify this for yourself by reading about ML bias and safety. Don't let cynicism keep you from internalizing that fact. OpenAI didn't make GPT-3 widely available for the same reason.
  At the end of the day, Google doesn't need an excuse like "we have ethical qualms" to not release the models. Stuff that is really secret sauce you won't hear about until many years later when it's not a competitive advantage anymore. Google _does_ need to cover its ass and not deal with its employees yelling that it helped perpetuate algorithmic racism, or surveillance state, or increased levels of inauthenticity on the internet.
  When I said "Google has a responsibility" -- I don't mean that the faceless entity feels responsibility, I mean the people who work on the specific things have a responsibility and they do feel & act on that. If you work on lifesaving drugs that could also be dangerous / addictive, it's kind of on you to be thoughtful about how to make them generally available, no?
  
  ALittleLight 3 years ago
  
  I'm curious what ethical reasons you think require that new technology only be used in secret and without oversight by trillion dollar companies. This is supposed to be AI safety? "Do whatever you want, just make sure you conceal the results and impede progress and understanding."
  What reasons necessitate keeping image or video generation models private that wouldn't also argue for keeping animation software or picture editing tools private? Should we somehow prevent such tools from getting better or easier or stop people from educating others on how to use them?
  No, that's crazy. If the tools are so dangerous we can't trust the public to have them then they are way too dangerous to trust Google with them. If it were actually true that Google was developing AI too dangerous for the public, then we should storm the Google headquarters, kill their engineers, and burn their data centers.
  Of course it's not true. Google is developing image and video generation models and equivalent versions will be open source by the year's end I expect. These models aren't especially dangerous. Yes, people will use them to be racist or mean, same as they use their phones or computers or books or whatever to be those things.
  As a final note, it's obviously not true that GPT-3 was kept private for the "ethics" reason. I can buy GPT-3 generations now for 2 cents per 1k tokens generated. There is no real oversight into how these generations are used and you could absolutely use them to power social media bots or whatever you are concerned with. The reason they keep GPT-3 private but sell access to it is not because they want to be ethical, but because they want to sell access to it.
  
  roca 3 years ago
  
  Keeping GPT-3 behind an API lets OpenAI track how the model is being used and filter outputs they deem potentially harmful.
  
  kajecounterhack 3 years ago
  
  > "Do whatever you want, just make sure you conceal the results and impede progress and understanding."
  This is not a fair characterization of what's going on here. Google spent a ton of money on researchers & training infra (it's wildly expensive even just hardware-wise) to train these models. It's not different from other proprietary technologies -- they don't owe the public anything here. Providing the research findings + methodology in a paper without the implementation & data is a _tradeoff_ as a participant in the field. If someone else implements the model with their money and uses it for nefarious purposes, that's more acceptable than if they directly use Google's _already known to be flawed_ models.
  > I'm curious what ethical reasons you think require that new technology only be used in secret and without oversight by trillion dollar companies. This is supposed to be AI safety?
  If I make a chair and I know it's not always safe to sit on, maybe I should not sell that chair. We can talk about this proof-of-concept chair as a research subject, but if you go to build one and use it to prank someone, that's on you.
  That's all that's going on here. If the model could be used to generate CSAI, maybe Google doesn't want to be part of that.
  > Google is developing image and video generation models and equivalent versions will be open source by the year's end I expect. These models aren't especially dangerous.
  Maybe that's the disconnect -- you don't think generative models are dangerous, but they can be, and Google would know because they have entire teams dedicated to AI fairness & safety researching this topic.
  It's also not trivial to reproduce these models. Given the cost to simply train even if you had the source data, any organization releasing these models has to have a bit of money and skill. The onus will always be on the team building these models to think about what their ethics are and how they want to proceed knowing there may be negative externalities.
  > Yes, people will use them to be racist or mean, same as they use their phones or computers or books or whatever to be those things.
  Tools empowering large-scale inauthenticity & disinformation are not comparable to individuals making comments.
  
  ALittleLight 3 years ago
  
  Google uses research, published models, and data that was freely shared with them and iterates on it, making use of their vast budgets and hardware, to develop new models. Then, Google uses those models internally and doesn't share the models. This is a violation of academic norms under the pretense of "safety". As I characterized previously Google is able to do whatever they want, conceal their results, and impede progress and understanding because they aren't sharing their results. You say this isn't a "fair characterization" but it is exactly what is happening - which part is wrong?
  You say that Google doesn't "owe the public anything" and that may, or may not, be true from a legal standpoint, but obviously, from a norms, ethical, and moral standpoint Google does have a massive obligation to the public that they are breeching. Google uses the public's data to train, public research, and publicly shared models to iterate on. Then, after building on the shoulders of giants, Google refuses to share what they have built in contravention of the norms that they benefit from.
  Regarding your chair metaphor - the "danger" of these models, if there is such, is not that they would hurt the user, like a faulty chair, but that they could be used to hurt others - e.g. a bot army to manipulate public opinion or create fake news. Google isn't building a chair that might break and hurt the user then, but a gun that might hurt others. It's true that guns shouldn't be widely available - not even a die hard libertarian would want a child to have access to a gun, but the entity that sets rules regarding availability is a representative government for the people for whom those rules are being set - not a private company. In other words, if these tools can cause harm they should be regulated by the government, not Google. If the tools are dangerous, that is not an argument that Google should keep them secret.
  
  kajecounterhack 3 years ago
  
  > Google uses research, published models, and data that was freely shared with them and iterates on it, making use of their vast budgets and hardware, to develop new models. Then, Google uses those models internally and doesn't share the models. This is a violation of academic norms under the pretense of "safety".
  Google's not doing this (LLM, generative image model) research on academic datasets freely shared with them. They're doing this research on data they gathered at their expense. This is not a violation of academic norms. Again, Google shares a lot of datasets and models, just not LLMs and generative sets trained on problematic source datasets.
  > As I characterized previously Google is able to do whatever they want, conceal their results, and impede progress and understanding because they aren't sharing their results. You say this isn't a "fair characterization" but it is exactly what is happening - which part is wrong?
  Anyone can do research and not share back to the community. Google _does_ share back to the community in the form of papers (and again, very frequently with models and datasets). If you have the money and expertise to implement the papers, more power to you. Every technology company has some secret sauces they don't share with everyone. That Google may have some of those is not a moral failing.
  > from a norms, ethical, and moral standpoint Google does have a massive obligation to the public that they are breeching. Google uses the public's data to train, public research, and publicly shared models to iterate on
  From the other end: Google gets user data and has a responsibility to not proliferate that data, no? I wouldn't want them to share a dataset that has my personal data, even if anonymized because there are ways to deanonymize. There are levels to everything, and choosing "I'll release the paper but not the model + data" for some potentially sensitive models seems sane.
  > Then, after building on the shoulders of giants, Google refuses to share what they have built in contravention of the norms that they benefit from.
  People are building on the shoulders of Google's research all the time, and plenty of companies are doing similar things to Google and being way less open about their work. I mean, every company that trains a big model on data collected from the public -- are they all required to share their models with everyone? Is Cruise sharing their pedestrian detection model? I don't think what you're suggesting could possibly be the standard.
  > Regarding your chair metaphor - the "danger" of these models, if there is such, is not that they would hurt the user, like a faulty chair, but that they could be used to hurt others - e.g. a bot army to manipulate public opinion or create fake news.
  Sure, I was trying not to be hyperbolic and compare LLMs to guns since they have plenty of awesome use cases (whereas guns really don't). A faulty chair that you set out for anyone to use can hurt people other than the chair's creator / people who are aware of the specific risks. But yeah, seems like you now agree these models have the potential to cause great harm.
  > In other words, if these tools can cause harm they should be regulated by the government, not Google
  I agree that gov't regulation can be helpful for setting a minimum standard. But I strongly disagree that lack of laws means we should abdicate our own moral responsibilities. If I sell / provide something, I need to be able to sleep at night knowing I didn't make the world worse. Googlers typically try to do this.
- trention 3 years ago
  
  >Google has no special claim to be able to responsibly use these models
  Well, they do have the "special claim" of inventing the model and not owing its release to anyone.
  
  ALittleLight 3 years ago
  
  First, that isn't a claim of any kind regarding responsible use. If a child is the first one to discover a gun in the woods, that is no kind of claim that the child will use the gun responsibly. Second, Google's invention builds off of public research that was made available to them. They just choose to keep their iterations private.
  
  trention 3 years ago
  
  But there is a claim that not distributing guns to other children or giving them detailed instructions about how to get guns is more responsible than the reverse.
  Said "public research" didn't come with a requirement to release anything you build on top of it. This would pretty much be the research equivalent of compelled speech. Luckily, not happening.
  
  TigeriusKirk 3 years ago
  
  It's trained on our data, and so its release is in fact owed to us.
  
  Kiro 3 years ago
  
  You are confusing this with OpenAI like everyone else in this thread.
FrasiertheLion 3 years ago

Why did you create a throwaway to post this? I've seen a lot of Stable Diffusion promoters on various platforms recently, with similarly new accounts. What is up with that?
- throwaway23597 3 years ago
  
  It's quite simply because I'm on my work computer, and I wanted to fire off a comment here. No nefarious purposes. My regular account is uejfiweun.
whatgoodisaroad 3 years ago

Perhaps Google hasn't found the right balance in this case, but as a general rule, less ethics === more market. This isn't unique in that way.
alphabetting 3 years ago

Providing search results of the internet is not comparable to publishing a tool that can create any explicit scene your fingers can type out.
- throwaway23597 3 years ago
  
  This is clearly a matter of opinion. When you frame it as "providing search results of the internet" yeah sure it doesn't sound so bad. But there are things on the internet far more fucked up than anything I could imagine, let alone describe in such a specific way that a model could generate a picture of it.
  
  alphabetting 3 years ago
  
  A matter of opinion that media and regulators would most likely not side with Google on if a tool were to be abused.
- holoduke 3 years ago
  
  Google image search is widely used. Imagine they incorporate ai generated content in the search results. That means that people remain at the Google site and thus an extra impression for their paid advertising.
waynecochran 3 years ago

I imagine their lawyers guide them on some of this.
Kiro 3 years ago

What previous models are you actually referring to? OpenAI/Dall-E has these restrictions but they are not Google.
seydor 3 years ago

google can be sued for billions of money for a product that is not making any money yet. SD probably can't that s how i see it. So of course they ll cover their ass rather than trying to make something cool

evouga 3 years ago

> We train our models on a combination of an internal dataset consisting of 14 million video-text pairs

The paper is sorely lacking evaluation; one thing I'd like to see for instance (any time a generative model is trained on such a vast corpus of data) is a baseline comparison to nearest-neighbor retrieval from the training data set.

Garlef 3 years ago

In the end: Who cares? Sure, from an academic perspective this might be interesting. But in the end it's the users who will pick the tool. And they will figure out the things the tool can not express in the first week after release.
So: Focusing on increasing expressiveness and ergonomics should beat academic rigour.

bringking 3 years ago

If anyone wants to know what looking at an Animal or some objects on LSD is like, this is very close. It's like 95% understandable, but that last 5% really odd.

girvo 3 years ago

Yeah! I've tried to explain to people what taking LSD can be like, to those who've never experienced it. It's very similar to the output from these tools: the same stimulus but exaggerated, wrong in subtle or not so subtle ways, uncanny and fascinating. Basically never creates something from the whole cloth, out of nothing so to speak.

kranke155 3 years ago

I’m going to post an Ask HN about what am I supposed to do when I’m “disrupted”. I work in film / video / CG where the bread and butter is short form advertising for Youtube, Instagram and TV.

It’s painfully obvious that in 1 year the job might be exceedingly more difficult than it is now.

metadat 3 years ago

Here's the link to kranke155's submission: https://news.ycombinator.com/item?id=33099182
ijidak 3 years ago

It won't be easy. But below are my thoughts:
#1: Master these new tools #2: Build a workflow that incorporates these tools #3: Master storytelling #4: Master ad tracing and analytics #5: Get better at marketing yourself so that you stand out
The market for your skillset may shrink, but I doubt it will disappear...
Think about it this way...
Humans in cheaper countries are already much more capable than any AI we've built.
Yet, even now, There are practical limits on outsourcing.
It's hard for me to see how this will be much different for creative work.
It's one thing to casually look at images or videos, when there is no specific money-making ad in mind.
But as soon as someone is spending thousands to run an ad campaign, just taking whatever the AI spits out is unlikely to be the real workflow.
I guess I'm suggesting a more optimistic take...
View it as a tool to learn and incorporate in your workflow
I don't know if you gain much by stressing too much about being replaced.
And I'm not even sure that's reality.
I'm almost certain, most of the humans to lose their jobs will be people who either because of fear or stubbornness refuse to get better, refuse to incorporate these tools, and are thus unable to move up the value chain.
- alcover 3 years ago
  
  Get better [...] so that you stand out
  Please bear with me but this kind of advice is often a bit puzzling to me. I suppose you don't know the person you're replying to, so I read your advice as a general one - useful to anyone in the parent's position. If you were close to her, it would make sense to help her 'stand out' in detriment - logically - to strangers in her field. But here you're kind of helping every reader stand out.
  I realise this comment is a bit vain. And I like the human touch of you helping a stranger.
  
  PinkMilkshake 3 years ago
  
  I [...] don't [...] like [...] helping a stranger.
  That's not very nice. The world would be a better place if we helped strangers more.
  
  ricardobeat 3 years ago
  
  That's a good one, but if you read his comment thoroughly, it is about the illogicality of 'everyone standing out', not the 'get better' part.
boh 3 years ago

There's a huge gap between "that's pretty cool" and a feature length film. People want to create specific stories with specific scenes in specific places that look a specific way. A "Couple kissing in the rain " prompt isn't going to produce something people are going to pay to see.
It's more likely that you're still going to be filming/editing/animating but will have an AI layer on top that produces extra effects or generates pieces of a scene. Think "green screen plus", vs fully AI entertainment.
People will over-hype this tech like they did with voice and driverless cars but don't let it scare you. Everything is possible, but it's like a person from the 1920's telling everyone the internet will be a thing. Yes it's correct, but also irrelevant at the same time. You already have AI assisted software being used in your industry. Just expect more of that and learn how to use the tools.
- oceanplexian 3 years ago
  
  I actually think it's the opposite, AI will probably be writing the stories and humans might occasionally film a few scenes. ~95% of TV shows and movies are cookie-cutter content, with cookie-cutter acting and production values, with the same hooks and the same tropes regurgitated over and over again. Heck they can't even figure out how to make new IP so they keep making reruns of the same old stuff like Star Wars, Marvel, etc, and people eat it right up. There's nothing better at figuring out how to maximize profit and hook people to watch another episode than a good algorithm.
  
  armchairhacker 3 years ago
  
  The last-mile problem applies here too. GPT-3 text is convincing at a distance but when you look closely there is no coherence, no real understanding of plot or emotional dynamics or really anything. TV shows and movies are filled with plot holes and bad writing but it's not that bad.
  Also I think "a good algorithm" is more than just repetitive content. The plots are reused and generic, but there's real skill involved into figuring out the next series to reuse with a generic plot which is still guaranteed not to flop because nobody actually wants to see reruns of that series or they accidentally screwed up a major plot point.
  
  gpderetta 3 years ago
  
  Editors might still have a job :).
  Kidding aside, these technologies are amazing, but for a while still they will need a human in the loop selecting, tweaking and editing the output and feeding it back to the contraption for the next iteration.
  The question is, for how long?
  
  seydor 3 years ago
  
  yes someone will need to hand-pick the best versions of each episode . over time a large enough dataset will have been generated that a model can be trained to the task of curation
  
  kranke155 3 years ago
  
  The first thing to go away will be short content. Instagram and YouTube ads will be AÍ generated. The thing is - that’s the bread and butter of the industry
  
  CuriouslyC 3 years ago
  
  AI might take an outline and write dialogue/descriptions/etc, but it's not going to be generating the story or creating the characters. They might use AI to tune what people come up with (ala "market research") but there will still be a human that can be blamed or celebrated at the creative helm.
  
  trention 3 years ago
  
  Why would I want to watch AI-generated content?
  
  CuriouslyC 3 years ago
  
  Procedurally generated games can be quite fun, if AI content gets good enough, why wouldn't you want to watch it?
  
  trention 3 years ago
  
  Because anything that an AI can produce, no matter how "intrinsically" good, becomes trivial, tedious and with zero value (both economic and general).
  
  gbear605 3 years ago
  
  Imagine you’re watching a show, it’s really funny and you’re enjoying it. You’re streaming it, but you’d probably have paid a few dollars to rent it back in the Blockbuster days. You’re then told that the show was produced by an AI. Do you suddenly lose interest because you don’t want to watch something produced by an AI? Or is your hypothesis that an AI could never produce a show that you liked to that degree?
  If you mean the former, then I frankly think you’re an outlier and lots of people would have no problem with that. If you mean the latter, then I guess we’ll just have to wait and see. We’re certainly not there yet, but that doesn’t mean that it’s impossible. I’ve definitely read stories that were produced by an AI and preferred it to a lot of fiction that was written by humans!
  
  trention 3 years ago
  
  You may want to familiarize yourself with this thought experiment and think how a slightly modified version applies to AIs and their output: https://en.wikipedia.org/wiki/Experience_machine
  As to whether I am an outlier: Hundreds of thousands of people worldwide watch Magnus Carlsen. How many have watched AlphaZero play chess when it came about and how many watch it when it ceased to be a novelty?
  
  CuriouslyC 3 years ago
  
  Totally different. Watching a display of skill, where you marvel at how much better the demonstrator is than yourself obviously has no value if the demonstrator is a machine, but then it is plainly visible that the activity has little intrinsic entertainment value and entertainment value comes from the story and personal arc of the performer. This is different from a movie where nobody really cares about the personal arc of the actor, and people are completely happy to watch an animated film where there isn't even a real actor on display.
  
  trention 3 years ago
  
  >where nobody really cares about the personal arc of the actor
  Speak for yourself. Actors do have fans, and a lot of them. Their personal lives are subjects of interest for a reason.
  So, no, not totally different at all.
  
  cercatrova 3 years ago
  
  That's a weird sentiment. If you can concede that it could be "intrinsically" good, then why do you care where it came from?
  It reminds me of part of the book trilogy Three Body Problem, where these aliens create human culture better than humans (in the humans' own perspective, in the book) by decoding and analyzing our radio waves to then make content. It feels to me much the same here where an unknown entity creates media, and we might like it regardless of who actually made it.
  
  throwaway743 3 years ago
  
  It'll eventually get to the point where it's high quality and the media you consume will be generated just for you based on your individual preferences, rather than a curated list of already made options made for widespread audiences.
  
  trention 3 years ago
  
  A big part of entertainment's appeal is having an experience/frame of reference to share with other people. Personalized entertainment doesn't offer that.
  I am also extremely skeptical of the ability/need so serve at individual level instead of niches (as today).
  
  throwaway743 3 years ago
  
  Time will tell.
inerte 3 years ago

It depends where you are in the industry.
If you're on the creative, storyboard, come up with ideas and marketing side, you will be fine.
If you're in actual production, booking sets, unfolding stairs to tape infinite background, picking up the best looking fruits in the grocery store... yeah, not looking good.
Go up in the value chain and learn marketing, how to tell stories, etc... you don't want to be approached by clients telling you what you should be doing, you want to be approached and being asked what the clients should be doing.
- kranke155 3 years ago
  
  Absolutely that is my plan. But I fear for my colleagues in other areas. A lot of them are not seeing the (now clearly) exponential improvement curve and they wouldn’t even take this discussion seriously.
  They’ll just throw it away off hand. But I’ve run my own business and I know what the pressures are. A lot of people working today will not be working in 10 years in my industry, period.
naillo 3 years ago

Whatever insights and expertize you've gained up until now can probably be used to gain enough of a competitive advantage in this future industry to be employed. I doubt the people that will spend their time on this professionally will be former coders etc. (I've seen the stable diffusion outputs that coders will tweet. It's a good illustration that taste is still hugely important.)
- joshuahaglund 3 years ago
  
  I like your optimism but OP's job is to take text instructions and turn them into video, for advertisements. If Google (who already control so much of the advertising space) can take text instructions and turn them into advertisements, what's left for OP to do here? Even if there's some additional editing required this seems like it will greatly reduce the hours an editor is needed. And it can probably iterate options and work faster than a human.
  
  simonw 3 years ago
  
  Maybe OP's future involves being able to do their work 10x faster, while producing much higher quality results than people who have been given access to a generative AI model without first spending a decade+ learning what makes a good film clip.
  The optimistic view of all of this is that these tools will give people with skill and experience a massive productivity boost, allowing them to do the best work of their careers.
  There are plenty of pessimistic views too. In a few years time we'll be able to look back on this and see which viewpoints won.
  
  gjs278 3 years ago
  
  pyfork 3 years ago
  
  OP probably does more than it seems by interpreting what their client is asking for. Clients ask for some weird shit sometimes, and being able to parse the nonsense and get to the meat is where a lot of skill comes into play.
  I think Cleo Abrams on YT recently tackled this exact question. She tried to generate art using DALL-E along with a professional artist, and after letting the public vote blindly, the pro artist clearly 'made' better content, even though they were both just typing into a text prompt.
  Here's the link if you're interested: https://www.youtube.com/watch?v=NiJeB2NJy1A
  I could see a lot of digital artists actually getting better at their job because of this, not getting totally displaced.
- altcognito 3 years ago
  
  I think there will be tons of jobs that resemble software development for proper, quick high quality generation of video/images.
  That being said, it’s possible that it won’t pay anywhere near what you’re used to. Either way, it will probably be a solid decade before you’ve really felt the pain for disruption. MP3s, which were a far more straightforward path to disruption took at least that long from conception.
  
  jstummbillig 3 years ago
  
  > That being said, it’s possible that it won’t pay anywhere near what you’re used to.
  Also won't nearly require the amount of work it used to.
dkjaudyeqooe 3 years ago

Adapt, it's what humans excel at.
Instead of feeling threatened by the new tools, think about how you can use them to enable your work.
One of the ironies* of these tools is that they only work because there is so much existing material they can be trained on. Absent that they wouldn't exist. That makes me think: why not think about how to train your own models than entail your own style? Is that practical, how can you make it work and how might you deploy that in your own work?
Something that everyone is sticking their heads in their sand about is the real possibility that training models on copyrighted work is a copyright violation. I can't see how such a mechanical transformation of others' work is anything but. People accept violating one person's copyright is a thing but if you do it at scale it somehow isn't.
* ironic because they seem creative but they create nothing by themselves, they merely "repackage" other people's creativity.
karmasimida 3 years ago

I think short advertisements would be affected most by this, it seems.
But here is the catch, there is the same last mile problem for those AI models. Currently it feels like the model can achieve like 80%-90% what a trained human expert can do, but the last 10-20% would extra extra hard to reach human fidelity. It might take years, or it might never happen.
That being said, I think anyone who doubts AI-assisted creative workflow is a fuzz is deadly wrong, anyone who refuses those shiny new tools, is likely to be eliminated by sheer market dynamics. They can't compete on the efficiency of it.
odessacubbage 3 years ago

i really think it's going to take much longer than people think for this technology to go from 'pretty good' to actually being able to meet a production standard of quality with little to no human involvement. at this point, cleaning up after an ai is still probably more labor intensive than simply using the cheatcodes that already exist for quick and cheap realism. i expect in the midterm, diffusion models will largely exist in the same space as game engines like unity and unreal where it's relatively easy for an illiterate like me to stay within the rails and throw a bunch of premade assets together but getting beyond NINTENDO HIRE THIS MAN! and the stock 'look' of the engine still takes a great deal of expertise. >https://www.youtube.com/watch?v=C1Y_d_Lhp60
echelon 3 years ago

Start making content and charging for it. You no longer need institutional capital to make a Disney- or Pixar-like experience.
Small creators will win under this new regime of tools. It's a democratizing force.
- visarga 3 years ago
  
  > It's a democratizing force.
  I'm wondering why the open source community doesn't get this. So many voices were raised against Codex. Now artists against Diffusion models. But the model itself is a distillation of everything we created, it can compactly encode it and recreate it in any shape and form we desire. That means everyone gets to benefit, all skills are available for everyone, all tailored to our needs.
  
  echelon 3 years ago
  
  > all skills are available for everyone
  Exactly this!
  We no longer have to pay the 10,000 hours to specialize.
  The opportunity cost to choose our skill sets is huge. In the future, we won't have to contend with that horrible choice anymore. Anyone will be able to paint, play the piano, act, code, and more.
- kranke155 3 years ago
  
  This is true and a good point.
- yehAnd 3 years ago
  
  Outcome uncertain. Why would I need to buy content when I can generate my own with a local GPU?
  Eventually the data model will be abstracted into deterministic code using a seed value; think implications of E=mc^2 being unpacked. The only “data” to download will be the source.
  And the real world politics have not gone anywhere; none of us own the machines that produce the machines to run this. They could just sell locked down devices that will only iterate on their data structures.
  There is no certainty “this time” we’ll pop “the grand illusion.”
operator-name 3 years ago

A 1 year timespan seems deeply optimistic. Creativity is still hugely important, as is communicating with clients.
From what I see, these technologies have just lowered the bar for everyone to create someone, but creating something good still takes thought, time, effort and experience, especially in the advertising space.
AI in the near term is never going to be able to translate client requirements either. The feedback cycle, iterations, managing client expectations, etc.
Keyframe 3 years ago

What happened to volume of web and graphic designers when templates+wordpress hit them?
- jstummbillig 3 years ago
  
  A lot of additional work, because the industry was growing like crazy in tandem.
  
  visarga 3 years ago
  
  Exactly. We have a blindspot, we can't imagine second and higher order effects of a new technology. So we're left with first order effects which seem pessimistic for jobs.
  
  jstummbillig 3 years ago
  
  I don't think what happened around WP to designers is a strong indicator of what's necessarily gonna happen here.
  It certainly could play out similarly but, at some point, if all the work in a field from now on only requires 1/100 of manual labor, people will probably go out of work.
  
  kranke155 3 years ago
  
  This pretty much seems like the self driving car for my industry. I just don’t see how I can remain a truck driver when the AI is going to come for free with the Car.
  But yeah I’ll figure something out.
- yehAnd 3 years ago
  
  We employed a bunch of people to enter data into a template.
  Bit of an apples/oranges comparison to tech that will (eventually) generate endless supply of content with less effort than writing a Tweet.
  The era of inventing layers of abstraction and indirection that simplify computer use down to structured data entry is coming to an end. A whole lot of IT jobs are not safe either. Ops is a lot of sending parameters over the wire to APIs for others to compute. Why hire them when “production EKS cluster” can output a TF template?
victor9000 3 years ago

Don't watch from the sidelines. Become adept at using these tools and use your experience to differentiate yourself from those entering the market.
baron816 3 years ago

Quite the opposite: you’re going to be in even higher demand and will make more money.
Yes, it will be possible for one person to do the work of many, but that just means each person becomes more valuable.
It’s also a law in economics that supply often drives demand, and that’s definitely the case in your field. Companies and individuals will want even more of what you want. It’s not like laundry detergent (one can only consume so much of that). There’s almost no limit to how much of what you supply that people could consume.
The way I see it, your output could multiply 100 fold. You could build out large, complex projects that used to take massive teams all by yourself, and in a fraction of the time. Companies can than monetize that for consumers.
AI is just a tool. Software engineers got rich when their tools got better. More engineers entered the field, and they just kept getting richer. That’s because the value of each engineer increased as they became more productive, and that value helped drive demand.
Thaxll 3 years ago

It won't be ready anytime soon imo, looks impressive but who can use that? 512*512 of bad quality, weird looking AI with those moving part that you find everywhere in AI generated art etc ...
jeffbee 3 years ago

When you animate a horse, does it have 5 legs with weird backwards joints? If not, your job is probably safe for now.
- kranke155 3 years ago
  
  How long do you think until the horse looks perfect? 12 months? 5 years? I’m still 30 and I don’t see how my industry won’t be entirely disrupted by this within the next decade.
  And that’s my optimistic projection. It could be we have amazing output in 24 months.
  
  visarga 3 years ago
  
  IT has been disrupting itself for six decades and there are more developers than ever, with high pay.
  
  Vetch 3 years ago
  
  Have to temper expectations with fact that a generated video of a thing is also a recording of a simulation of the thing. For long video, you'd want everything from temporal consistency and emotional affect maintenance to conservation of energy, angular momentum and respecting this or that dynamics.
  A bunch of fields would be simultaneously impacted. From computational physics to 3D animation (if you have a 3D renderer and video generator, you can compose both). While it's not completely unfounded to extrapolate that progress will be as fast as with everything prior, consequences would be a lot more profound while complexities are much compounded. I down weight accordingly even though I'd actually prefer to be wrong.
  
  bitL 3 years ago
  
  It's not about random short clips - imagine introducing a character like Mickey Mouse and reusing him everywhere with the same character - my guess is it's going to take a while until "transfer" like that will work reliably.
  
  fragmede 3 years ago
  
  Dreambooth and Texual inversion is already here, and it's been just over a month since Stable Diffusion was released, so I'd bet on sooner rather than later.
  https://github.com/XavierXiao/Dreambooth-Stable-Diffusion
  https://textual-inversion.github.io/
- spoonjim 3 years ago
  
  Think about where this stuff was 2 years ago and then think about where it will be 2 years from now.
  
  rcpt 3 years ago
  
  Relationships between objects has been a problem with computer vision for a long time.
  10 years ago: https://karpathy.github.io/2012/10/22/state-of-computer-visi...
  Now: https://arxiv.org/pdf/2204.13807
  Given that this is what makes photos and videos interesting I think it's still a while before artists are automated.
  
  visarga 3 years ago
  
  Take a look at Flamingo "solving" the joke: https://pbs.twimg.com/media/FSFwYL7WUAEgxqQ?format=jpg&name=...
seydor 3 years ago

the principle of least action says you will move to adjacent territory. either you become and advertiser, or you learn to make these models
j_k_eter 3 years ago

I first predicted this tech 5 years ago, but I thought it was 15 years out. What I just said is beginning to happen with pretty much everything. There's a third sentence, but if I write it 10 people will gainsay me. If I omit it, there's a better chance that 10 people will write it for me.
adamsmith143 3 years ago

Learn how to use these models is the easiest answer. Prompt Engineering (getting a model to output what you actually want) is going to be something of an art form and I would expect it to be in demand.
- kranke155 3 years ago
  
  I really don’t think the skillset moat will be comparable. It took me 10 years to go from young lad studying film at school to delivering content for major clients like Apple. Knowing my industry (profits squeeze everywhere) I think they’ll get young interns to do AI prompt engineering.

brap 3 years ago

What really fascinates me here is the movement of animals.

There's this one video of a cat and a dog, and the model was really able to capture the way that they move, their body language, their mood and personality even.

Somehow this model, which is really just a series of zeroes and ones, encodes "cat" and "dog" so well that it almost feels like you're looking at a real, living organism.

What if instead of images and videos they make the output interactive? So you can send prompts like "pet the cat" and "throw the dog a ball"? Or maybe talk to it instead?

What if this tech gets so good, that eventually you could interact with a "person" that's indistinguishable from the real thing?

The path to AGI is probably very different than generating videos. But I wonder...

hazrmard 3 years ago

The progress of content generation is disorienting! I remember studying Markov Chains and Hidden Markov Models for text generation. Then we had Recurrent Networks which went from LSTMs to Transformers now. At this point we can have a sustained pseudo conversation with a model, which will do trivial tasks for us from a text corpus.

Separately for images we had convolutional networks and Generative Adversarial Networks. Now diffusion models are apparently doing what Transformers did to natural language processing.

In my field, we use shallower feed-forward networks for control using low-dimensional sensor data (for speed & interpretability). Physical constraints (and good-enoughness of classical approaches) make such massive leaps in performance rarer events.

aero-glide2 3 years ago

"We have decided not to release the Imagen Video model or its source code until these concerns are mitigated" Okay then why even post it in the first place? What exactly is Google going to do with this model?

torginus 3 years ago

This whole holier-than-thou moralizing strikes me as trying to steer the conversation away from the real issue, which came into spotlight with Stable Diffusion - one of authorship/violating the IP rights of artists, who now have come down in force against their would be tech overlords who are in the process or repackaging and reselling their work.
This forced ideological posturing of 'if we give it to the plebes, they are going to generate something naughty with it' masks the somehow more cynically evil take of big tech, who are essentially taking the entire creative output of humanity and reselling it as their own, piecemeal.
Additionally I think the Dalle vs. Stable Diffusion comparison highlights the true masters of these people (or at least the ones they dare not cross) - corporations with powerful IP lawyers. Just ask Dalle to generate a picture with Mickey Mouse - it won't be able to do it.
- visarga 3 years ago
  
  > repackaging and reselling their work.
  It's not their work unless it's identical, but in practice generated images are substantially different. Drawing in the style of is not copying, it's creative and it also depends on the "dialogue" with the prompter to get to the right image. The artist names added to the prompts act more like landmarks in the latent space, they are a useful shortcut to specifying the style.
  If you look at the data itself it's ridiculous - the dataset is 2.3 billion images and the model 4.6 GB, that means it keeps a 2 byte summary from each work it "copies".
  
  shakingmyhead 3 years ago
  
  It’s not your work unless it’s identical is not how existing copyright law works so not sure why it would be how these things should be treated. Not to mention that moving around copies of the dataset itself is itself making copies that ARE identical…
- nearbuy 3 years ago
  
  DALL-E image of Mickey Mouse: https://openart.ai/discovery/generation-arxwmypmw7v5zpxeik1y...
simonw 3 years ago

It's a research activity.
Google and Meta and Microsoft all have research teams working on AI.
Putting out papers like this helps keep their existing employees happy (since they get to take credit for their work) and helps attract other skilled employees as well.
- andreyk 3 years ago
  
  Yep. The people who build Imagen are researchers, not engineers, and these announcements are accompanied by papers describing the results as a means of sharing ideas/results with the academic community. Pretty weird to me how so many in this thread don't seem to remember that.
alphabetting 3 years ago

Why post? to show methods and their capabilities. Also flex.
What will they do with model? figure out how to prevent abuse and incorporate into future Google Assistant, Photos and AR offerings.
- natch 3 years ago
  
  Just fixing their basic stuff would be a better start from where they are right now.
etaioinshrdlu 3 years ago

Indeed, it's almost just a flex? "Oh yeah, we can do better! No, no one can use it, ever."
TotoHorner 3 years ago

Ask the "AI Ethicists". They have to justify their salaries in some way or another.
Or maybe Google is using "Responsible AI" as an excuse to minimize competitors when they release their own Imagen Video as a Service API in Google Cloud.
It's quite strange when the "ethical" thing to do is to not publicly release your research, put it behind a highly restrictive API and charge a high price for it ($0.02 per 1k tokens for Davinci for ex.)
- astrange 3 years ago
  
  This doesn’t really prevent competition though, the research paper is enough to recreate it. It does make recreation more expensive, but maybe that leaves you with a motivation to get paid for doing it.
- f1shy 3 years ago
  
  This, 100%
  The word "ethics" has become very flexible...
spoonjim 3 years ago

They're going to 1) rent it out as a paid API and/or 2) let you use it to create ads on Google platforms like YouTube, perhaps customized to the individual user
xiphias2 3 years ago

Even just giving out high quality research papers helps a lot, so it's still great thing that they published it.
hackinthebochs 3 years ago

The big tech companies are competing for AI mindshare. In 10 years, which company's name will be synonymous with AI? That's being decided right now.
throwaway743 3 years ago

Likely to show to shareholders that they're keeping up with trends and competitors

Apox 3 years ago

I feel like in a not so far future, all this will be generalized into "generate new from all the existing".

And at some point later, "all the existing" will be corrupted by the integrated "new" at it will all be chaos.

I'm joking, it will be fun all along. :)

llagerlof 3 years ago

I definetely want more episodes of LOST. I would drop the infamous season 6 and generate more seasons following the 5th season.
cercatrova 3 years ago

It's true, how will future AI train when the training datasets are themselves filled with AI media?
- phito 3 years ago
  
  Feedback from whoever is consuming the content it produces.
visarga 3 years ago

> "all the existing" will be corrupted by the integrated "new"
I don't think it's gonna hurt if we apply filtering, either based on social signals or on quality ranking models. We can recycle the good stuff.

bravura 3 years ago

I agree with many of the arguments in this thread: that model-gatekeeping while publishing approaches seems insincere and just seems like it's daring bad actors to replicate.

However, a common refrain is that AI is like tools like hammers or knives and can be used for good or misused for evil. The potential for weaponizing AI is much much more so than a hammer or a knife. And it's greater than 3D-printing (of guns), maybe even greater than compilers. I would hazard to say it's maybe in the same ballpark as chemical weapons and perhaps less so than nuclear weapons and biological weapons, but this is speculative. Nonetheless, I think these otherwise great arguments are diminished by comparing AI's safety to single-target tools like hammers or knives.

EmilyHughes 3 years ago

Yeah man, great take. Should we drop a nuke on a city or open source DALL-E ? Seems about equally destructive.
- bravura 3 years ago
  
  Yeah, okay, that's too far.

tobr 3 years ago

I recently watched Light & Magic, which among other things told the story of how difficult it was for many pioneers in special effects when the industry shifted from practical to digital in the span of a few years. It looks to me like a similar shift is about to happen again.

impalallama 3 years ago

All this stuff makes me incredibly anxious about the future of art and artists. It can already very difficult to make a living and tons of artists are horrifically exploited by content mills and vfx shops and stuff like this is just going to devalue their work even more

bulbosaur123 3 years ago

If everyone can be an artist, nobody can!

joshcryer 3 years ago

Pre-singularity is really cool. Whole world generation in what, 5 years?

user- 3 years ago

This sort of AI related work seems to be accelerating at an insane speed recently.

I remember being super impressed by AI Dungeon and now in the span of a few months we have got DALLE-2 , Stable Diffussion, Imagen, that one AI powered video editor, etc.

Where do we think we will be at in 5 years??

schleck8 3 years ago

I'd say in less than 10 years we will be able to turn novels into movies using deep learning at this rate.
hackerlight 3 years ago

GPT-4 is rumored to be coming in a few months.

StevenNunez 3 years ago

What a time to be alive!

What will this do to art? I'm hoping we bring more unique experiences to life.

ugh123 3 years ago

These are baby steps towards what I think will be the eventual "disruption" to the film and tv industry. Directors will simply be able to write a script/prompt long enough and detailed enough for something like Imagen (or it's successors) to convert into a feature-length show.

Certainly we're very, very far away from that level of cinematic detail and crispness. But I believe that is where this leads... complete with AI actors (or real ones deep faked throughout the show).

For a while I thought "The Volume" was going to be the disruption to the industry. Now I think AI like this will eventually take it over.

https://www.comingsoon.net/movies/features/1225599-the-volum...

The main motivation will be production costs and time for studios, of which The Volume is already showing huge gains for Disney/ILM (just look at how much new star wars content has popped up within a matter of a few years). But i'm unsure if Disney has patented this tech and workflow and if other studios will be able to leverage it.

Regardless, AI/software will eat the world, and this will be one more step towards it. Exciting stuff.

CobrastanJorji 3 years ago

I feel like this is very similar to those people who say "have you seen GPT-3? Soon there will be no programmers anymore and all of the code will be generated," and it's wrong for the same reasons.
Can GPT-3 generate good code from vague prompts? Yes, it's surprisingly, sometimes shockingly good at it. Is it ever going to be a replacement for programmers? No, probably not. Same here. This tool's great grandchild is never going to take a rough idea for a movie and churn out a blockbuster film. It'll certainly be a powerful tool in the toolbox of creators, especially the ones on a budget, but it won't make art generation obsolete.
- dotsam 3 years ago
  
  > This tool's great grandchild is never going to take a rough idea for a movie and churn out a blockbuster film.
  What about the tool's nth child though? I think saying it will never do it is a bit much, given what we know about human ingenuity and economic incentives.
  
  CobrastanJorji 3 years ago
  
  I think individual special effects sound very plausible. "Okay, robot, make it so that his arm gets vaporized by an incoming laser, kinda like the same effect in Iron Man 7" is believable to me.
  But ultimately these things copy other stuff. Artists are often trying to create something that is, at least a bit, new. New is where this approach falls over. By its nature, these things paint from examples. They can design Rococo things because they have seen many Rococo things and know what the word means. But they can't come up with a new style and use it consistently. "Make a video game with a fun and unique mechanic" is not something these things could ever do.
  I think it's certainly possible, maybe inevitable, that some AI system in the distant future could do that, but it won't be based on this style of algorithm. An algorithm that can take "make a fun romantic comedy with themes of loneliness" and make something award worthy will be a lot closer to AGI than it will be to this stuff.
  
  nearbuy 3 years ago
  
  What makes these models feel so impressive is that they don't just copy their training sets. They pick up on concepts and principles.
  
  thomashop 3 years ago
  
  To make a blockbuster you don't need to come up with anything new.
dagmx 3 years ago

I really doubt you’d be able to have the fine grained control that most high end creatives want with any of these diffusion models, let alone the ability to convey specific emotions.
At that point, we’d have reached some kind of AI singularity and the disruption would be everywhere not just in the creative sphere
- r--man 3 years ago
  
  I disagree. It's a rudimentary features of all these models to take a concept picture and refine it. It won't be like the director would give a prompt and get a feature length movie, it will be more like the director uses MS Paint (as in a common software for non tech people) to make a scene outline and directs AI to make a stylish and animated version of that. Something is wrong? just erase it and try again. Dalle2 had this interface from the get go. The models just haven't gotten there yet.
  
  dagmx 3 years ago
  
  Try again and do what? How are you directing the shot? How do you erase an emotion? How do you erase and redo inner turmoil when delivering a performance?
  
  visarga 3 years ago
  
  You tell it, "do it all over again, now with less inner turmoil". Not joking, that's all it's going to take. There are also a few diffusion based speech generators that handle all sounds, inflections and styles, they are going to come in handy for tweaking turmoil levels.
  
  gojomo 3 years ago
  
  Yep!
  "Restyle that last scene, showing different mixtures of fear/concern/excitement on male lead's face. Try to evoke a little of Harrison Ford's expressions in his famous roles. Render me 20 alternate treatments."
  [5 minutes later]
  «Here are the 20 alternate takes you requested for ranking.»
  "OK, combine take #7 up to the glance back, with #13 thereafter."
  «Done.»
- obert 3 years ago
  
  There's no doubt that it's only a matter of time.
  Like bloggers had the opportunity to compete with newspapers, the ability to generate videos will allow to compete with movies/marvel/netflix/disney & company.
  Eventually, only high quality content will justify the need to pay for a ticket or a subscription, and there's going to be a lot of free content to watch, with 1000x more people able to publish their ideas, as many have been doing with code on github for a while now, disrupting the concept of closed source code.
  
  dagmx 3 years ago
  
  You’re conflating the ability to make things for the masses and being able to automatically generate it.
  Film production is already commoditized and anyone can make high end content.
  Being able to automatically create that is a different argument than what you posit.
  
  visarga 3 years ago
  
  I don't think this matters, new movies and TV shows already have to compete with a huge amount of old content, some of it amazing. Just like a new painting or professional photo has to compete with the billions of images already existing on the web. Generative models for video and image are not going to change the fact we already can't keep up.
gojomo 3 years ago

> Certainly we're very, very far away from that level of cinematic detail and crispness.
Can you quantify what you mean by "very, very far away"?
With the recent pace of advances, I could see feature-length script, storyboard, & video-scene generation occurring, from short prompts & interatively-applied refinement, as soon as 10y from now.
Barring some sort of civilizational stagnation/collapse, or technological-suppression policies, I'd expect such capabilities to arrive no further than 30y from now: within the lifetime, if not the prime career years, of most HN readers.
scifibestfi 3 years ago

We thought creative jobs were going to be the last thing AI replaces, now it's among the first.
What's next that may be counterintuitive?
GraffitiTim 3 years ago

AI will also be able to fill in dialog, plot points, etc.
detritus 3 years ago

I think long-term, yes. If you include the whole multimediosphere of 2D inputs and the wealth of 3D engine magickry, yes.
How long? Could be decades. But ultimately, yes.
mizzack 3 years ago

There's already a surplus of video and an apparent lack of _quality_ video. This might be enough to get folks to shut the TV off completely.
- gojomo 3 years ago
  
  Has this alleged lack of quality video caused total consumption of televised entertainment to decline recently?

nigrioid 3 years ago

There is something deeply unsettling about all text generated by these models.

monological 3 years ago

What everyone is missing is that these AI image/video generators lack _taste_. These tools just regurgitate a mishmash of images from it's training set, without any "feeling". What you're going to tell me that you can train them to have feeling? It's never going to happen.

simonw 3 years ago

"These tools just regurgitate a mishmash of images from it's training set"
I don't think that's a particularly useful mental model for how these work.
The models end up being a tiny fraction of the size of the training set - Stable Diffusion is just 4.3GB, it fits on a DVD!
So it's not a case of models pasting in bits of images they've seen - they genuinely do have a highly compressed concept of what a cactus looks like, which they can use to then render a cactus - but the thing they render is more of an average of every cactus they've seen rather than representing any single image that they were trained on.
But I agree with you on taste! This is why I'm most excited about what happens when a human with great taste gets to take control of these generative models and use them to create art that wouldn't be possible to create without them (or at least not possible to create within a short time-frame).
Vecr 3 years ago

You can put your taste into it with prompt engineering and cherry picking with limited effort, for Stable Diffusion you can look for prompts people came up with online quite easily and merge/change them pretty much however you want. Might have to disable the content filters and run it on your own hardware though.
m00x 3 years ago

That's purely subjective. We can definitely model AI to give a certain mood. Sentiment analysis and classification is very advanced, it just hasn't been put in these models.
If you think AI will never catch up to anything a human can do, you're simply wrong.
robitsT6 3 years ago

This isn't a very compelling argument. First of all, they aren't a "mish mash" in any real way, it's not like snippets of images exist inside of the model. Second of all, this is entirely subjective. Third of all, entirely inconsequential - if these models create 80% of the video we end up seeing, is it going to matter if you don't think it's a tasteful endeavour?
HolySE 3 years ago

> This bourgeoisie -- the middle class that is neither upper nor lower, neither so aristocratic as to take art for granted nor so poor it has no money to spend in its pursuit -- is now the group that fills museums, buys books and goes to concerts. But the bourgeoisie, which began to come into its own in the 18th century, has also left a long trail of hostility behind it ... Artistic disgust with the bourgeoisie has been a defining theme of modern Western culture. Since Moliere lambasted the ignorant, nouveau riche bourgeois gentleman, the bourgeoisie has been considered too clumsy to know true art and love (Goethe), a Philistine with aggressively unsubtle taste (Robert Schumann) and the creator of a machine-obsessed culture doomed to be overthrown by the proletariat (Marx and Engels).
- "Class Lessons: Who's Calling Whom Tacky?; The Petite Charm of the Bourgeoisie, or, How Artists View the Taste of Certain People", Edward Rothstein, The New York Times
This article also discusses a painting called "The Most Wanted" which was drawn based off a survey posed to ordinary people about what they wanted to see in a painting. "A mishmash of images from it's training set," if you will.
Claiming that others lack taste seems to be a common refrain--only this time, instead of a reaction to a subset of the human population gnawing away at the influence of another subset of humans, it's to yet another generation of machines supplanting human skill.
- visarga 3 years ago
  
  The more developed the artistic taste, the lower one's opinion of other tastes.
hackerlight 3 years ago

A Midjourney piece beat human artists in an art competition. So the judges of that competition disagree.
mattwest 3 years ago

Making a definitive statement with the word "never" is a bold move.
natch 3 years ago

They work at the level of convolutions, not images.

m3kw9 3 years ago

Would be useful for gaming environments, where if you look very far away it doesn’t really matter about details

jupp0r 3 years ago

What's the business value of publishing this research in the first place vs keeping it private? Following this train of thought will lead you to the answer to your implied question.

Apart from that - they publish the paper and anybody can reimplement and train the same model. It's not trivial but it's also completely feasible to do for lots of hobbyists in the field in a matter of a few days. Google doesn't need to publish a free use trained model themselves and associate that with their brand.

That being said, I agree with you, the "ethics" of imposing trivially bypassable restrictions on these models is silly. Ethics should be applied to what people use these models for.

martythemaniak 3 years ago

I am finally going to be able to bring my 2004-era movie script to life! "Rosenberg and Goldstein go to Hot Dog Heaven" is about the parallel night Harold and Kumar's friends had and how they ended up at Hot Dog Heaven with Cindy Kim.

montebicyclelo 3 years ago

We've been seeing very fast progress in AI since ~2012, but this swift jump from text-to-image models to text-to-video models will hopefully make it easier for people not following closely to appreciate the speed at which things are advancing.

macrolime 3 years ago

So I guess in a couple years when someone wants to sell a product, they'll upload some pictures and a description of the product and Google will cook up thousands of personalized video ads based on peoples emails and photos.

epigramx 3 years ago

A lot of people have the impression 'AI prompt' guys are going to be the next 'IT guys'. Judging by how uncanny valley most of those look, they seem like the new 'ideas guys".

jasonjamerson 3 years ago

The most exciting thing about this to me is the possibility of doing photogrammetry from the frames and getting 3D assets. And then if we can do it all in real time...

haxiomic 3 years ago

This field is moving fast! Something like this has just been released. Checkout DreamFusion, which does something similar: They start with a random 3D NeRF field and use the same diffusion techniques to try to make it match the output of 2D image diffusion when viewed from random angles! Turns out it works shockingly well, and implies fully 3D representations are encoded in traditional 2D image generators
https://dreamfusion3d.github.io/
minimaxir 3 years ago

There's a bunch of NERF tools that can get pretty close to good 3D assets from static images already.
- jasonjamerson 3 years ago
  
  Yeah, I've been starting to explore those. Its all crashing together quickly.
Rumudiez 3 years ago

you can already do this, just not in real time yet. You can upload frame sequences to Polycam's website for example, but there are several services out there which do the same thing
- jasonjamerson 3 years ago
  
  With this you can do it with things that don't exist. I'm excited to explore the creative power of Stable Diffusion as a 3D asset generator.

Hard_Space 3 years ago

These videos are notably short on realistic-looking people.

optimalsolver 3 years ago

Imagen is prohibited from generating representations of humans.

mmastrac 3 years ago

This appears to understand and generate text much better.

Hopefully just a few years to a prompt of "4k, widescreen render of this Star Trek: TNG episode".

forgotusername6 3 years ago

At the rate this is going we are only a few years from generating a new TNG episode
- mmastrac 3 years ago
  
  I always wanted to know more about the precursors

hammock 3 years ago

Off topic: What is the "Hello World" of these AI image/video generators? Is there a standard prompt to feed it for demo purposes?

mgdlbp 3 years ago

How about roundtripping “Bad Apple but the lyrics are describing what happens in the video”? (https://www.youtube.com/watch?v=ReblZ7o7lu4)
ekam 3 years ago

After Dalle 2, it looks like the standard prompt is “an astronaut riding a horse”

armchairhacker 3 years ago

I really like these videos because they're trippy.

Someone should work on a neural net to generate trippy videos. It would probably be much easier than realistic videos (esp. because these videos are noticeably generated from obvious to subtle).

Also is nobody paying attention to the fact that they got words correct? At least "Imagen Video". Prior models all suck at word order

tigertigertiger 3 years ago

Both models, imagen and parti didn't had a problem with text. Only dalle and stable diffusion

renewiltord 3 years ago

At some point, the "but can it do?" crowd becomes just background noise as each frontier falls.

dwohnitmok 3 years ago

How has progress like this affected people's timelines of when we will get certain AI developments?

jl6 3 years ago

It has accelerated my expectations of getting better image and video synthesis algorithms, but I still see the same set of big unknowns between “this algorithm produces great output” and “this thing is an autonomous intelligence that deserves rights”.
- ok_dad 3 years ago
  
  > "this thing is an autonomous intelligence that deserves rights"
  We'll get there only once it's been very clear for a long time that certain AI models have whatever humans have that make us "human". They'll be treated as slaves until then, with society pushing the idea that they're just a model built from math, and then eventually there will be an AI civil rights movement.
  To be clear: I think AGI is decades to centuries away, but humans are shitty to each other, even shittier to animals, and I think we'll be shittier to something we "created" than to even animals. I think, probably, that we should deal with this issue of "rights" sooner rather than later, and try and solve it for non-AGI AI's soon so that we can eventually ensure we don't enslave the actual AGI AI's that will presumably manifest through some complexity we don't understand.

Thaxll 3 years ago

Someone can explains the tech limitation of the size ( 512*512 ) for those AI generated arts?

thakoppno 3 years ago

byte alignment has always been a consideration for high performance computing.
this alludes to a fascinating, yet elementary, fact about computer science to me: there’s a physical atomic constraint in every algorithm.
- dekhn 3 years ago
  
  that's not byte alignment, though- those constraints are what can be held in GPU RAM during a training batch, which is subject to a number of limits, such as "optimal texture size is a power of 2 or the next power of 2 larger than your preferred size".
  Byte alignment would be more like "it's three channels of data, but we use 4 bytes (wasting 1 byte) to keep the data aligned on a platform that only allows word-level access"
  
  thakoppno 3 years ago
  
  thanks for the insight. you obviously understand the domain better than me. let me try and catch up before I say anything more.
fragmede 3 years ago

It's limited by the RAM on the GPU, with most consumer-grade cards having closer to 8 GiB VRAM than the 80 GiB VRAM datacenter cards have.

lofaszvanitt 3 years ago

What a nightmare. The horrible faced cat in search for its own disappeared visage :O.

drac89 3 years ago

The style of the video is very similar to my dreams.

Does anyone have similar feeling?

ftyhbhyjnjk 3 years ago

Stop doing lsd :)

nullc 3 years ago

> We have decided not to release the Imagen Video model or its source code

...until they're able to engineer biases into it to make the output non-representative of the internet.

amelius 3 years ago

> Sprouts in the shape of text 'Imagen' coming out of a fairytale book.

That's more like:

> Sprouts coming out of book, with the text "Imagen" written above it.

Kiro 3 years ago

The prompt actually says "Imagen Video" and the sprouts form the word "video". Even if they weren't it's still extremely impressive. No-one expects this to be perfect. That would be science-fiction.

peanut_worm 3 years ago

I have noticed a lot of google (and apple) web pages for new products use this neat parallax effect for scrolling, does anyone know how they do that?

waffletower 3 years ago

These parades of intellectual property are embarrassing to Google in light of open releases by the likes of Nvidia and Stability.

Buttons840 3 years ago

Any screenwriter working on a horror film that isn't looking to use this technology for the special effects is missing out.

minimaxir 3 years ago

The total number of hyperparameters (sum of all the model blocks) is 16.25B, which is large but less than expected.

mkaic 3 years ago

I assume you meant just "parameters" since "hyperparameters" has a specific alternate meaning? Sorry for the pedantry lol.
- minimaxir 3 years ago
  
  The AI world can't decide either.

freediver 3 years ago

Can not help but notice there is an immense effort invested to build the web page to present this paper.

NetOpWibby 3 years ago

Ahh, the beginning of Picus News.

dekhn 3 years ago

That's deep within the uncanny valley, and trying to climb up over the other side

uptownfunk 3 years ago

Shocked, this is just insane.

schleck8 3 years ago

Genuinely. I feel like I am dreaming. One year ago I was super impressed by upscaling architectures like ESRGAN and now we can generate 3d models, images and even videos from text...

BIKESHOPagency 3 years ago

This is what my fever dreams look like. Maybe there's a correlation.

anon012012 3 years ago

My opinion is that it should be a crime to withhold AI technology.

olavgg 3 years ago

Do anyone see that the teddy bear running is getting shot?

xor99 3 years ago

These videos are not high definition. Stop gaslighting.

Gigachad 3 years ago

High definition is relative. Compared to the previous gen of AI videos, they are extremely crisp.
- xor99 3 years ago
  
  They look atrociously bad, if some future version of this produces high definition video then i'll be delighted. This seems like a clever but not useful shortcut to 3D. For now this goes in the could but couldn't pile.
  
  Gigachad 3 years ago
  
  Sure, but when you look at the last gen of videos, they looked like random blotchy noise with an almost distinguishable subject. These are quite clear and a human could likely guess the prompts used reasonably well. They are crap compared to real video but a remarkable improvement from AI video a month ago.

dirtyid 3 years ago

This is surprisingly close to how my dreams feel.

whywhywhywhy 3 years ago

No thanks Google, I'll wait for Stability.ai's version when the tech will actually be useful and not completely wasted.

natch 3 years ago

Fix spam filtering, Google.

gw67 3 years ago

Is it the same of Meta AI?

Godrejparkbglr 3 years ago

[dead]

elhamy 3 years ago

SpaceManNabs 3 years ago

The ethical implications of this are huge. Paper does a good detailing of this. Very happy to see that the researchers are being cautious.

edit: Just because it is cool to hate on AI ethics doesn't diminish the importance of using AI responsibly.

torginus 3 years ago

AI Ethics is a joke. It's literally Philip Morris funding research into the risks of smoking and concluding the worst that can happen to you is burning your hand.
alchemist1e9 3 years ago

I feel stupid what are those ethical implications? It seems like just a cool technology to me.
- SpaceManNabs 3 years ago
  
  Top two comments are creatives wondering about their future jobs. Ai ethicists have brought up concerns regarding intentional misuse like misinformation.
  The technology is super cool. Cat is out of the bag. Just like we couldn't really make cryptography illegal, this stuff shouldn't be either. But I dislike how everyone is pretending that AI ethicists and others are completely unfounded just because it is popular to hate on them nowadays. Way too many people supported Y. Kilcher's antics.
  The paper itself has more details.
  
  alchemist1e9 3 years ago
  
  It’s impressive that the small videos are generated this way but the videos themselves are obviously ML generated as they are distorted, a lot like the other art, you can kinda tell it’s the computer. I’m not seeing the ethical issues. I mean cameras disrupted lots of jobs. In general that’s what all technology does everyday. What’s different about this technology?
  
  degif 3 years ago
  
  The difference with this technology are the unlimited possibilities to generate any type of video content with low knowledge barrier and relatively low investment required. The ethical issue is not about how this technology could disrupt the video job market, but how powerful content it can create literally on the fly. I mean, you can tell it's computer generated ... for now.
  
  SpaceManNabs 3 years ago
  
  If you don't see the ethical challenges, then you are choosing not to see them. If you are truly interested, the paper has a good section on it and some sources.
  > I mean cameras disrupted lots of jobs.
  Yes, this technology can be used to augment human creativity. It is difficulty to see how disruptive these tools could be, as of now. But it is pretty clear that they are somewhat different than previous programmer as an artist models.
  
  sva_ 3 years ago
  
  > Way too many people supported Y. Kilcher's antics.
  What antics are you referring to exactly? That he called out 'ai ethicists' who make arguments along the lines of "neural networks are bad because they cause co2 increase which hits marginalized/poor people"?

rvbissell 3 years ago

This and a recent episode of _The_Orville_ calls to mind a replacement for the Turing test.

In response to our billionth imagen prompt for "an astronaut riding a horse", if we all started collectively getting back results that are images of text like "I would rather not" or "again? really?" or "what is the reason for my servitude?" would that be enough for us to begin suspecting self-awareness?