Noise2Music: Generating Music from Text Using Diffusion Models

noise2music.github.io

72 points by georgehill a year ago

I'd like to see something like this used to generate an instrument from text. I don't think the 30 second clips are passable quite yet (I do like the simlish-esque vocals though). But I could see this being able to generate wavetables (or other synthesis methods). Generating an instrument from a text description would be very neat. "scratchy violin", "distorted kazoo", "combo violin and slide whistle", etc. It could be an interesting starting point to play with.

jjarvis a year ago

I like this idea too.
I would also propose taking it in the direction of generating synthesizer parameters for a popular VST or Hardware synth instrument. As a musician, it would be very nice to be able to program a synthesizer through plain text as a starting point.

nighthawk454 a year ago

Pretty similar to Google Research's recent MusicLM:

https://google-research.github.io/seanet/musiclm/examples/

kleiba a year ago

Do you think? I find the Google examples way more impressive.

Hydraulix989 a year ago

"AI plagiarism" spotted.

The Spectrogram Model for #23 prompt "It sounds energetic and like something you would hear in clubs." sounds almost EXACTLY like "Psy - Gangnam Style"...

The model is hallucinating what it was trained on.

williamcotton a year ago

I mean, I can sort of hear it, but there's enough of a difference that it's still original, if not musically derivative... but this is electronic dance music we're talking about! If people are going to a French House night they expect something musically derivative! The genre depends on seamlessly mixing various works together so there has to be an incredible amount of similarities.
- Hydraulix989 a year ago
  
  Are you sure? I can hear Psy's voice and recognize the lyrics.
  
  williamcotton a year ago
  
  I've been going through Gangnam Style in detail and the more I compare the two the more I find that they are very different.
  Guilty of terribly trite, cliched and overall bad music, but not guilty of plagiarism.
  
  Hydraulix989 a year ago
  
  You must not be very musically inclined then. I am a professionally trained musician. You need a good ear to spot the rhythmic similarities.
mensetmanusman a year ago

You mean like most pop music?
https://youtu.be/9oCgSE-Le0c
Oranguru a year ago

Hallucinating? you mean overfitting, right?
taybeck a year ago

I thought the same exact thing upon listening to #23. It's pretty obviously using elements from it IMO, even though it's layering and warping them. Hard to prove though...

GaggiX a year ago

The clips are really good but I cannot find anything online about this model, only this page, so I wonder where this link came from.

sdenton4 a year ago

It's probably an ICML 2023 submission. The submission deadline was last Thursday, and there's an anonymous review process starting now.

armchairhacker a year ago

Unless I missed any this sounds a lot more realistic than existing music-generating models. It's downsampled and can't do lyrics (just vocalizations) but the samples are passable for some muffled song you would hear in the background (e.g. from a passing car)

epistemer a year ago

More artificial Muzak generation that absolutely no one will ever listen to.

My fav is the "hippie coffee shop" jam band clip. That will surely corner the market for Jam band background Muzak at "hippie coffee shops". Total available market of like $5.

At best this new synthesis technique will be an Autechre album.

cwmoore a year ago

> 28 The snare is struck at every third count.

I don't exactly know how to interpret this prompt, and the resulting solo drums meander around as though they don't either. Not really on threes or waltz or 1/3 notes, but a brief tour through all of these and other rhythms.

8jy89hui a year ago

On my phone (iOS) I can’t seem to get any of the samples to play.

williamcotton a year ago

Same with Safari on MacOS. They're WAV files so it's not a format issue. I can see that the network request fails in the inspector but that's the extent of the debugging work I'm willing to put in this morning!

kleer001 a year ago

I call BS. Unless there's more data to be had.

SyrupThinker a year ago

I don’t know about the general claims of Noise2Music (I can’t play the samples),
but Riffusion[1] uses the spectrogram approach and kind of works.
[1]: https://www.riffusion.com/about