I'd like to see something like this used to generate an instrument from text. I don't think the 30 second clips are passable quite yet (I do like the simlish-esque vocals though). But I could see this being able to generate wavetables (or other synthesis methods). Generating an instrument from a text description would be very neat. "scratchy violin", "distorted kazoo", "combo violin and slide whistle", etc. It could be an interesting starting point to play with.
I would also propose taking it in the direction of generating synthesizer parameters for a popular VST or Hardware synth instrument. As a musician, it would be very nice to be able to program a synthesizer through plain text as a starting point.
The Spectrogram Model for #23 prompt "It sounds energetic and like something you would hear in clubs." sounds almost EXACTLY like "Psy - Gangnam Style"...
The model is hallucinating what it was trained on.
I mean, I can sort of hear it, but there's enough of a difference that it's still original, if not musically derivative... but this is electronic dance music we're talking about! If people are going to a French House night they expect something musically derivative! The genre depends on seamlessly mixing various works together so there has to be an incredible amount of similarities.
I thought the same exact thing upon listening to #23. It's pretty obviously using elements from it IMO, even though it's layering and warping them. Hard to prove though...
Unless I missed any this sounds a lot more realistic than existing music-generating models. It's downsampled and can't do lyrics (just vocalizations) but the samples are passable for some muffled song you would hear in the background (e.g. from a passing car)
More artificial Muzak generation that absolutely no one will ever listen to.
My fav is the "hippie coffee shop" jam band clip. That will surely corner the market for Jam band background Muzak at "hippie coffee shops". Total available market of like $5.
At best this new synthesis technique will be an Autechre album.
I don't exactly know how to interpret this prompt, and the resulting solo drums meander around as though they don't either. Not really on threes or waltz or 1/3 notes, but a brief tour through all of these and other rhythms.
Same with Safari on MacOS. They're WAV files so it's not a format issue. I can see that the network request fails in the inspector but that's the extent of the debugging work I'm willing to put in this morning!
I'd like to see something like this used to generate an instrument from text. I don't think the 30 second clips are passable quite yet (I do like the simlish-esque vocals though). But I could see this being able to generate wavetables (or other synthesis methods). Generating an instrument from a text description would be very neat. "scratchy violin", "distorted kazoo", "combo violin and slide whistle", etc. It could be an interesting starting point to play with.
I like this idea too.
I would also propose taking it in the direction of generating synthesizer parameters for a popular VST or Hardware synth instrument. As a musician, it would be very nice to be able to program a synthesizer through plain text as a starting point.
Pretty similar to Google Research's recent MusicLM:
https://google-research.github.io/seanet/musiclm/examples/
Do you think? I find the Google examples way more impressive.
"AI plagiarism" spotted.
The Spectrogram Model for #23 prompt "It sounds energetic and like something you would hear in clubs." sounds almost EXACTLY like "Psy - Gangnam Style"...
The model is hallucinating what it was trained on.
I mean, I can sort of hear it, but there's enough of a difference that it's still original, if not musically derivative... but this is electronic dance music we're talking about! If people are going to a French House night they expect something musically derivative! The genre depends on seamlessly mixing various works together so there has to be an incredible amount of similarities.
Are you sure? I can hear Psy's voice and recognize the lyrics.
I've been going through Gangnam Style in detail and the more I compare the two the more I find that they are very different.
Guilty of terribly trite, cliched and overall bad music, but not guilty of plagiarism.
You must not be very musically inclined then. I am a professionally trained musician. You need a good ear to spot the rhythmic similarities.
You mean like most pop music?
https://youtu.be/9oCgSE-Le0c
Hallucinating? you mean overfitting, right?
I thought the same exact thing upon listening to #23. It's pretty obviously using elements from it IMO, even though it's layering and warping them. Hard to prove though...
The clips are really good but I cannot find anything online about this model, only this page, so I wonder where this link came from.
It's probably an ICML 2023 submission. The submission deadline was last Thursday, and there's an anonymous review process starting now.
Unless I missed any this sounds a lot more realistic than existing music-generating models. It's downsampled and can't do lyrics (just vocalizations) but the samples are passable for some muffled song you would hear in the background (e.g. from a passing car)
More artificial Muzak generation that absolutely no one will ever listen to.
My fav is the "hippie coffee shop" jam band clip. That will surely corner the market for Jam band background Muzak at "hippie coffee shops". Total available market of like $5.
At best this new synthesis technique will be an Autechre album.
> 28 The snare is struck at every third count.
I don't exactly know how to interpret this prompt, and the resulting solo drums meander around as though they don't either. Not really on threes or waltz or 1/3 notes, but a brief tour through all of these and other rhythms.
On my phone (iOS) I can’t seem to get any of the samples to play.
Same with Safari on MacOS. They're WAV files so it's not a format issue. I can see that the network request fails in the inspector but that's the extent of the debugging work I'm willing to put in this morning!
I call BS. Unless there's more data to be had.
I don’t know about the general claims of Noise2Music (I can’t play the samples),
but Riffusion[1] uses the spectrogram approach and kind of works.
[1]: https://www.riffusion.com/about