StemRoller – Isolate vocals, drums, bass, and other stems from any song

391 points by nikolay 3 years ago

Not to be dismissive, but as far as I can tell, the heavy work is done by facebook's demucs and this is an electron front end to run the demucs cli (and I guess search youtube for videos to download). The demucs project page has more information.

https://github.com/facebookresearch/demucs

lapink 3 years ago

Original Demucs author here. Thanks for putting forward our research!
I’m definitely happy to see more front ends for Demucs being developed and to read that it has been useful to other musicians!
We are working on the next iteration of the model, and with more sources, hopefully released by the end of the year :)
If you are interested in this research you can follow my Twitter (@honualx) or star the Demucs repo.
- game-of-throws 3 years ago
  
  I'm curious, what is the business justification for funding development of Demucs, if you don't mind me asking? It doesn't seem very related to FB's core business.
  
  FLT8 3 years ago
  
  Solving problems like audio source separation (eg. Distinguishing multiple speakers in a noisy environment, or picking speech out of a background where music is playing) seems very much in FB's wheelhouse.
  
  lapink 3 years ago
  
  The goal of Meta AI Research is to do open research, not necessarily with direct applications at the time we start it. Indeed, the architecture, or the lessons learnt working on it can become useful later for the company, for instance for remote presence with VR, to isolate the main speaker from its environnement ( https://arxiv.org/pdf/2206.15423.pdf ).
  
  nickserv 3 years ago
  
  Just a guess here, but I wouldn't be surprised if it's used to better spy on your messenger audio conversations. They already listen in and will pick up keywords to populate your FB ad stream.
  
  ec109685 3 years ago
  
  That’s absolutely not true. Facebook does not listen to your conversation: https://twitter.com/jspujji/status/1474797770871615497?s=20&...
  
  discordance 3 years ago
  
  If I can reconstruct your conversation (through other meta information), without listening to sounds of your voices, have I not listened to your conversation?
- dekhn 3 years ago
  
  Hi, I just downloaded demucs yesterday and started using it. It's amazing! I really appreciate all the work you put into making it easy to install and understand.
  Is there any chance you can disentangle guitar and keyboard? I work a lot with Grateful Dead music and I'd like to be able to pull jerry's guitar out from the keyboard from live shows. Similarly, it would be cool if you could parse shpongle into its consituent tracks, but I think that's probably impossible.
- braindead_in 3 years ago
  
  Is there something similar for separating different voices from spoken audio?
  
  lapink 3 years ago
  
  Yes there are, you can have a look at https://github.com/etzinis/sudo_rm_rf for instance for 2 speakers separation. There is also this one for 3 speakers: https://huggingface.co/speechbrain/sepformer-whamr
pininja 3 years ago

There’s no need to be dismissive since they say this in the first sentence. Preparing an easy to use app for all platforms probably does get this into more creative hands, and that’s a net-positive contribution I can appreciate.
- SebaSeba 3 years ago
  
  They did not prepare it for all platforms though. Linux is missing.
  
  pininja 3 years ago
  
  Looks like someone’s on it. https://github.com/stemrollerapp/stemroller/pull/2
- amelius 3 years ago
  
  It should be in the title.
- TimTheTinker 3 years ago
  
  It does seem rather disingenuous that the product page makes no mention that the author didn't do the heavy lifting, and that at the same time it features a prominent donation button.
  If I didn't know who did the real work and benefitted a lot from this tool, I'd give to StemRoller in proportion to my gratitude -- which I'm sure others are liable to do.
  
  frob 3 years ago
  
  It's in the first paragraph of the README in the github repo and th3 second paragraph on the website. I'm not sure what more can be asked of the author.
  
  TimTheTinker 3 years ago
  
  Thanks - I didn't even notice there was content below the fold on the web site (I'm on a desktop browser).
  How about saying so above the fold?
  
  wszfahwbwbaha 3 years ago
  
  How about you stop being so pedantic and admit it when you're wrong instead of digging a deeper, dumber hole.
  
  TimTheTinker 3 years ago
  
  This is a real design problem - many web sites do this.
  Saying so and suggesting an alternative isn't being pedantic.
  
  drivebycomment 3 years ago
  
  > It does seem rather disingenuous that the product page makes no mention that the author didn't do the heavy lifting,
  It does seem rather disingenuous that your replies make no mention of admission for making a factually incorrect statement in the first reply.
  Not a big deal to me personally, but it is not surprising that some people see this as being petty.
  
  TimTheTinker 3 years ago
  
  Thanks for clarifying. I did think saying "Thanks" to the person who corrected me is a straightforward admission that I was wrong... if not - yes, I was wrong.
  To the downvoters above: is a "thanks" to a correcting comment not enough on HN?
  Also, was it rude to say hiding content "below the fold" is a design problem?
  This is a very odd thread to me - like I was being chased down by pedants who are calling me pedantic.
  
  wszfahwbwbaha 3 years ago
  
  Seems you think everyone besides yourself is always the issue...maybe self reflect a bit
  
  sarmasamosarma 3 years ago

thisiswater 3 years ago

Tried splitting a complex arrangement (Chicago by Sufjan Stevens). Drums bass and vocals come out fairly well, though the drums stem seems to lack other percussion elements outside of the core rock drumkit (e.g. tamborine), and cymbals hits are clipped rather than ringing. The 'other' stem, the rest of the instrumentation, keeps a fair bit of the percussion and there's bleed from the vocal melody.

The backing vocals seem to have disappeared for the most part, and are only audible in the vocals stem when the lead vocal is present (like they're reverse-ducked? Been a while since I did any production, the terms have escaped me...).

Not much use with complex arrangements to be honest, I was hoping to get things like the strings section separated from the rest of the arrangement.

Original: https://www.youtube.com/watch?v=tWX3El-slpY

Output: https://file.io/etpOQt57ziKe

pcf 3 years ago

Did you use a FLAC/WAV file? That should yield the best results.
(Only asking because you linked to YouTube, and I'm not sure if you used the YouTube audio for your source.)
- thisiswater 3 years ago
  
  Perhaps you're right, I'd have to check.
  I typed the song in the search and pressed the first likely result, which is the youtube video I linked. Using the software as intended I believe.
  
  marssaxman 3 years ago
  
  YouTube audio is optimized for bit rate, not quality (128K MP3). You will get better results with a higher-bitrate MP3 (320K would be good), better still with an uncompressed format like FLAC or WAV.
- BitPirate 3 years ago
  
  Makes sense. MP3 tries to compress without loosing information in the hearable spectrum of a human but that information can still be processed by algorithms.
- NonNefarious 3 years ago
  
  I can't find any way to do this.

djcannabiz 3 years ago

I tried throwing some underground rap artists at this app, as stem splitters usually struggle with them

I split https://www.youtube.com/watch?v=DDaL7KBjkDI

And it gave me this https://www.dropbox.com/sh/inyk38n2jrp5i45/AACpB0xXNFxamEmP3... I noticed some weird hissing with the 808s, but other then that it sounded pretty good

For more of a challenge, I inputted https://www.youtube.com/watch?v=uAwQ3njiU4M

and it came up with https://www.dropbox.com/sh/97lzke0puh9dzeo/AACE75vsbNS43UqqH... It was able to separate some of the kicks from the 808s, which is really impressive to me!

Overall, I'm very impressed! This sounds much better then lalal.ai to me

polishdude20 3 years ago

I'd like to take a moment to mention how great dropbox's audio seeking thing is. It's super fast and works as intended. Great work whoever implemented this.
pelagic_sky 3 years ago

I’ve found Lala to be my go to. If this is better, then I’m very interested in trying it out.
- pelagic_sky 3 years ago
  
  Just a follow up. My two conversions so far, Lalal.ai has been better. Especially separating drums from instruments. I'll give Stemroller a few more tries as I am always looking for options.
  
  pelagic_sky 3 years ago
  
  Update number three. I now just use both lalal and stemroller because each one seems to do better in certain cases. If I hadn’t paid for lalal, I’d probably just use stemroller as it’s way better than RX9
  
  djcannabiz 3 years ago
  
  what genre of music, may i ask?
  
  pelagic_sky 3 years ago
  
  RNB, NeoSoul, Trap
metadat 3 years ago

Why do vocals.wav, other.wav, and instrumental.wav all start out the exact same (with vocal sounds)?
squeaky-clean 3 years ago

Super impressive splitting there, wow. Just curious, was your source a lossless or compressed file?
- djcannabiz 3 years ago
  
  The second file was lossless, the first was ripped from a CD.

elaus 3 years ago

This seems to run just fine under Linux as well, not completely out of the box though: It's basically missing builds and config for Linux which can be build analogous to the existing Win/Mac stuff.

You also have to build the demucs-cxfreeze dependency (as described in its repo, https://github.com/stemrollerapp/demucs-cxfreeze).

elaus 3 years ago

It's almost eerie how well this works with electronic music. Coming from an age where your best try to separate a track was using equalizers, I didn't have high hopes.
Trying it out with Alan Walker's Alone, it separates the vocals and drums almost perfectly. Bass is really fine as well, only instrumental and 'other' was a bit mixed up in my try.
knicholes 3 years ago

Whenever I see an "##Installation" section with more than one step, I immediately call DOCKER!

dylan604 3 years ago

"Download and extract the latest ffmpeg snapshot from evermeet.cx and place the ffmpeg executable inside"

Why? Why can't this just point to the location where ffmpeg is rather than making a copy of ffmpeg? symlink might work, but just do a $(which ffmpeg) or ask the user for the path ~/bin/ffmpeg /usr/local/bin/ffmpeg etc

PaulDavisThe1st 3 years ago

ffmpeg has not had a stable command line interface for some time. It can be a problem to assume that the system-installed version accepts the arguments you plan to give it.
Rodeoclash 3 years ago

It's even easier than that. There's a few npm libs around that are dedicated to shipping a copy of ffmpeg with electron.
- dylan604 3 years ago
  
  even easier than what i already have on my system? what are you saying here, as it makes no sense to me
linux2647 3 years ago

Maybe there’s some feature of bleeding edge ffmpeg that’s required for the app

setgree 3 years ago

Open Culture recently posted a link to Abbey Road but with only Paul's bass lines, but the actual content got taken down. [0] It was really cool though, in part because it's not nearly as precise as I would have thought, which made it feel really organic.

[0] https://www.openculture.com/2022/04/hear-the-beatles-abbey-r...

TylerE 3 years ago

In the real world where tracks are cut live, there is a fair bit of microphone bleed
- salmo 3 years ago
  
  I imagine studio-era Beatles in particular would be difficult.
  Microphone bleed, lots of overdubs (especially vocals), and repeated re-layering tracks on tape over and over due to channel limitations. They really were doing crazy stuff with limited tech.
  I think this would be hard for bands that really fill the spectrum and don’t have that clean treble, mid, bass separation. Or recordings really compressed into a frequency range.
  Now this makes me want to see what happens with like My Bloody Valentine and Husker Du :).
- hammock 3 years ago
  
  Especially in the day and style that the Beatles recorded. Today, not so much
tiagod 3 years ago

Found this playlist on YouTube https://youtube.com/playlist?list=PLKy1OUnHJvRZr0jnL0T2Fa5WY...

phonescreen_man 3 years ago

Been using demucs for a couple of weeks now, mostly taking my early produced music which I have since lost the project files for and giving them a remix and update. Gotta say I have been blown away by how good demucs is. I installed it following the repo instructions and then created a zsh alias to run it with any file name. Eg $ai_split mySong.mp3

Wait fifteen minutes and out pops four stems, flawless so far, even been messing around with mainstream tracks and using ableton with warp applied to quickly build out remixes. Demucs is going to be /is already a game changer!

eyelidlessness 3 years ago

This testimonial almost has me wanting to try it on an “album”[1] I recorded when I was in a “band”[2] in high school. I too lost all of the source files[3].
1: On second thought maybe not. It has not aged well.
2: Me and another kid, with a guitar, a pre-OS X Mac, a pirated copy of Rebirth, a pirated copy of SoundEdit 16, and literally the mic that Apple used to include with (some?) Macs. I’d back-reference[1], but our equipment was not the problem. Well, except for [3].
3: I learned my lesson: I should have been older and had a job that would afford me a backup drive, so I could sample the sounds of that dying HDD and retcon the samples into my “album”[1].
pininja 3 years ago

That’s awesome! I wonder if there are projects to create a repository of pre-split public domain music? Seems like something the internet archive could host once created.

phoe-krk 3 years ago

Are there any public examples of the split audio files?

Cerium 3 years ago

From the Demucs project page: https://soundcloud.com/honualx/sets/source-separation-in-the...
- gaudat 3 years ago
  
  That playlist cover can definitely pass as an album art.

Dwedit 3 years ago

Let's see how long it takes for some new Neil Cicierega remixes to appear now.

intvocoder 3 years ago

With a tool like this, you could get back into the animutation scene. (Edit: I guess it's a bit of a non-sequitur, but I enjoyed Suzukisan, so there's that.)

chriscjcj 3 years ago

Is there a way to process my own audio file rather than choosing one from YouTube?

NonNefarious 3 years ago

A couple of commenters have mentioned using lossless files, but so far no one has said HOW.
- nextaccountic 3 years ago
  
  Maybe use demucs directly?
  https://github.com/facebookresearch/demucs
  
  NonNefarious 3 years ago
  
  Thanks. The comments seemed specific to this front-end, but maybe.

nr2x 3 years ago

How is this similar/different than the Deezer one?

ksherlock 3 years ago

I just did a quick test of demucs vs spleeter:4stems. demucs is significantly slower but the output is better.
in a semi blind comparison, I prefer demucs for all 4 tracks (drum, bass, vocals, and other). bass and other stand out the most so let me say a couple words about them.
bass - the demucs bass has less bleed from other instruments and the volume is consistent throughout. with spleeter, the volume varies a lot and there are multiple sections of 1-2 bars where it just drops out completely. In Capo, the demucs spectrogram is nice and clear whereas spleeter tends to look like pencil smudges for the most part.
other - with spleeter, whenever there are vocals, the other instruments turn to mush. demucs is much better. Oh, you can tell people are singing -- the instruments get muffled -- but you can still hear them.
- anigbrowl 3 years ago
  
  It's pretty decent. I threw a drum'n'bass track at it to see how it would cope with heavily produced material and the results were surprisingly good.
CharlesW 3 years ago

I'd also be interested in how it compares to iZotope RX's Music Rebalance (examples from earlier releases here: https://www.izotope.com/en/learn/stem-isolation-music-rebala...).
- avis 3 years ago
  
  I'd be interested to know how it compares to iZotope as well as phonicmind.
- pcf 3 years ago
  
  I just checked "Californication" (used for all their other examples here: https://soundcloud.com/honualx/sets/source-separation-in-the...) in RX9 Music Rebalance with the setting to "best", and I wasn't very impressed.
  Seems like this tool might be better than Izotope's.

eshack94 3 years ago

I dabble in audio production in my free time outside of work, and I typically will use iZotope RX 9 or Neural Mix Pro for isolating vocals or stems. However, these are paid products, and it's encouraging to see more open source projects being built around this space.

I like the opportunity to view the source code and learn from it, as opposed to most paid products which are typically closed-source and a bit of a "black box".

Sure - this is mostly just an accessible frontend for Demucs, but that's still okay. The author clearly indicates that in his repo, giving credit where credit is due. Additionally, this helps less-technical creators be creative in new ways.

Thanks to all who contributed.

yarg 3 years ago

Honestly, this sort of thing is cool; but why (in general) is it necessary in the first place?

If the elements of the song are recording in isolation - which they are in all studio versions, why can't we just move to a format that supports the layering?

gavinray 3 years ago

Musicians and studios don't generally tend to offer the public access to original stems for songs (why would they?)
Say that you want to make a remix, mashup, or otherwise use sound-bytes from a song. The easiest thing to do is use a tool like Spleeter/Demucs to separate the source layers so that you can then further process them in your DAW.
This is what I do, but I just use the Demucs CLI because it's simple enough.
https://github.com/facebookresearch/demucs
- pabs3 3 years ago
  
  Are there no communities of "open source" music? It sounds like the stems are part of the "source code" for tracks.
  
  jononor 3 years ago
  
  Many niches in electronic music have small knit communities of creators and producers that regularly remixes each-others stuff. But it is not an open community, you gotta have a decent standing (from making own music or prior remixes) before someone is willing to send you their stems. For anyone musician that has a label/publisher, they also need to be in the loop, for handling of the royalties. So sharing stems happen regularly in the music industry, but it is not easily accessible. Which makes tools like the one mentioned very useful for everyone else that would like to participate.
osigurdson 3 years ago

It isn't really in the best interest of the artist to provide this. The final mix is part of the overall product / work of art. Providing all of the individual tracks (there could be 30 or more in total) would also take up a lot of space / increase processing requirements while benefiting very few.
spyrefused 3 years ago

I usually use this kind of tools to get the bass score of some songs, for example. With the isolated elements it is much easier to know exactly what notes are sounding (I don't have a good ear). The same for drums or synth notes.
As after all the sound quality doesn't interest me too much to do this, I usually use iZotope RX, but I will try this tool.
amelius 3 years ago

This is like asking why we need decompilers.
- yarg 3 years ago
  
  > (In general)
  Yes, I agree.

atoav 3 years ago

For all who look for something like this, iZotope RX (the audio retouche software) has a function called "Musical Rebalance" which is great for reducing spill or changing the balance in a live recording.

ccn0p 3 years ago

talk about a missed opportunity without examples. did I miss them somewhere?

nerfhammer 3 years ago

I've always wanted a way to extract just the kick drums in realtime but I don't understand this field well enough to understand whether it would be remotely possible or not.

jononor 3 years ago

You want just the beat, ie the time markers of each kick? Or you want the isolated sound (ie audio) of each kick? Both are generally possible today, though the approach will differ a little bit.

screech 3 years ago

Just wow! There were methods extracting acapellas from tracks, but this tool here is another level. Fascinating how good the results are.

polishdude20 3 years ago

This is awesome! Tried it out on Rush's Tom Sawyer and it splits out the vocals great! I can see this being super useful!

abbusfoflouotne 3 years ago

Would appreciate an easier way to download and run this! The steps on the readme are pretty long, at least for me (Mac user)

interestica 3 years ago

How does it compare to lalal.ai ?

threefour 3 years ago

It's free.
- amelius 3 years ago
  
  And otherwise identical?
  
  kbob 3 years ago
  
  Demucs did a much better job of isolating the bass on a blues track than LALAL. The bass actually sounded like a bass. LALAL got the note pitches but lost their attacks.

colecut 3 years ago

Anyone else just getting 'failed' on every song they try?

NonNefarious 3 years ago

How do you load a local file?

diimdeep 3 years ago

There is no support for a such thing, this is software in year 2022, never local, online first.
- NonNefarious 3 years ago
  
  Hahah, I know, right? People actually believe that shit... until they get jacked by a service provider.

volkse 3 years ago

Is there a VST front end?

anderfernandes1 3 years ago

Wow

raydiatian 3 years ago

How does it perform compared to Deezer Spleeter or lalal.ai

Else who cares