The worse problem for YouTube Music is that some of the source material is not the same as what you may have on physical media. There are entire labels that provide worse or at least different materials than they published to record stores. And then there is remaster roulette. What am I going to hear if I press play on "Rumours": the original that was born perfect, or one of the half-dozen remasters that have been issued as the original engineer progressively loses his hearing?
Yeah, the ADM certified stuff, as far as I know, is a small percentage of the music on there. The distributors I've worked with don't let you have a separate Apple Music version in the same release (a lot of them don't even let you pick which stores it goes to, it's just bulk submit everywhere), so it's extra work and/or cost to make an AM specific version. And only the top mastering talent seem to be ADM certified- I don't see many advertising it (some do). I figure if an ADM master is made, that one is submitted everywhere, but it's really a minority of the releases. Most of them maybe conform to it by happenstance by way of just making a nice sounding master, but I'm not aware of Apple rejecting loud releases, for example. You can submit a song mastered at +6 LUFS but they will just turn it down when someone plays it.
Apple Digital Masters indeed recommends no clipping. But the program is based on Apple interviewing the process that to-be-certified mastering engineers use. There is a way where you can pass the interview, get certified and then decide that clipping sounds better/louder/whatever and the file will still get the ADM stamp.
This is unfortunately the case on other platforms like Spotify too. :(
Due to what I assume are music licensing oddities, one song in my Spotify now has an entirely different singer. It was taken down for a while, then later reuploaded as a new recording with new vocals. I’ve also seen copies of my songs changing to remix/cover versions, presumably due to metadata adjustments by the artists. Although usually I can re-search and find the original, adding it back to my library.
A few songs just have new outros/intros. One song in my library now has an additional several seconds of silence on the end. One has a new longer intro that I think is really from the music video version of the song.
As convenient as these music services are, I hate that my library changes beneath me without my control. There are dozens of songs just straight up missing from my Spotify library now. And a small handful that have changed audio. These are almost always in indie songs, and Spotify just hides deleted songs by default in the UI so most users don’t notice.
One of my favorite albums of the last year [0] has had its 5 tracks split into 3 parts each on streaming platforms. It’s kinda weird, but I assume it’s because of the way streaming services pay per track and not per second played.
This affects all digital stores. Many labels appear to have lost their uncompressed masters, especially for stuff that was released on the internet in the early 2000s. I regularly catch them pants down when trying to buy something through digital music stores as lossless and I receive a FLAC file that shows clear signs of MP3 compression. It usually results in a refund from the store in question. For this reason I'm buying less and less through those stores, and try to hunt down cheap used CD copies instead, if at all possible.
I started this process for all the major labels in 1999. For the five years I was involved it was just "here's the CD, do your best, we don't have easy access to anything better than this. good luck." We ripped to WAV so we could re-encode as each new tech came out.
I wonder if I upload a video to standard Youtube and tag it 'music' so it's shows up over on Youtube Music does it get a better audio quality than the video encode gets?
Regular user can get up to YT-251, which is described in article, ~135kbps opus (i think target is higher, 160 kbps or so). But bitrate itself is good enough - on good headphones I can't distinguish flac vs opus-110kbps, except very rare killer samples. And YouTube audio clearly lacks detail despite having higher bitrate. So, probably, 256 kbps would not help much
The problem with these weaknesses is that they are different in each human being. For me listening an mp3 is like a sanding an ear, while some don't even hear the difference between it and a live performance.
> For me listening an mp3 is like a sanding an ear
There is such a broad spectrum in thr word mp3 that you need to be more specific. I can absolutely pick the difference between certian high bitrate mp3 encodings and wav files, however a different mp3 with exactly the same bitrate, effectively indistinguishable.
Bad mp3 encoding is a problem, not one I have experienced recently though. I think the bigger issue is people will rip a music video from youtube, then instead of extracting the existing audio stream into it's own container will reencode it. Mp3->mp3 encoding will be lossy just like any other encoder.
If people have notches in their hearing sensitivity at low or mid frequencies, sounds at those frequencies might fail to mask other sounds as expected. You could simulate this by applying notch filters to otherwise transparent lossy audio and seeing if it exposes compression artifacts. But I think this kind of hearing loss is uncommon. Normal age-related hearing loss does not cause any problems with lossy audio compression.
I have YouTube premium and I have access to experimental features and coincidentally they offer one feature called audio in high quality. So they are working on improving it.
Description:
High-Quality Audio
Available until February 22
With high-quality audio, you can listen to music on YouTube in the best audio quality.
How it works:
Watch an eligible music video on YouTube and enjoy the benefits of higher-quality audio.
There's no "end-to-end drm" for audio. Any audio can be captured near losslessly by plugging in a USB-C to 3.5mm adapter, and a sound card on the other end. There's also no HDCP-like mechanism for sound, so if you're willing to put the work into it, you could make a fake USB DAC that produces a bit perfect capture of the audio output.
Anyone uploading audio different platforms might find Streamliner useful (I'm nowhere near the point where details on that level matter).
https://adptraudio.com/product/streamliner/
It's wild how many details sound people have to keep track of. I know when I upload to Youtube things get smoothed noticeably compared to say Soundcloud. Probably because I've mastered over their -14 LUFS requirement.
I wonder how much of the modern 'everyone needs to use closed captioning watching TV now' comes from the streaming services Codecs and other decisions and not just the A/V sound peoples' decisions. Do movies sound people now need to listen through on something like Streamliner above for every decision?
My theory about the more widespread use of subtitles when watching modern film and TV productions is that it is in large part due to screenwriters trying to write (and directors to direct) more realistic dialogue. Just by itself, everything else equal, realistic speech is necessarily harder to grok than “played out” theatrical. In real life people who can hear generally benefit from drastically higher fidelity and wider dynamic range of sound, as well as many more cues and context compared to a linear film (plus the ability to ask someone to repeat). This clashes with screenwriters having to[0] put crucial information in the dialogue, the rule of only saying something once, and other writing and production practices (which also exist for a reason). Realistic dialogue that is easy to grok requires cooperation from many sides.
That said, technical aspects matter, recording practices, mastering for loudness with little dynamic range (so that compressed voice blends with foley and the rest), and indeed encoding can definitely affect the ease of understanding human speech. Speaking of encoding… I couldn’t find it on the site, is Streamliner a one-time purchase?
[0] Well, not every film has to (Upstream Color comes to mind, as an example of a film with relatively little plot-advancing dialogue), but it seems that the majority of productions rely on dialogue and it isn’t going away.
For so many words I do not think it says something really new besides that it is art style, “realistic dialogue” in writing/direction, recording, mixing, mastering, and/or encoding.
With SoundCloud I've found you have to pay for the 'Plus+++' subscription or whatever it is to not get audio with the higher frequencies absolutely butchered, unless you follow a very specific upload process that bypasses their conversion.
Upload 320kbit encoded MP3? Sounds great.
Upload a high-khz WAV? It gets butchered, the top-end turns to glittery noise.
Maybe others have different experiences, but honestly it felt like I'd been duped when paying for the subscription but still got trash quality audio, only to have to pay more.
Youtube probably have stats on viewer-happyness vs bitrate.
I would take a guess that a higher bitrate = longer loading times, and viewers care far more about an extra few second of buffering than they care about audio quality, especially when they don't have the original to compare to.
The audio data is miniscule compared to the video data, and the size of it is tied to the video quality level. And everything is streamed in chunks. It'd only amount to milliseconds of extra buffering.
But now you're comparing median video bitstream with peak audio bitstream.
YouTube uses variable bitrate for audio, which can vary dramatically in size. Your example of podcasts or "talking heads" is actually perfect. Most encoders are extremely efficient at compressing voices, as they will only have to encode 30-300Hz, and voices have less data variation than images.
Image encoding is just very complex. It'll get better and better, but audio encoders of the same generation will also improve.
Couldn’t you do adaptive Bitrate and start streaming low-bitrate for a few seconds and then switch to higher quality once the video is already playing?
Yes it's very possible to do this.
Without seeking support it's trivial, just instruct the encoder to encode with a low bitrate for a few seconds and then increase it.
To support seeking you could encode a low bitrate stream, and a high quality stream, and then a number of ramps between these. So when you seek you start with the low bitrate stream and then after a few time units go on the ramp to the high quality stream.
While nothing that's been said here is inherently wrong per se, a sample YT page load is ~5s to DOMContentLoaded, and without counting the video content, transfers ~7 MiB worth of requests & ~95 requests for me, and visually, the entire page feels like it loads twice. (I thought it was redirecting, but the inspector says nope, that's a single page load.)
… while yeah… a lower bitrate upfront might lower the required bandwidth and thus, latency, to get enough of a buffer to start playback … all the bloat on the page would be a better first port of call.
I think time stretching is done natively by your browser, not by Youtube at all. I use https://github.com/igrigorik/videospeed on sites...allows any media to be stretched. Did you try another browser?
That's no excuse for YouTube because (a) audio processing can be done in JS/WASM and (b) they have the influence to improve browser playbackRate implementations to something better [1].
Besides, their Android and iOS apps do slow music as bad if not worse than on web.
If there's reason to believe this is a useful way to handle time stretching, then there's reason to believe the same browser could do it natively just fine.
Which one in that list is better than the default one in Chrome? What media serving sites that you know of intentionally don't use native browser APIs?
How did you become this convinced that it's some niche, unused feature? I'd go as far as to say it's essential. So many videos, especially of courses, out there that really need that 1.25x playback rate boost.
The niche was for the people using YT to learn to play a song. You’re now applying to something else, so that’s a bit of goal post moving.
Also, your use of need is odd as well, and seem to have convinced yourself that the world is wrong and only you’re right. If it needed, the creators would have made it that way
> Also, your use of need is odd as well, and seem to have convinced yourself that the world is wrong and only you’re right. If it needed, the creators would have made it that way
It's called having personal experiences and an opinion. You should try them sometime. It's almost as if appropriate tempo was in the "eye" of the "viewer", and so I was very clearly not suggesting my needs and essentials are universal objective truths in the first place. Getting extremely tired of having to insert "I think", "I believe", "in my opinion" to signal subjectivity in what - I think - are ostensibly subjective contexts, just so that I can avoid subsequent bikeshedding like this.
> You’re now applying to something else, so that’s a bit of goal post moving.
This will be crazy I know, but instead of this delightfully malice-assuming explanation, I simply missed the words where they said "music practice". Didn't help that "timestretch for music practice" is not a feature of YouTube, only timestretch is (as part of the playback rate adjustment feature), so when you were (according to my personal impression of the wording of your previous comment) generally addressing the feature, I replied in kind.
If I was being extra prickly, I'd accuse you of intentionally writing in a way so that you could accuse me of strawmanning you later (goalpost moving is a very loose fit here) for an easy dunk, but of course as someone who reaches for fallacies immediately, you wouldn't do that, right?
This is what distrust sown between people, as well as just plain not being able to know your discussion partner looks like. It's been increasingly frustrating me, and it looks like it's having an effect on you too.
> If I was being extra prickly, I'd accuse you of intentionally writing in a way so that you could accuse me of strawmanning you later
That’s the most bizarre comment I think I’ve ever seen. You didn’t have to reply to my comment. You’re now saying that I assume that the reading comprehension of everyone is so bad that I make comments specifically as gotchas. WTF is that logic? People that post comments in threads without taking the whole thread into consideration are like people that butt their way into a conversation based on the last sentence heard. It never goes well. It’s called social etiquette.
Just admit you didn’t read the full thread and that based on now understanding the full context of the conversation that your comment is out of place and have a nice day
It's also basic etiquette to not assume malice or that the other party is lying, but you explicitly and proudly continue to fail at that. So it's a bit tough for me to accept criticism from you regarding this.
> the reading comprehension of everyone is so bad
No, I am not saying this. This is just your headcanon, it is not even a remotely necessary presumption to have.
> You didn’t have to reply to my comment.
I felt compelled to after being told that I'm "moving goalposts". Obviously. Again, a subjectively perceived need.
> based on now understanding the full context of the conversation that your comment is out of place and have a nice day
Even with the additional context your original comment still rings unreasonable and self-absorbed. It is true however that I do not need you to explain why anymore. Have a nice day indeed.
If you're learning guitar, drums, piano, trumpet, etc, you'll want to start playing along at about 75% speed then work up until you can play 110% faster than you need to. YouTube's built in audio time stretch makes this a painful exercise.
Fair enough. Never heard this technique before. Note that such a technique wouldn't translate to other skills like dancing, singing, sawing a piece of wood. Doing those things faster I can't imagine would be of any help in improving how you do them at normal speed.
Including 44.1kHz in the analysis is ignorant of the fact that most consumer audio equipment made in the past decade or two cannot switch its clock source and physical playback sample rate to that anymore. It's all 48kHz (or multiples of that, for snake oil) these days, starting with when Intel cemented HDA on a 24MHz clock. Playing back anything 44.1kHz just gets resampled at some point before hitting the DAC, in most cases in software.
> It's all 48kHz (or multiples of that, for snake oil)
Oversampling can be a useful internal detail for ADCs and DACs. For example, if the digital audio stream is mathematically converted to 96 kHz with a high-quality FIR low-pass filter and then fed to a DAC, then the analog low-pass filter can have a much shallower roll-off and be more easily designed. Same goes for ADCs, where the analog filter can be simple and gentle, then digitized at 96 kHz, then downsampled digital to 48 kHz with high-quality but more computationally intensive filters. ( https://en.wikipedia.org/wiki/Oversampling , https://en.wikipedia.org/wiki/Delta-sigma_modulation , etc.)
I swear within the last 24 months most of the new Youtube video I saw had 256kbps AAC-LC audio in it. I thought that was generous of them to up the quality of audio. ( Along with 4Mbps 1080p50 AVC Video Files which is also quite high in quality )
Now I just check on YouTube again and they are now back to 128 / 130 Kbps for AAC-LC.
When I use an FFT to view the spectrogram on YouTube music videos, it is very obvious that YouTube applies a lowpass filter at 16kHz on all videos (true since 2023 at least).
While this does retain the majority of useful information, it explains why the youtube version of your song feels just a little more 'lifeless' than the high quality version you have elsewhere.
The original recording contains high frequency detail that got lost. Your human body uses that high frequency detail to orient itself in space with respect to sound sources (like reverb, reflections, or ambient sounds).
It is interesting from a data storage point of view because this could result in massive savings. Consider audio is recorded at 44.1khz or 48kHz but is actually stored at 32kHz. They have effectively saved 25% in audio file storage at marginal customer experience.
> it explains why the youtube version of your song feels just a little more 'lifeless' than the high quality version you have elsewhere
Having hearing sensitivity over 16 kHz is unusual. If you're under 15 years old and kept your ears pristine by not listening to loud noises, you might be able to hear it. Older people are out of luck.
Moreover, even if you can hear above 16 kHz in loud pure tones, there is so little content in real audio/music above 16 kHz that it makes no practical difference.
> massive savings ... effectively saved 25%
Not really. Going from a 48 kHz sampling rate to 32 kHz is indeed 2/3× the size for uncompressed PCM audio. But for lossily compressed audio? Not remotely the same. Even in old codecs like MP3, high frequency bands have heavy quantizers applied and use far fewer bits per hertz than low frequency bands. Analogously, look at how JPEG and MPEG have huge quantizers for high spatial frequencies (i.e. small details) and small quantizers for broad, large visual features.
It is true that your ears do not perceive the higher frequencies in the same way (through pitch detection). However, if you put on a headset and apply only frequencies above 16kHz, you will distinctly notice a change in the pressure in your headset's ear cups.
Good point about the savings. I was using uncompressed format as the reference, but it is indeed unlikely that YouTube serves out lossless audio.
I also should have used the word "delivery" instead of data storage. Those are two separate problems: where the original asset is stored (and how, if they don't store raw originals), and also how the asset is delivered over the web.
> However, if you put on a headset and apply only frequencies above 16kHz, you will distinctly notice a change in the pressure in your headset's ear cups.
If you put something above 16 kHz at full scale and/or if you play it extremely loud then maybe. With typical music content at typical volumes, I doubt it.
> When I use an FFT to view the spectrogram on YouTube music videos, it is very obvious that YouTube applies a lowpass filter at 16kHz on all videos (true since 2023 at least).
Maybe with a browser that doesn't support Opus and gets AAC instead (Safari?). With Firefox or Chromium on Linux I get up to 20 kHz, which by design is the upper limit in Opus codec.
A good way of hearing the gras grow is a null test.
Take unprocessed audio and process it. Then take both the processed and unprocessed audio and add them to your audio software (e.g. Audacity). Now flip the polarity of one of the audio signals.
This allows you to listen to the difference between the two signals and if there is nothing there, guess what, they are the same or the differences are so small that theg are inaudible.
This is a great way to anger people with expensive hifi gold cables, because what is true in the digital also works in the analog.
You only need to make sure both ajdio signals are at the same level (by minimizing the level of the difference).
This kind of analysis is interesting but must be considered rapidly outdated since we won't know what's changing behind the scenes at YouTube. Have they ever revealed whether their VCU handles the audio?
Video takes up almost all of the bitrate/CPU time/ASIC budget, audio is a rounding error.
Server CPUs can encode audio at hundreds of times faster than realtime so there’s no need for hardware acceleration. Back in the iPod era DSPs were used to decode MP3/AAC but now only the most CPU or battery constrained devices like AirPods need hardware acceleration.
Do you have information on this or are you just supposing? The VCU accelerator is supposedly 100x faster than software for VP9 and they have up to 20 accelerators per host. At those ratios it's not clear that a machine would be able to keep up with the audio processing, or that it would be easy to manage the division of labor between accelerators and host CPUs. Also there definitely are broadcast transcoders on the market that do the audio in hardware.
Modern CPUs seem to be able encode opus at around 266x speed. In other words they can encode 266 seconds of audio in 1 second. The tests results also don't scale with core count, so the program itself is probably single threaded and therefore you could encode even faster if you have concurrent streams. It's highly unlikely that even with VCUs, a server can encode video streams at several thousand times faster than playback speed.
Sure. I think it's hard to control for every relevant factor though. We won't really know what stream they choose to serve under given circumstances to various clients, for example.
Short answer is 'bugs', possibly even in the reviewer's own software. Opus (and I would fully expect AAC-LC) preserves time alignment. Something unknown, somewhere unknown, in an unknown part of the unknown software chain caused an unknown shift, and it's not necessarily by an integer number of samples. You can't use this 'enh, whatever, good enough' approach and expect to do meaningful null analysis, even if you're using it inappropriately.
In all seriousness, every aspect of this comparison is somewhere between deeply flawed and invalid. No point dwelling on just one part.
I imagine the lossy encoding process could use a bunch of stacked filters to cut away imperceptible frequencies, but filters in the general case are implemented by summing delayed copies of the input and can smear the output in the phase domain. (There’s a branch of filters that avoid this but require computing the introduced delay and shifting the output to compensate.)
It’s a noticeable problem in audio production if e.g. a filtered kick drum goes out of phase and sucks amplitude when mixed with the original.
Yeah, I think that’s traditionally the route when you’re not running with near-real-time constraints. I’m out of practice with DSP/filter math but I think there’s a constraint such that any theoretical filter that doesn’t impact phase must be symmetrical around the time axis such that it requires “knowledge from the future” that’s not available in real time.
EDIT: And I think with the two-pass approach you need to calculate the filter such that you get the desired effect after two applications instead of one.
Billions of dollars at their disposal. A near monopsony or just massive buyer of infrastructure, and some of the smartest people in the space. Yet the best they could do in 2022 is lossy codec at 120-190 bit rate?
I got better audio quality ripping songs from limewire or Napster in the 2000s.
Why do we settle with this substandard quality? Oh wait, YT barely has any competition and subsidized by G. No need for competition. Just shove ads down users throats and sell of their usage data.
Most people don't have a standard of listening to higher audio quality and youtube has taken advantage of it. Those who do, naturally self host. Plus, if they started serving FLAC, it would be bandwidth intensive
For music listening, the YouTube Music app has settings for higher audio quality under "Settings > Playback".
According to their documentation, this is "Upper bound of 256kbps AAC & OPUS": https://support.google.com/youtubemusic/answer/9076559?hl=en...
EDIT: FWIW, I'm a Premium member. I'm not sure if this is a standard feature.
The worse problem for YouTube Music is that some of the source material is not the same as what you may have on physical media. There are entire labels that provide worse or at least different materials than they published to record stores. And then there is remaster roulette. What am I going to hear if I press play on "Rumours": the original that was born perfect, or one of the half-dozen remasters that have been issued as the original engineer progressively loses his hearing?
One of the advantages of Apple Music’s digital masters. They require no clipping which alone vastly improves a majority of albums.
Where did you hear this from? For the vast majority, Apple Music gets the same masters as sent to every other service.
They describe the entire program in detail here: https://www.apple.com/apple-music/apple-digital-masters/docs...
Even if receiving the same sources, the encoding process is their own and aims to have zero clipping.
Yeah, the ADM certified stuff, as far as I know, is a small percentage of the music on there. The distributors I've worked with don't let you have a separate Apple Music version in the same release (a lot of them don't even let you pick which stores it goes to, it's just bulk submit everywhere), so it's extra work and/or cost to make an AM specific version. And only the top mastering talent seem to be ADM certified- I don't see many advertising it (some do). I figure if an ADM master is made, that one is submitted everywhere, but it's really a minority of the releases. Most of them maybe conform to it by happenstance by way of just making a nice sounding master, but I'm not aware of Apple rejecting loud releases, for example. You can submit a song mastered at +6 LUFS but they will just turn it down when someone plays it.
Apple Digital Masters indeed recommends no clipping. But the program is based on Apple interviewing the process that to-be-certified mastering engineers use. There is a way where you can pass the interview, get certified and then decide that clipping sounds better/louder/whatever and the file will still get the ADM stamp.
This is unfortunately the case on other platforms like Spotify too. :(
Due to what I assume are music licensing oddities, one song in my Spotify now has an entirely different singer. It was taken down for a while, then later reuploaded as a new recording with new vocals. I’ve also seen copies of my songs changing to remix/cover versions, presumably due to metadata adjustments by the artists. Although usually I can re-search and find the original, adding it back to my library.
A few songs just have new outros/intros. One song in my library now has an additional several seconds of silence on the end. One has a new longer intro that I think is really from the music video version of the song.
As convenient as these music services are, I hate that my library changes beneath me without my control. There are dozens of songs just straight up missing from my Spotify library now. And a small handful that have changed audio. These are almost always in indie songs, and Spotify just hides deleted songs by default in the UI so most users don’t notice.
One of my favorite albums of the last year [0] has had its 5 tracks split into 3 parts each on streaming platforms. It’s kinda weird, but I assume it’s because of the way streaming services pay per track and not per second played.
[0]: https://open.spotify.com/album/1xJ3AfTOdsBltWGFDubJZ9
This affects all digital stores. Many labels appear to have lost their uncompressed masters, especially for stuff that was released on the internet in the early 2000s. I regularly catch them pants down when trying to buy something through digital music stores as lossless and I receive a FLAC file that shows clear signs of MP3 compression. It usually results in a refund from the store in question. For this reason I'm buying less and less through those stores, and try to hunt down cheap used CD copies instead, if at all possible.
I started this process for all the major labels in 1999. For the five years I was involved it was just "here's the CD, do your best, we don't have easy access to anything better than this. good luck." We ripped to WAV so we could re-encode as each new tech came out.
I wonder if I upload a video to standard Youtube and tag it 'music' so it's shows up over on Youtube Music does it get a better audio quality than the video encode gets?
I don't think you need to tag it as music. You can listen to any Youtube video in the Youtube Music app.
Regular user can get up to YT-251, which is described in article, ~135kbps opus (i think target is higher, 160 kbps or so). But bitrate itself is good enough - on good headphones I can't distinguish flac vs opus-110kbps, except very rare killer samples. And YouTube audio clearly lacks detail despite having higher bitrate. So, probably, 256 kbps would not help much
Wondering if yt-dlp picks up the higher codec formats too.
You can easily verify that: yt-dlp -F link-to-youtube-video
-F (uppercase!) lists all available audio and video formats to download.
See "Why not use graphs / frequency analysis to compare codecs?" https://wiki.hydrogenaud.io/index.php?title=FAQ#:~:text=Why%...
"I decided that analysis should focus on the higher, more conventional rates – 48k and 44k1" - opus is always 48khz, so that doesn't mean much.
This. Your ears are not an oscilloscope. Lossy audio codecs are designed to exploit two major weaknesses in human hearing:
1. Poor sensitivity in bass and treble. See:
https://en.wikipedia.org/wiki/Equal-loudness_contour
2. Limited ability to hear multiple sounds simultaneously, or almost simultaneously. See:
https://en.wikipedia.org/wiki/Auditory_masking
Bernhard Seeber has some videos on Youtube with demonstrations of auditory masking:
https://www.youtube.com/watch?v=R9UZnMsm9o8
https://www.youtube.com/watch?v=bU0_Kaj7cPk
The only fair way to evaluate lossy codecs is with double blind listening tests.
The problem with these weaknesses is that they are different in each human being. For me listening an mp3 is like a sanding an ear, while some don't even hear the difference between it and a live performance.
> For me listening an mp3 is like a sanding an ear
There is such a broad spectrum in thr word mp3 that you need to be more specific. I can absolutely pick the difference between certian high bitrate mp3 encodings and wav files, however a different mp3 with exactly the same bitrate, effectively indistinguishable.
Bad mp3 encoding is a problem, not one I have experienced recently though. I think the bigger issue is people will rip a music video from youtube, then instead of extracting the existing audio stream into it's own container will reencode it. Mp3->mp3 encoding will be lossy just like any other encoder.
Can you ABX it?
https://abx.digitalfeed.net/lame.320.html
I wonder how many people there are, that don't have these weaknesses.
And who are therefore forced to hear terrible audio because the compression method only considers the majority.
If people have notches in their hearing sensitivity at low or mid frequencies, sounds at those frequencies might fail to mask other sounds as expected. You could simulate this by applying notch filters to otherwise transparent lossy audio and seeing if it exposes compression artifacts. But I think this kind of hearing loss is uncommon. Normal age-related hearing loss does not cause any problems with lossy audio compression.
I have YouTube premium and I have access to experimental features and coincidentally they offer one feature called audio in high quality. So they are working on improving it.
Description:
High-Quality Audio
Available until February 22
With high-quality audio, you can listen to music on YouTube in the best audio quality.
How it works: Watch an eligible music video on YouTube and enjoy the benefits of higher-quality audio.
Only available on iOS and Android.
why only on mobile? very curious
likely end-to-end drm
There's no "end-to-end drm" for audio. Any audio can be captured near losslessly by plugging in a USB-C to 3.5mm adapter, and a sound card on the other end. There's also no HDCP-like mechanism for sound, so if you're willing to put the work into it, you could make a fake USB DAC that produces a bit perfect capture of the audio output.
Just use DAC with S/PDIF output, and another one with S/PDIF input. Well, either way it's hard enough to block casual users, so... it works
Once jack is removed from laptops it will be streamed to laptops as well... </irony>
Anyone uploading audio different platforms might find Streamliner useful (I'm nowhere near the point where details on that level matter). https://adptraudio.com/product/streamliner/
It's wild how many details sound people have to keep track of. I know when I upload to Youtube things get smoothed noticeably compared to say Soundcloud. Probably because I've mastered over their -14 LUFS requirement.
I wonder how much of the modern 'everyone needs to use closed captioning watching TV now' comes from the streaming services Codecs and other decisions and not just the A/V sound peoples' decisions. Do movies sound people now need to listen through on something like Streamliner above for every decision?
My theory about the more widespread use of subtitles when watching modern film and TV productions is that it is in large part due to screenwriters trying to write (and directors to direct) more realistic dialogue. Just by itself, everything else equal, realistic speech is necessarily harder to grok than “played out” theatrical. In real life people who can hear generally benefit from drastically higher fidelity and wider dynamic range of sound, as well as many more cues and context compared to a linear film (plus the ability to ask someone to repeat). This clashes with screenwriters having to[0] put crucial information in the dialogue, the rule of only saying something once, and other writing and production practices (which also exist for a reason). Realistic dialogue that is easy to grok requires cooperation from many sides.
That said, technical aspects matter, recording practices, mastering for loudness with little dynamic range (so that compressed voice blends with foley and the rest), and indeed encoding can definitely affect the ease of understanding human speech. Speaking of encoding… I couldn’t find it on the site, is Streamliner a one-time purchase?
[0] Well, not every film has to (Upstream Color comes to mind, as an example of a film with relatively little plot-advancing dialogue), but it seems that the majority of productions rely on dialogue and it isn’t going away.
Here's a really good article that proposes most of the reasons:
https://www.slashfilm.com/673162/heres-why-movie-dialogue-ha...
For so many words I do not think it says something really new besides that it is art style, “realistic dialogue” in writing/direction, recording, mixing, mastering, and/or encoding.
With SoundCloud I've found you have to pay for the 'Plus+++' subscription or whatever it is to not get audio with the higher frequencies absolutely butchered, unless you follow a very specific upload process that bypasses their conversion.
Upload 320kbit encoded MP3? Sounds great.
Upload a high-khz WAV? It gets butchered, the top-end turns to glittery noise.
Maybe others have different experiences, but honestly it felt like I'd been duped when paying for the subscription but still got trash quality audio, only to have to pay more.
Youtube probably have stats on viewer-happyness vs bitrate.
I would take a guess that a higher bitrate = longer loading times, and viewers care far more about an extra few second of buffering than they care about audio quality, especially when they don't have the original to compare to.
The audio data is miniscule compared to the video data, and the size of it is tied to the video quality level. And everything is streamed in chunks. It'd only amount to milliseconds of extra buffering.
Not as insignificant as you might imagine, especially if you are talking about surround sound audio.
With newer codecs, doing 4k with 2Mb/s isn't unheard of.
For audio, on the other hand, 32 or 64kbps per channel isn't unheard of.
At 1080p60, YouTube uses 12Mbps, and 251Kbps Opus or 192Kbps AAC. That's a factor 50 smaller. Insignificant.
> At 1080p60, YouTube uses 12Mbps
It does not. That's a recommended bitrate for a live streamer (Streamer -> youtube).
Coming out of youtube, the numbers are quiet different.
For example, a random 4k video I just pulled had a video bitrate of 4.5Mbps.
A 720p video I pulled of a talking head had a 369kbps video stream.
That is to say, a podcast style video is likely to have a nearly 50:50 split on audio/video.
But now you're comparing median video bitstream with peak audio bitstream.
YouTube uses variable bitrate for audio, which can vary dramatically in size. Your example of podcasts or "talking heads" is actually perfect. Most encoders are extremely efficient at compressing voices, as they will only have to encode 30-300Hz, and voices have less data variation than images.
Image encoding is just very complex. It'll get better and better, but audio encoders of the same generation will also improve.
251 is codec format ID. Real bitrate is around ~135 kbps
Totally missed that, thanks!
Couldn’t you do adaptive Bitrate and start streaming low-bitrate for a few seconds and then switch to higher quality once the video is already playing?
Yes it's very possible to do this. Without seeking support it's trivial, just instruct the encoder to encode with a low bitrate for a few seconds and then increase it.
To support seeking you could encode a low bitrate stream, and a high quality stream, and then a number of ramps between these. So when you seek you start with the low bitrate stream and then after a few time units go on the ramp to the high quality stream.
The stream is already being split into chunks (identified in the m3u8 file) so encoding tricks like this won't be necessary.
While nothing that's been said here is inherently wrong per se, a sample YT page load is ~5s to DOMContentLoaded, and without counting the video content, transfers ~7 MiB worth of requests & ~95 requests for me, and visually, the entire page feels like it loads twice. (I thought it was redirecting, but the inspector says nope, that's a single page load.)
… while yeah… a lower bitrate upfront might lower the required bandwidth and thus, latency, to get enough of a buffer to start playback … all the bloat on the page would be a better first port of call.
I assume most viewers will already have most of the youtube javascript warm in the cache...
How much is loaded for a navigation from one video to the next?
YouTube’s most common audio codecs Opus and AAC use variable bitrate (VBR) by default.
pretty sure they already do this.
Nice analysis, for 1x playback speed. If you're playing back at a different speed, for example, for music practice, YouTube audio is awful.
Why doesn't this huge AV platform use a better audio time stretch algorithm?
I think time stretching is done natively by your browser, not by Youtube at all. I use https://github.com/igrigorik/videospeed on sites...allows any media to be stretched. Did you try another browser?
That's no excuse for YouTube because (a) audio processing can be done in JS/WASM and (b) they have the influence to improve browser playbackRate implementations to something better [1].
Besides, their Android and iOS apps do slow music as bad if not worse than on web.
[1] https://bungee.parabolaresearch.com/compare-audio-stretch-te...
> audio processing can be done in JS/WASM
If there's reason to believe this is a useful way to handle time stretching, then there's reason to believe the same browser could do it natively just fine.
There is no reason to believe that browsers do it just fine if you have evidence for the contrary.
Which one in that list is better than the default one in Chrome? What media serving sites that you know of intentionally don't use native browser APIs?
Because this is such a niche use the number of users taking advantage of this "feature" would be so small as to not get anybody a promotion
Changing playback speed is a heavily used feature. They’ve also refined it several times, recently adding 0.05 speed increments not just 0.25.
took me a while to recognize you can long press on mobile yt to 2x, maybe even more than 2x but I ain’t figured out the finger incantation
Keyboard commands still make it jump in 0.25 steps.
Does anyone know a youtube-frontend that lets me change the playback speed in smaller steps using keyboard?
How did you become this convinced that it's some niche, unused feature? I'd go as far as to say it's essential. So many videos, especially of courses, out there that really need that 1.25x playback rate boost.
The niche was for the people using YT to learn to play a song. You’re now applying to something else, so that’s a bit of goal post moving.
Also, your use of need is odd as well, and seem to have convinced yourself that the world is wrong and only you’re right. If it needed, the creators would have made it that way
> Also, your use of need is odd as well, and seem to have convinced yourself that the world is wrong and only you’re right. If it needed, the creators would have made it that way
It's called having personal experiences and an opinion. You should try them sometime. It's almost as if appropriate tempo was in the "eye" of the "viewer", and so I was very clearly not suggesting my needs and essentials are universal objective truths in the first place. Getting extremely tired of having to insert "I think", "I believe", "in my opinion" to signal subjectivity in what - I think - are ostensibly subjective contexts, just so that I can avoid subsequent bikeshedding like this.
> You’re now applying to something else, so that’s a bit of goal post moving.
This will be crazy I know, but instead of this delightfully malice-assuming explanation, I simply missed the words where they said "music practice". Didn't help that "timestretch for music practice" is not a feature of YouTube, only timestretch is (as part of the playback rate adjustment feature), so when you were (according to my personal impression of the wording of your previous comment) generally addressing the feature, I replied in kind.
If I was being extra prickly, I'd accuse you of intentionally writing in a way so that you could accuse me of strawmanning you later (goalpost moving is a very loose fit here) for an easy dunk, but of course as someone who reaches for fallacies immediately, you wouldn't do that, right?
This is what distrust sown between people, as well as just plain not being able to know your discussion partner looks like. It's been increasingly frustrating me, and it looks like it's having an effect on you too.
> If I was being extra prickly, I'd accuse you of intentionally writing in a way so that you could accuse me of strawmanning you later
That’s the most bizarre comment I think I’ve ever seen. You didn’t have to reply to my comment. You’re now saying that I assume that the reading comprehension of everyone is so bad that I make comments specifically as gotchas. WTF is that logic? People that post comments in threads without taking the whole thread into consideration are like people that butt their way into a conversation based on the last sentence heard. It never goes well. It’s called social etiquette.
Just admit you didn’t read the full thread and that based on now understanding the full context of the conversation that your comment is out of place and have a nice day
It's also basic etiquette to not assume malice or that the other party is lying, but you explicitly and proudly continue to fail at that. So it's a bit tough for me to accept criticism from you regarding this.
> the reading comprehension of everyone is so bad
No, I am not saying this. This is just your headcanon, it is not even a remotely necessary presumption to have.
> You didn’t have to reply to my comment.
I felt compelled to after being told that I'm "moving goalposts". Obviously. Again, a subjectively perceived need.
> based on now understanding the full context of the conversation that your comment is out of place and have a nice day
Even with the additional context your original comment still rings unreasonable and self-absorbed. It is true however that I do not need you to explain why anymore. Have a nice day indeed.
Its the browser that does this on the client, not YouTube on the server.
Let's be fair, for anything music related, you'll be doing 1x speed... higher than that is usually speech only, where it doesn't matter as much.
If you're learning guitar, drums, piano, trumpet, etc, you'll want to start playing along at about 75% speed then work up until you can play 110% faster than you need to. YouTube's built in audio time stretch makes this a painful exercise.
What's the benefit of learning to play faster than you need to? I understand wanting to play slower, but why faster?
You want to play within your abilities, not right at the very edge of them. This technique nudges the edges out.
Fair enough. Never heard this technique before. Note that such a technique wouldn't translate to other skills like dancing, singing, sawing a piece of wood. Doing those things faster I can't imagine would be of any help in improving how you do them at normal speed.
Reliability builds on skill redundancy.
Including 44.1kHz in the analysis is ignorant of the fact that most consumer audio equipment made in the past decade or two cannot switch its clock source and physical playback sample rate to that anymore. It's all 48kHz (or multiples of that, for snake oil) these days, starting with when Intel cemented HDA on a 24MHz clock. Playing back anything 44.1kHz just gets resampled at some point before hitting the DAC, in most cases in software.
> It's all 48kHz (or multiples of that, for snake oil)
Oversampling can be a useful internal detail for ADCs and DACs. For example, if the digital audio stream is mathematically converted to 96 kHz with a high-quality FIR low-pass filter and then fed to a DAC, then the analog low-pass filter can have a much shallower roll-off and be more easily designed. Same goes for ADCs, where the analog filter can be simple and gentle, then digitized at 96 kHz, then downsampled digital to 48 kHz with high-quality but more computationally intensive filters. ( https://en.wikipedia.org/wiki/Oversampling , https://en.wikipedia.org/wiki/Delta-sigma_modulation , etc.)
But yes, listening to or distributing audio at anything over 48 kHz is a complete waste of resources. Monty@Xiph.Org explained very well in: https://people.xiph.org/~xiphmont/demo/neil-young.html
[dead]
I swear within the last 24 months most of the new Youtube video I saw had 256kbps AAC-LC audio in it. I thought that was generous of them to up the quality of audio. ( Along with 4Mbps 1080p50 AVC Video Files which is also quite high in quality )
Now I just check on YouTube again and they are now back to 128 / 130 Kbps for AAC-LC.
I noticed the same thing.
When I use an FFT to view the spectrogram on YouTube music videos, it is very obvious that YouTube applies a lowpass filter at 16kHz on all videos (true since 2023 at least).
While this does retain the majority of useful information, it explains why the youtube version of your song feels just a little more 'lifeless' than the high quality version you have elsewhere.
The original recording contains high frequency detail that got lost. Your human body uses that high frequency detail to orient itself in space with respect to sound sources (like reverb, reflections, or ambient sounds).
It is interesting from a data storage point of view because this could result in massive savings. Consider audio is recorded at 44.1khz or 48kHz but is actually stored at 32kHz. They have effectively saved 25% in audio file storage at marginal customer experience.
> it explains why the youtube version of your song feels just a little more 'lifeless' than the high quality version you have elsewhere
Having hearing sensitivity over 16 kHz is unusual. If you're under 15 years old and kept your ears pristine by not listening to loud noises, you might be able to hear it. Older people are out of luck.
Moreover, even if you can hear above 16 kHz in loud pure tones, there is so little content in real audio/music above 16 kHz that it makes no practical difference.
> massive savings ... effectively saved 25%
Not really. Going from a 48 kHz sampling rate to 32 kHz is indeed 2/3× the size for uncompressed PCM audio. But for lossily compressed audio? Not remotely the same. Even in old codecs like MP3, high frequency bands have heavy quantizers applied and use far fewer bits per hertz than low frequency bands. Analogously, look at how JPEG and MPEG have huge quantizers for high spatial frequencies (i.e. small details) and small quantizers for broad, large visual features.
It is true that your ears do not perceive the higher frequencies in the same way (through pitch detection). However, if you put on a headset and apply only frequencies above 16kHz, you will distinctly notice a change in the pressure in your headset's ear cups.
Good point about the savings. I was using uncompressed format as the reference, but it is indeed unlikely that YouTube serves out lossless audio.
I also should have used the word "delivery" instead of data storage. Those are two separate problems: where the original asset is stored (and how, if they don't store raw originals), and also how the asset is delivered over the web.
> However, if you put on a headset and apply only frequencies above 16kHz, you will distinctly notice a change in the pressure in your headset's ear cups.
If you put something above 16 kHz at full scale and/or if you play it extremely loud then maybe. With typical music content at typical volumes, I doubt it.
> When I use an FFT to view the spectrogram on YouTube music videos, it is very obvious that YouTube applies a lowpass filter at 16kHz on all videos (true since 2023 at least).
Maybe with a browser that doesn't support Opus and gets AAC instead (Safari?). With Firefox or Chromium on Linux I get up to 20 kHz, which by design is the upper limit in Opus codec.
A good way of hearing the gras grow is a null test.
Take unprocessed audio and process it. Then take both the processed and unprocessed audio and add them to your audio software (e.g. Audacity). Now flip the polarity of one of the audio signals.
This allows you to listen to the difference between the two signals and if there is nothing there, guess what, they are the same or the differences are so small that theg are inaudible.
This is a great way to anger people with expensive hifi gold cables, because what is true in the digital also works in the analog.
You only need to make sure both ajdio signals are at the same level (by minimizing the level of the difference).
This kind of analysis is interesting but must be considered rapidly outdated since we won't know what's changing behind the scenes at YouTube. Have they ever revealed whether their VCU handles the audio?
Video takes up almost all of the bitrate/CPU time/ASIC budget, audio is a rounding error.
Server CPUs can encode audio at hundreds of times faster than realtime so there’s no need for hardware acceleration. Back in the iPod era DSPs were used to decode MP3/AAC but now only the most CPU or battery constrained devices like AirPods need hardware acceleration.
Do you have information on this or are you just supposing? The VCU accelerator is supposedly 100x faster than software for VP9 and they have up to 20 accelerators per host. At those ratios it's not clear that a machine would be able to keep up with the audio processing, or that it would be easy to manage the division of labor between accelerators and host CPUs. Also there definitely are broadcast transcoders on the market that do the audio in hardware.
see: https://openbenchmarking.org/test/pts/encode-opus
Modern CPUs seem to be able encode opus at around 266x speed. In other words they can encode 266 seconds of audio in 1 second. The tests results also don't scale with core count, so the program itself is probably single threaded and therefore you could encode even faster if you have concurrent streams. It's highly unlikely that even with VCUs, a server can encode video streams at several thousand times faster than playback speed.
From 2022, so a similar analysis today would be revealing if substantial differences were seen/heard.
Sure. I think it's hard to control for every relevant factor though. We won't really know what stream they choose to serve under given circumstances to various clients, for example.
Why would it be 6.5 ms late? It's not much, but way larger than I'd expect. I never knew encoding might shift the audio track.
Short answer is 'bugs', possibly even in the reviewer's own software. Opus (and I would fully expect AAC-LC) preserves time alignment. Something unknown, somewhere unknown, in an unknown part of the unknown software chain caused an unknown shift, and it's not necessarily by an integer number of samples. You can't use this 'enh, whatever, good enough' approach and expect to do meaningful null analysis, even if you're using it inappropriately.
In all seriousness, every aspect of this comparison is somewhere between deeply flawed and invalid. No point dwelling on just one part.
I imagine the lossy encoding process could use a bunch of stacked filters to cut away imperceptible frequencies, but filters in the general case are implemented by summing delayed copies of the input and can smear the output in the phase domain. (There’s a branch of filters that avoid this but require computing the introduced delay and shifting the output to compensate.)
It’s a noticeable problem in audio production if e.g. a filtered kick drum goes out of phase and sucks amplitude when mixed with the original.
Is it possible to run the recorded input through the filters twice, doing the second pass in reverse to cancel out the phase shift?
Yeah, I think that’s traditionally the route when you’re not running with near-real-time constraints. I’m out of practice with DSP/filter math but I think there’s a constraint such that any theoretical filter that doesn’t impact phase must be symmetrical around the time axis such that it requires “knowledge from the future” that’s not available in real time.
EDIT: And I think with the two-pass approach you need to calculate the filter such that you get the desired effect after two applications instead of one.
https://www.youtube.com/watch?v=StUxyyISJBA
Oh, look a new analysis of YouTube quality that, surely, has learned something from all the past discussions...
[reads]
...Jesus H. F. Christ....
Every generation thinks they discover sex and audio analysis for the first time.
[And don't call me Shirley]
(2022)
I always thought that YT audio was pretty bad.
Billions of dollars at their disposal. A near monopsony or just massive buyer of infrastructure, and some of the smartest people in the space. Yet the best they could do in 2022 is lossy codec at 120-190 bit rate?
I got better audio quality ripping songs from limewire or Napster in the 2000s.
Why do we settle with this substandard quality? Oh wait, YT barely has any competition and subsidized by G. No need for competition. Just shove ads down users throats and sell of their usage data.
> I got better audio quality ripping songs from limewire or Napster in the 2000s.
Nah, you didn't, at least not reliably. Half of that were recodes and upcodes that used Blade, FhG or Xing.
Only with torrent technology and community-driven trackers we got reliable distribution that surpassed the official non-physical channels.
Most people don't have a standard of listening to higher audio quality and youtube has taken advantage of it. Those who do, naturally self host. Plus, if they started serving FLAC, it would be bandwidth intensive
Lossy codecs at 120-190k in 2022 are probably transparent.
Opus _IS_ transparent at 128 kbps, big IF - encoded properly. YouTube does some weird shit instead of just reencoding
> YouTube audio quality – How good does it get? (2022)
Good. YouTube audio quality is crap. Plain and simple. 320 bps MP3 sounds better than anything Youtube offers. And 320 bps MP3 is not even "quality".