"It turns out that the round-trip time from an audio interface, through a computer (DAW) and back to the speakers takes a few hundreds of milliseconds, making direct audio processing impossible using consumer hardware." - uh, what? Real-time audio processing has been a thing for at least a couple of decades. It doesn't work by default on Windows, but you can get free drivers (ASIO4All) which make it work on pretty much any hardware. And it works out of the box on Macs.
"Latency seems to shift by a few tens of milliseconds when restarting the application." - this makes me think you are using the wrong API for your sound input/output. With modern realtime audio support, your total latency from input to output should be less than 10ms total.
"I expected that memory usage would get out of hand quite fast due to the ever growing dictionary of arrays containing audio data, but this does not happen in practice. I suspect that the good performance is caused by highly optimized memory management of Python and modern OSes." - without concrete figures it's quite hard to evaluate this, but what did you expect to happen? With a 44.1KHz stereo audio stream, you should be storing 88.2 thousand samples a second. Say you're using 64-bit floats, as a worst case. Your audio storage should be growing at about 689KB/sec, plus a bit extra for object overhead. How much is it actually growing by? Of course Python is probably doing a bunch of allocation and deallocation for temporary objects behind the scenes, but hopefully you should not need to lean too hard on 'highly optimized memory management' - ideally, you should hardly be allocating anything at all. Also, why a dict, rather than just a large array that you can occasionally make bigger?
Finally ... I'm sure you already know that Python is possibly the worst mainstream language you could pick for realtime audio processing. But that is fine. I have tried to build audio stuff in Python too! Sometimes using the wrong tool for the job is part of the fun.
+1. In Ableton on Windows you can get your latency down to ~40ms without a dedicated sound card using ASIO. Mac's drivers are even better with sub ~20 ms on my m2 pro IIRC.
+1 to the comments here. Part of the issue here is running these applications in Python. It's not really optimized to handle these loads and do DSP-based compute efficiently.
You seem surprised, but any sort of live production requires this. Check out SonoBus, it achieves adequately low end-to-end latency even with network delays in the mix.
When you see someone using python for something as real time and latency sensitive as audio don't you expect more wacky red flags on top of the fact that python is going to 50x to 100x slower than a native program?
Crazy numbers on top of a dictionary of arrays? It's all there.
Half the fun of this project has been doing it in Python and performance hasn't been an issue so far, which says something about how fast Python is already. And indeed native would be ~50x-100x faster.
I must defend the design choice for the dictionary of arrays though, this has been a very conscious choice:
- The "dictionary-of-arrays" approach allows lookups in constant time O(1), irrespectively of how much data has already been stored (compared to one big array)
- The dictionary structure allows me to throw away data in the middle easily (without having to handle growing arrays), because the "dictionary-of-arrays" has already been chunked. The audio looper will use only some parts of the recorded audio, leaving big parts in between unused.
Not necessarily. "I convinced this language/system to do something it really wasn't designed to do by optimising everything" is a well established genre of article :)
Interesting comment! I'm going to figure out if using another driver allows me to get under 20 ms in latency. Right now I'm measuring around 300 ms in latency round-trip, which is not a problem because I can correct for it. (I'm using a Focusrite Scarlett 2i2 with default drivers.)
The reasoning behind my comment about round-trip time was as follows:
- Right now I'm measuring around 300 ms round-trip time, without processing inbetween
- In the past I've tried to do live effects in Ableton with ASIO drivers (guitar in -> Ableton effects -> out), and the delay was too noticable. I couldn't play that way without making my ears bleed and I've switched back to pedals since.
One follow up: how could I achieve a total round-trip latency of around 10 ms total, as you describe? If I use a buffer of 500 samples @ 44.1 kHz, then I am spending already 11 ms just filling the buffer. So then the buffers need to become really small, causing more processing overhead, right? Not sure if this is the way to go.
Yeah, your Scarlett should be capable of single-digit ms latency. If you're on Windows, you need to install its ASIO drivers and figure out how to use them from Python. Then, yes, use tiny buffers and run your audio processing very fast - which is where Python's slowness will probably become a real problem.
10ms latency is how long sound takes to travel 3-and-a-bit metres. So if your amp is a few metres from you, you would experience that delay between hitting the guitar strings and hearing the amplified sound. This should barely be noticeable. If you were noticing a delay greater than that in your Ableton effects setup, your settings needed tweaked. All of this is completely possible - I had a PC-based electronic drum setup in 2006, running through the Reason DAW, which had 8ms latency between hitting a pad and hearing the result.
Hmm, I wonder if Cython (static Python-to-C compiler) would make writing audio code easier/more possible?
With Ableton and the default ASIO configuration on my Scarlett I get 96 ms combined input+output latency without any processing in between, so that's probably what made my ears bleed before. Tweaking the sample rate and buffer size gets me indeed single digit latencies in Ableton. So I'm definitely going to adjust the section about latency, thanks for this!
I'm a bit on the fence about what this means for the difficult latency calibration routine in the application. Ideally I could throw the calibration routine away, but then I require that users have ASIO installed, while the app now also works with non-ASIO drivers. And indeed Python itself might become a bottleneck (making this work in Python has been half the fun).
Even without ASIO you should be able to hit 40 ms latency on pretty much any Windows audio hardware, including motherboard built-in.
If you get 300 ms you're doing something wrong. Note that Windows has multiple audio APIs, 300 ms is about the latency of the old MME api, you need to use the newer one, WASAPI.
I apparently only have the old Windows MME drivers indeed (and ASIO, on Win10). Need to look into why I can't find WASAPI and if I can assume other Windows users have those by default.
WASAPI has been available since Windows Vista. It isn’t its own set of drivers but rather a unifying layer for the WDM driver and the preceding mishmash of Windows audio APIs (MME, DirectAudio, etc). WASAPI supports low ish latencies with Exclusive Mode and then something like 10ms buffering in Shared Mode through the Windows audio server, I recall.
Put another way: any Windows audio device supports WASAPI unless it only ships with an ASIO driver which is unlikely, even in the pro audio space.
try clarett interface. it also comes with pre amps which will make your sound less noisy , scarlet preamps are just absolutely terrible. you can debug your daw to see how it uses drivers and make a python module which exposes similar functions to python. you will likely still want a delay compensation to make things seem free of any latency, but it will be doing _much_ less compensating.
maybe theres an opensource daw if you want to skip reversing driver calls from a debugger.
Debugging an existing DAW to see how they do it under the hood is an interesting idea. Haven't done that yet.
About another interface: I do want to keep the application supporting cheaper interfaces such as the Scarlett, because the target audience (hobby musicians) will be using those. Still would be a nice upgrade for me!
I would disable any services and programs running in the background as well. Years ago I disabled the Windows print spooler and it greatly improved jitter. Not sure if that's still the case these days though, that was probably 10 years ago.
At least BASIC was designed for native code compilation from day one, and after the 8 bit home computers generation passed by, getting compilers for 16 bit home computers was rather easy.
30 years later, people insist in using bytecode interpreted language for the wrong use cases.
What about psyco? Anyway it's a very odd take that the early development of python should have been concerned with a jit. There were many far more pressing issues at the time.
Yes and I've been using it for a lot of that time. Maybe you too. At the time tools like psyco were useful but never got enough traction to persuade core developers that it was a compelling direction. It never felt like an obviously wrong decision.
I was once bitten by not understanding that there is a difference between "regular" clocks and high performance clocks/timers that a developer can take advantage of. At the time I needed a sampling routine to run at precisely once per second. My inexperience led me to go with something like thread.sleep(1000), and I learned quickly that I was mistaken in thinking it'd run with little jitter. As others are pointing out, there are also similar lessons and solutions when dealing with audio processing pipelines.
Indeed, it is not a guarantee that the "sleep" will be exactly that long. In the code I'm not "sleeping" in any sensitive places, instead I'm relying on the callback to the audio stream object, which just needs to finish before the next one starts (less of a timing constraint).
I love Python a lot and I’m first to criticise people complaining about performance when it’s irrelevant (if you’re doing long calculations in NumPy or whatever then the Python overhead is small relative to the hot path).
But as others have said, it’s not a great tool for real time audio since that overhead starts to be a larger % of the time spent. You might want to try compiling parts of it with Cython and using the C API of numpy with that (you can do cimport numpy in your Python code and it’ll remove much of the overhead as it’ll call the C functions directly rather than their wrappers).
Cython and related tools will be the direction to take if performance becomes a bottleneck indeed! Interestingly enough, the audio callback is being handled fast enough on my not-impressive laptop and CPU usage has been low on average (<1%), so Python trickery hasn't been needed yet
"It turns out that the round-trip time from an audio interface, through a computer (DAW) and back to the speakers takes a few hundreds of milliseconds, making direct audio processing impossible using consumer hardware." - uh, what? Real-time audio processing has been a thing for at least a couple of decades. It doesn't work by default on Windows, but you can get free drivers (ASIO4All) which make it work on pretty much any hardware. And it works out of the box on Macs.
"Latency seems to shift by a few tens of milliseconds when restarting the application." - this makes me think you are using the wrong API for your sound input/output. With modern realtime audio support, your total latency from input to output should be less than 10ms total.
"I expected that memory usage would get out of hand quite fast due to the ever growing dictionary of arrays containing audio data, but this does not happen in practice. I suspect that the good performance is caused by highly optimized memory management of Python and modern OSes." - without concrete figures it's quite hard to evaluate this, but what did you expect to happen? With a 44.1KHz stereo audio stream, you should be storing 88.2 thousand samples a second. Say you're using 64-bit floats, as a worst case. Your audio storage should be growing at about 689KB/sec, plus a bit extra for object overhead. How much is it actually growing by? Of course Python is probably doing a bunch of allocation and deallocation for temporary objects behind the scenes, but hopefully you should not need to lean too hard on 'highly optimized memory management' - ideally, you should hardly be allocating anything at all. Also, why a dict, rather than just a large array that you can occasionally make bigger?
Finally ... I'm sure you already know that Python is possibly the worst mainstream language you could pick for realtime audio processing. But that is fine. I have tried to build audio stuff in Python too! Sometimes using the wrong tool for the job is part of the fun.
+1. In Ableton on Windows you can get your latency down to ~40ms without a dedicated sound card using ASIO. Mac's drivers are even better with sub ~20 ms on my m2 pro IIRC.
+1 to the comments here. Part of the issue here is running these applications in Python. It's not really optimized to handle these loads and do DSP-based compute efficiently.
> Mac's drivers are even better with sub ~20 ms on my m2 pro IIRC.
Just to be clear that you're measuring apples to apples with OP:
You are measuring less than 40ms roundtrip latency on your Mac. Is this correct?
You seem surprised, but any sort of live production requires this. Check out SonoBus, it achieves adequately low end-to-end latency even with network delays in the mix.
[dead]
When you see someone using python for something as real time and latency sensitive as audio don't you expect more wacky red flags on top of the fact that python is going to 50x to 100x slower than a native program?
Crazy numbers on top of a dictionary of arrays? It's all there.
Half the fun of this project has been doing it in Python and performance hasn't been an issue so far, which says something about how fast Python is already. And indeed native would be ~50x-100x faster.
I must defend the design choice for the dictionary of arrays though, this has been a very conscious choice:
Not necessarily. "I convinced this language/system to do something it really wasn't designed to do by optimising everything" is a well established genre of article :)
:)
It is really a hammer for a screw.
Interesting comment! I'm going to figure out if using another driver allows me to get under 20 ms in latency. Right now I'm measuring around 300 ms in latency round-trip, which is not a problem because I can correct for it. (I'm using a Focusrite Scarlett 2i2 with default drivers.)
The reasoning behind my comment about round-trip time was as follows:
One follow up: how could I achieve a total round-trip latency of around 10 ms total, as you describe? If I use a buffer of 500 samples @ 44.1 kHz, then I am spending already 11 ms just filling the buffer. So then the buffers need to become really small, causing more processing overhead, right? Not sure if this is the way to go.Yeah, your Scarlett should be capable of single-digit ms latency. If you're on Windows, you need to install its ASIO drivers and figure out how to use them from Python. Then, yes, use tiny buffers and run your audio processing very fast - which is where Python's slowness will probably become a real problem.
10ms latency is how long sound takes to travel 3-and-a-bit metres. So if your amp is a few metres from you, you would experience that delay between hitting the guitar strings and hearing the amplified sound. This should barely be noticeable. If you were noticing a delay greater than that in your Ableton effects setup, your settings needed tweaked. All of this is completely possible - I had a PC-based electronic drum setup in 2006, running through the Reason DAW, which had 8ms latency between hitting a pad and hearing the result.
Hmm, I wonder if Cython (static Python-to-C compiler) would make writing audio code easier/more possible?
With Ableton and the default ASIO configuration on my Scarlett I get 96 ms combined input+output latency without any processing in between, so that's probably what made my ears bleed before. Tweaking the sample rate and buffer size gets me indeed single digit latencies in Ableton. So I'm definitely going to adjust the section about latency, thanks for this!
I'm a bit on the fence about what this means for the difficult latency calibration routine in the application. Ideally I could throw the calibration routine away, but then I require that users have ASIO installed, while the app now also works with non-ASIO drivers. And indeed Python itself might become a bottleneck (making this work in Python has been half the fun).
Even without ASIO you should be able to hit 40 ms latency on pretty much any Windows audio hardware, including motherboard built-in.
If you get 300 ms you're doing something wrong. Note that Windows has multiple audio APIs, 300 ms is about the latency of the old MME api, you need to use the newer one, WASAPI.
I apparently only have the old Windows MME drivers indeed (and ASIO, on Win10). Need to look into why I can't find WASAPI and if I can assume other Windows users have those by default.
WASAPI has been available since Windows Vista. It isn’t its own set of drivers but rather a unifying layer for the WDM driver and the preceding mishmash of Windows audio APIs (MME, DirectAudio, etc). WASAPI supports low ish latencies with Exclusive Mode and then something like 10ms buffering in Shared Mode through the Windows audio server, I recall.
Put another way: any Windows audio device supports WASAPI unless it only ships with an ASIO driver which is unlikely, even in the pro audio space.
try clarett interface. it also comes with pre amps which will make your sound less noisy , scarlet preamps are just absolutely terrible. you can debug your daw to see how it uses drivers and make a python module which exposes similar functions to python. you will likely still want a delay compensation to make things seem free of any latency, but it will be doing _much_ less compensating. maybe theres an opensource daw if you want to skip reversing driver calls from a debugger.
Debugging an existing DAW to see how they do it under the hood is an interesting idea. Haven't done that yet.
About another interface: I do want to keep the application supporting cheaper interfaces such as the Scarlett, because the target audience (hobby musicians) will be using those. Still would be a nice upgrade for me!
Can take a peek at how Tracktion engine does it too
Tracktion looks like an interesting project, thanks for pointing me to this!
I don't know windows audio, but on mac audio that's wildly high latency for a scarlett interface.
I would disable any services and programs running in the background as well. Years ago I disabled the Windows print spooler and it greatly improved jitter. Not sure if that's still the case these days though, that was probably 10 years ago.
So far CPU usage hasn't been an issue at all (<1% usually on my not-very-impressive laptop), which surprised me as well
The damage that Python not having a JIT has done.
At least BASIC was designed for native code compilation from day one, and after the 8 bit home computers generation passed by, getting compilers for 16 bit home computers was rather easy.
30 years later, people insist in using bytecode interpreted language for the wrong use cases.
What about psyco? Anyway it's a very odd take that the early development of python should have been concerned with a jit. There were many far more pressing issues at the time.
Python exists for 33 years....
Yes and I've been using it for a lot of that time. Maybe you too. At the time tools like psyco were useful but never got enough traction to persuade core developers that it was a compelling direction. It never felt like an obviously wrong decision.
I only use it as Perl alternative for UNIX scripts, nothing else, unless forced otherwise.
There are enough alternatives with the same dynamism and native code generation.
I was once bitten by not understanding that there is a difference between "regular" clocks and high performance clocks/timers that a developer can take advantage of. At the time I needed a sampling routine to run at precisely once per second. My inexperience led me to go with something like thread.sleep(1000), and I learned quickly that I was mistaken in thinking it'd run with little jitter. As others are pointing out, there are also similar lessons and solutions when dealing with audio processing pipelines.
Indeed, it is not a guarantee that the "sleep" will be exactly that long. In the code I'm not "sleeping" in any sensitive places, instead I'm relying on the callback to the audio stream object, which just needs to finish before the next one starts (less of a timing constraint).
I love Python a lot and I’m first to criticise people complaining about performance when it’s irrelevant (if you’re doing long calculations in NumPy or whatever then the Python overhead is small relative to the hot path).
But as others have said, it’s not a great tool for real time audio since that overhead starts to be a larger % of the time spent. You might want to try compiling parts of it with Cython and using the C API of numpy with that (you can do cimport numpy in your Python code and it’ll remove much of the overhead as it’ll call the C functions directly rather than their wrappers).
Cython and related tools will be the direction to take if performance becomes a bottleneck indeed! Interestingly enough, the audio callback is being handled fast enough on my not-impressive laptop and CPU usage has been low on average (<1%), so Python trickery hasn't been needed yet