Ask HN: Best method to classify short audio events in real time?

7 points by desertraven 3 years ago

I don't have too much experience with statistics (or ML), but a lot of the articles I've found are quite complicated for something I expected to be simple.

There are four distinct sounds I need to detect in real time with an embedded device. Think a clap sensor, but with 4 different sounding claps.

How might I go about this? How much training data (if any) do I need to collect? Is there an off-the-shelf method to just classify a few different audio events to a high degree of accuracy, and then embed that to a microcontroller (even a computer at this point)?

Thanks!

tkanarsky 3 years ago

Edge Impulse does everything you described. It has a really nice web UI that lets you collect and annotate data, extract features, train a model, and bake it into a microcontroller image for inference. It supports a good chunk of microcontrollers and SBCs out there.

https://docs.edgeimpulse.com/docs/development-platforms/full...

t0mas88 3 years ago

This book has a good example using Tensorflow Lite on a micro controller for speech recognition on a few commands, that would probably work for different sounds: https://www.amazon.com/TinyML-Learning-TensorFlow-Ultra-Low-...

(And it's overall a nice book, very easy to read and follow along the examples)

desertraven 3 years ago

Thanks for the recommendation, I'll buy that.

simne 3 years ago

I think this task is lot more about DSP, and very little ML, just Bayes classification (I will write on it later).

Best book I know on DSP:

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

simne 3 years ago

There are at least two possible approaches:
1. just adc, and work with time series. Could be easy done on modern soft/hard, but slow.
2. create Fourier transform, or use digital filters, to filter out noise (usually by filter out some frequencies) and get some sort of time-frequency table. Than easy match your sound by Bayes, with simple formula, something like: result = k[0]f[0]+k[1]f[1]+...+k[n]*f[n], f - frequency amplitude from FFT. Than just if (result > 0.8): matched = true. Just need to find right coefficients. May be, you will need to do 2-dimension matching, with sequence of sequential FFTs, this is not much harder.