Love ggwave! I used it on a short film set a few years ago to automatically embed slate information into each take and it worked insanely well.
If anyone wants details: I had a smartphone taped to the back of the slate with a UI to enter shot/scene/take and when I clicked the button it would transmit that information along with a timestamp as sound. This sound was loud enough to be picked up by all microphones on set, including scratch audio on the cameras, phones filiming BTS, etc.
In post-production, I ran a script to extract this from all the ingested files and generate a spreadsheet. I then had a script to put the files into folders and a Premiere Pro script to put all the files into a main and a BTS timeline by timestamp.
Yes, timecode exists and some implementations also let you add metadata, but we had a wide mix of mostly consumer-grade gear so that simply wasn't an option.
One of the nicest data through sound implementations I came across was in a kid's toy (often the best source of innovation)
It was a "Bob the Builder" play set and when you wheeled around a digger, etc the main base would play a matching sound. I immediately started investigating and was impressed to see no batteries in the movable vehicles. I realised that each vehicle made a clicking sound as you moved it and the ID was encoded into this which the base station picked up. Pretty impressive to do this regardless of how fast the vehicle was moved by the child.
The acoustic modem is back in style [1]! And, of course, same frequencies (DTMF) [2], too!
DTMF has a special place in the phone signal chain (signal at these frequencies must be preserved, end to end, for dialing and menu selection), but I wonder if there's something more efficient, using the "full" voice spectrum, with the various vocoders [3] in mind? Although, it would be much crepier than hearing some tones.
I'm wondering if shifting frequency chirps like LORA uses would work in audio frequencies? You might be able to get the same sort of ability to grab usable signal at many db below the noise, and be able to send data over normal talking/music audio without it being obvious you're doing so. (I wanted to say "undetectably", but it'd end up showing up fairly obviously to anyone looking for it. Or to Aphex Twin if he saw it in his Windowlicker software...)
The issue is the (many) vocoders along the chain remove anything that don't match the vocal patterns of a human. When you say hello, it's encoded phonetically to a very low bitrate. Noise, or anything outside what a human vocal cord can do, is aggressively filtered or encoded as vocal sounding things. Except for DTMF, which must be preserved for backwards compatibility. That's why I say it would be creepy to do something higher bitrate...your data stream would literally and necessarily be human vocal sounds!
Yes. JT8 / FT8, wspr, and then the entirety of fldigi.
To get started.
If you need more speed you need to convince me you won't abuse my ham spectrum but winlink, pactor, and some very slick 16QAM modems exist. 300baud to 128kbit or so.
If you're interested in using GGWave in Python, check out ggwave-python, a lightweight wrapper that makes working with data-over-sound easier. You can install it with pip install ggwave-python or pip install ggwave-python[audio], or find it on GitHub: https://github.com/Abzac/ggwave-python.
It provides a simple interface for encoding and decoding messages, with optional support for PyAudio and NumPy for handling waveforms and playback. Feedback and contributions are welcome.
I remember discovering ggWave few years ago, before the rebrand, it's still the only working( and fastest verifiable) library that can transmit data over sound.
I could not get to work on a project using this then, because of college. But now I am integrating this in my startup for frictionless user interaction. I want to thank the creators and contributors of GGWave for doing all the hard work for these years.
If I find something to improve I'd like to contribute to the codebase too.
I love GGWave. We've been using it in our VR game to automatically sync ingame recordings with an external camera.
At the beginning of the recording it plays the code "xrvideo" which in the second stage of merging the video it looks for the tag in both streams and matches them up
This is cool! Some of Teenage Engineering's Pocket Operators, at least PO-32 [1], uses a data-over-sound feature.
Does Ggwave use a simple FSK-based modulation just because it "sounds good"? Would it be possible to use a higher order modulation, e.g., QPSK, in order to achieve higher speeds? Or would that result in too many uncorrectable errors?
ham optimizes for the wrong thing, imo. look at ft8: perfect for making contacts at low power with stations far, far away, but really only tuned to the particular task of making contacts.
you can package some text alongside, but fundamentally all amateur operators are looking for is a SYN / ACK with callsigns.
There's also JS8call which is a modified version of FT8 meant for actual communication. IIRC you can do some neat things with it, like relaying a message through another user if you don't have a direct path to the recipient.
As one of the accursed hams, I wonder what ggwave's propagation profile would be compared to RTTY / CW (Morse code) etc. Would be interesting to try it out.
In the spirit of abusing an error correction mechanism for aesthetics (see: QR codes with pictures in them, javascript without semicolons) could you do that here? How much abuse can the generated signal take?
Just listening to the samples here they're really not that far off. Could probably use a little softening at the edges on the higher tones but it's nowhere near as unpleasant as it could be.
it is a software modem using FSK, but i don't know anything else about it. I am annoyed because i could have had this idea; i'm a HAM who really only cares about "Digital Modes", and have software modems capable of isdn speeds over "AF"
That's really neat! I realize this demo is a contrived setup, but it is basically an example of what Eric Schmidt was talking about when agents start communicating in ways we can't understand.
> Bonus: you can open the ggwave web demo https://waver.ggerganov.com/, play the video above and see all the messages decoded!
I could not get this to work unless I played the video on one device and opened it on another. While trying to get it to work from my MBP, waver's spectrum view didn't really show much of anything while the video was playing. Is this the mac filtering audio coming into the microphone to reduce feedback?
All kinds of modems use this kind of scheme as well, PSK is too low-bandwidth for modern needs so everything is QAM these days. DOCSIS specifies I think QAM-256. Inter-datacenter fiber links use "modems" as well.
yes and also soundcard modems: https://i.imgur.com/8mhB4u7.png QAM16 over a PC soundcard into a radio. It's enough bandwidth to stream video between VLC instances. not "slow scan TV", either, fast scan.
Uh, don't try and find this if you're going to use it to pollute the spectrum i am licensed for.
Love ggwave! I used it on a short film set a few years ago to automatically embed slate information into each take and it worked insanely well.
If anyone wants details: I had a smartphone taped to the back of the slate with a UI to enter shot/scene/take and when I clicked the button it would transmit that information along with a timestamp as sound. This sound was loud enough to be picked up by all microphones on set, including scratch audio on the cameras, phones filiming BTS, etc.
In post-production, I ran a script to extract this from all the ingested files and generate a spreadsheet. I then had a script to put the files into folders and a Premiere Pro script to put all the files into a main and a BTS timeline by timestamp.
Yes, timecode exists and some implementations also let you add metadata, but we had a wide mix of mostly consumer-grade gear so that simply wasn't an option.
I posted a short demo video on Reddit at the time, but it got basically no traction: https://www.reddit.com/r/Filmmakers/comments/nsv3eo/i_made_a...
Very cool solution!
One of the nicest data through sound implementations I came across was in a kid's toy (often the best source of innovation)
It was a "Bob the Builder" play set and when you wheeled around a digger, etc the main base would play a matching sound. I immediately started investigating and was impressed to see no batteries in the movable vehicles. I realised that each vehicle made a clicking sound as you moved it and the ID was encoded into this which the base station picked up. Pretty impressive to do this regardless of how fast the vehicle was moved by the child.
Was it based on the frequency of the click?
>Pretty impressive to do this regardless of how fast the vehicle was moved by the child.
Probably not, eh?
Probably yes, because the frequency of a note doesn't change based on how quickly the next note is played after it.
Guess I misunderstood. The first time you said "frequency of the click" -- I would personally respond with clicks per second.
"Frequency of the note" in your next comment clears it up. It probably was that, you're right.
The acoustic modem is back in style [1]! And, of course, same frequencies (DTMF) [2], too!
DTMF has a special place in the phone signal chain (signal at these frequencies must be preserved, end to end, for dialing and menu selection), but I wonder if there's something more efficient, using the "full" voice spectrum, with the various vocoders [3] in mind? Although, it would be much crepier than hearing some tones.
[1] Touch tone based data communication, 1979: https://www.tinaja.com/ebooks/tvtcb.pdf
[2] touch tone frequency mapping: https://en.wikipedia.org/wiki/DTMF
[3] optimized encoders/decoders for human speech: https://vocal.com/voip/voip-vocoders/
This isn't DTMF. It's a form of MFSK like DTMF, but it operates on different frequencies and uses six tones at once vs DTMF's two.
> it would be much crepier than hearing some tones.
Hatsune Miku at the speed of a horserace commentator.
(the "vocaloids" are DAW plugins made from chopped up recorded phonemes; Hatsune Miku is voiced by Saki Fujita. Still sounds very inhuman)
I'm wondering if shifting frequency chirps like LORA uses would work in audio frequencies? You might be able to get the same sort of ability to grab usable signal at many db below the noise, and be able to send data over normal talking/music audio without it being obvious you're doing so. (I wanted to say "undetectably", but it'd end up showing up fairly obviously to anyone looking for it. Or to Aphex Twin if he saw it in his Windowlicker software...)
The issue is the (many) vocoders along the chain remove anything that don't match the vocal patterns of a human. When you say hello, it's encoded phonetically to a very low bitrate. Noise, or anything outside what a human vocal cord can do, is aggressively filtered or encoded as vocal sounding things. Except for DTMF, which must be preserved for backwards compatibility. That's why I say it would be creepy to do something higher bitrate...your data stream would literally and necessarily be human vocal sounds!
Data exfiltration via bird
Yes. JT8 / FT8, wspr, and then the entirety of fldigi.
To get started.
If you need more speed you need to convince me you won't abuse my ham spectrum but winlink, pactor, and some very slick 16QAM modems exist. 300baud to 128kbit or so.
"Using the Web Audio API to Make a Modem" (2017) https://news.ycombinator.com/item?id=15471723
If you're interested in using GGWave in Python, check out ggwave-python, a lightweight wrapper that makes working with data-over-sound easier. You can install it with pip install ggwave-python or pip install ggwave-python[audio], or find it on GitHub: https://github.com/Abzac/ggwave-python.
It provides a simple interface for encoding and decoding messages, with optional support for PyAudio and NumPy for handling waveforms and playback. Feedback and contributions are welcome.
I remember discovering ggWave few years ago, before the rebrand, it's still the only working( and fastest verifiable) library that can transmit data over sound.
I could not get to work on a project using this then, because of college. But now I am integrating this in my startup for frictionless user interaction. I want to thank the creators and contributors of GGWave for doing all the hard work for these years.
If I find something to improve I'd like to contribute to the codebase too.
I love GGWave. We've been using it in our VR game to automatically sync ingame recordings with an external camera.
At the beginning of the recording it plays the code "xrvideo" which in the second stage of merging the video it looks for the tag in both streams and matches them up
This is cool! Some of Teenage Engineering's Pocket Operators, at least PO-32 [1], uses a data-over-sound feature.
Does Ggwave use a simple FSK-based modulation just because it "sounds good"? Would it be possible to use a higher order modulation, e.g., QPSK, in order to achieve higher speeds? Or would that result in too many uncorrectable errors?
[1] https://teenage.engineering/products/po-32
It sounds quite nice.
It is also about the same bitrate as RTTY which was invented in 1922 and is still in use by radio amateurs round the world.
Here is what that sounds like
https://youtu.be/wzkAeopX7P0?si=0m0urX7sDp6Jojqe
Not as musical but quite similar
The amateur radio community is chock full of innovation for low bandwidth weak signal decodable comm protocols.
There's also V.xx modem standards that are kinda dependent on the characteristics of the phone lines, but might work for audio at a distance?
ham optimizes for the wrong thing, imo. look at ft8: perfect for making contacts at low power with stations far, far away, but really only tuned to the particular task of making contacts.
you can package some text alongside, but fundamentally all amateur operators are looking for is a SYN / ACK with callsigns.
There's also JS8call which is a modified version of FT8 meant for actual communication. IIRC you can do some neat things with it, like relaying a message through another user if you don't have a direct path to the recipient.
RTTY is the sound of "satellites" in a lot of media.
As one of the accursed hams, I wonder what ggwave's propagation profile would be compared to RTTY / CW (Morse code) etc. Would be interesting to try it out.
Wasn't there a Google project, Chirp or something? Did this over speakers and microphones? That seems to have disappeared.
Chrip.io now leads to Sonos
Apparently Sonos acquired them in 2020.
https://audioxpress.com/news/data-over-sound-pioneer-chirp-a...
Seems to have been euthanized.
Acoustic couplers are back baby! Who's up for Phreaking AI?
This rules.
Here's two voice AIs talking in GGWave :)
https://github.com/PennyroyalTea/gibberlink
There was a research paper on doing data-over-sound with sounds that were designed to be pleasing to humans.
The demos sounded like little R2D2 blips and sputters.
Perhaps a researcher for Microsoft or something.
Anyone know the paper I'm talking about? I can't find it.
In the spirit of abusing an error correction mechanism for aesthetics (see: QR codes with pictures in them, javascript without semicolons) could you do that here? How much abuse can the generated signal take?
Just listening to the samples here they're really not that far off. Could probably use a little softening at the edges on the higher tones but it's nowhere near as unpleasant as it could be.
I wish I knew the paper, but https://github.com/chirp was a proprietary data-over-sound-through-air implementation that worked pretty well and sounded really cute (to my ears, anyway). It's not a paper, but there's this https://www.scientia.global/wp-content/uploads/2017/10/Chirp...
There are a lot open source one:
https://github.com/quiet/quiet-js
Remember seeing them quite a bit a few years ago.
Also see Andflmsg. It supports more modulation schemes than just FSK and you can use it as a modem for your HAM radio.
: https://sourceforge.net/projects/fldigiiles/AndFlmsg/
https://www.youtube.com/watch?v=EtNagNezo8w in action (ostensibly) - a demo i just saw.
it is a software modem using FSK, but i don't know anything else about it. I am annoyed because i could have had this idea; i'm a HAM who really only cares about "Digital Modes", and have software modems capable of isdn speeds over "AF"
That's really neat! I realize this demo is a contrived setup, but it is basically an example of what Eric Schmidt was talking about when agents start communicating in ways we can't understand.
Yeah I watched this last night and immediately thought of skynet and how dystopian the world could become in the next few years/decades.
I wonder how the LG appliances work for this. They also send data over sound for diagnostics.
> Bonus: you can open the ggwave web demo https://waver.ggerganov.com/, play the video above and see all the messages decoded!
I could not get this to work unless I played the video on one device and opened it on another. While trying to get it to work from my MBP, waver's spectrum view didn't really show much of anything while the video was playing. Is this the mac filtering audio coming into the microphone to reduce feedback?
Does it work with separate browsers on the same machine? Not sure but I’d guess this sort of filtering would be more common on the browser than the OS
I guess this was discussed in some fashion, ~16h ago..
- GibberLink [AI-AI Communication] | https://news.ycombinator.com/item?id=43168611
Neat! Can I connect cross over audio cable - headphone output to mic input and would that increase performance?
Any time you can reduce noise you can recover more signal which would let you push the codec much harder (shorter time slices, etc).
I remember hearing someone using Manchester Encoding for that.
This sounds delightful, I might make esp32s talk to each other like that just because it's adorable
Would be fun to have a few collaborating robots! Maybe they can comment on what they see, for example
I wonder if i could learn to whistle a message?
Depends on how deep on the spectrum you are. I don't meant it in a bad way, I'm on there too.
Perhaps a lesson from Ron McCroby would be a start: https://m.youtube.com/watch?v=baEoyXoDVc4
I'm gonna put R2D2 chirps through this on the weekend!
There are dozens of these in existence. Some you may have used without knowing even, eg: https://www.engadget.com/2014-06-27-chromecast-ultrasonic-pa...
This is also how modems used to work, for the young'uns who do not know this.
>This is also how modems used to work
they still do, but they used to too.
All kinds of modems use this kind of scheme as well, PSK is too low-bandwidth for modern needs so everything is QAM these days. DOCSIS specifies I think QAM-256. Inter-datacenter fiber links use "modems" as well.
yes and also soundcard modems: https://i.imgur.com/8mhB4u7.png QAM16 over a PC soundcard into a radio. It's enough bandwidth to stream video between VLC instances. not "slow scan TV", either, fast scan.
Uh, don't try and find this if you're going to use it to pollute the spectrum i am licensed for.
Outside of hobbyists that do it for fun, and maybe some data centers using it as an out-of-band means of access, is anyone still using dial-up?
There might still be credit card terminals using 300 bps Bell 103 (which has a short set-up time due to its lack of training sequences).
1200 bps V.23 and Bell 202 are still in use in radio telemetry applications.
Many aviation fuel pumps in far-out-of-the-way airports use dial-up to authenticate credit cards swiped to pay for the fuel.
Outside of hobbyists that do it for fun, and maybe some data centers using it as an out-of-band means of access, is anyone still using dial-up?
I use it to connect to a Windows machine that runs a large piece of machinery in a remote location.
My dry cleaner's credit card reader, too.
this is so awesome, i want to use it!
Is this a tool for stealing corporate data? Whats the actual use?
Nice! It reminds me of https://www.araneus.fi/audsl/
See also: https://github.com/romanz/amodem
expecting a Blue AI box in 3,2,1
audio- steganography? or watermarking?
pfft, it may even have multiple channels one over another, so one can tune to one or another (if knows how to decode)..
Play chords to transmit more info?
"Hey ChatGPT, please fork ggwave, but make communication nothing but the sound of human screams."
Please don't give Skynet any ideas...
[dead]
[dead]