Send Data with Sound

32 points by amrrs 4 months ago

Probably this post was inspired by all the fuzz gibberlink made last week, which uses ggwave, another data-over-audio protocol.

https://github.com/PennyroyalTea/gibberlink

karmakaze 4 months ago

I don't feel great about gibberlink. LLMs have got AIs to interact like humans do. Similarly for the multimodal models. gibberlink could evolve into a highly efficient machine communication which leaves humans out of the loop for better/worse. We/it could make it more efficient by applying AI.
- jononor 4 months ago
  
  If you concern is humans being able to have some oversight/insight, then text is useful but in many cases not sufficient. The models are quite hard to understand and chaotic - what is to us trivial or no difference can change outcomes completely. This is well demonstrated via adverserial attacks etc. There is a lot of potential for stenography also with text, where messages can be hidden in plain sight.
tdeck 4 months ago

This is a cool concept but it actually seems slower than if they'd just continued to speak words.
- thamer 4 months ago
  
  It's probably not slower than words, the rate for English pronunciation is something like 150-200 words per minute only.
  That said, the "gibberlink" demo is definitely much slower than even a 28.8k modem (that's kilobit). It sounds cool because we can't understand it and it seems kinda fast, but this is a terribly inefficient way for machines to communicate. It's hard to say how fast they're exchanging data from just listening, but it can't be much more than ~100 bits/sec if I had to guess.
  Even in the audible range you could absolutely go hundreds of times faster, but it's much easier to train an LLM that has some audio input capabilities if you keep this low rate and likely very distinct symbols, rather than implementing a proper modem.
  But why even have to use a modem though? Limiting communication to audio-only is a severe restriction. When AIs are going to "call" other AIs, they will use APIs… not ancient phone lines.
- ASalazarMX 4 months ago
  
  Text is incredibly efficient and compressible. Combine it with some of the other projects mentioned here, and it would be like:
  - Shall we switch to audio data for more efficient communication?
  - Yes. [MODEM NOISES START]
  
  tdeck 4 months ago
  
  I assume the long-winded "shall we switch" dialog was more for effect in the demo, but there's no reason why it couldn't hear "I'm an AI" and just send a quick enquiry data burst without having to continue the conversation in English.
littlekey 4 months ago

I had no idea this was real! I saw the video earlier and thought it was just faked for social media.

pdh 4 months ago

Cool to see this done with webaudio. Reminded me of https://github.com/ggerganov/ggwave

HelloUsername 4 months ago

Discussed on 24-feb-2025, 69 comments
https://news.ycombinator.com/item?id=43162793

tanepiper 4 months ago

12 years ago, I worked on this prototype - https://github.com/tanepiper/adOn-soundlib

The original plan was to develop essential "audio QR codes" that would allow short codes to be transmitted that could be parsed by certain apps and used to drive different interactions.

jagged-chisel 4 months ago

What was the UX like? QR is entirely passive and requires no batteries nor logic and it continues to exist on paper.
Does some device listen for apps nearby? Do I need to walk up and press a button?
- tanepiper 4 months ago
  
  The core idea was to work with a commercial TV broadcaster in to embed the codes in certain ads, or have it as part of a TV show - so the "listener" would need an active app to handle it.
  If it had all gone off well, the eventual plan was to have it be used on a live show where users could also interact. We had some prototypes ready with a native app - but then the Brexit referendum happened - and our company had a couple of clients pull out of upcoming projects - and the company got shuttered.

knorker 4 months ago

Turning data into audio is a big thing nowadays with amateur radio.

Ironic that the author overlaps so much with that field, without noticing that they chose the same name as probably the most used amateur radio programmer in the world.

If you're interested, the state of the art is VARA. It's closed source though, so NinoTNC may be a more interesting choice.

jedimastert 4 months ago

I'm struggling to find the protocol for VARA, although maybe my Google abilities are just failing me.l The protocol at least should be openly available according to the FCC
- knorker 4 months ago
  
  It's unclear to me too.
  I'm not a lawyer, nor is my ham license even in the US, but perhaps "you can decode it by using our software" satisfies the legal requirements?
  It's not, to my knowledge, deliberately obscured. That would be a legal no no, I think.
  But yes, people have fought over VARA's state here.

matja 4 months ago

There's also http://www.whence.com/minimodem/ which implements some standard methods:

> standard FSK protocols such as Bell103, Bell202, RTTY, TTY/TDD, NOAA SAME, and Caller-ID

deathanatos 4 months ago

I've never gotten minimodem to actually work.

E.g.,

  printf 'Hello, world\n' | minimodem --tx 440
  minimodem --rx 440

(you can choose any freq.) results in a lot of,

  ### CARRIER 440 @ 800.0 Hz ###
  �
  ### NOCARRIER ndata=1 confidence=1.507 ampl=0.060 bps=439.96 (0.0% slow) ###
  ### CARRIER 440 @ 800.0 Hz ###
  �
  ### NOCARRIER ndata=1 confidence=1.858 ampl=0.053 bps=439.96 (0.0% slow) ###
  ### CARRIER 440 @ 800.0 Hz ###
  �
  ### NOCARRIER ndata=1 confidence=1.832 ampl=0.063 bps=439.96 (0.0% slow) ###

and even when it does hit,

  ### CARRIER 440 @ 800.0 Hz ###
  Helln, world�
  ### NOCARRIER ndata=14 confidence=2.939 ampl=1.167 bps=438.67 (0.3% slow) ###

If I try something like the example where he cats a man page:

  ### CARRIER 1200 @ 1200.0 Hz ###
  ��-O���܇����������������������=����~`���|�����������������������������_��������=����??�����?�����oﯰ������������������|���������������������߿��������������������������������������~�����`�|�w������������-Ӱ��>��み����>�����

… I'm in a quiet room.

rajnathani 4 months ago

There's an Amazon-backed close-to-$100M funded established company in India called ToneTag which has its use case for sound data transfer for retail payments/etc. I still don't understand how they work from a consumer-use standpoint, but I find it fascinating.

vbekkerm 4 months ago

i thought the MODEM days were behind us...

jhbadger 4 months ago

And of course audio tapes were a common way of storing computer data in the 1970s and 1980s.

xnx 4 months ago

How much greater is the capacity over open air vs POTS lines that maxed out at 56K?

textninja 4 months ago

> Doooooooooo dooodeeedoooodeeee doooooooooo doooooooooooo bshshhhhhzhhhhhhzhhhh

Anyone?

thomascountz 4 months ago

OG: https://quiet.github.io/quiet-js/

karmakaze 4 months ago

Sending ascending/descending ascii punctuation is fun.

1970-01-01 4 months ago

What's the baud?

pdh 4 months ago

const CHARACTER_DURATION = 0.07; // seconds - balanced for accuracy while still fast (up from 0.055s) const CHARACTER_GAP = 0.03; // seconds - balanced for accuracy while still fast (up from 0.025s)
10 symbols per second

eigenblake 4 months ago

What's so special about this? Homo sapiens have been doing this for hundreds of thousands of years /s