I don't feel great about gibberlink. LLMs have got AIs to interact like humans do. Similarly for the multimodal models. gibberlink could evolve into a highly efficient machine communication which leaves humans out of the loop for better/worse. We/it could make it more efficient by applying AI.
It's probably not slower than words, the rate for English pronunciation is something like 150-200 words per minute only.
That said, the "gibberlink" demo is definitely much slower than even a 28.8k modem (that's kilobit). It sounds cool because we can't understand it and it seems kinda fast, but this is a terribly inefficient way for machines to communicate. It's hard to say how fast they're exchanging data from just listening, but it can't be much more than ~100 bits/sec if I had to guess.
Even in the audible range you could absolutely go hundreds of times faster, but it's much easier to train an LLM that has some audio input capabilities if you keep this low rate and likely very distinct symbols, rather than implementing a proper modem.
But why even have to use a modem though? Limiting communication to audio-only is a severe restriction. When AIs are going to "call" other AIs, they will use APIs… not ancient phone lines.
I assume the long-winded "shall we switch" dialog was more for effect in the demo, but there's no reason why it couldn't hear "I'm an AI" and just send a quick enquiry data burst without having to continue the conversation in English.
The original plan was to develop essential "audio QR codes" that would allow short codes to be transmitted that could be parsed by certain apps and used to drive different interactions.
Turning data into audio is a big thing nowadays with amateur radio.
Ironic that the author overlaps so much with that field, without noticing that they chose the same name as probably the most used amateur radio programmer in the world.
If you're interested, the state of the art is VARA. It's closed source though, so NinoTNC may be a more interesting choice.
I'm struggling to find the protocol for VARA, although maybe my Google abilities are just failing me.l The protocol at least should be openly available according to the FCC
const CHARACTER_DURATION = 0.07; // seconds - balanced for accuracy while still fast (up from 0.055s)
const CHARACTER_GAP = 0.03; // seconds - balanced for accuracy while still fast (up from 0.025s)
Probably this post was inspired by all the fuzz gibberlink made last week, which uses ggwave, another data-over-audio protocol.
https://github.com/PennyroyalTea/gibberlink
I don't feel great about gibberlink. LLMs have got AIs to interact like humans do. Similarly for the multimodal models. gibberlink could evolve into a highly efficient machine communication which leaves humans out of the loop for better/worse. We/it could make it more efficient by applying AI.
This is a cool concept but it actually seems slower than if they'd just continued to speak words.
It's probably not slower than words, the rate for English pronunciation is something like 150-200 words per minute only.
That said, the "gibberlink" demo is definitely much slower than even a 28.8k modem (that's kilobit). It sounds cool because we can't understand it and it seems kinda fast, but this is a terribly inefficient way for machines to communicate. It's hard to say how fast they're exchanging data from just listening, but it can't be much more than ~100 bits/sec if I had to guess.
Even in the audible range you could absolutely go hundreds of times faster, but it's much easier to train an LLM that has some audio input capabilities if you keep this low rate and likely very distinct symbols, rather than implementing a proper modem.
But why even have to use a modem though? Limiting communication to audio-only is a severe restriction. When AIs are going to "call" other AIs, they will use APIs… not ancient phone lines.
Text is incredibly efficient and compressible. Combine it with some of the other projects mentioned here, and it would be like:
- Shall we switch to audio data for more efficient communication?
- Yes. [MODEM NOISES START]
I assume the long-winded "shall we switch" dialog was more for effect in the demo, but there's no reason why it couldn't hear "I'm an AI" and just send a quick enquiry data burst without having to continue the conversation in English.
I had no idea this was real! I saw the video earlier and thought it was just faked for social media.
12 years ago, I worked on this prototype - https://github.com/tanepiper/adOn-soundlib
The original plan was to develop essential "audio QR codes" that would allow short codes to be transmitted that could be parsed by certain apps and used to drive different interactions.
What was the UX like? QR is entirely passive and requires no batteries nor logic and it continues to exist on paper.
Does some device listen for apps nearby? Do I need to walk up and press a button?
There's also http://www.whence.com/minimodem/ which implements some standard methods:
> standard FSK protocols such as Bell103, Bell202, RTTY, TTY/TDD, NOAA SAME, and Caller-ID
I've never gotten minimodem to actually work.
E.g.,
(you can choose any freq.) results in a lot of, and even when it does hit, If I try something like the example where he cats a man page: … I'm in a quiet room.Cool to see this done with webaudio. Reminded me of https://github.com/ggerganov/ggwave
Discussed on 24-feb-2025, 69 comments
https://news.ycombinator.com/item?id=43162793
> Doooooooooo dooodeeedoooodeeee doooooooooo doooooooooooo bshshhhhhzhhhhhhzhhhh
Anyone?
i thought the MODEM days were behind us...
And of course audio tapes were a common way of storing computer data in the 1970s and 1980s.
How much greater is the capacity over open air vs POTS lines that maxed out at 56K?
Sending ascending/descending ascii punctuation is fun.
Turning data into audio is a big thing nowadays with amateur radio.
Ironic that the author overlaps so much with that field, without noticing that they chose the same name as probably the most used amateur radio programmer in the world.
If you're interested, the state of the art is VARA. It's closed source though, so NinoTNC may be a more interesting choice.
I'm struggling to find the protocol for VARA, although maybe my Google abilities are just failing me.l The protocol at least should be openly available according to the FCC
It's unclear to me too.
I'm not a lawyer, nor is my ham license even in the US, but perhaps "you can decode it by using our software" satisfies the legal requirements?
It's not, to my knowledge, deliberately obscured. That would be a legal no no, I think.
But yes, people have fought over VARA's state here.
What's the baud?
const CHARACTER_DURATION = 0.07; // seconds - balanced for accuracy while still fast (up from 0.055s) const CHARACTER_GAP = 0.03; // seconds - balanced for accuracy while still fast (up from 0.025s)
10 symbols per second
What's so special about this? Homo sapiens have been doing this for hundreds of thousands of years /s