I wish there was an easier way to uniformly rewrite links across applications, one which annoys me is Reddit - they seem incapable of consistently linking to content on their own site, some could be fixed with a greasemonkey script but that doesn't work on mobile or across apps.
But this is part of a wider pattern of the internet becoming deep fried, instead of a link we get a short code to a facebook page with a bot reposting tiktok videos of a phone screen-recording of an editorialized livestream of somebody watching a screen-recording of the video on youtube - and ofcourse it starts half way through then loops round again and plays twice with three sets of black bands, content creator @names, watermarks, wifi & signal indicators etc.
If we could all do one thing to help stop the spread of this cancer, that would be great: de-duplication via content addressable links / tags.
Good phrase. I've frequently heard the term "Kentucky Fried Education"
applied to the decay of schools and universities under the influence
of big-tech mediocrity. My Digital Vegan response would be that
"heavily processed" content harms your intellectual health because
it's not fresh and has no vitamins :)
Ted Nelson anticipated this "death by indirection". Some of his early
writings on hypertext still seem like they're from a future we might
one day get to. The idea that links should be bi-directional, and how
different the Internet would be, is still mind-blowing.
Most of the problems you describe stem from the mediating platforms
being fundamentally controlling and dishonest. Everything is a
work-around to around a work-around... all the way down.
> will never happen
I get the shrugging cynicism. But to me it's a good thing they're too
broken to change. Let these festering sewers of low-quality content
digest themselves and make room for new growth.
Couldn't a bi-directional link still be written into HTML? Like any other tag, it could simply go unused while being available so nobody feels obligated.
I'm not sure. It seems a reasonable question. I think the obstacle is
the browser monopoly. For the same reasons the "semantic web" and RDF
knowledge tri-graphs failed in the wild - the current paradigm is
frozen in a "works for me" state of arrested development. You could
theoretically build this behind the existing "web" but it would get no
critical purchase.
Just installed this in my own network. Already had my own CA so I just took all the supported domains and generate a certificate for it. The list kind of pollutes my Pihole domain overrides but that's alright by me.
It works well! I'm running it with Nginx in a Proxmox LXC container where I've allocated a meager 64MB of RAM for it and it still has RAM to spare. I wish I could say the same of other "small" tools from across the web that I'm running.
I like the minimalist web pages and the fact it auto-resolves multiple redirects on its own (bit.ly -> msft.it -> aka.ms -> ...) without making you wait for the page to load and the fact it removes tracking parameters for you. I know there are online tools and extensions that the same but those are a pain to install on mobile.
This is awesome, thanks for taking the time to try it out! I honestly threw it together over a few hours this weekend so it's in a very rough state, and there's no unit tests for it, and none of the code is commented or structured that well so there's probably a lot of edge cases it does not account for.
If you want to hear of one of the worst URL forwarding "services," Facebook is it. They got so bothered by people stripping all the tracking data from links they were sharing that they actually embedded all their tracking within the primary portion of the URL. If you want the truly original URL the only way to get it is to abuse one of their APIs.
Nice writeup, and a nice demonstration of one of WebPKI's limitations!
I understand why both HPKP and Expect-CT have been obsoleted, but it's a bummer that we still don't have a good enforcement mechanism for CA/cert pinning for a particular site. CT itself does a reasonable job of mitigating the "globally visible mis-issuring CA" problem, but does nothing to help users whose certificate stores contain all kinds of mystery enterprise or application-installed CAs.
Is your argument that it shouldn't be possible for a user to intercept t.co in this way? Seems like a perfectly valid use case (sidecar process to unwrap 9 layers of redirects from an anonymous browsing context). If the sidecar is validating the original t.co certs and you trust it then what's the problem?
> Is your argument that it shouldn't be possible for a user to intercept t.co in this way?
Not necessarily; the argument is that it's indistinguishable from a malicious MiTM. I think this is a great and legitimate use, but it's also probably something that website providers should be able to make themselves resilient against (or, at the least, be able to audit when it happens).
If I configure my computer that I own to use my CA and proxy certain traffic, I don't see how that's any website's business. They have as much right to audit how I've set up my computers as I do theirs.
There are different models here: for every person like you who's trying to reasonably proxy their local traffic, there's a nation state, overbearing educational software provider, &c. who's trying to get access to sensitive and potentially life-affecting communications. When it comes to things like finances, private chats, &c.
I think there's a reasonable argument to be made that it's in the website's (and my!) interests to be able to detect and prevent these kinds of man-in-the-middling.
The problem is that in practice, at least in the US, the most realistic threats are from websites you visit delivering drive-by malware (e.g. spyware and adware), which they actually do constantly. It's such a common practice that it's not even usually phrased that way, but just imagine if you exploited eBay's web servers to port scan their internal network, which is exactly what they did to customers. The responsible employees should be criminally charged for that.
It doesn't matter if it's in the websites interests. The client computer does not belong to them, and it's definitely not in the owner's interests to let others "audit" them just like it's not in web hosts interests to let us "audit" their nginx configs.
I think you're setting up a false dichotomy here: I believe strongly in client filtering and in empowering users to do whatever they need to do to flush out the junk that comes with the modern online experience. I do it on my own devices, though both browser extensions and a local DNS server. I'd even consider doing it with a root CA, if it came to that (but so far it hasn't).
When I say "audit," I mean in the sense that existing ecosystems like CT already provide automatic auditability of certificate issuance. We're not talking about a private company sleuthing through your computer; we're talking about a way to enforce the stated security model that most users expect when a connection is described as "encrypted."
IIRC comcast and friends used to intercept plaintext HTTP connections and add advertising to them, so I don’t see why you’d consider this scenario uncommon.
OCSP stapling should help in situations where the website provider wants to ensure clients are not being MITMed. Some SAAS providers are actively using it from experience (enterprise SSL inspection products play havok with it).
One thing I'd neglected to mention in the post is the sidecar uses a public DNS resolver to get the actual t.co link, but it's making the assumption that Go's stdlib enforces this: https://github.com/djhworld/theunwrapper/blob/main/unwrap/un... and doesn't fallback to the system one.
So there is that issue....I guess one way to mitigate it would be to run the sidecar out of the network, or at least have a clean DNS config and not have my custom CA in the root store...i.e. you'd want to be double sure you're going to the real thing and only accepting trusted certs signed by a trusted root.
Clever solution! The author keeps emphasizing how hacky and what a terrible idea this is, but it's really not provided they trust their own infrastructure. This is exactly how SSL decryption in corporate environments works as well - MITM traffic using a cert issued by your own CA.
I think I was more bracing myself for a deluge of disapproving comments, so just wanted to make sure to emphasise I know the drawbacks of the approach and understand the risks etc :)
I thought it was odd you kept calling it "bad" and "awful". It's exactly as secure as any other certificate on the web. Arguably more, as you're aware of any access to the keys. The only differentiation is a commercial interest, nothing "bad" at all.
For anyone who wants to check a specific link, https://wheregoes.com/ does a good job of tracing where a Twitter link (or any other redirected link) goes. I just tried it and it works on t.co links.
Author here, yeah I mention this at the start, there's quite a few of these link uncloakers.
The annoying part is having to copy the link, navigate to the website, paste in the link etc.
I was looking for something more "seamless" and works cross device (e.g. on phone, in the Twitter app etc) not just browsers. With this you just click the t.co link and the result is there instantly.
I have a messy python script that does something like this for a bunch of sites. I've been wanting to rewrite it into a browser extension someday. Doing it at DNS level hadn't occurred to me. The python script is able to clean up a lot of tracking parameters though, while leaving the base domain the same, as well as bypassing some redirectors like t.co.
Maybe I’m missing something, but it seems like this doesn’t need to be a DNS rewrite at all. Couldn’t you just set a proxy server for outgoing requests that passes through for ones that aren’t a redirect? Or is the trade off there performance (since you’re proxying every request instead of just certain DNS names)?
If you're already running a squid style proxy server, you'd need to MITM the TLS request regardless. If you're just doing it for a single site, the DNS solution is a lot more light weight.
If (hypothetically at least) the "go service" is running on the device, then if one controls DNS on the device, one can set the A record for the t.co domainname to the address of the go service. In that case, does one still need a "reverse proxy".
Why does service need to run on a different port, i.e., not 80 or 443. What about running the service on the loopback, 127.0.0.2 or whatever. I ask because I have run into similar problems.
Using nginx, caddy, apache etc means you can offload the SSL handling there and let the application just be an application. In the age of containers, setting up a reverse proxy to handle it is minimal effort.
> I only really noticed this properly when my DNS sinkholing server (Adguard home) started blocking t.co links and I was getting an error when say, clicking a linked news article
You don't always see the URL, sometimes it's truncated or just the t.co link, it's not very consistent.
Additionally if I want to share a URL with friends it's usually polluted with all sorts of nonsense query parameters, this tool strips them and gives you a nice clean one.
As for the tracking element yeah, the link shortening services will still see my IP connecting to the service, doing a head request, just with a different user agent (whatever the one go's stdlib sets). One extra layer could be to move the go service out of the network, but tbh I didn't start this project with these considerations in mind
It’s not like the original URL is somehow encoded in the short URL, so this bot still has to request from t.co (and everywhere in between) to get the target.
Unlikely as the logs are appended to by CAs when they issue certificates.
Browsers could in theory contribute data but the infrastructure to support that would likely to be orders of magnitude bigger. I'm scared even thinking about going down that rabbit hole due to the expectation of what would be found (MITM evidence). :_(
> I run my own self-signed Certificate Authority (CA) and my reverse proxy uses certs signed by this CA
There's no need to track certificates from private CAs. In order for your browser to trust the certificates you need to manually trust the CA (or your system admin will configure it). Everyone else will see it as invalid.
I wish there was an easier way to uniformly rewrite links across applications, one which annoys me is Reddit - they seem incapable of consistently linking to content on their own site, some could be fixed with a greasemonkey script but that doesn't work on mobile or across apps.
But this is part of a wider pattern of the internet becoming deep fried, instead of a link we get a short code to a facebook page with a bot reposting tiktok videos of a phone screen-recording of an editorialized livestream of somebody watching a screen-recording of the video on youtube - and ofcourse it starts half way through then loops round again and plays twice with three sets of black bands, content creator @names, watermarks, wifi & signal indicators etc.
If we could all do one thing to help stop the spread of this cancer, that would be great: de-duplication via content addressable links / tags.
But it'll never happen.
> the internet becoming deep fried
Good phrase. I've frequently heard the term "Kentucky Fried Education" applied to the decay of schools and universities under the influence of big-tech mediocrity. My Digital Vegan response would be that "heavily processed" content harms your intellectual health because it's not fresh and has no vitamins :)
Ted Nelson anticipated this "death by indirection". Some of his early writings on hypertext still seem like they're from a future we might one day get to. The idea that links should be bi-directional, and how different the Internet would be, is still mind-blowing.
Most of the problems you describe stem from the mediating platforms being fundamentally controlling and dishonest. Everything is a work-around to around a work-around... all the way down.
> will never happen
I get the shrugging cynicism. But to me it's a good thing they're too broken to change. Let these festering sewers of low-quality content digest themselves and make room for new growth.
Couldn't a bi-directional link still be written into HTML? Like any other tag, it could simply go unused while being available so nobody feels obligated.
I'm not sure. It seems a reasonable question. I think the obstacle is the browser monopoly. For the same reasons the "semantic web" and RDF knowledge tri-graphs failed in the wild - the current paradigm is frozen in a "works for me" state of arrested development. You could theoretically build this behind the existing "web" but it would get no critical purchase.
Just installed this in my own network. Already had my own CA so I just took all the supported domains and generate a certificate for it. The list kind of pollutes my Pihole domain overrides but that's alright by me.
It works well! I'm running it with Nginx in a Proxmox LXC container where I've allocated a meager 64MB of RAM for it and it still has RAM to spare. I wish I could say the same of other "small" tools from across the web that I'm running.
I like the minimalist web pages and the fact it auto-resolves multiple redirects on its own (bit.ly -> msft.it -> aka.ms -> ...) without making you wait for the page to load and the fact it removes tracking parameters for you. I know there are online tools and extensions that the same but those are a pain to install on mobile.
This is awesome, thanks for taking the time to try it out! I honestly threw it together over a few hours this weekend so it's in a very rough state, and there's no unit tests for it, and none of the code is commented or structured that well so there's probably a lot of edge cases it does not account for.
But still, glad it worked!
If you want to hear of one of the worst URL forwarding "services," Facebook is it. They got so bothered by people stripping all the tracking data from links they were sharing that they actually embedded all their tracking within the primary portion of the URL. If you want the truly original URL the only way to get it is to abuse one of their APIs.
That 9-hop shortening example is disgusting, I wonder if that could be alleviated by introducing some protocol:
1. Make all shortening services append a `This-is-a-shortening-service: true` header to all the responses they send.
2. When a link is added to a shortening service, check if the response from the link has the header above and resolve the destination, recursively.
awesome-url-shortener: https://github.com/738/awesome-url-shortener
/? shorturl api OpenAPI https://www.google.com/search?q=shorturl+api+openapi
- TinyURL OpenAPI: https://tinyurl.com/app/dev
- GH topic: url-shortener: https://github.com/topics/url-shortener
A https://schema.org/Thing may have zero or more https://schema.org/url and/or https://schema.org/identifier ; and then first the ?s subject URI that's specified with the `@id` property in JSONLD RDF.
You can add string, schema:Thing, or URI tags/labels with the https://schema.org/about property.
It's disgusting but unfortunately it's not a bug - it a feature which allows each redirect hop to collect (and sell/use) some data about you.
Nice writeup, and a nice demonstration of one of WebPKI's limitations!
I understand why both HPKP and Expect-CT have been obsoleted, but it's a bummer that we still don't have a good enforcement mechanism for CA/cert pinning for a particular site. CT itself does a reasonable job of mitigating the "globally visible mis-issuring CA" problem, but does nothing to help users whose certificate stores contain all kinds of mystery enterprise or application-installed CAs.
Is your argument that it shouldn't be possible for a user to intercept t.co in this way? Seems like a perfectly valid use case (sidecar process to unwrap 9 layers of redirects from an anonymous browsing context). If the sidecar is validating the original t.co certs and you trust it then what's the problem?
> Is your argument that it shouldn't be possible for a user to intercept t.co in this way?
Not necessarily; the argument is that it's indistinguishable from a malicious MiTM. I think this is a great and legitimate use, but it's also probably something that website providers should be able to make themselves resilient against (or, at the least, be able to audit when it happens).
If I configure my computer that I own to use my CA and proxy certain traffic, I don't see how that's any website's business. They have as much right to audit how I've set up my computers as I do theirs.
There are different models here: for every person like you who's trying to reasonably proxy their local traffic, there's a nation state, overbearing educational software provider, &c. who's trying to get access to sensitive and potentially life-affecting communications. When it comes to things like finances, private chats, &c.
I think there's a reasonable argument to be made that it's in the website's (and my!) interests to be able to detect and prevent these kinds of man-in-the-middling.
The problem is that in practice, at least in the US, the most realistic threats are from websites you visit delivering drive-by malware (e.g. spyware and adware), which they actually do constantly. It's such a common practice that it's not even usually phrased that way, but just imagine if you exploited eBay's web servers to port scan their internal network, which is exactly what they did to customers. The responsible employees should be criminally charged for that.
It doesn't matter if it's in the websites interests. The client computer does not belong to them, and it's definitely not in the owner's interests to let others "audit" them just like it's not in web hosts interests to let us "audit" their nginx configs.
I think you're setting up a false dichotomy here: I believe strongly in client filtering and in empowering users to do whatever they need to do to flush out the junk that comes with the modern online experience. I do it on my own devices, though both browser extensions and a local DNS server. I'd even consider doing it with a root CA, if it came to that (but so far it hasn't).
When I say "audit," I mean in the sense that existing ecosystems like CT already provide automatic auditability of certificate issuance. We're not talking about a private company sleuthing through your computer; we're talking about a way to enforce the stated security model that most users expect when a connection is described as "encrypted."
IIRC comcast and friends used to intercept plaintext HTTP connections and add advertising to them, so I don’t see why you’d consider this scenario uncommon.
OCSP stapling should help in situations where the website provider wants to ensure clients are not being MITMed. Some SAAS providers are actively using it from experience (enterprise SSL inspection products play havok with it).
One thing I'd neglected to mention in the post is the sidecar uses a public DNS resolver to get the actual t.co link, but it's making the assumption that Go's stdlib enforces this: https://github.com/djhworld/theunwrapper/blob/main/unwrap/un... and doesn't fallback to the system one.
So there is that issue....I guess one way to mitigate it would be to run the sidecar out of the network, or at least have a clean DNS config and not have my custom CA in the root store...i.e. you'd want to be double sure you're going to the real thing and only accepting trusted certs signed by a trusted root.
Clever solution! The author keeps emphasizing how hacky and what a terrible idea this is, but it's really not provided they trust their own infrastructure. This is exactly how SSL decryption in corporate environments works as well - MITM traffic using a cert issued by your own CA.
I think I was more bracing myself for a deluge of disapproving comments, so just wanted to make sure to emphasise I know the drawbacks of the approach and understand the risks etc :)
Well, it seems you've been disappointed!
I thought it was odd you kept calling it "bad" and "awful". It's exactly as secure as any other certificate on the web. Arguably more, as you're aware of any access to the keys. The only differentiation is a commercial interest, nothing "bad" at all.
Simple, elegant. Love it.
I wonder if there would be some way to redirect kinda even if you weren't on your home network... You mention using a VPN? Which one?
For anyone who wants to check a specific link, https://wheregoes.com/ does a good job of tracing where a Twitter link (or any other redirected link) goes. I just tried it and it works on t.co links.
Friiter https://github.com/jonjomckay/fritter frontend for twitter automatically unwraps all t.co URLs and displays the original ones
Have you considered using services like https://wheregoes.com/ to fetch the final destination and navigate?
Author here, yeah I mention this at the start, there's quite a few of these link uncloakers.
The annoying part is having to copy the link, navigate to the website, paste in the link etc.
I was looking for something more "seamless" and works cross device (e.g. on phone, in the Twitter app etc) not just browsers. With this you just click the t.co link and the result is there instantly.
It's a dumb solution but was fun to write.
Got it. The process of pipelining and piecing it all together is interesting. Thank you for the post.
I have a messy python script that does something like this for a bunch of sites. I've been wanting to rewrite it into a browser extension someday. Doing it at DNS level hadn't occurred to me. The python script is able to clean up a lot of tracking parameters though, while leaving the base domain the same, as well as bypassing some redirectors like t.co.
Maybe I’m missing something, but it seems like this doesn’t need to be a DNS rewrite at all. Couldn’t you just set a proxy server for outgoing requests that passes through for ones that aren’t a redirect? Or is the trade off there performance (since you’re proxying every request instead of just certain DNS names)?
If you're already running a squid style proxy server, you'd need to MITM the TLS request regardless. If you're just doing it for a single site, the DNS solution is a lot more light weight.
If (hypothetically at least) the "go service" is running on the device, then if one controls DNS on the device, one can set the A record for the t.co domainname to the address of the go service. In that case, does one still need a "reverse proxy".
Yeah, I was running the service on a different port though, the proxy was just to work around that.
Why does service need to run on a different port, i.e., not 80 or 443. What about running the service on the loopback, 127.0.0.2 or whatever. I ask because I have run into similar problems.
Using nginx, caddy, apache etc means you can offload the SSL handling there and let the application just be an application. In the age of containers, setting up a reverse proxy to handle it is minimal effort.
I use forward proxy, not reverse proxy. It requires minimal effort. No containers needed.
Not a caddy user, but I believe caddy, with a plugin, can function as a forward proxy.
> I only really noticed this properly when my DNS sinkholing server (Adguard home) started blocking t.co links and I was getting an error when say, clicking a linked news article
Mission accomplished!
Awesome writeup! It's short, but I've learned a lot.
There is the FastForward browser extension if you don't want to compromise your device's security with self-signed certificates.
What is this trying to protect from? You see the actual URL in the tweet. Are you worried about redirects, if so why?
You don't always see the URL, sometimes it's truncated or just the t.co link, it's not very consistent.
Additionally if I want to share a URL with friends it's usually polluted with all sorts of nonsense query parameters, this tool strips them and gives you a nice clean one.
As for the tracking element yeah, the link shortening services will still see my IP connecting to the service, doing a head request, just with a different user agent (whatever the one go's stdlib sets). One extra layer could be to move the go service out of the network, but tbh I didn't start this project with these considerations in mind
Your clicks being logged by Twitter.
It’s not like the original URL is somehow encoded in the short URL, so this bot still has to request from t.co (and everywhere in between) to get the target.
The proxy is resolving all links, therefore denying twitter the information on which links you actually click (see also ad nauseum).
?
It only resolves the links navigated to, right? It shows you the interstitial page, but only when you try to navigate to the link.
It does enable you to get the clean link and share that.
And google, facebook, reddit, everyone. I suppose it is another thing to push back against, though.
Will this show up in Certificate Transparency logs?
Unlikely as the logs are appended to by CAs when they issue certificates.
Browsers could in theory contribute data but the infrastructure to support that would likely to be orders of magnitude bigger. I'm scared even thinking about going down that rabbit hole due to the expectation of what would be found (MITM evidence). :_(
I thought browsers did contribute. Otherwise any CA could issue as many certificates as they like and simply not report them. Is that really possible?
> I run my own self-signed Certificate Authority (CA) and my reverse proxy uses certs signed by this CA
There's no need to track certificates from private CAs. In order for your browser to trust the certificates you need to manually trust the CA (or your system admin will configure it). Everyone else will see it as invalid.
[flagged]
Author here, thanks for reading.
Just quit twitter and it's all good huh? Why is everyone not leaving Twitter?
It’s not just Twitter. Maybe this tool could be made to uncloack a list of known url shorteners.
It actually does work this way, you just need to setup your DNS for the other shorteners too :)
List of supported ones is here: https://github.com/djhworld/theunwrapper/blob/main/config/un...