Show HN: I wrote a new BitTorrent tracker in Elixir

github.com

379 points by dahrkael a day ago

Hello everyone!

I'm currently in a journey to learn and improve my Elixir and Go skills (my daily job uses C++) and looking through my backlog for projects to take on I decided Elixir is the perfect language to write a highly-parallel BitTorrent tracker. So I have spent my free time these last 3 months writing one! Now I think it has enough features to present it to the world (and a docker image to give it a quick try).

I know some people see trackers as relics of the past now that DHT and PEX are common but I think they still serve a purpose in today's Internet (purely talking about public trackers). That said there is not a lot going on in terms of new developments since everyone just throws opentracker in a vps a calls it a day (honorable exceptions: aquatic and torrust).

I plan to continue development for the foreseeable future and add some (optional) esoteric features along the way so if anyone currently operates a tracker please give a try and enjoy the lack of crashes.

note: only swarm_printout.ex has been vibe coded, the rest has all been written by hand.

nesarkvechnep a day ago

I really wished to see an OTP-first design. Unfortunately for me, the code is almost procedural as it's touching ETS or Application, which is built on ETS, in nearly every operation.

If the author wishes to learn how to design services in Elixir, or any BEAM language, with OTP, they can take a look at "Designing Elixir Systems with OTP" by by James Edward Gray and Bruce Tate, and "Functional Web Development with Elixir, OTP, and Phoenix" by Lance Halvorsen.

  • dahrkael a day ago

    On my first try I did write it in a more OTP-y style but the scaling potential for this very specific flow is just not the same. In the end a torrent tracker is just a specialized database and handling the data as fast as possible is the top objective. That said I'll give the books a go.

  • salviati 21 hours ago

    If, like me, you don't know what OTP means in this context, here it is:

    OTP stands for Open Telecom Platform, although it's not that much about telecom anymore (it's more about software that has the property of telecom applications, but yeah.) If half of Erlang's greatness comes from its concurrency and distribution and the other half comes from its error handling capabilities, then the OTP framework is the third half of it.

    https://learnyousomeerlang.com/what-is-otp

  • andyleclair 6 hours ago

    Building a BitTorrent tracker out of GenServers makes zero sense. ETS is the correct choice to make here 100% (for a single server configuration). Obviously it doesn't scale horizontally but it will scale vertically to the moon and chances are, yagni anyway

  • Zarathu 16 hours ago

    Out of curiosity, what would an "OTP-first" design look like?

    ETS is built into OTP, so how is using ETS not "OTP-first"? What's wrong with using ETS? It's just an in-memory store.

    I looked through the code and didn't find it to be anywhere close to procedural in style.

b0a04gl 16 hours ago

learned elixir in a week for an interview. didn’t clear it, but that week changed how i write code. understood state isolation for the first time. no shared data. fail and restart clean. pattern matching everywhere. structs over classes. pipes for everything. after that, i started writing code topdown. move sideeffects out. keep logic close to the data. elixir kinda rewired that for me.

after seeing this i saw that same mindset. not flashing any big genservers. simplified with fast procs, raw ETS tables. simple flow, but still fault aware. still clean.

voicedYoda a day ago

Well done. Couple quick notes, move to a logger instead of using IO.puts. Also consider adding OTel.

s-mon a day ago

Love Elixir so much, building a kick-ass notification engine with it now. Its so so good.

  • mikehostetler a day ago

    nice, private or OSS? Elixir needs a better notification engine badly

    • rhgraysonii a day ago

      What do you mean exactly? If you need a notification engine, reaching for a pubsub implementation is very easy with phoenix’s popularity and quite battle tested. I’ve implemented notifications at scale a few times in the ecosystem. What problems are you encountering that you don’t feel you have a tool in the shed to work with in this case?

abrookewood a day ago

Hey congrats on the launch! Can you provide any details on how it runs compared to opentracker? I'm really interested in the performance etc.

  • dahrkael a day ago

    For small trackers opentracker is probably faster and use a bit less memory. Where extracker is gonna shine compared to it is when core count starts having 2 digits. I still have to do a proper benchmark though.

arch-choot a day ago

Interesting! I'd done something similar in Typescript to learn more about BT, and then redid it in rust to learn rust (https://github.com/ckcr4lyf/kiryuu).

However I decided to just use redis as the DB. It sounds like your entire DB is in memory? Any interesting design decisions you made and/or problems faced in doing so?

(My redis solution isn't great since it does not randomize peers in subsequent announces afaik)

  • dahrkael a day ago

    in my case using the in-memory ETS has been the best decision, it lets me read&write the peer's data concurrently each on its own process so contention and latency are minimal. the only sequential part is when a new swarm is initially created but that doesn't happen a lot so its fine. there's sadly no native support for taking random rows directly from the tables, so for now i grab the whole swarm and then take a random subset (https://github.com/Dahrkael/ExTracker/blob/master/lib/ex_tra...)

    • toast0 15 hours ago

      I don't remember if there's a way to see how many slots an ets table has, but if you're ok with imperfect distribution, you could maybe pick a slot at random and use ets:slot/2 to get all the items in that slot, then select from those.

      You might be able to get the slot count from eta:table_info(Table, stats), although that's not intended for production use, so the format may change without notice.

TheJoeMan 20 hours ago

Could you please clarify what DHT and PEX are? I'm having trouble searching "tracker PEX".

  • atmosx 19 hours ago

    DHT (Distributed Hash Table) and PEX (Peer Exchange) let torrent clients find peers without centralised trackers. Hence, you don't need a central place / public tracker anymore

    • Thaxll 18 hours ago

      You still need a central server though...

      • perching_aix 17 hours ago

        Yes, they just don't track the individual torrents anymore. They only play a role during the initial peer discovery stage (bootstrapping). Peers find torrent swarms on their own, the bootstrap servers are excluded from all that.

      • LtdJorge 17 hours ago

        If you are connected to the DHT network, you don’t. Unless you mean for DNS and such.

        • perching_aix 17 hours ago

          No, they mean specifically that in order to connect to such a network, you need to hit some specific central nodes first.

          • LtdJorge 17 hours ago

            The bootstrap nodes. But those don't _need_ to be centralized, even if they have historically been to some degree. There coul be millions.

            • perching_aix 6 hours ago

              You still need to have a list of a subset of them, and you just have to trust they connect you to the "same network", to the extent that even makes sense, no?

            • immibis 17 hours ago

              If there are millions of bootstrap nodes, how do you find them?

              • toast0 15 hours ago

                If they listen on a well known port, and there are millions, send out a few thousand probes to 'random' IPv4 addresses and you'll most likely find one.

                If you get and keep a list of bootstrap nodes when you find one, then you can random select from the bootstrap addresses rather than all routable IPv4 addresses.

                • immibis 13 hours ago

                  What's your IPv6 plan?

                  • toast0 12 hours ago

                    Probing IPv6 would be pretty difficult. You could make some assumptions that might hold, like assume only the lowest /64 out of a /48 is used and ::0 or ::1 for the end of the address. Likely you'll still need too many probes to be feasible.

                    You'd need to probe v4 space through NAT64 and exchange v6 addresses after that, or include a cache of viable v6 addresses with clients. That gets you close to centralized service again, because how do you get the viable addresses to distribute with the client? Probably by running a supernode and dumping the list of supernodes into the client source every so often; but starting off with just that node listed.

              • LtdJorge 13 hours ago

                You can embed them in the torrent, same as the tracker. But the tracker has to handle all the state, while a bootstrap node just hooks you up with a few (or one) node in the DHT network. After that, it's fully decentralized.

                There could other ways, embedding them on SRV DNS records, etc. It's the same issue as with getting a DNS server. You could in theory get addresses of bootstrap nodes from your ISP through DHCP (lol, sure).

vivzkestrel a day ago

- how did you start - did you refer to other projects - how long did it take - how much functionality do you think works compared to say qbittorrent?

  • dahrkael a day ago

    I started because I needed a tracker for another project but the tracker turned to be more fun to make. I did glance over other trackers code but their code tends to be either overly complex or too simple so not very useful. So far its been 3 months of revenge bedtime procrastination. While this is not a client like qbittorrent I have ideas for a seedbox-oriented client project in the future.

  • lionkor a day ago

    it's a tracker, not a torrenting client.

    • NooneAtAll3 a day ago

      what does tracker mean?

      • devoutsalsa a day ago

        A torrent tracker is basically the world’s most antisocial matchmaking service that knows who has what files but refuses to actually store anything itself, like that friend who always knows where the party is but never hosts one. When your BitTorrent client asks “hey who’s got that Linux ISO,” the tracker dumps a list of IP addresses faster than a startup pivoting after their Series A falls through. Your client then connects to these strangers (seeders with complete files and leechers still downloading) and starts exchanging data while the tracker pretends nothing happened. It’s like Tinder but for file sharing, except everyone’s anonymous and probably downloading something weird at 3am.

        • vjerancrnjak a day ago

          not anonymous at all, while interacting with the tracker can be done with https, all of the communication between peers is unencrypted.

          • immibis a day ago

            There's an optional encryption extension, with no BEP because the BitTorrent company (which issues BEPs) is ideologically opposed to encryption.

jhgg a day ago

Really cool! You looking to write Elixir as your main job?

  • dahrkael a day ago

    It's one of my options yes, I'm sure I would enjoy it more than C++.

  • pdimitar a day ago

    Not OP but I am working with Elixir for 9 years and 2 months now. Know Rust and Golang as well. You hiring?

andyleclair 6 hours ago

This is really good, my dude. I took a peruse through the code and I'm definitely impressed, this is like, code I would expect my senior engineers with elixir experience to turn in. Great job!

KomoD a day ago

I tried it, couldn't get HTTPS to work.

Also my console gets spammed with:

04:43:20.160 [warning] invalid 'event' parameter: size: 6 value: "paused"

but it seems to work. I would've liked to see HTTP stats too but I guess UDP is fine (though I have it disabled)

  • bill876 a day ago

    The "paused" event is part of BEP 21. Clients send it to the tracker to let it know that the client is still incomplete, but won't download anymore. For example, because a user only wants some files from the torrent. Readme of the project shows that support for BEP 21 is not implemented.

    • KomoD 21 hours ago

      > Readme of the project shows that support for BEP 21 is not implemented.

      Ah, missed that.

  • dahrkael a day ago

    Telemetry for the HTTP side is in my ToDo list yes, since I'm using a 3rd party library for the webserver I still need to figure out how to do it right. For HTTPS to work you need to provide a valid certificate path in :https_keyfile but right now I would recommend sticking Caddy or Nginx in front of the tracker if you want HTTPS. I have certbot integration planned but is not a priority since most of the torrent peers use UDP.

IlikeKitties a day ago

Now that's neat. The Beam VM sounds like a natural fit for a torrent tracker

  • dahrkael a day ago

    I feel like ETS has been the real killer feature to pull this, being able to concurrently read and write from protected tables makes the whole thing incredibly parallel

bavell a day ago

Very cool! Is this suitable for using as a private tracker?

  • dahrkael a day ago

    not out of the box but it can be done. All the required moving parts are there (hash whitelist support, udp path parsing, peer rejection, etc).

desireco42 a day ago

Now this is serious business, congrats on the project! I can see how this is perfect fit for elixir...

guywithahat a day ago

There's something about C++ developers that makes them love Go and Elixir (and I include myself in this demographic). I think it's something about the people who are attracted to C++ for performance are attracted to Go/Elixir for its multithreaded performance. Really cool project

  • uncircle a day ago

    Not sure about C++ devs, but Erlang/Elixir are great to handle parsing of protocols, with its implementation of pattern matching. Also, makes the code much cleaner because pattern matching basically eliminates most branching and thus depth of the code base.

    The let it crash philosophy allows you to ignore most corner cases with the knowledge that, if they are encountered or a cosmic ray flips a bit, the crash is localised to a single client. I have worked with Elixir almost a decade at this point, and I have never seen an unexpected downtime of the apps I deployed. Aside of maintenance and updates, they all have 100% uptime. How cool is that?

    This is how I sell it to clients. “Will you be using Python, Go?” Me: “What about Elixir and the promise that your service won’t ever crash? And you get cool dashboards with it.” Them: “Sold.”

    I wish there was a systems language that allows you to pattern match on structs and enums, and in function signatures like Elixir

    • dahrkael a day ago

      Indeed. when your daily job is tracking down memory stomps, deadlocks, invalid pointers and unexpected state in very big codebases then using Elixir feels like "why is this so easy? it just works?". Also i'm a network programmer so the binary pattern matching is very much appreciated.

    • Thaxll 21 hours ago

      "The let it crash philosophy allows you to ignore most corner cases"

      This is such a dangerous take. Also Elixir is not strongly typed, so...

      • ricketycricket 20 hours ago

        It's not though. Processes can be supervised and crashes can just lead to "restart with good state" behavior. It's not that you don't try handling any errors at all, you just can be confident that anything you missed won't bring the system down.

        And Elixir is strongly typed by most definitions. Perhaps you mean static?

        • immibis 17 hours ago

          You can be more confident. But remember that time an Ericsson switch crashed upon handling a message that it sends to adjacent switches every time it restarts? That crashed the whole network, and you could still do that in Erlang.

eatbitseveryday 20 hours ago

Trackers are not relics - they're used exclusively in private tracker websites. Public-access torrents would more commonly use DHT and PEX for discovery.

  • dewey 14 hours ago

    But also private trackers are far from popular these days. Even if I’m a heavy user, but I know it’s a niche.