How realistic is it for the Trifecta Tech implementation to start displacing the "official" implementation used by linux distros, which hasn't seen an upstream release since 2019?
Fedora recently swapped the original Adler zlib implementation with zlib-ng, so that sort of thing isn't impossible. You just need to provide a C ABI compatible with the original one.
If it hasn't seen an upstream release since 2019, doesn't that mean the implementation is just... finished?
Maybe there's no more bugs to fix and features to add. And in that case, I don't see what's wrong with it.
Isn't 10-15% faster compression, and 5-10% faster decompression, a very nice "feature"?
> [...] doesn't that mean the implementation is just... finished?
I don't think that it _necessarily_ means that, e.g. all projects that haven't had a release since 2019 aren't finished? Probably most of them are simply abandoned?
On the other hand, a finished implementation is certainly a _possible_ explanation for why there have been no releases.
In this specific case, there are a handful of open bugs on their issue tracker. So that would indicate that the project isn't finished.
"Isn't 10-15% faster compression, and 5-10% faster decompression, a very nice «feature»?"
Although much code can be optimized to get it to run 10-15% faster, if that comes at the expense of legibility then such "feature" get rejected nowadays. Translating an existing codebase into a language that makes things more difficult¹ and (because of that) has (and most likely will have) fewer engineers willing to working in it looks very much akin to applying legibility-affecting optimizations to me.
> Although much code can be optimized to get it to run 10-15% faster, if that comes at the expense of legibility then such "feature" get rejected nowadays.
Makes sense, and I'd probably make the same call if I was a maintainer and someone submitted a patch which increased performance at the cost of maintainability...
> Translating an existing codebase into a language that makes things more difficult¹ and (because of that) has (and most likely will have) fewer engineers willing to working in it looks very much akin to applying legibility-affecting optimizations to me.
Here I have to personally disagree. I think that Rust is easier than both C and C++. Especially when coming into an already existing project.
The chance of me contributing to a Rust project is higher than to a C project, because I feel more comfortable in knowing that my code is "correct". It's also easier to match the "style" since most Rust projects follow the same structures and patterns, whereas in C it can vary wildly.
E.g. I contributed my first feature to Valkey (Redis fork, C codebase) recently, and figuring out how the tests worked took me quite some time. In fact, in the end I couldn't figure out how to run a specific test so I just ran all tests and grepped the output for my test name. But the tests take tens of minutes to run so this was sub-optimal. On the other hand, 99% of all Rust projects use `cargo test`, and to run a single test I can just click the play button in my editor (Zed) that shows up next to the test (or `cargo test "testname"`).
(with this said, I think that Valkey is a really well structured code base! Great work on their part)
Anyhow, this is just to illustrate my experience. I'm sure that for someone more used to C or C++ they would be more productive in that. And I could go on for ages on all the features that make me miss Rust every day at work when I have to work in other languages, especially algebraic data types!
1) This is a cool project and I wish them success. It would be really cool if these became the default utilities some day soon.
2) I think the MIT license was a mistake. These are often cloning GNU utilities, so referencing GNU source in its original language and then re-implementing it in Rust would be the obvious thing to do. But porting GPL-licensed code to an MIT licensed project is not allowed. Instead, the utilities must be re-implemented from scratch, which seems like a waste of effort. I would be interested in doing the work of porting GNU source to Rust, but I'm not interested in re-writing them all from scratch, so I haven't contributed to this project.
"Seems like a waste of effort" in a vacuum yes, but
1 - GNU utilities is ancient crufty #IFDEF'd C that's been in maintenance mode for decades. You want code to handle quirks of Tru64 and Ultrix? You got it.
2 - Waving your hands around 'the community will take care of it' is magical thinking. C developers don't grow on trees. C tooling is kinda weird and doesn't resemble anything modern - good luck finding enough VOLUNTEER C developers to make your goals happen.
I hadn't heard of tokei before, so I tried it on a small project of mine.
Tokei _finishes_ before cloc can print its help text. I wrote this post in less time than it took `cloc .` to count all the files in my project, probably because it doesn't know to ignore `target/`.
I absolutely hate it when people call their tools a "replacement" for something that is part of core standards, something, that did just fine for decades.
ripgrep is an excellent tool. But it's not a grep replacement. And should not ever be.
The GNU utils were a replacement for the BSD utils which were a replacement for the original AT&T utils. Every replacement added new functionality and improvements, and every time someone complained that they didn't stick closer to the thing they replaced. Looking specifically at grep, there used to be new versions like egrep and fgrep that added functionalities beyond standard grep's, but those were eventually pulled into "standard" grep (GNU or BSD). If we stuck with standards we'd all still be using the Bourne shell. The GNU utilities have been around long enough that they feel like the standard now, but I'm glad that we're coming into a new phase of innovation in command-line utilities. And this didn't start with Rust - the new generation of search utilities started with ack (Perl) and then ag (C).
Please forgive me my ignorance but what's wrong with bash? I'm still using it on all servers and workstations, I constantly write scripts for it, some fairly complex. It's not an obsolete project and it looks like a mainstream shell for me. Am I wrong?
Update: yeah, I realize now that this was about the original Bourne Shell, not bash.
Bash is not Bourne, and that's the point. Bash is the Bourne Again Shell, a shell written to improve and replace the Bourne shell in the GNU ecosystem. Modern bash is a huge improvement over the original Bourne shell and I'm convinced you use bash only feature basically every day, and would be very annoyed if someone forced you to use the actual Bourne shell
Ah, right! I do remember the original Bourne Shell, though. I wouldn't like to get back to using it. Though I might agree provided I get as many years of my age back.
Bash isn't the Bourne shell (sh)! It's a replacement (Bourne Again Shell). But it's interesting that the replacement has become so entrenched that folk assume that it was the original.
Maybe because the speed up is easier to attain in a language where you aren't constantly worrying about introducing bugs? Maybe development is easier in a language with more modern tooling?
Interoperability runs both ways, everyone currently taking a dependency on the C library can swap in the rust library in its place and see the same benefits
Why the hate? It's a genuine question. When you rewrite something, you need to justify the effort somehow. The GNU coreutils started out as "the BSD utilities, but with the GPL!".
Because the reimplementation authors skip all the complexities of designing the tool in the first place while getting right to the fun part (which is coding), and then they get to call themselves authors of a well known infrastructure tool.
Compare "I have typed a setuid() wrapper in rust" vs "I'm the author of sudo-rs".
I didn't call ripgrep a replacement. Other people do. Because it does actually replace their usage of grep in some or all cases, depending on their usage patterns.
Those "core standards" that you talk about didn't spring fully formed from the earth. They came about from competition and beating out and replacing the old "core standards" that lots of people argued very strongly for should not ever be replaced. When I was starting out my career I was told by experienced people that I should not learn to rely on the GNU tool features, since they're far from ubiquitous and probably won't be installed on most systems I'll be working on.
That's right and still true: GNU tools are still not ubiquitous on mainstream computers. And I'm not talking about that Ultrix box still churning. For example the latest macOS carries bsd tar, so.
I briefly looked a this and there's already cargo-c configuration, which is good, but it's currently namespaced differently, so it won't get automatically detected by C programs as `libbz2`:
I'm not familiar enough with the symbols of bzip2 to say anything about ABI compatibility.
I have a toy project to explore things like that, but it's difficult to set aside the amount of time needed to maintain an implementation of the GNU operating system. I would welcome pull requests though:
The commenters below are confusing two things - Rust binaries can be dynamically linked, but because Rust doesn’t have a stable ABI you can’t do this across compiler versions the way you would with C. So in practice, everything is statically linked.
Rust's stable ABI is the C ABI. So you absolutely can dynamically link a Rust-written binary and/or a Rust-written shared library, but the interface has to be pure C. (This also gives you free FFI to most other programming languages.) You can use lightweight statically-linked wrappers to convert between Rust and C interfaces on either side and preserve some practical safety.
> but the interface has to be pure C. (This also gives you free FFI to most other programming languages.)
Easy, not free. In many languages, extra work is needed to provide a C interface. Strings may have to be converted to zero terminated byte arrays, memory that can be garbage collected may have to be locked, structs may mean having to be converted to C struct layout, etc.
A culture isse, as in the C++ world, of Apple and Microsoft ecosystems, shipping binary C++ libraries is a common business, even it is compiler version dependent.
This is why Apple made such a big point of having a better ABI approach on Swift, after their experience with C++ and Objective-C.
While on Microsoft side, you will notice that all talks from Victor Ciura on Rust conferences have dealing with ABI as one of the key points Microsoft is dealing with in the context of Rust adoption.
Static linking doesn't produce smaller binaries. You are literally adding the symbols from a library into your executable rather than simply mentioning them and letting the dynamic linker figure out how to map those symbols at runtime.
The sum size of a dynamic binary plus the dynamic libraries may be larger than one static linked binary, but whether that holds for more static binaries (2, 3, or 100s) depends on the surface area your application uses of those libraries. It's relatively common to see certain large libraries only dynamically linked, with the build going to great lengths to build certain libraries as shared objects with the executables linking them using a location-relative RPATH (using the $ORIGIN feature) to avoid the extra binary size bloat over large sets of binaries.
Static linking does produce smaller binaries when you bundle dependencies. You're conflating two things - static vs dynamic linking, and bundled vs shared dependencies.
They are often conflated because you can't have shared dependencies with static linking, and bundling dynamically linked libraries is uncommon in FOSS Linux software. It's very common on Windows or with commercial software on Linux though.
You know how the page cache works? Static linking makes it not work. So 3000 processes won't share the same pages for the libc but will have to load it 3000 times.
Kind of off-topic. But yeah it's a good idea for operating systems to guarantee the provision of very commonly used libraries (libc for example) so that they can be shared.
Mac does this, and Windows pretty much does it too. There was an attempt to do this on Linux with the Linux Standard Base, but it never really worked and they gave up years ago. So on Linux if you want a truly portable application you can pretty much only rely on the system providing very old versions of glibc.
It's hardly a fair comparison with old linux distros when osx certainly will not run anything old… remember they dropped rosetta, rosetta2, 32bit support, opengl… (list continues).
And I don't think you can expect windows xp to run binaries for windows 11 either.
So I don't understand why you think this is perfectly reasonable to expect on linux, when no other OS has ever supported it.
I wonder what happens in the minds of people who just flatly contradict reality. Are they expecting others to go "OK, I guess you must be correct and the universe is wrong"? Are they just trying to devalue the entire concept of truth?
[In case anybody is confused by your utterance, yes of course this works in Rust]
That would have been a good post if you'd stopped at the first paragraph.
Your second paragraph is either a meaningless observation on the difference between static and dynamic linking or also incorrect. Not sure what your intent was.
Go may or may not do that on Linux depending what you import. If you call things from `os/user` for example, you'll get a dynamically linked binary unless you build with `-tags osusergo`. A similar case exists for `net`.
Static linking produces huge binaries, it lets you do LTO but the amount of optimisation you can actually do is limited by your RAM. Static linking also causes the entire archive to need constant rebuilds.
You don't need LTO to trim static binaries (though LTO will do it), `-ffunction-sections -fdata-sections` in compiler flags combined with `--gc-section` (or equivalent) in linker flags will do it.
This way you can get small binaries with readable assembly.
COM is actually good though. Or if you want another object system, you can go with GObject, which works fine with Rust, C-+, Python, JavaScript, and tons of other things.
Plenty do not, especially on Apple and Microsoft platforms because they always favoured other approaches to bare bones UNIX support on their dynamic linkers, and C++ compilers.
Rust cannot dynamic link to rust. It can dynamic link to C and be dynamicly linked by C - if you combine the two you can cheat but it is still C that you are dealing with not rust even if rust is on both sides.
Rust can absolutely link to Rust libraries dynamically. There is no stable ABI, so it has to be the same compiler version, but it will still be dynamically linked.
You can use dynamic linking in Rust with C ABI. Which means going through `unsafe` keyword - also known as 'trust me bro'. Static linking directly to Rust source means it is checked by compiler so there is no need for unsafe.
ripgrep is one of the best grep replacement you can find, maybe even the best, and also one of the most famous Rust projects.
I don't know of a sed equivalent, but I guess that would be easy to implement as Rust has good regex support (see ripgrep), and 90%+ of sed usage is search-and-replace. The other commands don't look hard to implement and because they are not used as much, optimizing these is less of a priority.
I don't know about awk, it is a full programming language, but I guess it is far from an impossible task to implement.
Now the real hard part is making a true, bug-for-bug compatible replacement of the GNU version of these tools, but while good to have, it is not strictly necessary. For example, Busybox is very popular, maybe even more so than GNU in terms of number of devices, and it has its own (likely simplified) version of grep, sed and awk.
What's the reason for using bz2 here? Wouldn't it be faster to do a one off conversion to zstd? It beats bzip2 in every metric at higher compression levels as far as I know.
bzip2 (particularly parallel implementations thereof) are already relatively competitive for compression. The decompression time is where it lags behind because lz77 based algorithms can be incredibly fast at decompression.
There's certainly a contrast between the "Oops a huge file causes a runtime failure" reported for that crate and a bunch of "Oops we have bounds misses" in C. I wonder how hard anybody worked on trying to exploit the bounds misses to get code execution. It may or may not be impossible to achieve that escalation.
But it does apply to the bzip2 crate, which is the topic of discussion. Its new pure-rust implementation is libbz2-rs-sys, not bzip2-rs. The last sentence is irrelevant.
i'd be curious if they're using the same llvm codegen (with the same optimization) backend for the c and rust versions. if so, where the speedups are coming from?
(ie, is it some kind of rust auto-simd thing, did they use the opportunity to hand optimize other parts or is it making use of newer optimized libraries, or... other)
Yeah, that was Brian Cantrill's realization when for the sake of learning he rewrote a part of dtrace in Rust and was shocked when he saw his naive reimplementation being significantly faster than his original code, and the answer boiled down to “I used a BTreeMap" in Rust because it's in std”.
hmm.. i wonder how it would compare then with clang+linux, clang+stl or hotspot+j2ee.
reminds me a bit of the days when perl programs would often outrun native c/c++ for common tasks because ultimately they had the most efficient string processing libraries baked into the language built-ins.
how is space efficiency? last i checked, because of big libraries and ease of adding them to projects, a lot of rust binaries tend to be much larger than their traditional counterparts. how might this impact overall system performance if this trade-off is made en-masse? (even if more ram is added to counteract loss of vm page cache, does it also start to impact locality and cache utilitization?)
i'd be curious how something like redox benchmarks against traditional linux for real world workloads and interactivity measures.
pretty cool! in isolation looks awesome! i'm still a little curious about the impacts increased executable image size, especially in a complete system.
if all the binaries are big, does it start to crowd out cache space? does static linking make sense for full systems?
C is honestly a pretty bad language for writing modern high performance code. Between C99 and C21, there was a ~20 year gap where the language just didn't add features needed to idiomatically target lots of the new instructions added (without inline asm). Just getting good abstract machine instructions for clz/popcnt/clmul/pdep etc helps a lot for writing this kind of code.
Popcount, clz, and ctz are provided as nonstandard functions in GCC (and clang might also support them in GNU mode, but I don't know for sure). PDEP and PEXT do not seem to be, but I think they should be (and PEXT is something that INTERCAL already had, anyways) (although PDEP and PEXP can be used with -mbmi2 on x86, but are not available for general use). The MOR and MXOR of MMIX are also something that I would want to be available as built-in functions.
I hope they or Prossimo will also look and reimplement in the similar fashion the core Internet protocols - BGP, OSPF and RIP, other routing implementations, DNS servers, and so on.
About not having perf on macOS: you can get quite far with dtrace for profiling. That’s what the original flame graph script in Perl mentions using and what the flame graph Rust reimplementation also uses. It does not have some metrics like cache misses or micro instructions retired but still it can be very useful.
Does anyone know if it supports parallel decompression, lbzip2-style? (or just iterators doing pre-scanning for the block magic that allow doing parallel decompression on top).
I like Rust and have an ambition to learn it as well (I've had a few false starts...). One of my issues that I have is that every (slight exaggeration) library that I seem to come across is still at version 0.x.y. Take this library as an example. 0.1.0 was released in 2014 and it still hasn't had a 1.0.0 release, is there an aversion to get to 1.0.0 in the rust community?
Serious answer: For some, they do change semi-often and don't feel compelled to declare stability. In other cases, it's a stable + widely used 0.x package, and bumping it to 1.0 usually implies _some_ kind of breaking change. (I don't know if that _should_ be the case, but I know that if I see a dependency has bumped from 0.x to 1.0 I'm going to be cautious and wait to update it until I have more time).
In general: People usually aren't too concerned about it.
This list's Zig as an entry, despite the Zig project having very clear plans[0] for a 1.0 release. That's not 0ver, it's just the beta stage of semver.
Yes, in rust, the package manager has built in rules about when to update a package. It won’t auto update a major version change because it implies a change that breaks something. As long as your package is safe to auto update you don’t want to change the major version number.
1. The uutils project didn’t also make all locales cases for sort faster even though the majority of people will be using UTF-8, C or POSIX where it is indeed faster
2. There’s a lot of debating about different test cases which is a never ending quibble with sorting routines (go look at some of the cutting edge sort algorithm development).
This complaint is hyperfocusing on 1 of the many utilities they claim they’re faster on and quibbling about what to me are important but ultimately minor critiques. I really don’t see the debacle.
As for the license, that’s more your opinion. Rust as a language generally has dual licensed their code as MIT and Apache2 and most open source projects follow this tradition. I don’t see the conspiracy that you do. And just so I’m clear, the corporation your criticizing here as the amorphous evil entity funding this is Ubuntu right?
>1. The uutils project didn’t also make all locales cases for sort faster even though the majority of people will be using UTF-8, C or POSIX where it is indeed faster
locale != encoding.
Try sort a phone book with tr_TR.UTF-8 vs en_US.UTF-8
You should of course verify these results in your scenario. However, I somewhat doubt that the person exists who cares greatly about performance, and is still willing to consider bzip2. There isn't a point anywhere in the design space where bzip2 beats zstd. You can get smaller outputs from zstd in 1/20th the time for many common inputs, or you can spend the same amount of time and get a significantly smaller output, and zstd decompression is again 20-50x faster depending. So the speed of your bzip2 implementation hardly seems worth arguing over.
Without commenting on whether an LLM is the right approach, I don't think this task is particularly hard to audit. There is almost assuredly a huge test suite for bzip2 archives; fuzzing file formats is very easy; and you can restrict / audit the use of unsafe by the translator.
I suspect attempting to debug it would be a nightmare though. Given the LLM could hallucinate anything anywhere you’d likely waste a ton of time.
I suspect it would be faster to just try and write a new implementation based on the spec and debug that against the test suite. You’d likely be closer.
In fact, since they used c2rust, they had a perfectly working version from the start. From there they just had to clean up the Rust code and make sure it didn’t break anything. Clearly the best of the three options.
They kicked off the article saying that no one uses bzip2 anymore. A million cycles saved for something no one uses (according to them) is still 0% battery life saved.
If modern CPUs are so power efficient and have so many spare cycles to allocate to e.g. eye candy no one asked for, then no one is counting and the comparison is irrelevant.
It sounds like the main motivation for the conversion was to simplify builds and reduce the chance of security issues. Old parts of protocols that no one pays much attention to anymore does seem to be a common place where those pop up. The performance gain looks more like just a nice side effect of the rewrite, I imagine they were at most targeting performance parity.
The Wikipedia data dumps [0] are multistream bz2. This makes them relatively easy to partially ingest, and I'm happy to be able to remove the C dependency from the Rust code I have that deals with said dumps.
The same could be said of many things that, nonetheless, are still used by many, and will continue to be used by many for decades to come. A thing does not need to be best to justify someone wanting to make it a bit better.
“Best” is measured along a lot more axis than just performance. And you don’t always get to choose what format you use. It may be dictated to you by some 3rd party you can’t influence.
bzip2 is still pretty good if you want to optimize for:
- better compression ratio than gzip
- faster compression than many better-than-gzip competitors
- lower CPU/RAM usage for the same compression ratio/time
This is a niche, but it does crop up sometimes. The downside to bzip2 is that it is slow to decompress, but for write-heavy workloads, that doesn't matter too much.
So? If I need to consume a resource compressed using bz2, I'm not just going to sit around and wait for them to use zstd. I'm going to break out bz2. If I can use a modern rewrite that's faster, I'll take every advantage I can get.
You know it is just Wirth's law in action: "Software gets slower faster than hardware gets faster." [^1]
In fact Jevons Paradox: When technological progress increases the efficiency with which a resource is used, but the rate of consumption of that resource rises due to increasing demand - essentially, efficiency improvements can lead to increased consumption rather than the intended conservation. [^2][^3]
I think it goes deeper. There is a certain level of slowness that causes pain to users. When that level is hit, market forces cause attention to software efficiency.
Hardware efficiency just gives more room for software to bloat. The pain level is a human factor and stays the same.
So time to adapt Wirths law: Software gets slower >exactly as much< as hardware gets faster
It seems to me like binary file format parsing (and construction) is probably a good place for using languages that aren't as prone to buffer-overflows and the like. Especially if it's for a common format and the code might be used in all sorts of security-contexts.
Buffer overflows are more a library problem, not a language problem, though for newer ecosystems like Rust the distinction is kind of lost on people. But point being, if you rewrote bzip2 using an equivalent to std::Vec, you'd end up in the same place. Unfortunately, the norm among C developers, especially in the past, was to open code most buffer manipulation, so you wind up with 1000 manually written overflow checks, some of which are wrong or outright missing, as opposed to a single check in a shared implementation. Indeed, even that Rust code had an off-by-one (in "safe" code), it just wasn't considered a security issue because it would result in data corruption, not an overflow.
What Rust-the-language does offer is temporal safety (i.e. the borrow checker), and there's no easy way to get that in C.
Pretty incredible for such a short argument to be so inconsistent with itself. Complaining about counting CPU cycles and actually measuring performance because... modern software development is bad and doesn't care about performance?
you're just an end user, you don't have to maintain the suite.
In OSS every hour of volunteer time is precious Manna from heaven, flavored with unicorn tears. So any way to remove Toil and introduce automation is gold.
Rust's strict compiler and an appropriate test suite guarantees a level of correctness far beyond C. There's less onus on the reviewer to ensure everything still works as expected when reviewing a pull request.
> lot of this "rewrite X in Rust" stuff feels like
Indeed. You know the react-angular-vue nevermind is churn? It appears that the trend of people pushing stuff because it benefit their careers is coming to the low level world.
I for one still find it mistifying that Linus torvals let this people into the kernel. Linus, who famous banned c++ from the kernel not because of c++ in itself, but to ban c++ programmer culture.
It's a lot like X11 vs. Wayland. The current graphics developers, who trend younger, don't want to maintain the boomer-written C code in the X server. Too risky and time-consuming. So one of the goals of Wayland is to completely abolish X so it can be replaced with something more long-term maintainable. Turns out, current systems-level developers don't want to maintain boomer-written GNU code or any C code at all, really, for similar reasons. C is inherently problematic because even seasoned developers have trouble avoiding its footguns. So an unstated, but important, goal of Rust is to abolish all critical C code and replace it with Rust code. Ubuntu is on board with this.
Except Wayland was developed by the same people who worked for years on X. And they don't dislike X because of C. And they didn't write Wayland in Rust.
> Except Wayland was developed by the same people who worked for years on X.
Yes, and they hated it and "worked hard to kill it" per Jordan Petridis. Note that the maintainers of X in the Wayland era are not really the same people as the original authors of X.
They didn't just maintain it, they did years of work on it. And again, it was not because it was C. Its because it was literally millions of lines of C from 80s and early 90s and with a sub-optimal architecture.
And there is likely a reason the original people didn't continue to work on it.
You're not telling me anything new. And I'm not trying to claim that X is abandoned because it's in C, although convoluted C code does add to the maintenance difficulties.
I'm drawing an analogy between the X and Wayland situation and the ongoing effort to replace core userland code written in C with new code written in Rust.
So you say younger programmer have not the required coding kung fu to cope with c code? I hope you are wrong. The perspective to have rust like things on everydays devices realy frightens me. C is like a Lingua franca for computers. Nearly any hardware near person can READ it. I am one of this Boomers and i am not able to propper READ rust code, because the syntax is so academic. The fact that more and more code is written in rust, lessens the amount of people that can read programs
It's more like no programmer of any age has the required coding kung fu to cope with C code. It is inevitable that they will introduce problematic code. We have decades of examples illustrating this. We lived with it because there was no truly competitive alternative for so long.
I can read and write C code from the times when there weren't any competitive alternatives. I have no problem reading or writing Rust code. In fact it communicates more to me than C code does or can and I can immediately understand more about the code written in Rust than I can about code written in C.
How realistic is it for the Trifecta Tech implementation to start displacing the "official" implementation used by linux distros, which hasn't seen an upstream release since 2019?
Fedora recently swapped the original Adler zlib implementation with zlib-ng, so that sort of thing isn't impossible. You just need to provide a C ABI compatible with the original one.
If it hasn't seen an upstream release since 2019, doesn't that mean the implementation is just... finished? Maybe there's no more bugs to fix and features to add. And in that case, I don't see what's wrong with it.
Isn't 10-15% faster compression, and 5-10% faster decompression, a very nice "feature"?
> [...] doesn't that mean the implementation is just... finished?
I don't think that it _necessarily_ means that, e.g. all projects that haven't had a release since 2019 aren't finished? Probably most of them are simply abandoned?
On the other hand, a finished implementation is certainly a _possible_ explanation for why there have been no releases.
In this specific case, there are a handful of open bugs on their issue tracker. So that would indicate that the project isn't finished.
ref: https://sourceware.org/bugzilla/buglist.cgi?product=bzip2
"Isn't 10-15% faster compression, and 5-10% faster decompression, a very nice «feature»?"
Although much code can be optimized to get it to run 10-15% faster, if that comes at the expense of legibility then such "feature" get rejected nowadays. Translating an existing codebase into a language that makes things more difficult¹ and (because of that) has (and most likely will have) fewer engineers willing to working in it looks very much akin to applying legibility-affecting optimizations to me.
¹ ...and not only. I've already described in other comments some of the issues I see with it as a software development tool, here https://news.ycombinator.com/item?id=31565024 here https://news.ycombinator.com/item?id=33390634 and here https://news.ycombinator.com/item?id=42241516
> Although much code can be optimized to get it to run 10-15% faster, if that comes at the expense of legibility then such "feature" get rejected nowadays.
Makes sense, and I'd probably make the same call if I was a maintainer and someone submitted a patch which increased performance at the cost of maintainability...
> Translating an existing codebase into a language that makes things more difficult¹ and (because of that) has (and most likely will have) fewer engineers willing to working in it looks very much akin to applying legibility-affecting optimizations to me.
Here I have to personally disagree. I think that Rust is easier than both C and C++. Especially when coming into an already existing project.
The chance of me contributing to a Rust project is higher than to a C project, because I feel more comfortable in knowing that my code is "correct". It's also easier to match the "style" since most Rust projects follow the same structures and patterns, whereas in C it can vary wildly.
E.g. I contributed my first feature to Valkey (Redis fork, C codebase) recently, and figuring out how the tests worked took me quite some time. In fact, in the end I couldn't figure out how to run a specific test so I just ran all tests and grepped the output for my test name. But the tests take tens of minutes to run so this was sub-optimal. On the other hand, 99% of all Rust projects use `cargo test`, and to run a single test I can just click the play button in my editor (Zed) that shows up next to the test (or `cargo test "testname"`).
(with this said, I think that Valkey is a really well structured code base! Great work on their part)
Anyhow, this is just to illustrate my experience. I'm sure that for someone more used to C or C++ they would be more productive in that. And I could go on for ages on all the features that make me miss Rust every day at work when I have to work in other languages, especially algebraic data types!
This is where Rust is going to win out. The significantly larger standard library increases the number of legible improvements one can make.
Ubuntu is using Rust sudo so it's definitely possible.
It's not. At least not yet. It's planned for 25.10, but thankfully sudo will be packaged and available for a few versions after that as promised [1].
[1] https://discourse.ubuntu.com/t/adopting-sudo-rs-by-default-i...
They do provide a compatible C ABI. Someone "just" needs to do the work to make it happen.
I think that is the goal of uutils.
https://uutils.github.io/
1) This is a cool project and I wish them success. It would be really cool if these became the default utilities some day soon.
2) I think the MIT license was a mistake. These are often cloning GNU utilities, so referencing GNU source in its original language and then re-implementing it in Rust would be the obvious thing to do. But porting GPL-licensed code to an MIT licensed project is not allowed. Instead, the utilities must be re-implemented from scratch, which seems like a waste of effort. I would be interested in doing the work of porting GNU source to Rust, but I'm not interested in re-writing them all from scratch, so I haven't contributed to this project.
Plenty of people dislike the perceived bload in GNU utils - for them a rewrite from scratch is a feature, not a bug.
"Seems like a waste of effort" in a vacuum yes, but
1 - GNU utilities is ancient crufty #IFDEF'd C that's been in maintenance mode for decades. You want code to handle quirks of Tru64 and Ultrix? You got it.
2 - Waving your hands around 'the community will take care of it' is magical thinking. C developers don't grow on trees. C tooling is kinda weird and doesn't resemble anything modern - good luck finding enough VOLUNTEER C developers to make your goals happen.
What's the point of mentioning you suspect it is happening other than as a dig at them?
You're 100% right. I removed it.
I hope some are improved too.
The performance boost in tools like ripgrep and tokei is insane compared to the tools they replace (grep and cloc respectively).
I hadn't heard of tokei before, so I tried it on a small project of mine.
Tokei _finishes_ before cloc can print its help text. I wrote this post in less time than it took `cloc .` to count all the files in my project, probably because it doesn't know to ignore `target/`.
> Tokei _finishes_ before cloc can print its help text.
cloc is a Perl script, so it has the interpreter startup time.
I absolutely hate it when people call their tools a "replacement" for something that is part of core standards, something, that did just fine for decades.
ripgrep is an excellent tool. But it's not a grep replacement. And should not ever be.
The GNU utils were a replacement for the BSD utils which were a replacement for the original AT&T utils. Every replacement added new functionality and improvements, and every time someone complained that they didn't stick closer to the thing they replaced. Looking specifically at grep, there used to be new versions like egrep and fgrep that added functionalities beyond standard grep's, but those were eventually pulled into "standard" grep (GNU or BSD). If we stuck with standards we'd all still be using the Bourne shell. The GNU utilities have been around long enough that they feel like the standard now, but I'm glad that we're coming into a new phase of innovation in command-line utilities. And this didn't start with Rust - the new generation of search utilities started with ack (Perl) and then ag (C).
> we'd all still be using the Bourne shell
Please forgive me my ignorance but what's wrong with bash? I'm still using it on all servers and workstations, I constantly write scripts for it, some fairly complex. It's not an obsolete project and it looks like a mainstream shell for me. Am I wrong?
Update: yeah, I realize now that this was about the original Bourne Shell, not bash.
Bash is not Bourne, and that's the point. Bash is the Bourne Again Shell, a shell written to improve and replace the Bourne shell in the GNU ecosystem. Modern bash is a huge improvement over the original Bourne shell and I'm convinced you use bash only feature basically every day, and would be very annoyed if someone forced you to use the actual Bourne shell
Ah, right! I do remember the original Bourne Shell, though. I wouldn't like to get back to using it. Though I might agree provided I get as many years of my age back.
Bash isn't the Bourne shell (sh)! It's a replacement (Bourne Again Shell). But it's interesting that the replacement has become so entrenched that folk assume that it was the original.
Yeah, got it now lol
> which were a replacement for the
You have a point here. I have to agree.
"X but rewritten in Z" is a terrible marketing, though. Makes me instantly want to hate the tool and its authors. (Love rust. Hate the vibe).
It’s a rust crate designed to be a native rust replacement for a rust c wrapper crate. It’s faster and easier to link to in rust projects.
How would you even tell people you made a better rust crate without using the word “rust?”
Why not spend the efforts to speed up the real zlib? So that the whole world actually gets to spin a bit faster.
Rust folks are claiming excellent interoperability with C binaries. Why the need for a rewrite then?
Maybe because the speed up is easier to attain in a language where you aren't constantly worrying about introducing bugs? Maybe development is easier in a language with more modern tooling?
Interoperability runs both ways, everyone currently taking a dependency on the C library can swap in the rust library in its place and see the same benefits
you would say "rewritten FOR rust" instead of "rewritten IN rust".
It’s a rust crate that depended on c and is now literally “rewritten in rust”
> rust crate
Why the hate? It's a genuine question. When you rewrite something, you need to justify the effort somehow. The GNU coreutils started out as "the BSD utilities, but with the GPL!".
Because the reimplementation authors skip all the complexities of designing the tool in the first place while getting right to the fun part (which is coding), and then they get to call themselves authors of a well known infrastructure tool.
Compare "I have typed a setuid() wrapper in rust" vs "I'm the author of sudo-rs".
> If we stuck with standards we'd all still be using the Bourne shell
or Korn shell.
I didn't call ripgrep a replacement. Other people do. Because it does actually replace their usage of grep in some or all cases, depending on their usage patterns.
https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#can...
And should not ever be.
Those "core standards" that you talk about didn't spring fully formed from the earth. They came about from competition and beating out and replacing the old "core standards" that lots of people argued very strongly for should not ever be replaced. When I was starting out my career I was told by experienced people that I should not learn to rely on the GNU tool features, since they're far from ubiquitous and probably won't be installed on most systems I'll be working on.
That's right and still true: GNU tools are still not ubiquitous on mainstream computers. And I'm not talking about that Ultrix box still churning. For example the latest macOS carries bsd tar, so.
I briefly looked a this and there's already cargo-c configuration, which is good, but it's currently namespaced differently, so it won't get automatically detected by C programs as `libbz2`:
https://github.com/trifectatechfoundation/libbzip2-rs/blob/8...
I'm not familiar enough with the symbols of bzip2 to say anything about ABI compatibility.
I have a toy project to explore things like that, but it's difficult to set aside the amount of time needed to maintain an implementation of the GNU operating system. I would welcome pull requests though:
https://github.com/kpcyrd/platypos
> You just need to provide a C ABI compatible with the original one.
How does this interact with dynamic linking? Doesn't the current Rust toolchain mandate static linking?
The commenters below are confusing two things - Rust binaries can be dynamically linked, but because Rust doesn’t have a stable ABI you can’t do this across compiler versions the way you would with C. So in practice, everything is statically linked.
Rust's stable ABI is the C ABI. So you absolutely can dynamically link a Rust-written binary and/or a Rust-written shared library, but the interface has to be pure C. (This also gives you free FFI to most other programming languages.) You can use lightweight statically-linked wrappers to convert between Rust and C interfaces on either side and preserve some practical safety.
> but the interface has to be pure C. (This also gives you free FFI to most other programming languages.)
Easy, not free. In many languages, extra work is needed to provide a C interface. Strings may have to be converted to zero terminated byte arrays, memory that can be garbage collected may have to be locked, structs may mean having to be converted to C struct layout, etc.
Specifically, the rust dependencies are statically linked. It's extremely easy to dynamically link anything that has a C ABI from rust.
A culture isse, as in the C++ world, of Apple and Microsoft ecosystems, shipping binary C++ libraries is a common business, even it is compiler version dependent.
This is why Apple made such a big point of having a better ABI approach on Swift, after their experience with C++ and Objective-C.
While on Microsoft side, you will notice that all talks from Victor Ciura on Rust conferences have dealing with ABI as one of the key points Microsoft is dealing with in the context of Rust adoption.
Static linking also produces smaller binaries and lets you do link-time-optimisation.
Static linking doesn't produce smaller binaries. You are literally adding the symbols from a library into your executable rather than simply mentioning them and letting the dynamic linker figure out how to map those symbols at runtime.
The sum size of a dynamic binary plus the dynamic libraries may be larger than one static linked binary, but whether that holds for more static binaries (2, 3, or 100s) depends on the surface area your application uses of those libraries. It's relatively common to see certain large libraries only dynamically linked, with the build going to great lengths to build certain libraries as shared objects with the executables linking them using a location-relative RPATH (using the $ORIGIN feature) to avoid the extra binary size bloat over large sets of binaries.
Static linking does produce smaller binaries when you bundle dependencies. You're conflating two things - static vs dynamic linking, and bundled vs shared dependencies.
They are often conflated because you can't have shared dependencies with static linking, and bundling dynamically linked libraries is uncommon in FOSS Linux software. It's very common on Windows or with commercial software on Linux though.
You know how the page cache works? Static linking makes it not work. So 3000 processes won't share the same pages for the libc but will have to load it 3000 times.
Kind of off-topic. But yeah it's a good idea for operating systems to guarantee the provision of very commonly used libraries (libc for example) so that they can be shared.
Mac does this, and Windows pretty much does it too. There was an attempt to do this on Linux with the Linux Standard Base, but it never really worked and they gave up years ago. So on Linux if you want a truly portable application you can pretty much only rely on the system providing very old versions of glibc.
The standard library is the whole distro :)
It's hardly a fair comparison with old linux distros when osx certainly will not run anything old… remember they dropped rosetta, rosetta2, 32bit support, opengl… (list continues).
And I don't think you can expect windows xp to run binaries for windows 11 either.
So I don't understand why you think this is perfectly reasonable to expect on linux, when no other OS has ever supported it.
Care to explain?
You can still statically link all your own code but dynamically link libc/other system dependencies.
Not with rust…
I wonder what happens in the minds of people who just flatly contradict reality. Are they expecting others to go "OK, I guess you must be correct and the universe is wrong"? Are they just trying to devalue the entire concept of truth?
[In case anybody is confused by your utterance, yes of course this works in Rust]
Can you run ldd on any binary you currently have on your machine that is written in rust?
I eagerly await the results!
I mean, sure, but what's your point?
Here's nu, a shell in Rust:
And here's the Debian variant of ash, a shell in C:Well seems I was wrong about linking C libraries from rust.
The problem of increased RAM requirements and constant rebuilds are still very real, if only slightly less big because of dynamically linking C.
That would have been a good post if you'd stopped at the first paragraph.
Your second paragraph is either a meaningless observation on the difference between static and dynamic linking or also incorrect. Not sure what your intent was.
Why do facts offend you?
I’m genuinely curious now, what made you so convinced that it would be completely statically linked?
I think people often talk about Rust only supporting static linking so he probably inferred that it couldn't dynamically link with anything.
Also Go does produce fully static binaries on Linux and so it's at least reasonable to incorrectly guess that Rust does the same.
Definitely shouldn't be so confident though!
Go may or may not do that on Linux depending what you import. If you call things from `os/user` for example, you'll get a dynamically linked binary unless you build with `-tags osusergo`. A similar case exists for `net`.
go by default links libc
It doesn't. See the sibling comment.
Static linking produces huge binaries, it lets you do LTO but the amount of optimisation you can actually do is limited by your RAM. Static linking also causes the entire archive to need constant rebuilds.
> Static linking also causes the entire archive to need constant rebuilds.
Only relinking, which you can make cheap for your non-release builds.
Dynamic linking needs relinking everytime you run the program!
You don't need LTO to trim static binaries (though LTO will do it), `-ffunction-sections -fdata-sections` in compiler flags combined with `--gc-section` (or equivalent) in linker flags will do it.
This way you can get small binaries with readable assembly.
C++ binaries should be doing the same. Externally, speak C ABI. Internally, statically link Rust stdlib or C++ stdlib.
Exporting a C API from a C++ project to consume in another C++ project is really painful. This is how you get COM.
(which actually slightly pre-dates C++, I think?)
> This is how you get COM. (which actually slightly pre-dates C++, I think?)
No. C++ is from 1985 (https://en.wikipedia.org/wiki/C%2B%2B), COM from 1993 (https://en.wikipedia.org/wiki/Component_Object_Model)
COM is actually good though. Or if you want another object system, you can go with GObject, which works fine with Rust, C-+, Python, JavaScript, and tons of other things.
OWL, MFC, Qt, VCL, FireMonkey, AppFramework, PowerPlant...
Plenty do not, especially on Apple and Microsoft platforms because they always favoured other approaches to bare bones UNIX support on their dynamic linkers, and C++ compilers.
Rust cannot dynamic link to rust. It can dynamic link to C and be dynamicly linked by C - if you combine the two you can cheat but it is still C that you are dealing with not rust even if rust is on both sides.
Rust can absolutely link to Rust libraries dynamically. There is no stable ABI, so it has to be the same compiler version, but it will still be dynamically linked.
It might help to think of it as two IPC 'servers' written in rust that happen to have the C ABI interfaces as their communication protocol.
No. https://doc.rust-lang.org/reference/linkage.html#r-link.dyli...
Rust lets you generate dynamic C-linkage libraries.
Use crate-type=["cdylib"]
Dynamic linking works fine if you target the C ABI.
Rust importing Rust must be statically linked, yes. You can statically link Rust into a dynamic library that other libraries link to, though!
You can use dynamic linking in Rust with C ABI. Which means going through `unsafe` keyword - also known as 'trust me bro'. Static linking directly to Rust source means it is checked by compiler so there is no need for unsafe.
i wait until they come to the hard stuff like awk, sed and grep.
ripgrep is one of the best grep replacement you can find, maybe even the best, and also one of the most famous Rust projects.
I don't know of a sed equivalent, but I guess that would be easy to implement as Rust has good regex support (see ripgrep), and 90%+ of sed usage is search-and-replace. The other commands don't look hard to implement and because they are not used as much, optimizing these is less of a priority.
I don't know about awk, it is a full programming language, but I guess it is far from an impossible task to implement.
Now the real hard part is making a true, bug-for-bug compatible replacement of the GNU version of these tools, but while good to have, it is not strictly necessary. For example, Busybox is very popular, maybe even more so than GNU in terms of number of devices, and it has its own (likely simplified) version of grep, sed and awk.
There is sd, not a drop in replacement though.
https://github.com/chmln/sd
What would be the point?
I use this crate to process 100s of TB of Common Crawl data, I appreciate the speedups.
What's the reason for using bz2 here? Wouldn't it be faster to do a one off conversion to zstd? It beats bzip2 in every metric at higher compression levels as far as I know.
Common Crawl delivers the data as bz2. Indeed I store intermediate data in zstd with ZFS.
That assumes you're processing the data more than once.
Is this data available as torrents?
Yeah came here to say a 14% speed up in compression is pretty good!
bzip2 (particularly parallel implementations thereof) are already relatively competitive for compression. The decompression time is where it lags behind because lz77 based algorithms can be incredibly fast at decompression.
It's blazingly fast
Anyone know if this will by default resolve the 11 outstanding CVEs?
Ironically there is one CVE reported in the bzip2 crate
[1] https://app.opencve.io/cve/?product=bzip2&vendor=bzip2_proje...
There's certainly a contrast between the "Oops a huge file causes a runtime failure" reported for that crate and a bunch of "Oops we have bounds misses" in C. I wonder how hard anybody worked on trying to exploit the bounds misses to get code execution. It may or may not be impossible to achieve that escalation.
> The bzip2 crate before 0.4.4
They're releasing 0.6.0 today :>
[flagged]
But it does apply to the bzip2 crate, which is the topic of discussion. Its new pure-rust implementation is libbz2-rs-sys, not bzip2-rs. The last sentence is irrelevant.
This article is about the bzip2 crate, not the bzip2-rs crate, despite the repo for the former having the name of the latter.
[dead]
FTA:
> Why bother working on this algorithm from the 90s that sees very little use today?
What's in use nowadays ? zstd ?
ahh saw this: https://quixdb.github.io/squash-benchmark/
i'd be curious if they're using the same llvm codegen (with the same optimization) backend for the c and rust versions. if so, where the speedups are coming from?
(ie, is it some kind of rust auto-simd thing, did they use the opportunity to hand optimize other parts or is it making use of newer optimized libraries, or... other)
Just speculating: Rust can hand over more hints to the code generator. Eg you don't have to worry about aliasing as much as with C pointers. See https://en.wikipedia.org/wiki/Aliasing_(computing)#Conflicts...
This makes a lot of sense to me, though I don’t know the official answer so I’m just sort of guessing along too.
Linked from the article is another on how they used c2rust to do the initial translation.
https://trifectatech.org/blog/translating-bzip2-with-c2rust/
For our purposes, it points out places where the code isn’t very optimal because the C code has no guarantees on the ranges of variables, etc.
It also points out a lot of people just use ‘int’ even when the number will never be very big.
But with the proper type the Rust compiler can decide to do something else if it will perform better.
So I suspect your idea that it allows unlocking better optimizations though more knowledge is probably the right answer.
Ergonomics of using the right data structures and algorithms can also play a big role. In C, everything beyond a basic array is too much hassle.
Yeah, that was Brian Cantrill's realization when for the sake of learning he rewrote a part of dtrace in Rust and was shocked when he saw his naive reimplementation being significantly faster than his original code, and the answer boiled down to “I used a BTreeMap" in Rust because it's in std”.
And even non standard library crates are really easy to use in Rust.
hmm.. i wonder how it would compare then with clang+linux, clang+stl or hotspot+j2ee.
reminds me a bit of the days when perl programs would often outrun native c/c++ for common tasks because ultimately they had the most efficient string processing libraries baked into the language built-ins.
how is space efficiency? last i checked, because of big libraries and ease of adding them to projects, a lot of rust binaries tend to be much larger than their traditional counterparts. how might this impact overall system performance if this trade-off is made en-masse? (even if more ram is added to counteract loss of vm page cache, does it also start to impact locality and cache utilitization?)
i'd be curious how something like redox benchmarks against traditional linux for real world workloads and interactivity measures.
For whatever it's worth, details of my findings are in [0].
[0] https://bcantrill.dtrace.org/2018/09/28/the-relative-perform...
pretty cool! in isolation looks awesome! i'm still a little curious about the impacts increased executable image size, especially in a complete system.
if all the binaries are big, does it start to crowd out cache space? does static linking make sense for full systems?
The kernel will only load the parts of the binary you actually run, and can drop the disk cache for those parts that haven't been ran in a while.
So more than the absolute size of the binary, you should worry about how much is actually in the 'active set'.
C is honestly a pretty bad language for writing modern high performance code. Between C99 and C21, there was a ~20 year gap where the language just didn't add features needed to idiomatically target lots of the new instructions added (without inline asm). Just getting good abstract machine instructions for clz/popcnt/clmul/pdep etc helps a lot for writing this kind of code.
Popcount, clz, and ctz are provided as nonstandard functions in GCC (and clang might also support them in GNU mode, but I don't know for sure). PDEP and PEXT do not seem to be, but I think they should be (and PEXT is something that INTERCAL already had, anyways) (although PDEP and PEXP can be used with -mbmi2 on x86, but are not available for general use). The MOR and MXOR of MMIX are also something that I would want to be available as built-in functions.
any rewrite, in X, Y, Z language gives you the opportunity to speed things up, there is nothing inherent to rust
I hope they or Prossimo will also look and reimplement in the similar fashion the core Internet protocols - BGP, OSPF and RIP, other routing implementations, DNS servers, and so on.
Check out
https://nlnet.nl/project/current.html https://www.sovereign.tech/programs/fund
There's been good support over the last couple of years to fund rewriting critical internet & OS tools into safer languages like Rust.
Eg BGP in Rust https://www.nlnetlabs.nl/projects/routing/rotonda/
Thank you, precisely what I had in mind! Somehow I missed this project. As well as Holo[1] (routing)
[1] https://github.com/holo-routing/holo
https://www.memorysafety.org/initiative/ this page mentions TLS and DNS which goes some way towards your suggestion.
Is that domain actually about memory safety or about Rust?
One guy did Ironsides DNS in SPARK Ada which has stronger proofs.
Nothing against Ada, it's a good language. The only problem would be finding contributors in that case.
[dead]
About not having perf on macOS: you can get quite far with dtrace for profiling. That’s what the original flame graph script in Perl mentions using and what the flame graph Rust reimplementation also uses. It does not have some metrics like cache misses or micro instructions retired but still it can be very useful.
Does anyone know if it supports parallel decompression, lbzip2-style? (or just iterators doing pre-scanning for the block magic that allow doing parallel decompression on top).
Edit : it probably doesn't.
We should rewrite Rust in Javascript
Lbzip2 had much faster decompressing speed, using all available CPU cores.
It's 2025, and most programs like Python are stuck at one CPU core.
Thanks for showing us you have no understanding of python's situation.
rust aside, I really enjoy seeing all these different implementation benchmarks, very satisfying to read
I like Rust and have an ambition to learn it as well (I've had a few false starts...). One of my issues that I have is that every (slight exaggeration) library that I seem to come across is still at version 0.x.y. Take this library as an example. 0.1.0 was released in 2014 and it still hasn't had a 1.0.0 release, is there an aversion to get to 1.0.0 in the rust community?
https://0ver.org/#notable-zerover-projects
Serious answer: For some, they do change semi-often and don't feel compelled to declare stability. In other cases, it's a stable + widely used 0.x package, and bumping it to 1.0 usually implies _some_ kind of breaking change. (I don't know if that _should_ be the case, but I know that if I see a dependency has bumped from 0.x to 1.0 I'm going to be cautious and wait to update it until I have more time).
In general: People usually aren't too concerned about it.
This list's Zig as an entry, despite the Zig project having very clear plans[0] for a 1.0 release. That's not 0ver, it's just the beta stage of semver.
[0] https://github.com/ziglang/zig/milestone/2
Yes, in rust, the package manager has built in rules about when to update a package. It won’t auto update a major version change because it implies a change that breaks something. As long as your package is safe to auto update you don’t want to change the major version number.
[flagged]
> After the uutils debacle
Which debacle?
[flagged]
So what I’m getting is
1. The uutils project didn’t also make all locales cases for sort faster even though the majority of people will be using UTF-8, C or POSIX where it is indeed faster
2. There’s a lot of debating about different test cases which is a never ending quibble with sorting routines (go look at some of the cutting edge sort algorithm development).
This complaint is hyperfocusing on 1 of the many utilities they claim they’re faster on and quibbling about what to me are important but ultimately minor critiques. I really don’t see the debacle.
As for the license, that’s more your opinion. Rust as a language generally has dual licensed their code as MIT and Apache2 and most open source projects follow this tradition. I don’t see the conspiracy that you do. And just so I’m clear, the corporation your criticizing here as the amorphous evil entity funding this is Ubuntu right?
>1. The uutils project didn’t also make all locales cases for sort faster even though the majority of people will be using UTF-8, C or POSIX where it is indeed faster
locale != encoding.
Try sort a phone book with tr_TR.UTF-8 vs en_US.UTF-8
I know. UTF-8, C and POSIX are locales (at least those are the locale strings)
So what was I supposed to get from that 4chan wannabe site? That the project is not currently at fast as GNU? Where is the lying?
[flagged]
[flagged]
You should of course verify these results in your scenario. However, I somewhat doubt that the person exists who cares greatly about performance, and is still willing to consider bzip2. There isn't a point anywhere in the design space where bzip2 beats zstd. You can get smaller outputs from zstd in 1/20th the time for many common inputs, or you can spend the same amount of time and get a significantly smaller output, and zstd decompression is again 20-50x faster depending. So the speed of your bzip2 implementation hardly seems worth arguing over.
Sure there is: someone provided you bzip2 files. Or required you give them files in that format.
Then you don’t have a choice.
And if you have to use it, 14% is a really nice speed up.
Do they use any llm to transpile the C to Rust ?
If you're going to use tools to transpile, don't use something that hallucinates. You want it to be precise.
https://github.com/immunant/c2rust reportedly works pretty well. Blog post from a few years ago of them transpiling quake3 to rust: https://immunant.com/blog/2020/01/quake3/. The rust produced ain't pretty, but you can then start cleaning it up and making it more "rusty"
They indeed used c2rust for the initial transpile according to https://trifectatech.org/blog/translating-bzip2-with-c2rust/
Task that requires precision and potentially hard to audit? Exactly where I'd use an LLM /s
Without commenting on whether an LLM is the right approach, I don't think this task is particularly hard to audit. There is almost assuredly a huge test suite for bzip2 archives; fuzzing file formats is very easy; and you can restrict / audit the use of unsafe by the translator.
You’re right, there is a large existing test suite. It’s mentioned in an article linked from this one.
https://trifectatech.org/blog/translating-bzip2-with-c2rust/
I suspect attempting to debug it would be a nightmare though. Given the LLM could hallucinate anything anywhere you’d likely waste a ton of time.
I suspect it would be faster to just try and write a new implementation based on the spec and debug that against the test suite. You’d likely be closer.
In fact, since they used c2rust, they had a perfectly working version from the start. From there they just had to clean up the Rust code and make sure it didn’t break anything. Clearly the best of the three options.
> and you can restrict / audit the use of unsafe by the translator.
No. You need to audit for correctness in additional to safety.
A lot of this "rewrite X in Rust" stuff feels like burning your own house down so you can rebuild and paint it a different color.
Counting CPU cycles as if it's an accomplishment seems irrelevant in a world where 50% of modern CPU resources are allocated toward UI eye candy.
> Counting CPU cycles as if it's an accomplishment seems irrelevant in a world where 50% of modern CPU resources are allocated toward UI eye candy.
That's the kind of attitude that leads to 50% of modern CPU resources being allocated toward UI eye candy.
Every cycle saved is longer battery life. Someone paid the one time cost of porting it, and now we can enjoy better performance forever.
They kicked off the article saying that no one uses bzip2 anymore. A million cycles saved for something no one uses (according to them) is still 0% battery life saved.
If modern CPUs are so power efficient and have so many spare cycles to allocate to e.g. eye candy no one asked for, then no one is counting and the comparison is irrelevant.
It sounds like the main motivation for the conversion was to simplify builds and reduce the chance of security issues. Old parts of protocols that no one pays much attention to anymore does seem to be a common place where those pop up. The performance gain looks more like just a nice side effect of the rewrite, I imagine they were at most targeting performance parity.
Exactly, even if we can't remove "that one dependency" (https://xkcd.com/2347/), we can reinforce everything that uses it.
Isn't bzip used quite a bit, especially for tar files?
The Wikipedia data dumps [0] are multistream bz2. This makes them relatively easy to partially ingest, and I'm happy to be able to remove the C dependency from the Rust code I have that deals with said dumps.
[0]: https://meta.wikimedia.org/wiki/Data_dump_torrents#English_W...
If so, only by misguided users. Why would anyone choose bz2 in 2025?
To unpack an archive made from the time when bz2 was used?
Of course no one uses systems, tools and files created before 2025!
bzip2 hasn't been the best at anything in at least 20 years.
The same could be said of many things that, nonetheless, are still used by many, and will continue to be used by many for decades to come. A thing does not need to be best to justify someone wanting to make it a bit better.
I use plain old zip files almost every day.
“Best” is measured along a lot more axis than just performance. And you don’t always get to choose what format you use. It may be dictated to you by some 3rd party you can’t influence.
bzip2 is still pretty good if you want to optimize for:
This is a niche, but it does crop up sometimes. The downside to bzip2 is that it is slow to decompress, but for write-heavy workloads, that doesn't matter too much.So? If I need to consume a resource compressed using bz2, I'm not just going to sit around and wait for them to use zstd. I'm going to break out bz2. If I can use a modern rewrite that's faster, I'll take every advantage I can get.
I personally find a lot more relevant the part about "Enabling cross-compilation ", which in my opinion is important and a win.
The same about exported symbols and being able to compile to wasm easily.
> Counting CPU cycles as if it's an accomplishment seems irrelevant in a world where 50% of modern CPU resources are allocated toward UI eye candy.
Attitude which leads to electron apps replacing native ones, and I hate it. I am not buying better cpus and more ram just to have it wasted like this
You know it is just Wirth's law in action: "Software gets slower faster than hardware gets faster." [^1]
In fact Jevons Paradox: When technological progress increases the efficiency with which a resource is used, but the rate of consumption of that resource rises due to increasing demand - essentially, efficiency improvements can lead to increased consumption rather than the intended conservation. [^2][^3]
[^1]: https://www.comp.nus.edu.sg/~damithch/quotes/quote27.htm
[^2]: https://www.greenchoices.org/news/blog-posts/the-jevons-para...
[^3]: https://quickonomics.com/terms/jevons-paradox/
I think it goes deeper. There is a certain level of slowness that causes pain to users. When that level is hit, market forces cause attention to software efficiency.
Hardware efficiency just gives more room for software to bloat. The pain level is a human factor and stays the same.
So time to adapt Wirths law: Software gets slower >exactly as much< as hardware gets faster
It seems to me like binary file format parsing (and construction) is probably a good place for using languages that aren't as prone to buffer-overflows and the like. Especially if it's for a common format and the code might be used in all sorts of security-contexts.
Buffer overflows are more a library problem, not a language problem, though for newer ecosystems like Rust the distinction is kind of lost on people. But point being, if you rewrote bzip2 using an equivalent to std::Vec, you'd end up in the same place. Unfortunately, the norm among C developers, especially in the past, was to open code most buffer manipulation, so you wind up with 1000 manually written overflow checks, some of which are wrong or outright missing, as opposed to a single check in a shared implementation. Indeed, even that Rust code had an off-by-one (in "safe" code), it just wasn't considered a security issue because it would result in data corruption, not an overflow.
What Rust-the-language does offer is temporal safety (i.e. the borrow checker), and there's no easy way to get that in C.
Those cycles translate directly to $ saved in a few places. Mostly in places far away from having any UI at all.
Pretty incredible for such a short argument to be so inconsistent with itself. Complaining about counting CPU cycles and actually measuring performance because... modern software development is bad and doesn't care about performance?
you're just an end user, you don't have to maintain the suite.
In OSS every hour of volunteer time is precious Manna from heaven, flavored with unicorn tears. So any way to remove Toil and introduce automation is gold.
Rust's strict compiler and an appropriate test suite guarantees a level of correctness far beyond C. There's less onus on the reviewer to ensure everything still works as expected when reviewing a pull request.
It's a win-win situation.
I fully agree with you on the first statement and I am at loss of words at the second...
> lot of this "rewrite X in Rust" stuff feels like
Indeed. You know the react-angular-vue nevermind is churn? It appears that the trend of people pushing stuff because it benefit their careers is coming to the low level world.
I for one still find it mistifying that Linus torvals let this people into the kernel. Linus, who famous banned c++ from the kernel not because of c++ in itself, but to ban c++ programmer culture.
It's like "adapting" Akallabêth so you can tell your own empowering story for modern audiences.
It's a lot like X11 vs. Wayland. The current graphics developers, who trend younger, don't want to maintain the boomer-written C code in the X server. Too risky and time-consuming. So one of the goals of Wayland is to completely abolish X so it can be replaced with something more long-term maintainable. Turns out, current systems-level developers don't want to maintain boomer-written GNU code or any C code at all, really, for similar reasons. C is inherently problematic because even seasoned developers have trouble avoiding its footguns. So an unstated, but important, goal of Rust is to abolish all critical C code and replace it with Rust code. Ubuntu is on board with this.
Except Wayland was developed by the same people who worked for years on X. And they don't dislike X because of C. And they didn't write Wayland in Rust.
> Except Wayland was developed by the same people who worked for years on X.
Yes, and they hated it and "worked hard to kill it" per Jordan Petridis. Note that the maintainers of X in the Wayland era are not really the same people as the original authors of X.
They didn't just maintain it, they did years of work on it. And again, it was not because it was C. Its because it was literally millions of lines of C from 80s and early 90s and with a sub-optimal architecture.
And there is likely a reason the original people didn't continue to work on it.
You're not telling me anything new. And I'm not trying to claim that X is abandoned because it's in C, although convoluted C code does add to the maintenance difficulties.
I'm drawing an analogy between the X and Wayland situation and the ongoing effort to replace core userland code written in C with new code written in Rust.
So you say younger programmer have not the required coding kung fu to cope with c code? I hope you are wrong. The perspective to have rust like things on everydays devices realy frightens me. C is like a Lingua franca for computers. Nearly any hardware near person can READ it. I am one of this Boomers and i am not able to propper READ rust code, because the syntax is so academic. The fact that more and more code is written in rust, lessens the amount of people that can read programs
It's more like no programmer of any age has the required coding kung fu to cope with C code. It is inevitable that they will introduce problematic code. We have decades of examples illustrating this. We lived with it because there was no truly competitive alternative for so long.
I can read and write C code from the times when there weren't any competitive alternatives. I have no problem reading or writing Rust code. In fact it communicates more to me than C code does or can and I can immediately understand more about the code written in Rust than I can about code written in C.
> Counting CPU cycles
And that's assuming they aren't lying about the counting: https://desuarchive.org/g/thread/104831348/#q104831479
Do you have any reason to think their numbers are wrong, or is your argument "someone else once lied, maybe they are too"?
Rust devs continuing to use misleading benchmarks? I, for one, am absolutely shocked. Flabbergasted, even.