Fd and ripgrep/rg are the two "new" alternatives I use on a regular basis, and which are just huge improvements to life. Both of these find/search programs respect your .gitignore files, which helps enormously & makes searching my department's entire codebase really fast.
Fd is featured on Julia Evans' recent "New(ish) command line tools"[1]
https://github.com/chmin/sd: "sd uses regex syntax that you already know from JavaScript and Python. Forget about dealing with quirks of sed or awk - get productive immediately."
It would be interesting to test the ~1.5GB of JSON the author uses for the benchmark against sed, but there are no details on how many files nor what those files contain.
When trying something relatively small and simple, sd appears to be slower than sed. It also appears to require more memory. Maybe others will have different results.
sh # using dash not bash
echo j > 1
time sed s/j/k/ 1
time -p sed s/j/k/ 1
time sd j k 1
time -p sd j k 1
Opposite problem as the sd author for me. For system tasks, more familiar with faster sed and awk than with slower Python and Javascript, so I wish that Python and Javascript regex looked more like sed and awk, i.e., BRE and occasionally ERE. Someone in the NetBSD core group once wrote a find(1) alternative that had C-like syntax, similar to how awk uses a C-like syntax. Makes sense because C is the systems language for UNIX. Among other things, most of the system utilities are written in it. If the user knows C then she can read the system source and modify/repair the system where necessary, so it is beneficial to become familiar with it. Is anyone is writing system utility alternatives in Rust that use a Rust-like syntax.
Agree, I've started replacing my `perl -pe s/.../.../g`s with `sd`. It seems it's actually slightly faster than the equivalent Perl for the same substitutions (which it should be since it does less).
It is somewhat notable that rg and fd differ significantly in that rg is almost perfect superset of grep in terms of features (some might be behind different flags etc), but fd explicitly has narrower featureset than find.
Yeah, this was very intentional. Because this is HN, I'll say some things that greps usually support that ripgrep doesn't:
1) greps support POSIX-compatible regexes, which come in two flavors: BREs and EREs. BREs permit back-references and have different escaping rules that tend to be convenient in some cases. For example, in BREs, '+' is just a literal plus-sign but '\+' is a regex meta character that means "match one or more times." In EREs, the meanings are flipped. POSIX compatible regexes also use "leftmost longest" where as ripgrep uses "leftmost first." For example, 'sam|samwise' will match 'sam' in 'samwise' in "leftmost first," but will match 'samwise' in "leftmost longest."
2) greps have POSIX locale support. ripgrep intentionally just has broad Unicode support and ignores POSIX locales completely.
3) ripgrep doesn't have "equivalence classes." For example, `echo 'pokémon' | grep 'pok[[=e=]]mon'` matches.
4) grep conforms to a standard---POSIX---where as ripgrep doesn't. That means you can (in theory) have multiple distinct implementations that all behave the same. (Although, in practice, this is somewhat rare because some implementations add a lot of extra features and it's not always obvious when you use something that is beyond what POSIX itself strictly supports.)
I think that probably covers it, although this is all off the cuff. I might be forgetting something. I suppose the main other things are some flag incompatibilities. For example, grep has '-h' as short for '--no-filename'. Also, since ripgrep does recursive search by default, there are no -r/-R flags. Instead, -r does replacements and -R is unused. -L is used for following symlinks (like 'find').
The specific reason is hard to articulate precisely, but it basically boils down to "difficult to implement." The UTS#18 spec is a tortured document. I think it's better that it exists than not, but if you look at its history, it's undergone quite a bit of evolution. For example, there used to be a "level 3" of UTS#18, but it was retracted: https://unicode.org/reports/tr18/#Tailored_Support
And to be clear, in order to implement the Turkish dotless 'i' stuff correctly, your implementation needs to have that "level 3" support for custom tailoring based on locale. So you could actually elevate your question to the Unicode consortium itself.
I'm not plugged into the Unicode consortium and its decision making process, but based on what I've read and my experience implementing regex engines, the answer to your question is reasonably simple: it is difficult to implement.
ripgrep doesn't even have "level 2" support in its regex engine, nevermind a retracted "level 3" support for custom tailoring. And indeed, most regex engines don't bother with level 2 either. Hell, many don't bother with level 1. The specific reasoning boils down to difficulty in the implementation.
OK OK, so what is this "difficulty"? The issue comes from how regex engines are implemented. And even that is hard to explain because regex engines are themselves split into two major ideas: unbounded backtracking regex engines that typically support oodles of features (think Perl and PCRE) and regex engines based on finite automata. (Hybrids exist too!) I personally don't know so much about the former, but know a lot about the latter. So that's what I'll speak to.
Before the era of Unicode, most things just assumed ASCII and everything was byte oriented and things were glorious. If you wanted to implement a DFA, its alphabet was just consisted of the obvious: 255 bytes. That means your transition table had states as rows and each possible byte value as columns. Depending on how big your state pointers are, even this is quite massive! (Assuming state pointers are the size of an actual pointer, then on x86_64 targets, just 10 states would use 10x255x8=~20KB of memory. Yikes.)
But once Unicode came along, your regex engine really wants to know about codepoints. For example, what does '[^a]' match? Does it match any byte except for 'a'? Well, that would be just horrendous on UTF-8 encoded text, because it might give you a match in the middle of a codepoint. No, '[^a]' wants to match "every codepoint except for 'a'."
So then you think: well, now your alphabet is just the set of all Unicode codepoints. Well, that's huge. What happens to your transition table size? It's intractable, so then you switch to a sparse representation, e.g., using a hashmap to map the current state and the current codepoint to the next state. Well... Owch. A hashmap lookup for every transition when previously it was just some simple arithmetic and a pointer dereference? You're looking at a huge slowdown. Too huge to be practical. So what do you do? Well, you build UTF-8 into your automaton itself. It makes the automaton bigger, but you retain your small alphabet size. Here, I'll show you. The first example is byte oriented while the second is Unicode aware:
This doesn't look like a huge increase in complexity, but that's only because '[^a]' is simple. Try using something like '\w' and you need hundreds of states.
But that's just codepoints. UTS#18 level 2 support requires "full" case folding, which includes the possibility of some codepoints mapping to multiple codepoints when doing caseless matching. For example, 'ß' should match 'SS', but the latter is two codepoints, not one. So that is considered part of "full" case folding. "simple" case folding, which is all that is required by UTS#18 level 1, limits itself to caseless matching for codepoints that are 1-to-1. That is, codepoints whose case folding maps to exactly one other codepoint. UTS#18 even talks about this[1], and that specifically, it is difficult for regex engines to support. Hell, it looks like even "full" case folding has been retracted from "level 2" support.[2]
The reason why "full" case folding is difficult is because regex engine designs are oriented around "codepoint" as the logical units on which to match. If "full" case folding were permitted, that would mean, for example, that '(?i)[^a]' would actually be able to match more than one codepoint. This turns out to be exceptionally difficult to implement, at least in finite automata based regex engines.
Now, I don't believe the Turkish dotless-i problem involves multiple codepoints, but it does require custom tailoring. And that means the regex engine would need to be parameterized over a locale. AFAIK, the only regex engines that even attempt this are POSIX and maybe ICU's regex engine. Otherwise, any custom tailoring that's needed is left up to the application.
The bottom line is that custom tailoring and "full" case matching don't tend to matter enough to be worth implementing correctly in most regex engines. Usually the application can work around it if they care enough. For example, the application could replace dotless-i/dotted-I with dotted-i/dotless-I before running a regex query.
The same thing applies for normalization.[3] Regex engines never (I'm not aware of any that do) take Unicode normal forms into account. Instead, the application needs to handle that sort of stuff. So nevermind Turkish special cases, you might not find a 'é' when you search for an 'é':
$ echo 'é' | rg 'é'
$ echo 'é' | grep 'é'
$
Unicode is hard. Tooling is littered with footguns. Sometimes you just have to work to find them. The Turkish dotless-i just happens to be a fan favorite example.
I use Frawk (https://github.com/ezrosent/frawk) a decent amount too! I downloaded it to do some parallel CSV processing and i've just kind of kept it ever since.
I had someone ask me (a self described grep monkey) how I navigate grepping very long lines (minified js for example) to which I replied ‘lol I just ignore them’. I’d love ‘only select 200 chars if longer than 200 chats, but to my knowledge there’s no easy way to do this with grep. I’d love to hear suggestions on how people navigate this
It works well outside of git repos automatically. And can search across multiple git repos while respecting each repo's respective gitignores automatically. ripgrep also tends to be faster, although the absolute difference tends to be lower with 'git grep' than a simple 'grep -r', since 'git grep' does at least use parallelism.
There are other reasons to prefer one over the other, but are somewhat more minor.
Here's one benchmark that shows a fairly substantial difference between ripgrep and git-grep and ugrep:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ git rev-parse HEAD
3b5e1590a26713a8c76896f0f1b99f52ec24e72f
$ git remote -v
origin git@github.com:torvalds/linux (fetch)
origin git@github.com:torvalds/linux (push)
$ time rg '\w{42}' | wc -l
1957843
real 0.706
user 7.110
sys 0.462
maxmem 300 MB
faults 0
$ time git grep -E '\w{42}' | wc -l
1957843
real 7.678
user 1:49.03
sys 0.729
maxmem 411 MB
faults 0
$ time ugrep -r --binary-files=without-match --ignore-files '\w{42}' | wc -l
1957841
real 10.570
user 46.980
sys 0.502
maxmem 344 MB
faults 0
$ time ag '\w{42}' | wc -l
1957806
real 3.423
user 8.288
sys 0.695
maxmem 79 MB
faults 0
$ time grep -E -r '\w{42}' ./ | wc -l
grep: ./.git/objects/pack/pack-c708bab866afaadf8b5da7b741e6759169a641b4.pack: binary file matches
grep: ./.git/index: binary file matches
1957843
real 47.441
user 47.137
sys 0.290
maxmem 4 MB
faults 0
The GNU grep comparison is somewhat unfair because it's searching a whole lot more than the other 3 tools. (Although notice that there are no additional matches outside of binary files.) But it's a good baseline and also demonstrates the experience that a lot of folks have: most just tend to compare a "smarter" grep with the "obvious" grep invocation and see that it's an order of magnitude faster.
It's also interesting that all tools agree on match counts except for ugrep ang ag. ag at least doesn't have any kind of Unicode support, so that probably explains that. (Don't have time to track down the discrepancy with ugrep to see who is to blame.)
And if you do want to search literally everything, ripgrep can do that too. Just add '-uuu':
$ time rg -uuu '\w{42}' | wc -l
1957845
real 1.288
user 8.048
sys 0.487
maxmem 277 MB
faults 0
And it still does it better than GNU grep. And yes, this is with Unicode support enabled. If you disable it, you get fewer matches and the search time improves. (GNU grep gets faster too.)
$ time rg -uuu '(?-u)\w{42}' | wc -l
1957810
real 0.235
user 1.662
sys 0.374
maxmem 173 MB
faults 0
$ time LC_ALL=C grep -E -r '\w{42}' ./ | wc -l
grep: ./.git/objects/pack/pack-c708bab866afaadf8b5da7b741e6759169a641b4.pack: binary file matches
grep: ./.git/index: binary file matches
1957808
real 2.636
user 2.362
sys 0.269
maxmem 4 MB
faults 0
Now, to be fair, '\w{42}' is a tricky regex. Searching something like a literal brings all tools down into a range where they are quite comparable:
$ time rg ZQZQZQZQZQ | wc -l
0
real 0.073
user 0.358
sys 0.364
maxmem 11 MB
faults 0
$ time git grep ZQZQZQZQZQ | wc -l
0
real 0.206
user 0.291
sys 1.014
maxmem 134 MB
faults 1
$ time ugrep -r --binary-files=without-match --ignore-files ZQZQZQZQZQ | wc -l
0
real 0.199
user 0.847
sys 0.743
maxmem 7 MB
faults 16
I realize this is beyond the scope of what you asked, but eh, I had fun.
How fast is magic wormhole? In my experience most of the new(er) file transfer apps based on WebRTC are just barely faster than Bluetooth and are unable to saturate the bandwidth. I am not sure if the bottleneck is in the WebRTC stack or whether there is something fundamentally wrong about the protocol itself.
The singular habit I picked up way back in the day was to simply cope with what was available.
There's all sorts of utilities and such. Emacs was a grand example at the time as well. Lots of better mousetraps.
But when you bounce around to a lot of different machines, machines not necessarily in your control, "lowest common denominator" really starts to rear its ugly head.
That vast majority of my command line concoctions are burned into muscle memory.
Today, I think the base line install of modern *nixes are higher than they were back in the day, but the maxim still applies of working with what they have out of the box.
> The singular habit I picked up way back in the day was to simply cope with what was available.
Nothing wrong with that. There are other ends of the spectrum of "make the things I do often as easy as I can", too. Both work.
It reminds me of my father; when he got in a car that wasn't his, he would NOT CHANGE ANYTHING. You couldn't tell he was in it. Safety issues aside, there's an argument to be made to make it as comfortable for you as you can; seat, mirrors, environment, etc. to minimize any distractions.
I see this too in my (software development) communities; some people like to tailor their development experience to the n'th degree to extract as much personal enjoyment/optimization/etc. out of it as they can. Others like to use whatever they're given and be happy with that. Both work.
Myself, I type for a living so I like to use keyboards I like. I bring my own with me in my "go bag" for when I'm out so I don't have to use the (IMO!) crappy Apple Laptop KB's. I /can/ use it, I just choose not to. Other people either like them, or have learned they don't care enough. All good.
Counterpoint, deciding not to put up with old things that are a pain in the ass is part of unseating the existing momentum so we can finally move on to better things.
Knowing how to do things the annoying way doesn't mean that has to be the preferred way. Being open to retooling is part of staying relevant
What I would like to see in that case is a "next gen utils", a bundling that makes it likely to find a number of these tools together in future servers.
Really the "right" way to go about it would be to employ an existing package manager (there's enough already, we don't need another) and some magic glue on top that makes it easy.
For example, you have your configuration of packages, in the ephemeral cloud somewhere, and you do the really dangerous no good thing of piping through bash with some kind of uuid that's assigned to your account, something like (totally made up url)
curl packman.info/users/aed1242faed60a | bash
And it sniffs the architecture, version of binaries to install, which ones are there, and then puts it into an install directory configured by you.
This is like 97% existing things with a little glue and interface polish so you can easily bring in an environment.
There's certainly other ways but the idea remains the same
YES. That isn't a completely new shell. I keep trying the fishes and z-shells of the world and I keep coming back to "my own hacked up bash" because of muscle memory on random edge case things.
That's why I stuck with vi and sh back in the day: I knew they were on every machine I might telnet to (this was before ssh, sigh).
On machines I controlled, I mostly used ksh, but it wasn't available on all machines; I cannot remember if it was the SunOS boxes or the older HP-UX or the Apollos, but there were a few. (csh? Go away. No, just go.)
Nowadays, vim and bash are everywhere I need them to be, even if I have to kludge around some version differences.
My only real gripe about find is the awkwardness of pruning multiple hierarchies. After you've written
This is fine and all, but there are also subtle differences in standard CLI tools depending on the implementation. I'm used to GNU stdutils, and butt heads with the macOS and BusyBox implementations.
Yes, to get by with what is available is a useful trait. No matter how good these tools are, I will often arrive at a prompt where they are unavailable.
If you ever give ripgrep a go, stumble over something and are inclined to: please post a Discussion question[1]. "beginner" or "stupid" questions are welcome. If you can show me what you know works with grep, for example, and are curious about an equivalent rg command, that could be a good question. I might be able to give you some "fundamental" answers to it that let you reason about the tools more from first principles, but on your terms.
I aliased grep and find to their newer alternatives. Sure, the syntax will be off from time to time but due to muscle memory I couldn’t relearn the new tools otherwise.
I think about a month after I learned enough Vim to be dangerous RHEL (8, I think) started shipping nano as the default editor. Ah well, now I can scroll with the home row on my local box.
I want to love fd - I'm a big believer in the idea that CLIs don't have to be scary, intimidating things (see normals using Slack with /commands and keyboard shortcuts), and find has a gigantic hairball of a UI.
The thing is, though, I know find well enough to not notice the terrible UI that much, and I know I can rely on it being everywhere. With fd that isn't true.
So it's hard for me to justify making the move.
Same thing happens with things like the fish and oil shells - I have little doubt their UX is better than Bash's, but Bash is pretty ubiquitous.
Emacs has this problem too, as an Emacs user. The UX is completely alien by current standards, but if you update the defaults you'll break a lot of people's existing configs.
How do you get around backwards compatibility / universality UX roadblocks like this?
I was more thinking of scripts, yeah, like you describe.
Where I get hung up is, if I need to keep the traditional syntaxes in my head for scripting, why bother storing another one in my head for interactive use?
...that said, I do use ag and rg for other interactive tools, like cross-project search in Emacs.
I know what you mean. I use fd and rg on my machine, but for scripts, Dockerfiles etc I tend to use find and grep, just because this is the „lingua franca“ of Unix/Linux.
Same, and I'm the author of ripgrep! Unless the script is one that I wrote for myself in ~/bin, I use grep and find and standard tooling in shell scripts.
The only exception is if there is a specific need for something ripgrep does. Usually it's for speed, but speed isn't always needed.
I think the solution that NixOS uses would work for Emacs too. Just define a single variable that declares which default values to use (in NixOS, it's system.stateVersion).
Then packages (including internal packages) can update their defaults based on the version declared there. Basically a protocol version field but for your Emacs configuration.
Distros probably need a different strategy for improving core utils, though.
Fish is awesome, if you hate Bash arcane syntax, like me. It improves my script productivity by 100%.
One rule of thumb though: use it only for personal use, and stick with it to see if it lives long enough. If you're working with the team, just use Bash.
Old grep is still muscle memory for me, and that’s what I use in scripts.
But the newer grep’s are so much faster! I scoffed initially but after a couple of uses I was hooked. I try to install these new tools in my personal ~/bin on systems I spend much time using.
It feels like we're in a third wave of innovation in Unix CLI tools. The first was from BSD in the late 70s/early 80s which considerably improved the original Unix utilities, then the GNU tools rewrite in the late 80s, then there were the dark ages of System V. I give ack (and Andy) credit for starting this latest wave around 2005 but it's really taken off lately with tools being rewritten in Rust and challenging the old status quo.
The only way for 3rd wave to work is if multiple distros agree to adopt it. Since these tools don't even agree on an interface, IMHO it wouldn't be much different than what we have. I also don't like the fact that some tools native skip things like what's in ".gitignore". I don't want a tool that does that by default. If there was a consortium to standardize a new *nix CLI, then maybe it could get some traction.
There was a consortium, and it did standardise Unix, and that's when everything stopped moving in the 90s. Standards are compromises and so all the commercial Unixes had to implement the lowest common denominator. Thankfully GNU didn't care, and the BSD tools were already better.
Pick one... during the Unix Wars [0] there was X/Open [1] vs Unix International (aka AT&T) [2] and the Open Software Foundation [3] (which eventually merged with X/Open to form the Open Software Group). And then the IEEE got involved with POSIX which ultimately "won" as the lowest of the LCDs. [4]
Thanks! Once I moved from University to corporate america, I never looked back at the history of *nix standardization. And looking at these links, I feel awfully naive suggesting another consortium.
Well, like zsh, and many other improvements on the shell, and gawk and other tooling that doesn't match other awk engines and so forth... you end up having two parallel realities. One for scripting, where you use the bare minimus that is acceptable to run on any server and it's guaranteed to be there, and then your user env where you have all your fun tools.
The cool part is network transparency and forwarding environments and other things that plan9 plays with so that you can work locally, remotely.
I know it's asking for the world, but some way to do better "built in modularity" would be great. Like "whatever new shell" plus a standardized "plugin system."
I love fd, but somehow I always get tripped up on this:
fd # prints tree of current directory
fd somedir/ # results in an error
find # prints tree of current directory
find somedir/ # prints tree of somedir/
There are --search-path and --base-path options, so if you alias say fd='fd --search-path' you can then have the required first argument be the path to search. Personally, I find changing to the directory I want to search in less annoying than typing out the directory to search (I know the options exist from script use).
If it wasn't for find's muscle memory, fd has it right as you'd usually list what you want to do in the cli first and then list arbitrary number of targets in the end.
I like the contemporary alternative to the classics. They make a lot of thing so much easier.
I have a little mental block, though. It's related to the realities of the stuff I work on. Since I find myself logged into other people systems, keeping the old, standard tools hot in my head does really take some of the load off. It's a pretty common refrain, but it's real and practical when you've got embedded systems, bsds, linuxes, macs, etc. Even the difference between gnu and mac is clunky when I don't practice enough.
For the same reason, with the notable exception of git, I use practically no aliases.
If I could invent a product, maybe it would be one that enables effectively "forwarding" CLIs to a remote host shell.
I'm with you. I find that I feel like I know Linux like the back of my hands because I can fluidly interface with stock tools with a breeze. These new tools are great but I just don't see them widely spread across many remote systems that I manage. Just managing those packages across a fleet sounds like a pain in the ass.
Nothing like logging into a freshly installed Solaris system and having to configure it using Bourne shell, which didn't have job control or history. At least it had vi. Usually the first thing you would do is get enough networking going to download bash and the GNU tools. But there were always some old timers around who wanted to haze the youngsters by forcing you to do everything with "native" tools.
I spend a lot of time remoting into fresh *nix systems, so I also almost don't have any aliases, with one notable exception : ll (aliased to ls -lah).
It's just so engrained into my muscle memory that I do it without thinking about it most of the time.
And the workaround I found for it is adding a macro on my keyboard (through QMK but can be done with anything) that just types out 'alias ll="ls -lah"\n'.
Yeah, same here. I am doing a good amount of ops/SRE stuff these days while supporting my services and find myself ssh'ing into:
- very locked down bastions
- hosts through a secure remote access VM thing that makes file transfer difficult
- random docker containers in EKS (often through both of the above)
Getting good at the basic tools is just unavoidable. I find myself manually typing `alias k = kubectl` a lot though :p
Oh, I thought I was kind of dumb for doing that :) happy to see that it's normal haha. I always find myself trying to remember: is it one dash? is it "name" or "iname" or whatnot.. pipe grep is way easier.
Maybe I’m just old fashioned but all these new command line utilities strike me as solutions in search of a problem.
Standard ‘find’ works great. It finds files. It can filter by any criteria I have ever had to look for and the syntax seems very intuitive to me (maybe I am just used to it). It is flexible and powerful.
I’d love to be told I’m wrong, because I feel like I’m missing something.
The "new CLI utilities" aren't solving new problems. They are solving old problems with a different user experience, driven by changes in how folks do development (particularly the scales).
Notice that I said "different" UX and not "better" UX. Reasonable people can disagree about whether the newer tools have better UX as an objective fact. Many folks very rightly find it very surprising that these tools will skip over things automatically by default. What is true, however, is that there are a lot of people who do see the UX of these tools as better for them.
As the author of ripgrep, I hear the same thing over and over again: ripgrep replaced their ~/bin grep wrapper with a bunch of --exclude rules that filtered out files they didn't want to search. Because if you don't do that, a simple 'grep -r' can take a really fucking long time to run if you're working on a big project. And guess what: a lot of programmers these days work on big projects. (See previous comment about changes in scale.) So you don't really have a choice: you either write a wrapper around grep so that it doesn't take forever, or you use a "smarter" tool that utilizes existing information (gitignore) to effectively do that for you. That smarter tool typically comes with other improvements, because your simple "grep wrapper" probably doesn't use all of the cores on your machine. It could, but it probably doesn't. So when you switch over, it's like fucking magic: it does what you wanted and it does it better than your wrapper. Boom. Throw away that wrapper and you're good to go. That's what I did anyway. (I had several grep wrappers before I wrote ripgrep.)
Every time these tools are discussed here, people say the same thing: "I don't see the point." It's a failure of imagination to look beyond your own use cases. If all you ever work on are smaller projects, then you're never going to care about the perf difference between ripgrep and grep. ripgrep has other improvements/changes, but nearly all of them are more niche than the UX changes and the perf improvements.
Definitely agree. I really like that ripgrep is fast, but I mainly use it for its better UX for what I do every day: search code. If ripgrep wasn't any faster than grep, but was still recursive by default and ignored .gitignore et al, it'd still be worth using for me. (In fact, I used to use "ack", which is basically ripgrep but slow. :-)
'find' is slow to the point of being useless for me. It can never find the file I'm looking for before I give up waiting for it to finish running. So I'm excited if fd can provide the same functionality but run much faster.
One thing to remember is that these fun utils won't exist on production servers. You also don't want them there for obvious reasons. I find it better to use the most commonly available set of unix tools and I end up being far more effective due to that.
What is the obvious reason? Security? The guy provides sources and it took me 20 minutes to see what's doing in there. I'd definitely put this on production server, after testing intensively of course.
How frequently are you and your team willing to spend that 20 minutes? Are you confident you evaluated it precisely? Are you always going to install the latest release? What do you do as the codebase grows and that 20 minutes turns into an hour?
Did you know there are businesses out there that are still using programs made in DOS era (especially those made in FoxPro) and work perfectly? Or did you took a look at ATM's and seen that majority still have that WindowsXP look'n'feel? Not everybody needs the latest and greatest, you know?
In that spirit, let's see your questions:
1-Once in a life time; 2-Yes;3-No;4-Nothing, see 1st answer.
Hmm, I already have a shell alias that does 90% of this. Doesn't parse .gitignore, but it's not a big problem for me. If it was I'd do `make clean` in the project.
This is always installed and ready to go on any box I have my dotfiles.
I suppose that is why it is hard for these perfectly-good improvements have a hard time getting traction. Because the older stuff is still so flexible.
That are a lot of dependencies for such a simple tool. I'm a Rust user myself, but some of those dependencies really should be part of a good standard lib. Actually, the NPM like ecosystem is my biggest pain point with Rust.
[dependencies]
ansi_term = "0.12"
atty = "0.2"
ignore = "0.4.3"
num_cpus = "1.13"
regex = "1.5.5"
regex-syntax = "0.6"
ctrlc = "3.2"
humantime = "2.1"
lscolors = "0.9"
globset = "0.4"
anyhow = "1.0"
dirs-next = "2.0"
normpath = "0.3.2"
chrono = "0.4"
once_cell = "1.10.0"
[dependencies.clap]
version = "3.1"
features = ["suggestions", "color", "wrap_help", "cargo",
"unstable-grouped"]
If you happen to be using macOS, there's mdfind, which uses the spotlight database which is always kept up to date, unlike locate/updatedb, where updatedb is expensive to run, even if you've run it recently.
I have yet to find a good solution for linux CLI... something that uses an internal database that is kept up to date with all directory structure changes.
Maybe someone else has seen something cool for this? :D
It's still not clear to me whether you want the files inside the database or just the metadata. The stat part suggests the latter?
I guess I can see that, but now you have cache invalidation (someone else linked to Spotlight, which does this as a background process). SQLite files can be larger than any physical medium you can purchase so why not go the distance?
updatedb is expensive to run and is usually run by a cron job, whereas something like mdfind on macos uses spotlight, which is kept up to date with all filesystem changes (or at least reasonably fast).
I use find all the time, but it is such a strange beast - it's as if there were a meeting among all the standard Unix utilities on look and feel and find missed the memo. But it's ubiquitous and I'm too old to change horses now anyway.
I don't like it when non-git tools do that by default. I'm sure it's nice for some people, so make it an option that could be enabled. But to have that behavior by default feels far too opinionated for me.
People generally develop tools for their own needs. I'm not saying "hey, if you want one that uses the defaults you believe are best, create your own", but just keep some empathy for open source developers. :)
Generated files. If you're searching for every place something shows up so you can change it, it's annoying to have to filter out the stuff that'll get updated automatically (and will mess up timestamps if you update manually.)
Also, backup files. The fewer irrelevant results, the more useful a tool is.
To answer your question directly: If you're grepping through source code, you generally do not care about caches, backups, personal workspace configs, node_modules, "compiled" output (JS projects generally "compile" to JS and if you're looking for something in source files, you probably do not care about the bundles you're outputting, just the source), etc. You generally care about your source code, which is the stuff that's not getting added to .gitignore.
Since we're talking about an open source project, try to keep some empathy for developers that are solving their own problems first and not trying to solve everybody's problems. It's a configurable option, anyway.
Hey, fd. I don't use it normally, but it ended up being the easiest and fastest tool for me to delete ~100 million empty files off my disk (a weird situation). It has threading and command execution built in, so I could saturate my cpu pretty easily doing the deletes with `fd -tf --threads=64 --exec r m {}` (I put the space in the rm command on purpose for posting).
I don't use `fd` on the command line because I have very ingrained `find` muscle memory, but it's really made using `projectile-find-file` in Emacs totally usable with the huge monorepos I deal with at work. The same goes for `rg`, I love using it with `consult-ripgrep` in Emacs for searching through mountains of code.
I'm seriously curious, is this the first time this link is being submitted?
Frequently, I try to submit a link, and it shows up as having been submitted. And I'm quite certain a tool as popular as fd has been featured on hn before. So either, somehow this particular link has never been submitted (doubtful), hn allows resubmitting a link after some amount of time, or the link resubmit prevention logic doesn't apply to certain users?
Usually when I can't be bothered to remind myself the syntax for find, my go to these days is `echo **/*pattern*`. Of course, this is mainly just for small searches.
This didn't really sound like something I need until I got to the `{.}` syntax, which solves a problem I was just trying and failing to solve with gnu find ten minutes ago (namely that there seems to be no convenient way to use the basename of the match in an exec statement, eg. bash's `${file##*.}` syntax).
Well this is saving me a ton of time as I'm basically migrating UID ownership of files for NFS shares that dates back to the 90's. According to my rough benchmarks between find and fd, fd is ~3 times faster.
How does it run faster than find? Can I manually implement that speedup using standard unix tools? I need to run find a lot on many machines I don't have access to install anything on.
You can speed up grep by using 'xargs' or 'parallel' because searching tends to be the bottleneck.
But for 'find', the bottleneck tends to be directory traversal itself. It's hard to speed that up outside of the tool. fd's directory traversal is itself parallelized.
The other reason why 'fd' might be faster than a similar 'find' command is that 'fd' respects your gitignore rules automatically and skips hidden files/directories. You could approximate that with 'find' by porting your gitignore rules to 'find' filters. You could also say that this is comparing apples-to-oranges, which is true, but only from the perspective of comparing equivalent workloads. From the perspective of the user experience, it's absolutely a valid comparison.
I don't like find. Nor do I like vi (not vim, vi) and/or maybe others. I don't wish to stamp on someone's parade who's more accomplished than I am. But I think these "new" tools miss the point.
I use vi because I know it exists on every(?) system ever. It's not like I go out of my way seeking vi. I feel the feeling is similar for find. It works. It works well. It works the same on all systems I work on.
Would I go out of my way to install find on my system? Probably not.
They don't miss the point. We're well aware they aren't ubiquitous and that is indeed one of their costs.[1]
If the old tools are working well for you, then keep using them! I used plain grep for well over a decade before writing ripgrep. Hell, sometimes I still use grep for precisely the reason you describe: it is ubiquitous.
Also, not every grep behaves the same. Not even close. Unless you're being paranoid about how you use grep, it's likely you've used some feature that isn't in POSIX and thus isn't portable.
Uniquity and portability aren't "the point." Uniquity is a benefit and portability can be a benefit or a cost, depending on how you look at it.
I should have phrased this as a question, instead of being dismissively declarative.
>If, upon hearing that "ripgrep can replace grep," you actually hear, "ripgrep can be used in every instance grep can be used, in exactly the same way, for the same use cases, with exactly the same bug-for-bug behavior," then no, ripgrep trivially cannot replace grep. Moreover, ripgrep will never replace grep. If, upon hearing that "ripgrep can replace grep," you actually hear, "ripgrep can replace grep in some cases and not in other use cases," then yes, that is indeed true!
Yes, it's a persistent misunderstanding because communication is hard and folks aren't always exactly precise. It is very common to hear from someone, "ripgrep has replaced grep for me." You might even here people state it more objectively, like, "ripgrep is a grep replacement." The problem is that the word "replace" or "replacement" means different things to different people. So that FAQ item was meant to tease those meanings apart.
Fd and ripgrep/rg are the two "new" alternatives I use on a regular basis, and which are just huge improvements to life. Both of these find/search programs respect your .gitignore files, which helps enormously & makes searching my department's entire codebase really fast.
Fd is featured on Julia Evans' recent "New(ish) command line tools"[1]
[1] https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l... https://news.ycombinator.com/item?id=31009313 (760 points, 37d ago, 244 comments)
It's fd, ncdu and sd (sed alternative) for me.
https://github.com/chmln/sd
https://dev.yorhel.nl/ncdu
A while ago I came across this post: https://towardsdatascience.com/awesome-rust-powered-command-...
I’ve also been using bat and exa which are pretty good replacements for cat and ls, respectively.
https://github.com/sharkdp/bat
https://github.com/ogham/exa
scc is an insanely fast alternative to cloc: https://github.com/boyter/scc
nnn is also my go to file tree navigation / file moving tool these days too: https://github.com/jarun/nnn
For counting coding lines I use tokei and I like it: https://github.com/XAMPPRocky/tokei
https://github.com/kamiyaa/joshuto in favor of nnn.
No file previews yet? I'd stick with ranger or lf.
It makes you feel less painful reading man pages with bat being the colorized pager.
https://github.com/sharkdp/bat#man
This snipper in my ~/.profile colorizes man pages for like ten years already:
https://github.com/chmin/sd: "sd uses regex syntax that you already know from JavaScript and Python. Forget about dealing with quirks of sed or awk - get productive immediately."
It would be interesting to test the ~1.5GB of JSON the author uses for the benchmark against sed, but there are no details on how many files nor what those files contain.
When trying something relatively small and simple, sd appears to be slower than sed. It also appears to require more memory. Maybe others will have different results.
Opposite problem as the sd author for me. For system tasks, more familiar with faster sed and awk than with slower Python and Javascript, so I wish that Python and Javascript regex looked more like sed and awk, i.e., BRE and occasionally ERE. Someone in the NetBSD core group once wrote a find(1) alternative that had C-like syntax, similar to how awk uses a C-like syntax. Makes sense because C is the systems language for UNIX. Among other things, most of the system utilities are written in it. If the user knows C then she can read the system source and modify/repair the system where necessary, so it is beneficial to become familiar with it. Is anyone is writing system utility alternatives in Rust that use a Rust-like syntax.https://github.com/chmln/sd Corrected url
tw from AT&T AST has C-like syntax. https://github.com/att/ast/blob/master/src/cmd/tw/tw.c#L182
ncdu is amazing. I foolishly spent way too much time trying to massage du's output into something human-friendly.
sd is my favorite of the newish command line tools. Its super fast and i like the syntax a lot
Agree, I've started replacing my `perl -pe s/.../.../g`s with `sd`. It seems it's actually slightly faster than the equivalent Perl for the same substitutions (which it should be since it does less).
It is somewhat notable that rg and fd differ significantly in that rg is almost perfect superset of grep in terms of features (some might be behind different flags etc), but fd explicitly has narrower featureset than find.
Yeah, this was very intentional. Because this is HN, I'll say some things that greps usually support that ripgrep doesn't:
1) greps support POSIX-compatible regexes, which come in two flavors: BREs and EREs. BREs permit back-references and have different escaping rules that tend to be convenient in some cases. For example, in BREs, '+' is just a literal plus-sign but '\+' is a regex meta character that means "match one or more times." In EREs, the meanings are flipped. POSIX compatible regexes also use "leftmost longest" where as ripgrep uses "leftmost first." For example, 'sam|samwise' will match 'sam' in 'samwise' in "leftmost first," but will match 'samwise' in "leftmost longest."
2) greps have POSIX locale support. ripgrep intentionally just has broad Unicode support and ignores POSIX locales completely.
3) ripgrep doesn't have "equivalence classes." For example, `echo 'pokémon' | grep 'pok[[=e=]]mon'` matches.
4) grep conforms to a standard---POSIX---where as ripgrep doesn't. That means you can (in theory) have multiple distinct implementations that all behave the same. (Although, in practice, this is somewhat rare because some implementations add a lot of extra features and it's not always obvious when you use something that is beyond what POSIX itself strictly supports.)
I think that probably covers it, although this is all off the cuff. I might be forgetting something. I suppose the main other things are some flag incompatibilities. For example, grep has '-h' as short for '--no-filename'. Also, since ripgrep does recursive search by default, there are no -r/-R flags. Instead, -r does replacements and -R is unused. -L is used for following symlinks (like 'find').
> 2) greps have POSIX locale support. ripgrep intentionally just has broad Unicode support and ignores POSIX locales completely.
Does this mean that there's no support for language specific case mappings (e.g. iİ and ıI in Turkic)?
Correct. ripgrep only has Level 1 UTS#18 support: https://unicode.org/reports/tr18/#Simple_Loose_Matches
This document outlines Unicode support more precisely for ripgrep's underlying regex engine: https://github.com/rust-lang/regex/blob/master/UNICODE.md
Thx! Is there a specific reason for the lack of that feature or was this just not implemented yet?
I've added this to the ripgrep Q&A discussion board: https://github.com/BurntSushi/ripgrep/discussions/2221 --- Thanks for the good question!
The specific reason is hard to articulate precisely, but it basically boils down to "difficult to implement." The UTS#18 spec is a tortured document. I think it's better that it exists than not, but if you look at its history, it's undergone quite a bit of evolution. For example, there used to be a "level 3" of UTS#18, but it was retracted: https://unicode.org/reports/tr18/#Tailored_Support
And to be clear, in order to implement the Turkish dotless 'i' stuff correctly, your implementation needs to have that "level 3" support for custom tailoring based on locale. So you could actually elevate your question to the Unicode consortium itself.
I'm not plugged into the Unicode consortium and its decision making process, but based on what I've read and my experience implementing regex engines, the answer to your question is reasonably simple: it is difficult to implement.
ripgrep doesn't even have "level 2" support in its regex engine, nevermind a retracted "level 3" support for custom tailoring. And indeed, most regex engines don't bother with level 2 either. Hell, many don't bother with level 1. The specific reasoning boils down to difficulty in the implementation.
OK OK, so what is this "difficulty"? The issue comes from how regex engines are implemented. And even that is hard to explain because regex engines are themselves split into two major ideas: unbounded backtracking regex engines that typically support oodles of features (think Perl and PCRE) and regex engines based on finite automata. (Hybrids exist too!) I personally don't know so much about the former, but know a lot about the latter. So that's what I'll speak to.
Before the era of Unicode, most things just assumed ASCII and everything was byte oriented and things were glorious. If you wanted to implement a DFA, its alphabet was just consisted of the obvious: 255 bytes. That means your transition table had states as rows and each possible byte value as columns. Depending on how big your state pointers are, even this is quite massive! (Assuming state pointers are the size of an actual pointer, then on x86_64 targets, just 10 states would use 10x255x8=~20KB of memory. Yikes.)
But once Unicode came along, your regex engine really wants to know about codepoints. For example, what does '[^a]' match? Does it match any byte except for 'a'? Well, that would be just horrendous on UTF-8 encoded text, because it might give you a match in the middle of a codepoint. No, '[^a]' wants to match "every codepoint except for 'a'."
So then you think: well, now your alphabet is just the set of all Unicode codepoints. Well, that's huge. What happens to your transition table size? It's intractable, so then you switch to a sparse representation, e.g., using a hashmap to map the current state and the current codepoint to the next state. Well... Owch. A hashmap lookup for every transition when previously it was just some simple arithmetic and a pointer dereference? You're looking at a huge slowdown. Too huge to be practical. So what do you do? Well, you build UTF-8 into your automaton itself. It makes the automaton bigger, but you retain your small alphabet size. Here, I'll show you. The first example is byte oriented while the second is Unicode aware:
This doesn't look like a huge increase in complexity, but that's only because '[^a]' is simple. Try using something like '\w' and you need hundreds of states.But that's just codepoints. UTS#18 level 2 support requires "full" case folding, which includes the possibility of some codepoints mapping to multiple codepoints when doing caseless matching. For example, 'ß' should match 'SS', but the latter is two codepoints, not one. So that is considered part of "full" case folding. "simple" case folding, which is all that is required by UTS#18 level 1, limits itself to caseless matching for codepoints that are 1-to-1. That is, codepoints whose case folding maps to exactly one other codepoint. UTS#18 even talks about this[1], and that specifically, it is difficult for regex engines to support. Hell, it looks like even "full" case folding has been retracted from "level 2" support.[2]
The reason why "full" case folding is difficult is because regex engine designs are oriented around "codepoint" as the logical units on which to match. If "full" case folding were permitted, that would mean, for example, that '(?i)[^a]' would actually be able to match more than one codepoint. This turns out to be exceptionally difficult to implement, at least in finite automata based regex engines.
Now, I don't believe the Turkish dotless-i problem involves multiple codepoints, but it does require custom tailoring. And that means the regex engine would need to be parameterized over a locale. AFAIK, the only regex engines that even attempt this are POSIX and maybe ICU's regex engine. Otherwise, any custom tailoring that's needed is left up to the application.
The bottom line is that custom tailoring and "full" case matching don't tend to matter enough to be worth implementing correctly in most regex engines. Usually the application can work around it if they care enough. For example, the application could replace dotless-i/dotted-I with dotted-i/dotless-I before running a regex query.
The same thing applies for normalization.[3] Regex engines never (I'm not aware of any that do) take Unicode normal forms into account. Instead, the application needs to handle that sort of stuff. So nevermind Turkish special cases, you might not find a 'é' when you search for an 'é':
Unicode is hard. Tooling is littered with footguns. Sometimes you just have to work to find them. The Turkish dotless-i just happens to be a fan favorite example.[1]: https://unicode.org/reports/tr18/#Simple_Loose_Matches
[2]: https://www.unicode.org/reports/tr18/tr18-19.html#Default_Lo...
[3]: https://unicode.org/reports/tr18/#Canonical_Equivalents
Is there a benefit to respecting locale and not just using Unicode?
Probably only if you are on an old legacy system that is using an unusual encoding.
I use Frawk (https://github.com/ezrosent/frawk) a decent amount too! I downloaded it to do some parallel CSV processing and i've just kind of kept it ever since.
I had someone ask me (a self described grep monkey) how I navigate grepping very long lines (minified js for example) to which I replied ‘lol I just ignore them’. I’d love ‘only select 200 chars if longer than 200 chats, but to my knowledge there’s no easy way to do this with grep. I’d love to hear suggestions on how people navigate this
My go-to is using -o and pre/appending .{100} to the pattern to capture however much context I need
Pipe to cut -c 1-200?
ripgrep has the -M option that will help here.
I tend to use `git grep` for that. Is ripgrep better in some way?
It works well outside of git repos automatically. And can search across multiple git repos while respecting each repo's respective gitignores automatically. ripgrep also tends to be faster, although the absolute difference tends to be lower with 'git grep' than a simple 'grep -r', since 'git grep' does at least use parallelism.
There are other reasons to prefer one over the other, but are somewhat more minor.
Here's one benchmark that shows a fairly substantial difference between ripgrep and git-grep and ugrep:
The GNU grep comparison is somewhat unfair because it's searching a whole lot more than the other 3 tools. (Although notice that there are no additional matches outside of binary files.) But it's a good baseline and also demonstrates the experience that a lot of folks have: most just tend to compare a "smarter" grep with the "obvious" grep invocation and see that it's an order of magnitude faster.It's also interesting that all tools agree on match counts except for ugrep ang ag. ag at least doesn't have any kind of Unicode support, so that probably explains that. (Don't have time to track down the discrepancy with ugrep to see who is to blame.)
And if you do want to search literally everything, ripgrep can do that too. Just add '-uuu':
And it still does it better than GNU grep. And yes, this is with Unicode support enabled. If you disable it, you get fewer matches and the search time improves. (GNU grep gets faster too.) Now, to be fair, '\w{42}' is a tricky regex. Searching something like a literal brings all tools down into a range where they are quite comparable: I realize this is beyond the scope of what you asked, but eh, I had fun.What version of time are you using? I don't recognize the output
The zsh builtin with a custom TIMEFMT: https://github.com/BurntSushi/dotfiles/blob/965383e6eeb0bad4...
How fast is magic wormhole? In my experience most of the new(er) file transfer apps based on WebRTC are just barely faster than Bluetooth and are unable to saturate the bandwidth. I am not sure if the bottleneck is in the WebRTC stack or whether there is something fundamentally wrong about the protocol itself.
All magic wormhole is doing is agreeing a key, and then moving the encrypted data over TCP between sender and recipient.
So for a non-trivial file this is in principle subject to the same performance considerations as any other file transfer over TCP.
For a very tiny file, you'll be dominated by the overhead of the setup.
Why use ripgrep over silver searcher?
This could have changed in the last few years, but I think rg does tend to (sometimes significantly) outperform ag, see the author's benchmarks [0].
0: https://blog.burntsushi.net/ripgrep/#code-search-benchmarks
>much better single file performance, better large-repo performance and real Unicode support that doesn't slow way down
By ripgrep's dev (https://news.ycombinator.com/item?id=12567484).
The Silver Searcher appears to be if not dead then certainly resting.
fzf too
shout out to 'ack' as well
If you're still using ripgrep, check out ugrep.
Very fast, TUI, fuzzing matching, and actively maintained.
ripgrep is not maintained anymore? that was fast...
I'm the maintainer of ripgrep and it is actively maintained.
Well that was a quick rollercoaster of emotions. Thanks for all that you do.
ripgrep isn't maintained now? That was fast :)
Or is it just done :)
`rg` is maintained. Last commit was 9 days ago by the creator himself.
The singular habit I picked up way back in the day was to simply cope with what was available.
There's all sorts of utilities and such. Emacs was a grand example at the time as well. Lots of better mousetraps.
But when you bounce around to a lot of different machines, machines not necessarily in your control, "lowest common denominator" really starts to rear its ugly head.
That vast majority of my command line concoctions are burned into muscle memory.
Today, I think the base line install of modern *nixes are higher than they were back in the day, but the maxim still applies of working with what they have out of the box.
> The singular habit I picked up way back in the day was to simply cope with what was available.
Nothing wrong with that. There are other ends of the spectrum of "make the things I do often as easy as I can", too. Both work.
It reminds me of my father; when he got in a car that wasn't his, he would NOT CHANGE ANYTHING. You couldn't tell he was in it. Safety issues aside, there's an argument to be made to make it as comfortable for you as you can; seat, mirrors, environment, etc. to minimize any distractions.
I see this too in my (software development) communities; some people like to tailor their development experience to the n'th degree to extract as much personal enjoyment/optimization/etc. out of it as they can. Others like to use whatever they're given and be happy with that. Both work.
Myself, I type for a living so I like to use keyboards I like. I bring my own with me in my "go bag" for when I'm out so I don't have to use the (IMO!) crappy Apple Laptop KB's. I /can/ use it, I just choose not to. Other people either like them, or have learned they don't care enough. All good.
Counterpoint, deciding not to put up with old things that are a pain in the ass is part of unseating the existing momentum so we can finally move on to better things.
Knowing how to do things the annoying way doesn't mean that has to be the preferred way. Being open to retooling is part of staying relevant
What I would like to see in that case is a "next gen utils", a bundling that makes it likely to find a number of these tools together in future servers.
Really the "right" way to go about it would be to employ an existing package manager (there's enough already, we don't need another) and some magic glue on top that makes it easy.
For example, you have your configuration of packages, in the ephemeral cloud somewhere, and you do the really dangerous no good thing of piping through bash with some kind of uuid that's assigned to your account, something like (totally made up url)
And it sniffs the architecture, version of binaries to install, which ones are there, and then puts it into an install directory configured by you.This is like 97% existing things with a little glue and interface polish so you can easily bring in an environment.
There's certainly other ways but the idea remains the same
Yep, I've started keeping a github repo of install scripts / dotfiles / etc that basically amounts to the workflow you described.
YES. That isn't a completely new shell. I keep trying the fishes and z-shells of the world and I keep coming back to "my own hacked up bash" because of muscle memory on random edge case things.
That's why I stuck with vi and sh back in the day: I knew they were on every machine I might telnet to (this was before ssh, sigh).
On machines I controlled, I mostly used ksh, but it wasn't available on all machines; I cannot remember if it was the SunOS boxes or the older HP-UX or the Apollos, but there were a few. (csh? Go away. No, just go.)
Nowadays, vim and bash are everywhere I need them to be, even if I have to kludge around some version differences.
My only real gripe about find is the awkwardness of pruning multiple hierarchies. After you've written
a few times, and have returned to the previous command to add yet another tuple, it gets a little old.But it works. Everywhere.
(* that I need it)
This is fine and all, but there are also subtle differences in standard CLI tools depending on the implementation. I'm used to GNU stdutils, and butt heads with the macOS and BusyBox implementations.
Yes, to get by with what is available is a useful trait. No matter how good these tools are, I will often arrive at a prompt where they are unavailable.
Why don't you make them available in ~/bin and have a better shell life?
This.
I have exa and rg and fd all installed but unlearning the find and grep muscle memory is hard.
Occasionally I give the newer stuff a go and then end up stumbling over syntax differences and end up just going back to what I know.
If you ever give ripgrep a go, stumble over something and are inclined to: please post a Discussion question[1]. "beginner" or "stupid" questions are welcome. If you can show me what you know works with grep, for example, and are curious about an equivalent rg command, that could be a good question. I might be able to give you some "fundamental" answers to it that let you reason about the tools more from first principles, but on your terms.
[1] - https://github.com/BurntSushi/ripgrep/discussions
I aliased grep and find to their newer alternatives. Sure, the syntax will be off from time to time but due to muscle memory I couldn’t relearn the new tools otherwise.
Can't you just install Homebrew on your home dir and stop being locked up with the ancient environment?
Or you could just create a git repo with those executables and pull them to your machines?
I think about a month after I learned enough Vim to be dangerous RHEL (8, I think) started shipping nano as the default editor. Ah well, now I can scroll with the home row on my local box.
I found out recently that I can't even count on
to be portable. Got a batch of machines that don't support '-h'.Agree. For most situations find and then grep is good enough.
I want to love fd - I'm a big believer in the idea that CLIs don't have to be scary, intimidating things (see normals using Slack with /commands and keyboard shortcuts), and find has a gigantic hairball of a UI.
The thing is, though, I know find well enough to not notice the terrible UI that much, and I know I can rely on it being everywhere. With fd that isn't true.
So it's hard for me to justify making the move.
Same thing happens with things like the fish and oil shells - I have little doubt their UX is better than Bash's, but Bash is pretty ubiquitous.
Emacs has this problem too, as an Emacs user. The UX is completely alien by current standards, but if you update the defaults you'll break a lot of people's existing configs.
How do you get around backwards compatibility / universality UX roadblocks like this?
It just doesn't happen that often anymore that I need to ssh into a system.
And my own systems have automatically synced dotfiles, making it mostly a non issue. (I'm using Syncthing for that)
When writing scripts I usually fallback to traditional shells/commands for compatibilty. Unless I'm really sure I will be the only user.
I was more thinking of scripts, yeah, like you describe.
Where I get hung up is, if I need to keep the traditional syntaxes in my head for scripting, why bother storing another one in my head for interactive use?
...that said, I do use ag and rg for other interactive tools, like cross-project search in Emacs.
I know what you mean. I use fd and rg on my machine, but for scripts, Dockerfiles etc I tend to use find and grep, just because this is the „lingua franca“ of Unix/Linux.
Same, and I'm the author of ripgrep! Unless the script is one that I wrote for myself in ~/bin, I use grep and find and standard tooling in shell scripts.
The only exception is if there is a specific need for something ripgrep does. Usually it's for speed, but speed isn't always needed.
I think the solution that NixOS uses would work for Emacs too. Just define a single variable that declares which default values to use (in NixOS, it's system.stateVersion).
Then packages (including internal packages) can update their defaults based on the version declared there. Basically a protocol version field but for your Emacs configuration.
Distros probably need a different strategy for improving core utils, though.
Fish is awesome, if you hate Bash arcane syntax, like me. It improves my script productivity by 100%.
One rule of thumb though: use it only for personal use, and stick with it to see if it lives long enough. If you're working with the team, just use Bash.
Old grep is still muscle memory for me, and that’s what I use in scripts.
But the newer grep’s are so much faster! I scoffed initially but after a couple of uses I was hooked. I try to install these new tools in my personal ~/bin on systems I spend much time using.
It feels like we're in a third wave of innovation in Unix CLI tools. The first was from BSD in the late 70s/early 80s which considerably improved the original Unix utilities, then the GNU tools rewrite in the late 80s, then there were the dark ages of System V. I give ack (and Andy) credit for starting this latest wave around 2005 but it's really taken off lately with tools being rewritten in Rust and challenging the old status quo.
The only way for 3rd wave to work is if multiple distros agree to adopt it. Since these tools don't even agree on an interface, IMHO it wouldn't be much different than what we have. I also don't like the fact that some tools native skip things like what's in ".gitignore". I don't want a tool that does that by default. If there was a consortium to standardize a new *nix CLI, then maybe it could get some traction.
There was a consortium, and it did standardise Unix, and that's when everything stopped moving in the 90s. Standards are compromises and so all the commercial Unixes had to implement the lowest common denominator. Thankfully GNU didn't care, and the BSD tools were already better.
What was the consortium? I'd like to read up on it. LCD isn't always a bad thing, as it tends to weed out the niche one-offs.
Pick one... during the Unix Wars [0] there was X/Open [1] vs Unix International (aka AT&T) [2] and the Open Software Foundation [3] (which eventually merged with X/Open to form the Open Software Group). And then the IEEE got involved with POSIX which ultimately "won" as the lowest of the LCDs. [4]
[0] https://en.wikipedia.org/wiki/Unix_wars [1] https://en.wikipedia.org/wiki/X/Open [2] https://en.wikipedia.org/wiki/Unix_International [3] https://en.wikipedia.org/wiki/Open_Software_Foundation [4] https://en.wikipedia.org/wiki/POSIX
Thanks! Once I moved from University to corporate america, I never looked back at the history of *nix standardization. And looking at these links, I feel awfully naive suggesting another consortium.
Well, like zsh, and many other improvements on the shell, and gawk and other tooling that doesn't match other awk engines and so forth... you end up having two parallel realities. One for scripting, where you use the bare minimus that is acceptable to run on any server and it's guaranteed to be there, and then your user env where you have all your fun tools.
The cool part is network transparency and forwarding environments and other things that plan9 plays with so that you can work locally, remotely.
I know it's asking for the world, but some way to do better "built in modularity" would be great. Like "whatever new shell" plus a standardized "plugin system."
I love fd, but somehow I always get tripped up on this:
Ah, same thing happens to me all the time. You are right, second one seems more intuitive and many tools old and new use this pattern.
That seems unfortunate, since my shell's autocomplete puts that trailing slash in there by default.
The trailing slash is not the issue here. With `fd` the first argument is the pattern to search for, not the starting point like in `find`.
For example to find fonts.conf below /etc, with fd you would do:
And with find: In other words with find the first argument is always the starting point. And leaving it out implies the current directory as the starting point.There are --search-path and --base-path options, so if you alias say fd='fd --search-path' you can then have the required first argument be the path to search. Personally, I find changing to the directory I want to search in less annoying than typing out the directory to search (I know the options exist from script use).
If it wasn't for find's muscle memory, fd has it right as you'd usually list what you want to do in the cli first and then list arbitrary number of targets in the end.
I like the contemporary alternative to the classics. They make a lot of thing so much easier.
I have a little mental block, though. It's related to the realities of the stuff I work on. Since I find myself logged into other people systems, keeping the old, standard tools hot in my head does really take some of the load off. It's a pretty common refrain, but it's real and practical when you've got embedded systems, bsds, linuxes, macs, etc. Even the difference between gnu and mac is clunky when I don't practice enough.
For the same reason, with the notable exception of git, I use practically no aliases.
If I could invent a product, maybe it would be one that enables effectively "forwarding" CLIs to a remote host shell.
I'm with you. I find that I feel like I know Linux like the back of my hands because I can fluidly interface with stock tools with a breeze. These new tools are great but I just don't see them widely spread across many remote systems that I manage. Just managing those packages across a fleet sounds like a pain in the ass.
These new tools are great but I just don't see them widely spread across many remote systems
I had smart, experience people tell me not to waste my time using the GNU tools for this exact reason back in the day
Nothing like logging into a freshly installed Solaris system and having to configure it using Bourne shell, which didn't have job control or history. At least it had vi. Usually the first thing you would do is get enough networking going to download bash and the GNU tools. But there were always some old timers around who wanted to haze the youngsters by forcing you to do everything with "native" tools.
> Just managing those packages across a fleet sounds like a pain in the ass.
Use Homebrew?
I spend a lot of time remoting into fresh *nix systems, so I also almost don't have any aliases, with one notable exception : ll (aliased to ls -lah).
It's just so engrained into my muscle memory that I do it without thinking about it most of the time.
And the workaround I found for it is adding a macro on my keyboard (through QMK but can be done with anything) that just types out 'alias ll="ls -lah"\n'.
Yeah, same here. I am doing a good amount of ops/SRE stuff these days while supporting my services and find myself ssh'ing into:
- very locked down bastions - hosts through a secure remote access VM thing that makes file transfer difficult - random docker containers in EKS (often through both of the above)
Getting good at the basic tools is just unavoidable. I find myself manually typing `alias k = kubectl` a lot though :p
I find `find` so difficult to use that I usually do `find . | grep pattern` when I need to search for a file name.
Even better, try fzf instead: do `find . | fzf`
Isn't that the equivalent of just `fzf`?
It depends on `FZF_DEFAULT_COMMAND`, but yes, fair point
Same. I have an ’alias fgr=”find|grep”. Stupid but gets the job done.
Oh, I thought I was kind of dumb for doing that :) happy to see that it's normal haha. I always find myself trying to remember: is it one dash? is it "name" or "iname" or whatnot.. pipe grep is way easier.
What do you find difficult about using `find . -regex pattern`?
I just use "grep -R", sometimes with "-i" as well. Works for me 99% of the time.
It amazes me how long it took for alternatives like fd to emerge
The old ones must have caused years of wasted time
Maybe I’m just old fashioned but all these new command line utilities strike me as solutions in search of a problem.
Standard ‘find’ works great. It finds files. It can filter by any criteria I have ever had to look for and the syntax seems very intuitive to me (maybe I am just used to it). It is flexible and powerful.
I’d love to be told I’m wrong, because I feel like I’m missing something.
The "new CLI utilities" aren't solving new problems. They are solving old problems with a different user experience, driven by changes in how folks do development (particularly the scales).
Notice that I said "different" UX and not "better" UX. Reasonable people can disagree about whether the newer tools have better UX as an objective fact. Many folks very rightly find it very surprising that these tools will skip over things automatically by default. What is true, however, is that there are a lot of people who do see the UX of these tools as better for them.
As the author of ripgrep, I hear the same thing over and over again: ripgrep replaced their ~/bin grep wrapper with a bunch of --exclude rules that filtered out files they didn't want to search. Because if you don't do that, a simple 'grep -r' can take a really fucking long time to run if you're working on a big project. And guess what: a lot of programmers these days work on big projects. (See previous comment about changes in scale.) So you don't really have a choice: you either write a wrapper around grep so that it doesn't take forever, or you use a "smarter" tool that utilizes existing information (gitignore) to effectively do that for you. That smarter tool typically comes with other improvements, because your simple "grep wrapper" probably doesn't use all of the cores on your machine. It could, but it probably doesn't. So when you switch over, it's like fucking magic: it does what you wanted and it does it better than your wrapper. Boom. Throw away that wrapper and you're good to go. That's what I did anyway. (I had several grep wrappers before I wrote ripgrep.)
Every time these tools are discussed here, people say the same thing: "I don't see the point." It's a failure of imagination to look beyond your own use cases. If all you ever work on are smaller projects, then you're never going to care about the perf difference between ripgrep and grep. ripgrep has other improvements/changes, but nearly all of them are more niche than the UX changes and the perf improvements.
Definitely agree. I really like that ripgrep is fast, but I mainly use it for its better UX for what I do every day: search code. If ripgrep wasn't any faster than grep, but was still recursive by default and ignored .gitignore et al, it'd still be worth using for me. (In fact, I used to use "ack", which is basically ripgrep but slow. :-)
'find' is slow to the point of being useless for me. It can never find the file I'm looking for before I give up waiting for it to finish running. So I'm excited if fd can provide the same functionality but run much faster.
Are you inadvertently hitting a big .git/ with every find?
Call me naive/dumb/whatever, but I can't stand find's interface. Even things as simple as -depth rather than --depth bother me
One thing to remember is that these fun utils won't exist on production servers. You also don't want them there for obvious reasons. I find it better to use the most commonly available set of unix tools and I end up being far more effective due to that.
What is the obvious reason? Security? The guy provides sources and it took me 20 minutes to see what's doing in there. I'd definitely put this on production server, after testing intensively of course.
How frequently are you and your team willing to spend that 20 minutes? Are you confident you evaluated it precisely? Are you always going to install the latest release? What do you do as the codebase grows and that 20 minutes turns into an hour?
Did you know there are businesses out there that are still using programs made in DOS era (especially those made in FoxPro) and work perfectly? Or did you took a look at ATM's and seen that majority still have that WindowsXP look'n'feel? Not everybody needs the latest and greatest, you know?
In that spirit, let's see your questions:
1-Once in a life time; 2-Yes;3-No;4-Nothing, see 1st answer.
Hmm, I already have a shell alias that does 90% of this. Doesn't parse .gitignore, but it's not a big problem for me. If it was I'd do `make clean` in the project.
This is always installed and ready to go on any box I have my dotfiles.
I suppose that is why it is hard for these perfectly-good improvements have a hard time getting traction. Because the older stuff is still so flexible.
That are a lot of dependencies for such a simple tool. I'm a Rust user myself, but some of those dependencies really should be part of a good standard lib. Actually, the NPM like ecosystem is my biggest pain point with Rust.
[dependencies]
ansi_term = "0.12"
atty = "0.2"
ignore = "0.4.3"
num_cpus = "1.13"
regex = "1.5.5"
regex-syntax = "0.6"
ctrlc = "3.2"
humantime = "2.1"
lscolors = "0.9"
globset = "0.4"
anyhow = "1.0"
dirs-next = "2.0"
normpath = "0.3.2"
chrono = "0.4"
once_cell = "1.10.0"
[dependencies.clap]
version = "3.1"
features = ["suggestions", "color", "wrap_help", "cargo", "unstable-grouped"]
I just want my entire file system stored in Sqlite so I can query it myself
Oh hey look what I just found https://github.com/narumatt/sqlitefs
>> "mount a sqlite database file as a normal filesystem"
I'm looking to do the opposite. Given a location, recurse it, stat() each file and create the database.
If you happen to be using macOS, there's mdfind, which uses the spotlight database which is always kept up to date, unlike locate/updatedb, where updatedb is expensive to run, even if you've run it recently.
I have yet to find a good solution for linux CLI... something that uses an internal database that is kept up to date with all directory structure changes.
Maybe someone else has seen something cool for this? :D
It's still not clear to me whether you want the files inside the database or just the metadata. The stat part suggests the latter?
I guess I can see that, but now you have cache invalidation (someone else linked to Spotlight, which does this as a background process). SQLite files can be larger than any physical medium you can purchase so why not go the distance?
You could also make it as a Sqlite virtual table so it's dynamic.
Also file data is available in osquery: https://osquery.io/schema/5.2.3/#file
updatedb/locate?
updatedb is expensive to run and is usually run by a cron job, whereas something like mdfind on macos uses spotlight, which is kept up to date with all filesystem changes (or at least reasonably fast).
Anybody know of something like that for linux?
I'm shocked so many people forget this
https://plocate.sesse.net/ is even better
Sounds like macOS would be right up your alley then.
I have MacOS, where is the sqlite file?
/private/var/db/Spotlight-V100/...
Is the parent post referring to `mdfind` perhaps?
Believe so.
https://github.com/kashav/fsql
I use find all the time, but it is such a strange beast - it's as if there were a meeting among all the standard Unix utilities on look and feel and find missed the memo. But it's ubiquitous and I'm too old to change horses now anyway.
Why would the tool want to ignore patterns in a .gitignore? It isn't a git tool...
I don't like it when non-git tools do that by default. I'm sure it's nice for some people, so make it an option that could be enabled. But to have that behavior by default feels far too opinionated for me.
People generally develop tools for their own needs. I'm not saying "hey, if you want one that uses the defaults you believe are best, create your own", but just keep some empathy for open source developers. :)
Generated files. If you're searching for every place something shows up so you can change it, it's annoying to have to filter out the stuff that'll get updated automatically (and will mess up timestamps if you update manually.)
Also, backup files. The fewer irrelevant results, the more useful a tool is.
I'm not asking why someone would find it be useful.
Yes you are. We are talking about a tool, so you are asking why someone would "find it be useful".
The tool would want to use .gitignore files because it's useful.
Also, it's far from the only non-VCS tool that uses VCS ignore files.
Again, I'm not asking why someone would find it useful.
> Why would the tool want to ignore patterns in a .gitignore?
> The tool would want to use .gitignore files because it's useful.
Sometimes a question isn't a question.
To answer your question directly: If you're grepping through source code, you generally do not care about caches, backups, personal workspace configs, node_modules, "compiled" output (JS projects generally "compile" to JS and if you're looking for something in source files, you probably do not care about the bundles you're outputting, just the source), etc. You generally care about your source code, which is the stuff that's not getting added to .gitignore.
Since we're talking about an open source project, try to keep some empathy for developers that are solving their own problems first and not trying to solve everybody's problems. It's a configurable option, anyway.
Exactly. And if you wanted a Git aware tool, you could just run 'git ls-files'.
Hey, fd. I don't use it normally, but it ended up being the easiest and fastest tool for me to delete ~100 million empty files off my disk (a weird situation). It has threading and command execution built in, so I could saturate my cpu pretty easily doing the deletes with `fd -tf --threads=64 --exec r m {}` (I put the space in the rm command on purpose for posting).
find with its built-in -delete action would have avoided executing the external rm command millions of times.
well, somehow I missed that when looking for options. I just tried it out, the -delete option is way faster than what I posted before, TIL, thanks.
I don't use `fd` on the command line because I have very ingrained `find` muscle memory, but it's really made using `projectile-find-file` in Emacs totally usable with the huge monorepos I deal with at work. The same goes for `rg`, I love using it with `consult-ripgrep` in Emacs for searching through mountains of code.
Yup. Integrations with vim are really helpful here. But on the raw CLI it's tough to start unlearning the muscle memory
> The command name is 50% shorter* than find :-).
I love this but if enough new tools keep doing this I might have to change some of my bash aliases :(
I use fd and rg a lot, integrated them with my scripts and even have some of them bound to keys.
Insanely good and fast programs. Zero regrets.
For a fuzzy finder I recently replaced fzf with peco. I like it better, it's very customizable.
I'm seriously curious, is this the first time this link is being submitted?
Frequently, I try to submit a link, and it shows up as having been submitted. And I'm quite certain a tool as popular as fd has been featured on hn before. So either, somehow this particular link has never been submitted (doubtful), hn allows resubmitting a link after some amount of time, or the link resubmit prevention logic doesn't apply to certain users?
Usually when I can't be bothered to remind myself the syntax for find, my go to these days is `echo **/*pattern*`. Of course, this is mainly just for small searches.
I would love a `find` with reverse polish notation, also known as postfix notation. Something like:
or, for more complex: I have some little personal CRUD apps and this sort of postfix notation works very well for them.I could write something like this but haven't gotten motivated to do it though.
Windows test comparison on my machine - for finding all .jpg in my D: drive
1 - classic command prompt: "D:\>dir /s /b *.jpg > 2.txt" - time 5 seconds (4581 files)
2 - this little gizmo: "D:\>fd -e jpg > 1.txt" - time 1 second (same 4581 files)
Conclusion: I have a new tool dropped in my System32 folder from now on. Thank you David Peter
This didn't really sound like something I need until I got to the `{.}` syntax, which solves a problem I was just trying and failing to solve with gnu find ten minutes ago (namely that there seems to be no convenient way to use the basename of the match in an exec statement, eg. bash's `${file##*.}` syntax).
Well this is saving me a ton of time as I'm basically migrating UID ownership of files for NFS shares that dates back to the 90's. According to my rough benchmarks between find and fd, fd is ~3 times faster.
See this for a collection of alternatives for a modern unix commands. https://github.com/ibraheemdev/modern-unix
Haven't there been unresolved security issues with chrono, a crate this one depends on?
Chrono hasn't been updated for almost 2 years. Is the issue resolved or is there a security risk in using fd?
How does it run faster than find? Can I manually implement that speedup using standard unix tools? I need to run find a lot on many machines I don't have access to install anything on.
It's faster than find in that I don't need to read the manpage every time I use it.
You can speed up grep by using 'xargs' or 'parallel' because searching tends to be the bottleneck.
But for 'find', the bottleneck tends to be directory traversal itself. It's hard to speed that up outside of the tool. fd's directory traversal is itself parallelized.
The other reason why 'fd' might be faster than a similar 'find' command is that 'fd' respects your gitignore rules automatically and skips hidden files/directories. You could approximate that with 'find' by porting your gitignore rules to 'find' filters. You could also say that this is comparing apples-to-oranges, which is true, but only from the perspective of comparing equivalent workloads. From the perspective of the user experience, it's absolutely a valid comparison.
Discussed in 2017: https://news.ycombinator.com/item?id=15429390 (215 comments)
I want a user friendly alternative to find's companion xargs.
Would it make sense to create a utils package with all these new rust based utilities? Something like "rustutils" with a resemblance to "coreutils".
There are some Unix tools I never get around to memorizing the syntax of, and always end up searching "how to"s. find is definitively one of them.
Find is a powerful command but one for which it is hard to find good examples beyond the basics.
So I'm glad for these new kinds of CLI tools.
I don't like find. Nor do I like vi (not vim, vi) and/or maybe others. I don't wish to stamp on someone's parade who's more accomplished than I am. But I think these "new" tools miss the point.
I use vi because I know it exists on every(?) system ever. It's not like I go out of my way seeking vi. I feel the feeling is similar for find. It works. It works well. It works the same on all systems I work on.
Would I go out of my way to install find on my system? Probably not.
They don't miss the point. We're well aware they aren't ubiquitous and that is indeed one of their costs.[1]
If the old tools are working well for you, then keep using them! I used plain grep for well over a decade before writing ripgrep. Hell, sometimes I still use grep for precisely the reason you describe: it is ubiquitous.
Also, not every grep behaves the same. Not even close. Unless you're being paranoid about how you use grep, it's likely you've used some feature that isn't in POSIX and thus isn't portable.
Uniquity and portability aren't "the point." Uniquity is a benefit and portability can be a benefit or a cost, depending on how you look at it.
[1] - https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...
I should have phrased this as a question, instead of being dismissively declarative.
>If, upon hearing that "ripgrep can replace grep," you actually hear, "ripgrep can be used in every instance grep can be used, in exactly the same way, for the same use cases, with exactly the same bug-for-bug behavior," then no, ripgrep trivially cannot replace grep. Moreover, ripgrep will never replace grep. If, upon hearing that "ripgrep can replace grep," you actually hear, "ripgrep can replace grep in some cases and not in other use cases," then yes, that is indeed true!
I think this statement says it all.
Yes, it's a persistent misunderstanding because communication is hard and folks aren't always exactly precise. It is very common to hear from someone, "ripgrep has replaced grep for me." You might even here people state it more objectively, like, "ripgrep is a grep replacement." The problem is that the word "replace" or "replacement" means different things to different people. So that FAQ item was meant to tease those meanings apart.
Does fd solve the annoying “the order of the command line arguments matters a ton” approach that find uses?
Oh, nice. Currently I just have myshell alias `fn` to `find -name $argv` but this looks cool.
And another one, because I've read ncdu so much. dua is very nice as well.
Because no one mentioned it. procs. I love it as ps/top replacement.
find is one of those tools that has me back at square one every time I want to do something non-trivial. Looking forward to giving this a shot.
fatal flaw: written in ruby
It is written in Rust and not Ruby.
Oh ok that’s much better