rektide 2 years ago

Fd and ripgrep/rg are the two "new" alternatives I use on a regular basis, and which are just huge improvements to life. Both of these find/search programs respect your .gitignore files, which helps enormously & makes searching my department's entire codebase really fast.

Fd is featured on Julia Evans' recent "New(ish) command line tools"[1]

[1] https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l... https://news.ycombinator.com/item?id=31009313 (760 points, 37d ago, 244 comments)

  • girishso 2 years ago

    It's fd, ncdu and sd (sed alternative) for me.

    https://github.com/chmln/sd

    https://dev.yorhel.nl/ncdu

    • plandis 2 years ago

      A while ago I came across this post: https://towardsdatascience.com/awesome-rust-powered-command-...

      I’ve also been using bat and exa which are pretty good replacements for cat and ls, respectively.

      https://github.com/sharkdp/bat

      https://github.com/ogham/exa

    • 1vuio0pswjnm7 2 years ago

      https://github.com/chmin/sd: "sd uses regex syntax that you already know from JavaScript and Python. Forget about dealing with quirks of sed or awk - get productive immediately."

      It would be interesting to test the ~1.5GB of JSON the author uses for the benchmark against sed, but there are no details on how many files nor what those files contain.

      When trying something relatively small and simple, sd appears to be slower than sed. It also appears to require more memory. Maybe others will have different results.

         sh # using dash not bash
         echo j > 1
         time sed s/j/k/ 1
         time -p sed s/j/k/ 1
         time sd j k 1
         time -p sd j k 1
      
      Opposite problem as the sd author for me. For system tasks, more familiar with faster sed and awk than with slower Python and Javascript, so I wish that Python and Javascript regex looked more like sed and awk, i.e., BRE and occasionally ERE. Someone in the NetBSD core group once wrote a find(1) alternative that had C-like syntax, similar to how awk uses a C-like syntax. Makes sense because C is the systems language for UNIX. Among other things, most of the system utilities are written in it. If the user knows C then she can read the system source and modify/repair the system where necessary, so it is beneficial to become familiar with it. Is anyone is writing system utility alternatives in Rust that use a Rust-like syntax.
    • WalterGR 2 years ago

      ncdu is amazing. I foolishly spent way too much time trying to massage du's output into something human-friendly.

    • pantsforbirds 2 years ago

      sd is my favorite of the newish command line tools. Its super fast and i like the syntax a lot

      • kbd 2 years ago

        Agree, I've started replacing my `perl -pe s/.../.../g`s with `sd`. It seems it's actually slightly faster than the equivalent Perl for the same substitutions (which it should be since it does less).

  • zokier 2 years ago

    It is somewhat notable that rg and fd differ significantly in that rg is almost perfect superset of grep in terms of features (some might be behind different flags etc), but fd explicitly has narrower featureset than find.

    • burntsushi 2 years ago

      Yeah, this was very intentional. Because this is HN, I'll say some things that greps usually support that ripgrep doesn't:

      1) greps support POSIX-compatible regexes, which come in two flavors: BREs and EREs. BREs permit back-references and have different escaping rules that tend to be convenient in some cases. For example, in BREs, '+' is just a literal plus-sign but '\+' is a regex meta character that means "match one or more times." In EREs, the meanings are flipped. POSIX compatible regexes also use "leftmost longest" where as ripgrep uses "leftmost first." For example, 'sam|samwise' will match 'sam' in 'samwise' in "leftmost first," but will match 'samwise' in "leftmost longest."

      2) greps have POSIX locale support. ripgrep intentionally just has broad Unicode support and ignores POSIX locales completely.

      3) ripgrep doesn't have "equivalence classes." For example, `echo 'pokémon' | grep 'pok[[=e=]]mon'` matches.

      4) grep conforms to a standard---POSIX---where as ripgrep doesn't. That means you can (in theory) have multiple distinct implementations that all behave the same. (Although, in practice, this is somewhat rare because some implementations add a lot of extra features and it's not always obvious when you use something that is beyond what POSIX itself strictly supports.)

      I think that probably covers it, although this is all off the cuff. I might be forgetting something. I suppose the main other things are some flag incompatibilities. For example, grep has '-h' as short for '--no-filename'. Also, since ripgrep does recursive search by default, there are no -r/-R flags. Instead, -r does replacements and -R is unused. -L is used for following symlinks (like 'find').

      • wander_homer 2 years ago

        > 2) greps have POSIX locale support. ripgrep intentionally just has broad Unicode support and ignores POSIX locales completely.

        Does this mean that there's no support for language specific case mappings (e.g. iİ and ıI in Turkic)?

        • burntsushi 2 years ago

          Correct. ripgrep only has Level 1 UTS#18 support: https://unicode.org/reports/tr18/#Simple_Loose_Matches

          This document outlines Unicode support more precisely for ripgrep's underlying regex engine: https://github.com/rust-lang/regex/blob/master/UNICODE.md

          • wander_homer 2 years ago

            Thx! Is there a specific reason for the lack of that feature or was this just not implemented yet?

            • burntsushi 2 years ago

              I've added this to the ripgrep Q&A discussion board: https://github.com/BurntSushi/ripgrep/discussions/2221 --- Thanks for the good question!

              The specific reason is hard to articulate precisely, but it basically boils down to "difficult to implement." The UTS#18 spec is a tortured document. I think it's better that it exists than not, but if you look at its history, it's undergone quite a bit of evolution. For example, there used to be a "level 3" of UTS#18, but it was retracted: https://unicode.org/reports/tr18/#Tailored_Support

              And to be clear, in order to implement the Turkish dotless 'i' stuff correctly, your implementation needs to have that "level 3" support for custom tailoring based on locale. So you could actually elevate your question to the Unicode consortium itself.

              I'm not plugged into the Unicode consortium and its decision making process, but based on what I've read and my experience implementing regex engines, the answer to your question is reasonably simple: it is difficult to implement.

              ripgrep doesn't even have "level 2" support in its regex engine, nevermind a retracted "level 3" support for custom tailoring. And indeed, most regex engines don't bother with level 2 either. Hell, many don't bother with level 1. The specific reasoning boils down to difficulty in the implementation.

              OK OK, so what is this "difficulty"? The issue comes from how regex engines are implemented. And even that is hard to explain because regex engines are themselves split into two major ideas: unbounded backtracking regex engines that typically support oodles of features (think Perl and PCRE) and regex engines based on finite automata. (Hybrids exist too!) I personally don't know so much about the former, but know a lot about the latter. So that's what I'll speak to.

              Before the era of Unicode, most things just assumed ASCII and everything was byte oriented and things were glorious. If you wanted to implement a DFA, its alphabet was just consisted of the obvious: 255 bytes. That means your transition table had states as rows and each possible byte value as columns. Depending on how big your state pointers are, even this is quite massive! (Assuming state pointers are the size of an actual pointer, then on x86_64 targets, just 10 states would use 10x255x8=~20KB of memory. Yikes.)

              But once Unicode came along, your regex engine really wants to know about codepoints. For example, what does '[^a]' match? Does it match any byte except for 'a'? Well, that would be just horrendous on UTF-8 encoded text, because it might give you a match in the middle of a codepoint. No, '[^a]' wants to match "every codepoint except for 'a'."

              So then you think: well, now your alphabet is just the set of all Unicode codepoints. Well, that's huge. What happens to your transition table size? It's intractable, so then you switch to a sparse representation, e.g., using a hashmap to map the current state and the current codepoint to the next state. Well... Owch. A hashmap lookup for every transition when previously it was just some simple arithmetic and a pointer dereference? You're looking at a huge slowdown. Too huge to be practical. So what do you do? Well, you build UTF-8 into your automaton itself. It makes the automaton bigger, but you retain your small alphabet size. Here, I'll show you. The first example is byte oriented while the second is Unicode aware:

                  $ regex-cli debug nfa thompson -b '(?-u)[^a]'
                  >000000: binary-union(2, 1)
                   000001: \x00-\xFF => 0
                  ^000002: capture(0) => 3
                   000003: sparse(\x00-` => 4, b-\xFF => 4)
                   000004: capture(1) => 5
                   000005: MATCH(0)
                  
                  $ regex-cli debug nfa thompson -b '[^a]'
                  >000000: binary-union(2, 1)
                   000001: \x00-\xFF => 0
                  ^000002: capture(0) => 10
                   000003: \x80-\xBF => 11
                   000004: \xA0-\xBF => 3
                   000005: \x80-\xBF => 3
                   000006: \x80-\x9F => 3
                   000007: \x90-\xBF => 5
                   000008: \x80-\xBF => 5
                   000009: \x80-\x8F => 5
                   000010: sparse(\x00-` => 11, b-\x7F => 11, \xC2-\xDF => 3, \xE0 => 4, \xE1-\xEC => 5, \xED => 6, \xEE-\xEF => 5, \xF0 => 7, \xF1-\xF3 => 8, \xF4 => 9)
                   000011: capture(1) => 12
                   000012: MATCH(0)
              
              This doesn't look like a huge increase in complexity, but that's only because '[^a]' is simple. Try using something like '\w' and you need hundreds of states.

              But that's just codepoints. UTS#18 level 2 support requires "full" case folding, which includes the possibility of some codepoints mapping to multiple codepoints when doing caseless matching. For example, 'ß' should match 'SS', but the latter is two codepoints, not one. So that is considered part of "full" case folding. "simple" case folding, which is all that is required by UTS#18 level 1, limits itself to caseless matching for codepoints that are 1-to-1. That is, codepoints whose case folding maps to exactly one other codepoint. UTS#18 even talks about this[1], and that specifically, it is difficult for regex engines to support. Hell, it looks like even "full" case folding has been retracted from "level 2" support.[2]

              The reason why "full" case folding is difficult is because regex engine designs are oriented around "codepoint" as the logical units on which to match. If "full" case folding were permitted, that would mean, for example, that '(?i)[^a]' would actually be able to match more than one codepoint. This turns out to be exceptionally difficult to implement, at least in finite automata based regex engines.

              Now, I don't believe the Turkish dotless-i problem involves multiple codepoints, but it does require custom tailoring. And that means the regex engine would need to be parameterized over a locale. AFAIK, the only regex engines that even attempt this are POSIX and maybe ICU's regex engine. Otherwise, any custom tailoring that's needed is left up to the application.

              The bottom line is that custom tailoring and "full" case matching don't tend to matter enough to be worth implementing correctly in most regex engines. Usually the application can work around it if they care enough. For example, the application could replace dotless-i/dotted-I with dotted-i/dotless-I before running a regex query.

              The same thing applies for normalization.[3] Regex engines never (I'm not aware of any that do) take Unicode normal forms into account. Instead, the application needs to handle that sort of stuff. So nevermind Turkish special cases, you might not find a 'é' when you search for an 'é':

                  $ echo 'é' | rg 'é'
                  $ echo 'é' | grep 'é'
                  $
              
              Unicode is hard. Tooling is littered with footguns. Sometimes you just have to work to find them. The Turkish dotless-i just happens to be a fan favorite example.

              [1]: https://unicode.org/reports/tr18/#Simple_Loose_Matches

              [2]: https://www.unicode.org/reports/tr18/tr18-19.html#Default_Lo...

              [3]: https://unicode.org/reports/tr18/#Canonical_Equivalents

      • arjvik 2 years ago

        Is there a benefit to respecting locale and not just using Unicode?

        • thayne 2 years ago

          Probably only if you are on an old legacy system that is using an unusual encoding.

  • hsbauauvhabzb 2 years ago

    I had someone ask me (a self described grep monkey) how I navigate grepping very long lines (minified js for example) to which I replied ‘lol I just ignore them’. I’d love ‘only select 200 chars if longer than 200 chats, but to my knowledge there’s no easy way to do this with grep. I’d love to hear suggestions on how people navigate this

    • karottenreibe 2 years ago

      My go-to is using -o and pre/appending .{100} to the pattern to capture however much context I need

    • burntsushi 2 years ago

      ripgrep has the -M option that will help here.

  • MayeulC 2 years ago

    I tend to use `git grep` for that. Is ripgrep better in some way?

    • burntsushi 2 years ago

      It works well outside of git repos automatically. And can search across multiple git repos while respecting each repo's respective gitignores automatically. ripgrep also tends to be faster, although the absolute difference tends to be lower with 'git grep' than a simple 'grep -r', since 'git grep' does at least use parallelism.

      There are other reasons to prefer one over the other, but are somewhat more minor.

      Here's one benchmark that shows a fairly substantial difference between ripgrep and git-grep and ugrep:

          $ locale
          LANG=en_US.UTF-8
          LC_CTYPE="en_US.UTF-8"
          LC_NUMERIC="en_US.UTF-8"
          LC_TIME="en_US.UTF-8"
          LC_COLLATE="en_US.UTF-8"
          LC_MONETARY="en_US.UTF-8"
          LC_MESSAGES="en_US.UTF-8"
          LC_PAPER="en_US.UTF-8"
          LC_NAME="en_US.UTF-8"
          LC_ADDRESS="en_US.UTF-8"
          LC_TELEPHONE="en_US.UTF-8"
          LC_MEASUREMENT="en_US.UTF-8"
          LC_IDENTIFICATION="en_US.UTF-8"
          LC_ALL=
          $ git rev-parse HEAD
          3b5e1590a26713a8c76896f0f1b99f52ec24e72f
          $ git remote -v
          origin  git@github.com:torvalds/linux (fetch)
          origin  git@github.com:torvalds/linux (push)
      
          $ time rg '\w{42}' | wc -l
          1957843
      
          real    0.706
          user    7.110
          sys     0.462
          maxmem  300 MB
          faults  0
      
          $ time git grep -E '\w{42}' | wc -l
          1957843
      
          real    7.678
          user    1:49.03
          sys     0.729
          maxmem  411 MB
          faults  0
      
          $ time ugrep -r --binary-files=without-match --ignore-files '\w{42}' | wc -l
          1957841
      
          real    10.570
          user    46.980
          sys     0.502
          maxmem  344 MB
          faults  0
      
          $ time ag '\w{42}' | wc -l
          1957806
      
          real    3.423
          user    8.288
          sys     0.695
          maxmem  79 MB
          faults  0
      
          $ time grep -E -r '\w{42}' ./ | wc -l
          grep: ./.git/objects/pack/pack-c708bab866afaadf8b5da7b741e6759169a641b4.pack: binary file matches
          grep: ./.git/index: binary file matches
          1957843
      
          real    47.441
          user    47.137
          sys     0.290
          maxmem  4 MB
          faults  0
      
      The GNU grep comparison is somewhat unfair because it's searching a whole lot more than the other 3 tools. (Although notice that there are no additional matches outside of binary files.) But it's a good baseline and also demonstrates the experience that a lot of folks have: most just tend to compare a "smarter" grep with the "obvious" grep invocation and see that it's an order of magnitude faster.

      It's also interesting that all tools agree on match counts except for ugrep ang ag. ag at least doesn't have any kind of Unicode support, so that probably explains that. (Don't have time to track down the discrepancy with ugrep to see who is to blame.)

      And if you do want to search literally everything, ripgrep can do that too. Just add '-uuu':

          $ time rg -uuu '\w{42}' | wc -l
          1957845
      
          real    1.288
          user    8.048
          sys     0.487
          maxmem  277 MB
          faults  0
      
      And it still does it better than GNU grep. And yes, this is with Unicode support enabled. If you disable it, you get fewer matches and the search time improves. (GNU grep gets faster too.)

          $ time rg -uuu '(?-u)\w{42}' | wc -l
          1957810
      
          real    0.235
          user    1.662
          sys     0.374
          maxmem  173 MB
          faults  0
      
          $ time LC_ALL=C grep -E -r '\w{42}' ./ | wc -l
          grep: ./.git/objects/pack/pack-c708bab866afaadf8b5da7b741e6759169a641b4.pack: binary file matches
          grep: ./.git/index: binary file matches
          1957808
      
          real    2.636
          user    2.362
          sys     0.269
          maxmem  4 MB
          faults  0
      
      Now, to be fair, '\w{42}' is a tricky regex. Searching something like a literal brings all tools down into a range where they are quite comparable:

          $ time rg ZQZQZQZQZQ | wc -l
          0
      
          real    0.073
          user    0.358
          sys     0.364
          maxmem  11 MB
          faults  0
          $ time git grep ZQZQZQZQZQ | wc -l
          0
      
          real    0.206
          user    0.291
          sys     1.014
          maxmem  134 MB
          faults  1
          $ time ugrep -r --binary-files=without-match --ignore-files ZQZQZQZQZQ | wc -l
          0
      
          real    0.199
          user    0.847
          sys     0.743
          maxmem  7 MB
          faults  16
      
      I realize this is beyond the scope of what you asked, but eh, I had fun.
  • melony 2 years ago

    How fast is magic wormhole? In my experience most of the new(er) file transfer apps based on WebRTC are just barely faster than Bluetooth and are unable to saturate the bandwidth. I am not sure if the bottleneck is in the WebRTC stack or whether there is something fundamentally wrong about the protocol itself.

    • tialaramex 2 years ago

      All magic wormhole is doing is agreeing a key, and then moving the encrypted data over TCP between sender and recipient.

      So for a non-trivial file this is in principle subject to the same performance considerations as any other file transfer over TCP.

      For a very tiny file, you'll be dominated by the overhead of the setup.

  • pmoriarty 2 years ago

    Why use ripgrep over silver searcher?

  • jedisct1 2 years ago

    If you're still using ripgrep, check out ugrep.

    Very fast, TUI, fuzzing matching, and actively maintained.

    • aftbit 2 years ago

      ripgrep is not maintained anymore? that was fast...

      • burntsushi 2 years ago

        I'm the maintainer of ripgrep and it is actively maintained.

        • a_wild_dandan 2 years ago

          Well that was a quick rollercoaster of emotions. Thanks for all that you do.

    • winrid 2 years ago

      ripgrep isn't maintained now? That was fast :)

      Or is it just done :)

      • ducktective 2 years ago

        `rg` is maintained. Last commit was 9 days ago by the creator himself.

whartung 2 years ago

The singular habit I picked up way back in the day was to simply cope with what was available.

There's all sorts of utilities and such. Emacs was a grand example at the time as well. Lots of better mousetraps.

But when you bounce around to a lot of different machines, machines not necessarily in your control, "lowest common denominator" really starts to rear its ugly head.

That vast majority of my command line concoctions are burned into muscle memory.

Today, I think the base line install of modern *nixes are higher than they were back in the day, but the maxim still applies of working with what they have out of the box.

  • michaelcampbell 2 years ago

    > The singular habit I picked up way back in the day was to simply cope with what was available.

    Nothing wrong with that. There are other ends of the spectrum of "make the things I do often as easy as I can", too. Both work.

    It reminds me of my father; when he got in a car that wasn't his, he would NOT CHANGE ANYTHING. You couldn't tell he was in it. Safety issues aside, there's an argument to be made to make it as comfortable for you as you can; seat, mirrors, environment, etc. to minimize any distractions.

    I see this too in my (software development) communities; some people like to tailor their development experience to the n'th degree to extract as much personal enjoyment/optimization/etc. out of it as they can. Others like to use whatever they're given and be happy with that. Both work.

    Myself, I type for a living so I like to use keyboards I like. I bring my own with me in my "go bag" for when I'm out so I don't have to use the (IMO!) crappy Apple Laptop KB's. I /can/ use it, I just choose not to. Other people either like them, or have learned they don't care enough. All good.

  • kristopolous 2 years ago

    Counterpoint, deciding not to put up with old things that are a pain in the ass is part of unseating the existing momentum so we can finally move on to better things.

    Knowing how to do things the annoying way doesn't mean that has to be the preferred way. Being open to retooling is part of staying relevant

    • mongol 2 years ago

      What I would like to see in that case is a "next gen utils", a bundling that makes it likely to find a number of these tools together in future servers.

      • kristopolous 2 years ago

        Really the "right" way to go about it would be to employ an existing package manager (there's enough already, we don't need another) and some magic glue on top that makes it easy.

        For example, you have your configuration of packages, in the ephemeral cloud somewhere, and you do the really dangerous no good thing of piping through bash with some kind of uuid that's assigned to your account, something like (totally made up url)

            curl packman.info/users/aed1242faed60a | bash
        
        And it sniffs the architecture, version of binaries to install, which ones are there, and then puts it into an install directory configured by you.

        This is like 97% existing things with a little glue and interface polish so you can easily bring in an environment.

        There's certainly other ways but the idea remains the same

        • Frotag 2 years ago

          Yep, I've started keeping a github repo of install scripts / dotfiles / etc that basically amounts to the workflow you described.

      • jrm4 2 years ago

        YES. That isn't a completely new shell. I keep trying the fishes and z-shells of the world and I keep coming back to "my own hacked up bash" because of muscle memory on random edge case things.

  • PeterWhittaker 2 years ago

    That's why I stuck with vi and sh back in the day: I knew they were on every machine I might telnet to (this was before ssh, sigh).

    On machines I controlled, I mostly used ksh, but it wasn't available on all machines; I cannot remember if it was the SunOS boxes or the older HP-UX or the Apollos, but there were a few. (csh? Go away. No, just go.)

    Nowadays, vim and bash are everywhere I need them to be, even if I have to kludge around some version differences.

    My only real gripe about find is the awkwardness of pruning multiple hierarchies. After you've written

      find / -path /sys -prune -o -path /dev -prune -o ... -o -type f -exec grep -H WHATEVER {} \;
    
    a few times, and have returned to the previous command to add yet another

      -o -path ... -prune
    
    tuple, it gets a little old.

    But it works. Everywhere.

    (* that I need it)

  • jthrowsitaway 2 years ago

    This is fine and all, but there are also subtle differences in standard CLI tools depending on the implementation. I'm used to GNU stdutils, and butt heads with the macOS and BusyBox implementations.

  • mongol 2 years ago

    Yes, to get by with what is available is a useful trait. No matter how good these tools are, I will often arrive at a prompt where they are unavailable.

    • mekster 2 years ago

      Why don't you make them available in ~/bin and have a better shell life?

  • bradwood 2 years ago

    This.

    I have exa and rg and fd all installed but unlearning the find and grep muscle memory is hard.

    Occasionally I give the newer stuff a go and then end up stumbling over syntax differences and end up just going back to what I know.

    • burntsushi 2 years ago

      If you ever give ripgrep a go, stumble over something and are inclined to: please post a Discussion question[1]. "beginner" or "stupid" questions are welcome. If you can show me what you know works with grep, for example, and are curious about an equivalent rg command, that could be a good question. I might be able to give you some "fundamental" answers to it that let you reason about the tools more from first principles, but on your terms.

      [1] - https://github.com/BurntSushi/ripgrep/discussions

    • kaba0 2 years ago

      I aliased grep and find to their newer alternatives. Sure, the syntax will be off from time to time but due to muscle memory I couldn’t relearn the new tools otherwise.

  • mekster 2 years ago

    Can't you just install Homebrew on your home dir and stop being locked up with the ancient environment?

    Or you could just create a git repo with those executables and pull them to your machines?

  • easton 2 years ago

    I think about a month after I learned enough Vim to be dangerous RHEL (8, I think) started shipping nano as the default editor. Ah well, now I can scroll with the home row on my local box.

  • hinkley 2 years ago

    I found out recently that I can't even count on

        du -sh * | sort -h
    
    to be portable. Got a batch of machines that don't support '-h'.
  • forgotmypw17 2 years ago

    Agree. For most situations find and then grep is good enough.

NateEag 2 years ago

I want to love fd - I'm a big believer in the idea that CLIs don't have to be scary, intimidating things (see normals using Slack with /commands and keyboard shortcuts), and find has a gigantic hairball of a UI.

The thing is, though, I know find well enough to not notice the terrible UI that much, and I know I can rely on it being everywhere. With fd that isn't true.

So it's hard for me to justify making the move.

Same thing happens with things like the fish and oil shells - I have little doubt their UX is better than Bash's, but Bash is pretty ubiquitous.

Emacs has this problem too, as an Emacs user. The UX is completely alien by current standards, but if you update the defaults you'll break a lot of people's existing configs.

How do you get around backwards compatibility / universality UX roadblocks like this?

  • eterps 2 years ago

    It just doesn't happen that often anymore that I need to ssh into a system.

    And my own systems have automatically synced dotfiles, making it mostly a non issue. (I'm using Syncthing for that)

    When writing scripts I usually fallback to traditional shells/commands for compatibilty. Unless I'm really sure I will be the only user.

    • NateEag 2 years ago

      I was more thinking of scripts, yeah, like you describe.

      Where I get hung up is, if I need to keep the traditional syntaxes in my head for scripting, why bother storing another one in my head for interactive use?

      ...that said, I do use ag and rg for other interactive tools, like cross-project search in Emacs.

  • rmetzler 2 years ago

    I know what you mean. I use fd and rg on my machine, but for scripts, Dockerfiles etc I tend to use find and grep, just because this is the „lingua franca“ of Unix/Linux.

    • burntsushi 2 years ago

      Same, and I'm the author of ripgrep! Unless the script is one that I wrote for myself in ~/bin, I use grep and find and standard tooling in shell scripts.

      The only exception is if there is a specific need for something ripgrep does. Usually it's for speed, but speed isn't always needed.

  • yepguy 2 years ago

    I think the solution that NixOS uses would work for Emacs too. Just define a single variable that declares which default values to use (in NixOS, it's system.stateVersion).

    Then packages (including internal packages) can update their defaults based on the version declared there. Basically a protocol version field but for your Emacs configuration.

    Distros probably need a different strategy for improving core utils, though.

  • hiepph 2 years ago

    Fish is awesome, if you hate Bash arcane syntax, like me. It improves my script productivity by 100%.

    One rule of thumb though: use it only for personal use, and stick with it to see if it lives long enough. If you're working with the team, just use Bash.

  • mturmon 2 years ago

    Old grep is still muscle memory for me, and that’s what I use in scripts.

    But the newer grep’s are so much faster! I scoffed initially but after a couple of uses I was hooked. I try to install these new tools in my personal ~/bin on systems I spend much time using.

mprovost 2 years ago

It feels like we're in a third wave of innovation in Unix CLI tools. The first was from BSD in the late 70s/early 80s which considerably improved the original Unix utilities, then the GNU tools rewrite in the late 80s, then there were the dark ages of System V. I give ack (and Andy) credit for starting this latest wave around 2005 but it's really taken off lately with tools being rewritten in Rust and challenging the old status quo.

  • sbf501 2 years ago

    The only way for 3rd wave to work is if multiple distros agree to adopt it. Since these tools don't even agree on an interface, IMHO it wouldn't be much different than what we have. I also don't like the fact that some tools native skip things like what's in ".gitignore". I don't want a tool that does that by default. If there was a consortium to standardize a new *nix CLI, then maybe it could get some traction.

    • mprovost 2 years ago

      There was a consortium, and it did standardise Unix, and that's when everything stopped moving in the 90s. Standards are compromises and so all the commercial Unixes had to implement the lowest common denominator. Thankfully GNU didn't care, and the BSD tools were already better.

    • calvinmorrison 2 years ago

      Well, like zsh, and many other improvements on the shell, and gawk and other tooling that doesn't match other awk engines and so forth... you end up having two parallel realities. One for scripting, where you use the bare minimus that is acceptable to run on any server and it's guaranteed to be there, and then your user env where you have all your fun tools.

      The cool part is network transparency and forwarding environments and other things that plan9 plays with so that you can work locally, remotely.

    • jrm4 2 years ago

      I know it's asking for the world, but some way to do better "built in modularity" would be great. Like "whatever new shell" plus a standardized "plugin system."

eterps 2 years ago

I love fd, but somehow I always get tripped up on this:

  fd             # prints tree of current directory
  fd somedir/    # results in an error

  find           # prints tree of current directory
  find somedir/  # prints tree of somedir/
  • geodel 2 years ago

    Ah, same thing happens to me all the time. You are right, second one seems more intuitive and many tools old and new use this pattern.

  • Johnny555 2 years ago

    That seems unfortunate, since my shell's autocomplete puts that trailing slash in there by default.

    • eterps 2 years ago

      The trailing slash is not the issue here. With `fd` the first argument is the pattern to search for, not the starting point like in `find`.

      For example to find fonts.conf below /etc, with fd you would do:

        fd fonts.conf /etc
      
      And with find:

        find /etc -name fonts.conf
      
      In other words with find the first argument is always the starting point. And leaving it out implies the current directory as the starting point.
      • joveian 2 years ago

        There are --search-path and --base-path options, so if you alias say fd='fd --search-path' you can then have the required first argument be the path to search. Personally, I find changing to the directory I want to search in less annoying than typing out the directory to search (I know the options exist from script use).

      • mekster 2 years ago

        If it wasn't for find's muscle memory, fd has it right as you'd usually list what you want to do in the cli first and then list arbitrary number of targets in the end.

ibejoeb 2 years ago

I like the contemporary alternative to the classics. They make a lot of thing so much easier.

I have a little mental block, though. It's related to the realities of the stuff I work on. Since I find myself logged into other people systems, keeping the old, standard tools hot in my head does really take some of the load off. It's a pretty common refrain, but it's real and practical when you've got embedded systems, bsds, linuxes, macs, etc. Even the difference between gnu and mac is clunky when I don't practice enough.

For the same reason, with the notable exception of git, I use practically no aliases.

If I could invent a product, maybe it would be one that enables effectively "forwarding" CLIs to a remote host shell.

  • robohoe 2 years ago

    I'm with you. I find that I feel like I know Linux like the back of my hands because I can fluidly interface with stock tools with a breeze. These new tools are great but I just don't see them widely spread across many remote systems that I manage. Just managing those packages across a fleet sounds like a pain in the ass.

    • dagw 2 years ago

      These new tools are great but I just don't see them widely spread across many remote systems

      I had smart, experience people tell me not to waste my time using the GNU tools for this exact reason back in the day

      • mprovost 2 years ago

        Nothing like logging into a freshly installed Solaris system and having to configure it using Bourne shell, which didn't have job control or history. At least it had vi. Usually the first thing you would do is get enough networking going to download bash and the GNU tools. But there were always some old timers around who wanted to haze the youngsters by forcing you to do everything with "native" tools.

    • mekster 2 years ago

      > Just managing those packages across a fleet sounds like a pain in the ass.

      Use Homebrew?

  • nowahe 2 years ago

    I spend a lot of time remoting into fresh *nix systems, so I also almost don't have any aliases, with one notable exception : ll (aliased to ls -lah).

    It's just so engrained into my muscle memory that I do it without thinking about it most of the time.

    And the workaround I found for it is adding a macro on my keyboard (through QMK but can be done with anything) that just types out 'alias ll="ls -lah"\n'.

  • vehementi 2 years ago

    Yeah, same here. I am doing a good amount of ops/SRE stuff these days while supporting my services and find myself ssh'ing into:

    - very locked down bastions - hosts through a secure remote access VM thing that makes file transfer difficult - random docker containers in EKS (often through both of the above)

    Getting good at the basic tools is just unavoidable. I find myself manually typing `alias k = kubectl` a lot though :p

wodenokoto 2 years ago

I find `find` so difficult to use that I usually do `find . | grep pattern` when I need to search for a file name.

  • b3morales 2 years ago

    Even better, try fzf instead: do `find . | fzf`

    • petepete 2 years ago

      Isn't that the equivalent of just `fzf`?

      • b3morales 2 years ago

        It depends on `FZF_DEFAULT_COMMAND`, but yes, fair point

  • dankle 2 years ago

    Same. I have an ’alias fgr=”find|grep”. Stupid but gets the job done.

  • xtracto 2 years ago

    Oh, I thought I was kind of dumb for doing that :) happy to see that it's normal haha. I always find myself trying to remember: is it one dash? is it "name" or "iname" or whatnot.. pipe grep is way easier.

  • g4zj 2 years ago

    What do you find difficult about using `find . -regex pattern`?

  • lazyweb 2 years ago

    I just use "grep -R", sometimes with "-i" as well. Works for me 99% of the time.

  • 7373737373 2 years ago

    It amazes me how long it took for alternatives like fd to emerge

    The old ones must have caused years of wasted time

lumb63 2 years ago

Maybe I’m just old fashioned but all these new command line utilities strike me as solutions in search of a problem.

Standard ‘find’ works great. It finds files. It can filter by any criteria I have ever had to look for and the syntax seems very intuitive to me (maybe I am just used to it). It is flexible and powerful.

I’d love to be told I’m wrong, because I feel like I’m missing something.

  • burntsushi 2 years ago

    The "new CLI utilities" aren't solving new problems. They are solving old problems with a different user experience, driven by changes in how folks do development (particularly the scales).

    Notice that I said "different" UX and not "better" UX. Reasonable people can disagree about whether the newer tools have better UX as an objective fact. Many folks very rightly find it very surprising that these tools will skip over things automatically by default. What is true, however, is that there are a lot of people who do see the UX of these tools as better for them.

    As the author of ripgrep, I hear the same thing over and over again: ripgrep replaced their ~/bin grep wrapper with a bunch of --exclude rules that filtered out files they didn't want to search. Because if you don't do that, a simple 'grep -r' can take a really fucking long time to run if you're working on a big project. And guess what: a lot of programmers these days work on big projects. (See previous comment about changes in scale.) So you don't really have a choice: you either write a wrapper around grep so that it doesn't take forever, or you use a "smarter" tool that utilizes existing information (gitignore) to effectively do that for you. That smarter tool typically comes with other improvements, because your simple "grep wrapper" probably doesn't use all of the cores on your machine. It could, but it probably doesn't. So when you switch over, it's like fucking magic: it does what you wanted and it does it better than your wrapper. Boom. Throw away that wrapper and you're good to go. That's what I did anyway. (I had several grep wrappers before I wrote ripgrep.)

    Every time these tools are discussed here, people say the same thing: "I don't see the point." It's a failure of imagination to look beyond your own use cases. If all you ever work on are smaller projects, then you're never going to care about the perf difference between ripgrep and grep. ripgrep has other improvements/changes, but nearly all of them are more niche than the UX changes and the perf improvements.

    • benhoyt 2 years ago

      Definitely agree. I really like that ripgrep is fast, but I mainly use it for its better UX for what I do every day: search code. If ripgrep wasn't any faster than grep, but was still recursive by default and ignored .gitignore et al, it'd still be worth using for me. (In fact, I used to use "ack", which is basically ripgrep but slow. :-)

  • MichaelDickens 2 years ago

    'find' is slow to the point of being useless for me. It can never find the file I'm looking for before I give up waiting for it to finish running. So I'm excited if fd can provide the same functionality but run much faster.

    • pphysch 2 years ago

      Are you inadvertently hitting a big .git/ with every find?

  • cheeze 2 years ago

    Call me naive/dumb/whatever, but I can't stand find's interface. Even things as simple as -depth rather than --depth bother me

smoothgrammer 2 years ago

One thing to remember is that these fun utils won't exist on production servers. You also don't want them there for obvious reasons. I find it better to use the most commonly available set of unix tools and I end up being far more effective due to that.

  • unnouinceput 2 years ago

    What is the obvious reason? Security? The guy provides sources and it took me 20 minutes to see what's doing in there. I'd definitely put this on production server, after testing intensively of course.

    • butwhywhyoh 2 years ago

      How frequently are you and your team willing to spend that 20 minutes? Are you confident you evaluated it precisely? Are you always going to install the latest release? What do you do as the codebase grows and that 20 minutes turns into an hour?

      • unnouinceput 2 years ago

        Did you know there are businesses out there that are still using programs made in DOS era (especially those made in FoxPro) and work perfectly? Or did you took a look at ATM's and seen that majority still have that WindowsXP look'n'feel? Not everybody needs the latest and greatest, you know?

        In that spirit, let's see your questions:

        1-Once in a life time; 2-Yes;3-No;4-Nothing, see 1st answer.

digisign 2 years ago

Hmm, I already have a shell alias that does 90% of this. Doesn't parse .gitignore, but it's not a big problem for me. If it was I'd do `make clean` in the project.

This is always installed and ready to go on any box I have my dotfiles.

I suppose that is why it is hard for these perfectly-good improvements have a hard time getting traction. Because the older stuff is still so flexible.

h_an_smei3 2 years ago

That are a lot of dependencies for such a simple tool. I'm a Rust user myself, but some of those dependencies really should be part of a good standard lib. Actually, the NPM like ecosystem is my biggest pain point with Rust.

[dependencies]

ansi_term = "0.12"

atty = "0.2"

ignore = "0.4.3"

num_cpus = "1.13"

regex = "1.5.5"

regex-syntax = "0.6"

ctrlc = "3.2"

humantime = "2.1"

lscolors = "0.9"

globset = "0.4"

anyhow = "1.0"

dirs-next = "2.0"

normpath = "0.3.2"

chrono = "0.4"

once_cell = "1.10.0"

[dependencies.clap]

version = "3.1"

features = ["suggestions", "color", "wrap_help", "cargo", "unstable-grouped"]

reactjavascript 2 years ago

I just want my entire file system stored in Sqlite so I can query it myself

  • samatman 2 years ago

    Oh hey look what I just found https://github.com/narumatt/sqlitefs

    • reactjavascript 2 years ago

      >> "mount a sqlite database file as a normal filesystem"

      I'm looking to do the opposite. Given a location, recurse it, stat() each file and create the database.

      • neura 2 years ago

        If you happen to be using macOS, there's mdfind, which uses the spotlight database which is always kept up to date, unlike locate/updatedb, where updatedb is expensive to run, even if you've run it recently.

        I have yet to find a good solution for linux CLI... something that uses an internal database that is kept up to date with all directory structure changes.

        Maybe someone else has seen something cool for this? :D

      • samatman 2 years ago

        It's still not clear to me whether you want the files inside the database or just the metadata. The stat part suggests the latter?

        I guess I can see that, but now you have cache invalidation (someone else linked to Spotlight, which does this as a background process). SQLite files can be larger than any physical medium you can purchase so why not go the distance?

  • mi_lk 2 years ago

    updatedb/locate?

    • neura 2 years ago

      updatedb is expensive to run and is usually run by a cron job, whereas something like mdfind on macos uses spotlight, which is kept up to date with all filesystem changes (or at least reasonably fast).

      Anybody know of something like that for linux?

  • jeffbee 2 years ago

    Sounds like macOS would be right up your alley then.

    • reactjavascript 2 years ago

      I have MacOS, where is the sqlite file?

      • jeffbee 2 years ago

        /private/var/db/Spotlight-V100/...

      • ashton314 2 years ago

        Is the parent post referring to `mdfind` perhaps?

        • neura 2 years ago

          Believe so.

tragomaskhalos 2 years ago

I use find all the time, but it is such a strange beast - it's as if there were a meeting among all the standard Unix utilities on look and feel and find missed the memo. But it's ubiquitous and I'm too old to change horses now anyway.

jonnycomputer 2 years ago

Why would the tool want to ignore patterns in a .gitignore? It isn't a git tool...

  • joemi 2 years ago

    I don't like it when non-git tools do that by default. I'm sure it's nice for some people, so make it an option that could be enabled. But to have that behavior by default feels far too opinionated for me.

    • neura 2 years ago

      People generally develop tools for their own needs. I'm not saying "hey, if you want one that uses the defaults you believe are best, create your own", but just keep some empathy for open source developers. :)

  • sfink 2 years ago

    Generated files. If you're searching for every place something shows up so you can change it, it's annoying to have to filter out the stuff that'll get updated automatically (and will mess up timestamps if you update manually.)

    Also, backup files. The fewer irrelevant results, the more useful a tool is.

    • jonnycomputer 2 years ago

      I'm not asking why someone would find it be useful.

      • qot 2 years ago

        Yes you are. We are talking about a tool, so you are asking why someone would "find it be useful".

      • sfink 2 years ago

        The tool would want to use .gitignore files because it's useful.

        Also, it's far from the only non-VCS tool that uses VCS ignore files.

            rsync --cvs-exclude
            tar --exclude-vcs
        • jonnycomputer 2 years ago

          Again, I'm not asking why someone would find it useful.

          • sodality2 2 years ago

            > Why would the tool want to ignore patterns in a .gitignore?

            > The tool would want to use .gitignore files because it's useful.

            • jonnycomputer 2 years ago

              Sometimes a question isn't a question.

  • neura 2 years ago

    To answer your question directly: If you're grepping through source code, you generally do not care about caches, backups, personal workspace configs, node_modules, "compiled" output (JS projects generally "compile" to JS and if you're looking for something in source files, you probably do not care about the bundles you're outputting, just the source), etc. You generally care about your source code, which is the stuff that's not getting added to .gitignore.

    Since we're talking about an open source project, try to keep some empathy for developers that are solving their own problems first and not trying to solve everybody's problems. It's a configurable option, anyway.

  • EdSchouten 2 years ago

    Exactly. And if you wanted a Git aware tool, you could just run 'git ls-files'.

ungawatkt 2 years ago

Hey, fd. I don't use it normally, but it ended up being the easiest and fastest tool for me to delete ~100 million empty files off my disk (a weird situation). It has threading and command execution built in, so I could saturate my cpu pretty easily doing the deletes with `fd -tf --threads=64 --exec r m {}` (I put the space in the rm command on purpose for posting).

  • donio 2 years ago

    find with its built-in -delete action would have avoided executing the external rm command millions of times.

    • ungawatkt 2 years ago

      well, somehow I missed that when looking for options. I just tried it out, the -delete option is way faster than what I posted before, TIL, thanks.

spudlyo 2 years ago

I don't use `fd` on the command line because I have very ingrained `find` muscle memory, but it's really made using `projectile-find-file` in Emacs totally usable with the huge monorepos I deal with at work. The same goes for `rg`, I love using it with `consult-ripgrep` in Emacs for searching through mountains of code.

  • bradwood 2 years ago

    Yup. Integrations with vim are really helpful here. But on the raw CLI it's tough to start unlearning the muscle memory

colpabar 2 years ago

> The command name is 50% shorter* than find :-).

I love this but if enough new tools keep doing this I might have to change some of my bash aliases :(

pdimitar 2 years ago

I use fd and rg a lot, integrated them with my scripts and even have some of them bound to keys.

Insanely good and fast programs. Zero regrets.

For a fuzzy finder I recently replaced fzf with peco. I like it better, it's very customizable.

elromulous 2 years ago

I'm seriously curious, is this the first time this link is being submitted?

Frequently, I try to submit a link, and it shows up as having been submitted. And I'm quite certain a tool as popular as fd has been featured on hn before. So either, somehow this particular link has never been submitted (doubtful), hn allows resubmitting a link after some amount of time, or the link resubmit prevention logic doesn't apply to certain users?

broses 2 years ago

Usually when I can't be bothered to remind myself the syntax for find, my go to these days is `echo **/*pattern*`. Of course, this is mainly just for small searches.

massysett 2 years ago

I would love a `find` with reverse polish notation, also known as postfix notation. Something like:

    find . --name this --name that --or
or, for more complex:

    find . --name this --name that --or --modified 2022-05-20 --and
I have some little personal CRUD apps and this sort of postfix notation works very well for them.

I could write something like this but haven't gotten motivated to do it though.

unnouinceput 2 years ago

Windows test comparison on my machine - for finding all .jpg in my D: drive

1 - classic command prompt: "D:\>dir /s /b *.jpg > 2.txt" - time 5 seconds (4581 files)

2 - this little gizmo: "D:\>fd -e jpg > 1.txt" - time 1 second (same 4581 files)

Conclusion: I have a new tool dropped in my System32 folder from now on. Thank you David Peter

ghostly_s 2 years ago

This didn't really sound like something I need until I got to the `{.}` syntax, which solves a problem I was just trying and failing to solve with gnu find ten minutes ago (namely that there seems to be no convenient way to use the basename of the match in an exec statement, eg. bash's `${file##*.}` syntax).

HeadlessChild 2 years ago

Well this is saving me a ton of time as I'm basically migrating UID ownership of files for NFS shares that dates back to the 90's. According to my rough benchmarks between find and fd, fd is ~3 times faster.

h_an_smei3 2 years ago

Haven't there been unresolved security issues with chrono, a crate this one depends on?

Chrono hasn't been updated for almost 2 years. Is the issue resolved or is there a security risk in using fd?

yubiox 2 years ago

How does it run faster than find? Can I manually implement that speedup using standard unix tools? I need to run find a lot on many machines I don't have access to install anything on.

  • petepete 2 years ago

    It's faster than find in that I don't need to read the manpage every time I use it.

  • burntsushi 2 years ago

    You can speed up grep by using 'xargs' or 'parallel' because searching tends to be the bottleneck.

    But for 'find', the bottleneck tends to be directory traversal itself. It's hard to speed that up outside of the tool. fd's directory traversal is itself parallelized.

    The other reason why 'fd' might be faster than a similar 'find' command is that 'fd' respects your gitignore rules automatically and skips hidden files/directories. You could approximate that with 'find' by porting your gitignore rules to 'find' filters. You could also say that this is comparing apples-to-oranges, which is true, but only from the perspective of comparing equivalent workloads. From the perspective of the user experience, it's absolutely a valid comparison.

amelius 2 years ago

I want a user friendly alternative to find's companion xargs.

HeadlessChild 2 years ago

Would it make sense to create a utils package with all these new rust based utilities? Something like "rustutils" with a resemblance to "coreutils".

lbrito 2 years ago

There are some Unix tools I never get around to memorizing the syntax of, and always end up searching "how to"s. find is definitively one of them.

racl101 2 years ago

Find is a powerful command but one for which it is hard to find good examples beyond the basics.

So I'm glad for these new kinds of CLI tools.

pseudostem 2 years ago

I don't like find. Nor do I like vi (not vim, vi) and/or maybe others. I don't wish to stamp on someone's parade who's more accomplished than I am. But I think these "new" tools miss the point.

I use vi because I know it exists on every(?) system ever. It's not like I go out of my way seeking vi. I feel the feeling is similar for find. It works. It works well. It works the same on all systems I work on.

Would I go out of my way to install find on my system? Probably not.

  • burntsushi 2 years ago

    They don't miss the point. We're well aware they aren't ubiquitous and that is indeed one of their costs.[1]

    If the old tools are working well for you, then keep using them! I used plain grep for well over a decade before writing ripgrep. Hell, sometimes I still use grep for precisely the reason you describe: it is ubiquitous.

    Also, not every grep behaves the same. Not even close. Unless you're being paranoid about how you use grep, it's likely you've used some feature that isn't in POSIX and thus isn't portable.

    Uniquity and portability aren't "the point." Uniquity is a benefit and portability can be a benefit or a cost, depending on how you look at it.

    [1] - https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...

    • pseudostem 2 years ago

      I should have phrased this as a question, instead of being dismissively declarative.

      >If, upon hearing that "ripgrep can replace grep," you actually hear, "ripgrep can be used in every instance grep can be used, in exactly the same way, for the same use cases, with exactly the same bug-for-bug behavior," then no, ripgrep trivially cannot replace grep. Moreover, ripgrep will never replace grep. If, upon hearing that "ripgrep can replace grep," you actually hear, "ripgrep can replace grep in some cases and not in other use cases," then yes, that is indeed true!

      I think this statement says it all.

      • burntsushi 2 years ago

        Yes, it's a persistent misunderstanding because communication is hard and folks aren't always exactly precise. It is very common to hear from someone, "ripgrep has replaced grep for me." You might even here people state it more objectively, like, "ripgrep is a grep replacement." The problem is that the word "replace" or "replacement" means different things to different people. So that FAQ item was meant to tease those meanings apart.

krnlpnc 2 years ago

Does fd solve the annoying “the order of the command line arguments matters a ton” approach that find uses?

Symmetry 2 years ago

Oh, nice. Currently I just have myshell alias `fn` to `find -name $argv` but this looks cool.

0x00101010 2 years ago

And another one, because I've read ncdu so much. dua is very nice as well.

0x00101010 2 years ago

Because no one mentioned it. procs. I love it as ps/top replacement.

marbex7 2 years ago

find is one of those tools that has me back at square one every time I want to do something non-trivial. Looking forward to giving this a shot.