Biggest shell programs

259 points by todsacerdoti 7 months ago

zeroxfe 7 months ago

Okay, so when I worked at Sony about 25 years ago, I got assigned this project to fix our order management system, which was extremely slow, and kept crashing.

I jumped in and started digging around, and to my horror, the OMS was a giant set of shell scripts running on an AIX server, which evolved over a decade and was abandoned. It was over 50,000 lines of code! It was horrendous and shit kept timing out everywhere -- orders, payments, and other information were moved from server to server over FTP, parsed with complicated sed/awk, and inventory was tracked in text files (also FTPd around.)

At the time, perl seemed like the most practical way for me to migrate the mess -- I rewrote all of the shell piece by piece, starting with the simplest peices and replaced them with small perl modules as part of a larger perl application, refactoring along the way. It took me 3 months and I moved the whole thing to about 5000 lines of perl, and it ran 10-100x faster with almost none of the failures in the original system.

As terrible as it was, it's one of the most satisfying things I've ever done. :-)

martin-t 7 months ago

Just 3 months?
That's deleting 800 lines a day, each day. Did you need to read through the original code, get a deep understanding and match its behavior exactly or did you throw away huge chunks and write new code as you thought it should behave?
Was there a lot of boilerplate that could be replaced quickly?
- zeroxfe 7 months ago
  
  There was tons of duplicate code, unnecessary code, dead code, etc. There was also a lot of code for which CPAN modules could entirely replace. (Also, I'm a workaholic who obsesses about a problem until it's fully solved.)
- almostgotcaught 7 months ago
  
  > That's deleting 800 lines a day, each day
  50,000/90 = 555 ???
  
  ciupicri 7 months ago
  
  Not all days are working days.
shawn_w 7 months ago

perl is still the most practical way to mitigate shell script abominations like that. Though tcl's a good option too.
- chubot 7 months ago
  
  Oils aims to be the absolute best way to migrate shell scripts! (I created the project, and the wiki page being discussed)
  https://www.oilshell.org/
  OSH is the most bash-compatible shell in the world, and YSH is a new language
  ls | sort | uniq | wc -l # this is both OSH and YSH var mydict = {foo: 42, bar: ['a', 'b']} # this is new YSH stuff you can start using json write (mydict)
  The difference between OSH and YSH is exactly a set of "shopt" options [1], although YSH feels like a brand new language too! There is a smooth blend.
  I think it's worth it for 2 things alone
  - YSH checks all errors - you never lose an exit code
  - YSH has real arrays and doesn't mangle your variables with word splitting
  There's a lot more: modules with namespaces (use mymodule.ysh), buffered I/O that's not slow, etc.
  Gradually upgrading -https://github.com/oils-for-unix/oils/wiki/Gradually-Upgradi... (people are writing new YSH, but not many people have gradually upgraded, so I'd definitely appreciate feedback from people with a big "shell script problem")
  ---
  There is a FAQ here about Perl:
  Are you reinventing Perl? - https://www.oilshell.org/blog/2021/01/why-a-new-shell.html#a...
  Not to say that migrating to Perl is worse in any way, i.e. if you already know Perl or your team knows it.
  But objectively YSH is also a shell, so I think more of the code carries over, and there is a more direct upgrade path.
  ---
  [1] Unix Shell Should Evolve like Perl 5 - https://www.oilshell.org/blog/2020/07/blog-roadmap.html#the-... - i.e. with compatible upgrade options
  
  RestartKernel 7 months ago
  
  That looks great! I moved from Fish and NuShell to Zsh because of its reasonable compatibility with bash, so OSH seems right up my alley.
- nextos 7 months ago
  
  Or Ruby, which is essentially Smalltalk for Unix, plus lots of Perl-isms.
  Haskell (e.g. shh) and Clojure (Babashka) are also a nice for this usecase, but more niche options.
InitEnabler 7 months ago

That's wild.

PeterWhittaker 7 months ago

Oh, no, now I have to go dig out some of mine....

The first really big one I wrote was the ~7000 line installer for the Enrust CA and directory, which ran on, well, all Unixes at that time. It didn't initially, of course, but it grew with customer demand.

The installation itself wasn't especially complicated, but upgrades were, a little, and this was back when every utility on every Unix had slight variations.

Much of the script was figuring out and managing those differences, much was error detection and recovery and rollback, some was a very primitive form of package and dependency management....

DEC's Unix (the other one, not Ultrix) was the most baffling. It took me days to realize that all command line utilities truncated their output at column width. Every single one. Over 30 years later and that one still stands out.

Every release of HP-UX had breaking changes, and we covered 6.5 to 11, IIRC. I barely remember Ultrix or the Novell one or Next, or Sequent. I do remember AIX as being weird but I don't remember why. And of course even Sun's three/four OS's had their differences (SunOS pre 4.1.3; 4.1.3; Solaris pre 2; and 2+) but they had great FMs. The best.

emmelaich 7 months ago

That column truncation sounds bizarre. Are you sure the terminal didn't have some sort of sideways scroll available?
- dspillett 7 months ago
  
  I think he was meaning that they truncated the lines even when called from a script, with their output going somewhere other than a terminal, not just when run interactively
  
  emmelaich 7 months ago
  
  Yep, but I'm curious enough to quiz it.
  Weirdly, today I ran wish in MacOS Sequoia (15.1.x) and had the (exception) output truncated at terminal width!
  
  p_l 7 months ago
  
  Because macOS closest relative isn't Free/NetBSD, but OSF/1 which, under few different names, was sold by Digital as Unix for Alpha (there were few rare builds for MIPS too).
  
  skissane 7 months ago
  
  > Because macOS closest relative isn't Free/NetBSD, but OSF/1
  What you say here contains some truth (certainly with respect to the kernel), but I doubt that has anything to do with the behaviour the grandparent is reporting–the wish command in Tcl/Tk truncating output at terminal width. That behaviour would be determined by the Tcl/Tk code, nothing inherently to do with the underlying OS.
  > but OSF/1 which, under few different names, was sold by Digital as Unix for Alpha (there were few rare builds for MIPS too).
  IBM also briefly sold a port of OSF/1 to IBM mainframes, AIX/ESA: it was available to customers in June 1992, and withdrawn from marketing in June 1993–it was discontinued so quickly due to lack of customer interest, and also because IBM was about to release (in 1994) a UNIX compatibility subsystem for MVS (OpenEdition), which was a more attractive UNIX option for many of their mainframe customers.
  I believe IBM's abortive Workplace OS – shipped in beta form only as OS/2 PowerPC Edition – was also partially derived from the OSF/1 code base.
  
  p_l 7 months ago
  
  I believe the Workplace OS connection was related to the use of Mach microkernel. I knew IBM was looking into at least one use of OSF/1 as shipped Unix, but wasn't sure which one (and AIX by middle 1990s was weird enough to confuse most people...)
  
  emmelaich 7 months ago
  
  Interesting, I thought MacOS was basically a FreeBSD variant.
  But I just tried it again on a resized terminal window and I couldn't reproduce it!
  
  p_l 7 months ago
  
  OSX took in a bit of FreeBSD and NetBSD to modernise sons areas, but the main thing for 10.0 release involved pulling in the latest OSFMK.
  Also, several APIs introduced after BSD 4.4 are very visibly missing, pointing to how little was taken from Free/NetBSD
  
  PeterWhittaker 7 months ago
  
  dspillett was exactly right: ps, e.g., truncated its output at $COLUMNS and there was no horizontal scroll.
  As suggested above, it did this even when called from a script.
  The fix was easy, set COLUMNS ridiculously large if DEC Unix, but it took days of WTF apparent UB before I realized how simple was what was happening. It just seemed haphazard: I'd reposition and resize a window so I could run the script in one while manually running the commands in another, get inconsistent results, rinse, repeat...
  ...and eventually realize the common element in each test was me, and the variations I was introducing were window size.
  I cursed their engineers for trying to be "helpful" and keep things "pretty".
  
  nikau 7 months ago
  
  If he is talking about osf1/tru64 that's the first one heard of it
  
  raffraffraff 7 months ago
  
  :O
throw16180339 7 months ago

> DEC's Unix (the other one, not Ultrix) was the most baffling. It took me days to realize that all command line utilities truncated their output at column width. Every single one. Over 30 years later and that one still stands out.
Do you mean OSF1/, Digital Unix, or Tru64 Unix?
- PeterWhittaker 7 months ago
  
  Oh, yes, I think it was Digital Unix. IIRC, we toyed with OSF/1, but there wasn't much call for it.
  
  p_l 7 months ago
  
  OSF/1, Digital Unix, and Tru64 are the same OS at different points in time.
  Technically OSF/1 was supposed to be the commercial BSD answer to System V, in practice only several niche vendors used it, plus Digital and NeXT (and through NeXT, Apple which continues the line to this day)
  
  PeterWhittaker 7 months ago
  
  Thanks for the clarification. It was so long ago, it’s all a bit hazy.
  Other than the COLUMNS thing. That is burnt into my memory forever.
raffraffraff 7 months ago

I made it to thousands but more like 2000. At least I only had to support Redhat and Ubuntu (modern ones, at that)
banku_brougham 7 months ago

Thank you for your service, Im so glad you could share. Id be interested to read more.
PeterWhittaker 7 months ago
OK, so JOOC I ran wc against the main binary and supporting libraries for a project I did last year: It's a script to manage linear assured pipelines implemented as a series of containers (an input protocol adapter, one or more filters, an output protocol adapter). The whole thing was intended to be useful for people who aren't necessarily experts in either containers or protocols, but who have an idea of how they want to filter/transform files as they transit the pipeline.
It's 6224 lines, so far.
There is a top-level binary with sub-functions, sort of like how
```
   git [ git options ] < git action> [action options]
```
or
```
  systemctl [etc.[
```
work.
There is a sub command to add a new sub command, which creates the necessary libraries and pre-populates function definitions from a template; the template includes short and long usage functions, so that
```
  cbap -h
```
or
```
  cbap pipeline -h
```
give useful and reasonable advice.
There are subcommands for manipulating base images, components (which are images with specific properties for use as containers in the pipelines), and pipelines themselves. A LOT of code is for testing, to make sure that the component and pipeline definitions are correctly formatted. (Pipelines are specified in something-almost-TOML, so there is code to parse toml, convert sections to arrays, etc., while components are specified as simple key=value files, so there is code to parse those, extract LHS and RHS, perform schema validation, etc.).
Since pipeline components can share properties, there is code to find common properties in var and etc files, specify component properties, etc.
There are a lot user and group and directory and FIFO manipulation functions tailored to the security requirements: When a pipeline is setup, users and groups and SEL types and MCS categories are generated and applied, then mapped into to the service files that start the components (so there is a lot of systemd manipulation as well).
Probably the single biggest set of calls are the functions that get/set component properties (which are really container properties) and allow us to use data-driven container definitions, with each property having a get function, a validation function, and an inline (in a pipeline) version, for maximum flexibility.
Finally, there is code that uses a lot of bash references to set variables either from files, the environment, or the command line, so that we can test rapidly.
It also support four levels of user, from maintainer (people who work on the code itself), developer (people who develop component definitions), integrators (people who build pipelines from components), and operators (people who install pipelines), with the ability to copy and package itself for export to users at any of those levels (there is a lot of data-driven, limited recursive stuff happening therein).
Since target systems can be any Linux, it uses makeself to package and extract itself.
For example, an integrator can create a pipeline definition, which will produce a makeself file that, when run on the target system, will create all users, groups, directories, FIFOs (the inter-component IPC), apply DAC and MAC, create systemd files, copy images to each user, and launch the pipeline - with a delete option to undo all of that.
There is some seccomp in there as well, but we've paused that as we need to find the right balance between allow- and deny- listing.
(Yes, I use shellcheck. Religiously. :->)
- ndsipa_pomu 7 months ago
  
  That sounds both great and horrifying

RodgerTheGreat 7 months ago

At one point I considered writing an interpreter for my scripting language Lil in bash to maximize portability, but quickly realized that floating-point arithmetic would be extremely painful (can't even necessarily depend on bc/dc being available in every environment) and some of the machines in my arsenal have older versions of bash with very limited support for associative arrays. My compromise was to instead target AWK, which is a much more pleasant general-purpose language than most shells, and available in any POSIX environment: https://beyondloom.com/blog/lila.html

seiferteric 7 months ago

> can't even necessarily depend on bc/dc being available in every environment
Just discovered this myself, also trying to make a language target shell. Was really surprised bc/dc was not present I think in Ubuntu install in WSL2. Also using awk for floating point math, but just shelling out to it.
- RodgerTheGreat 7 months ago
  
  Yep! I considered shelling out to AWK for the same reason, as a bc/dc alternative, but rapidly found that nearly everything else bash could do was easier and less error-prone (and workable on much older systems) if I moved the whole script into pure AWK.
- kjellsbells 7 months ago
  
  One of those occasional reminders that Linux != UNIX, I guess. Bc is mandatory in POSIX I believe but Linux never took that path.
  https://pubs.opengroup.org/onlinepubs/9699919799.2008edition...
  
  nmz 7 months ago
  
  Honestly the first thing I do is get busybox, busybox's behavior is at least somewhat predictable.

kamaal 7 months ago

As someone who has written and maintained large Perl programs at various points in my career. There is a reason why people do this- Java and Python like languages work fine when interfaces and formats are defined, and you often have 0 OS interaction. That is, you use JSON/XML/YAML or interact with a database or other programs via http(s). This creates an ideal situation where these languages can shine.

When people do large quantity text and OS interaction work, languages like Java and Python are a giant pain. And you will begin to notice how Shell/Perl become a breeze to do this kind of work.

This means nearly every automation task, chaotic non-standard interfaces, working with text/log files, or other data formats that are not structured(or at least well enough). Add to this Perl's commitment towards backwards compatibility, a large install base and performance. You have 0 alternatives apart from Perl if you are working to these kind of tasks.

I have long believed that a big reason for so much manual drudgery these days, with large companies hiring thousands of people to do trivially easy to automate tasks is because Perl usage dropped. People attempt to use Python or Java to do some big automation tasks and quit soon enough when they are faced with the magnitude of verbosity and overall size of code they have to churn and maintain to get it done.

stackskipton 7 months ago

Strong disagree that it's because "Omg, no more Perl" but just complexity cranked up and that Perl person stitching scripts together became their full job and obviously Perl only got you so far. So now you have additional FTE who is probably expensive.
Also, if end user is on Windows, there is already Perl like option on their desktop, it's called Powershell and will perform similar to Perl.
GoblinSlayer 7 months ago

I did a big automation task in native code, because efficiency is desirable in such cases, while bash+grep favor running a new process for every text line. In order to be efficient, you need to minimize work, and thus batch and deduplicate it, which means you need to handle data in a stateful manner while tracking deduplication context, which is easier in a proper programming language, while bash+grep favor stateless text processing and thus result in much work duplication. Another strategy for minimization of work is accurate filtering, which is easier to express imperatively with nice formatting in a proper programming language, grep and regex are completely unsuitable for this. Then if you use line separated format, git awards you with escaping to accommodate for whatever, which is inconsistently supported and can be disabled by asking null terminated string format with -z option, I don't think bash has any way to handle it, while in a sufficiently low level language it's natural, and it also allows for incremental streaming so you don't have to start a new process for every text line.
As a bonus you can use single code base for everything no matter if there's http or something else in the line.
chubot 7 months ago

Yes I agree - my favorite language is Python, but it can be annoying/inefficient for certain low-level OS things. This is why I created https://www.oilshell.org (and the linked wiki page)
A few links for context:
Are you reinventing Perl?
https://www.oilshell.org/blog/2021/01/why-a-new-shell.html#a...
The Unix Shell Should Evolve Like Perl 5 (with compatible upgrade options, rather than a big bang like Perl 6/Raku)
https://www.oilshell.org/blog/2020/07/blog-roadmap.html#the-...
A Tour of YSH - https://www.oilshell.org/release/latest/doc/ysh-tour.html
hiAndrewQuinn 7 months ago

I've been seriously considering learning some Perl 5-fu ever since I realized it's installed by default on so many Linux and BSD systems. I think even OpenBSD comes with perl installed.
That may not seem like a big advantage until you're working in an environment where you don't actually have the advantage of just installing things from the open Internet (or reaching the Internet at all).

ulrischa 7 months ago

I think the main problem with writing large programs as bash scripts is that shell scripting languages were never really designed for complexity. They excel at orchestrating small commands and gluing together existing tools in a quick, exploratory way. But when you start pushing beyond a few hundred lines of Bash, you run into a series of limitations that make long-term maintenance and scalability a headache.

First, there’s the issue of readability. Bash's syntax can become downright cryptic as it grows. Variable scoping rules are subtle, error handling is primitive, and string handling quickly becomes messy. These factors translate into code that’s harder to maintain and reason about. As a result, future maintainers are likely to waste time deciphering what’s going on, and they’ll also have a harder time confidently making changes.

Next, there’s the lack of robust tooling. With more mature languages, you get static analysis tools, linters, and debuggers that help you spot common mistakes early on. For bash, most of these are either missing or extremely limited. Without these guardrails, large bash programs are more prone to silent errors, regressions, and subtle bugs.

Then there’s testing. While you can test bash scripts, the process is often more cumbersome. Complex logic or data structures make it even trickier. Plus, handling edge cases—like whitespace in filenames or unexpected environment conditions—means you end up writing a ton of defensive code that’s painful to verify thoroughly.

Finally, the ecosystem just isn’t built for large-scale Bash development. You lose out on modularity, package management, standardized dependency handling, and all the other modern development patterns that languages like Python or Go provide. Over time, these deficits accumulate and slow you down.

I think using Bash for one-off tasks or simple automation is fine — it's what it’s good at. But when you start thinking of building something substantial, you’re usually better off reaching for a language designed for building and maintaining complex applications. It saves time in the long run, even if the initial learning curve or setup might be slightly higher.

ndsipa_pomu 7 months ago

Using ShellCheck as a linter can catch a lot of the common footguns and there are a LOT of footguns and/or unexpected behaviour that can catch out even experienced Bash writers. However, Bash/shell occupies a unique place in the hierarchy of languages in that it's available almost everywhere and will still be around in 30 years. If you want a program that will run almost everywhere and still run in 30 years time, then shell/Bash is a good choice.
- norir 7 months ago
  
  I'd almost always prefer c99 to shell for anything more than 100 lines of code or so. There is even a project I saw here recently that can bootstrap tcc in pure shell (which can then be used to bootstrap gcc). I'm somewhat skeptical that bash will still be used for anything but legacy scripts in 30 years, despite it's impressive longevity to this point, but I could sadly be proven wrong.
  
  ndsipa_pomu 7 months ago
  
  So, if you wanted to write something that you would be pretty sure could easily run on machines in 30 years time, what would you use?
  I don't think c99 would be a good choice as processors will likely be different in 30 years time. If you had your program on e.g. a usb stick and you manage to load it onto a machine, it'd only be able to run if you had the same architecture. Even nowadays, you'd run into difficulties with arm and x86 differences.
  Some kind of bytecode language might seem better (e.g. java), but I have my doubts about backwards compatibility. I wonder if Java code from 20 years ago would just run happily on a new Java version. However, there's also the issue of Java not being installed everywhere.
  
  wiseowise 7 months ago
  
  > I wonder if Java code from 20 years ago would just run happily on a new Java version.
  Absolutely.
  
  ndsipa_pomu 7 months ago
  
  That's good to know. I haven't touched Java myself in years, but at work I hear of developers complaining that our code runs on Java 11 and they haven't been given the time to move it to a more recent version.
  Personally, I've encountered great difficulties with some old SAN software that required a Java 6 web plugin that I couldn't get running on anything other than Internet Explorer - I kept an XP VM with the correct version just for that. I suspect a large part of the problem was the software incorrectly attempts to check that the version is at least 6, but fails when the version is newer (they obviously didn't test it when later versions got released).
JoyfulTurkey 7 months ago

Dealing with this at work right now. Digging through thousands of lines of Bash. This script wasn’t written a long time ago, so no clue why they went with Bash.
The script works but it always feels like something is going to break if I look at the code the wrong way.
- chubot 7 months ago
  
  If you have thousands of lines of bash, don't like maintaining it, but don't necessarily want to rewrite the whole thing at once, that's what https://www.oilshell.org/ is for!
  See my comment here, with some details: https://news.ycombinator.com/item?id=42354095
  (I created the project and the wiki page. Right now the best bet is to join https://oilshell.zulipchat.com/ if it interests you. People who want to test it out should be comfortable with compiling source tarballs, which is generally trivial because shells have almost no dependencies.)
  The first step is:
  shopt --set strict:all # at the top of the file
  Or to run under bash
  shopt -s strict:all 2>/dev/null || true
  And then run with "osh myscript.bash"
  OSH should run your script exactly the same as bash, but with better error messages, and precise source locations.
  And you will get some strictness errors, which can help catch coding bugs. It's a little like ShellCheck, except it can detect things at runtime, whereas ShellCheck can't.
anthk 7 months ago

Bash/ksh have -x as a debug/tracing argument.

voxadam 7 months ago

I'm pretty sure the largest handwritten shell program I used back in the day on a regular basis was abcde (A Better CD Encoder)[1] which clocks in at ~5500 LOC.[2]

[1] https://abcde.einval.com

[2] https://git.einval.com/cgi-bin/gitweb.cgi?p=abcde.git;a=blob...

lelandfe 7 months ago

Not that I'd know anything about it, but this was one of the tools recommended on What.CD back in the day. Along with Max (my friends tell me) https://github.com/sbooth/Max
- voxadam 7 months ago
  
  Probably every rip I posted to What.CD and OiNK before it was created using abcde.
  Allegedly.
  
  lelandfe 7 months ago
  
  The greatest loss was truly not even What.CD the incredible tracker but the forums. I've never again found a more concentrated group of people with taste.
  
  throwup238 7 months ago
  
  You gotta use the SWIM acronym, for the ultimate callback to the aughts.
  
  voxadam 7 months ago
  
  Honestly, I came so close, so damn close. :)
dlcarrier 7 months ago

I've used that before. It works really well and was pretty easy to use. I had no idea the whole thing is just a giant shell script.

ykonstant 7 months ago

Many of these programs are true gems; the rkhunter script, for instance is both nice code (can be improved) and a treasure trove of information*.

Note that much of the code size of these scripts is dedicated to ensuring that the right utilities exist across the various platforms and perform as expected with their various command line options. This is the worst pain point of any serious shell script author, even worse than signals and subprocesses (unless one enjoys the pain).

*Information that, I would argue, would be less transparent if rkhunter had been written in a "proper" programming language. It might be shoved off in some records in data structures to be retrieved; actions might be complex combinations of various functions---or, woe, methods and classes---on nested data structures; logging could be JSON-Bourned into pieces and compressed in some database to be accessed via other methods and so on.

Shell scripts, precisely due to the lack of such complex tools, tend to "spill the beans" on what is happening. This makes rkhunter, for instance, a decent documentation of various exploits and rootkits without having to dig into file upon file, structure upon structure, DB upon DB.

cperciva 7 months ago

The FreeBSD Update client is about 3600 lines of sh code. Not huge compared to some of the other programs mentioned here, but I'm inclined to say that "tool for updating an entire operating system" is a pretty hefty amount of functionality.

The code which builds the updates probably adds up to more lines, but that's split across many files.

sebtron 7 months ago

For reference: https://cgit.freebsd.org/src/tree/usr.sbin/freebsd-update/fr...
craftkiller 7 months ago

poudriere is roughly 3 FreeBSD Update clients of sh code: https://github.com/freebsd/poudriere/blob/master/src/share/p...

xyst 7 months ago

It’s “only” 7.1K LoC, but my favorite is the “acme.sh” script which is used to issue and renew certs from Lets Encrypt.

https://github.com/acmesh-official/acme.sh/blob/master/acme....

Brian_K_White 7 months ago

already in the list
- dizhn 7 months ago
  
  Parent might have meant that they like it. I was going to say the same thing. That one and distrobox are quite impressive in how well they work.
  
  Macha 7 months ago
  
  Personally I abandoned acme.sh for lego because it didn't work well. For example, they lost track of the environment variables they were using for the server in their acme dns plugin across versions, thereby breaking what's supposed to be a fire and forget process.
  That and the CA that was exploiting shell injection in acme.sh convinced me it was time to move on
  
  dizhn 7 months ago
  
  I have also moved everything over to working with Caddy. It's so convenient that for one domain I even set up a little job to copy over the web certificate from it to be used for my smtp/imap.

eschneider 7 months ago

Sometimes shell is the only thing you can guarantee is available and life is such you have to have portability, but in general, if you've got an enormous shell app, you might want to rethink your life choices. :/

GuB-42 7 months ago

The problem is: which shell? Bash is far from being universal. Also, there is usually not much you can do with a shell without commands (find, grep, sed, cat, head, tail, cut...), and commands have their own portability issues.
Targeting busybox may be your best bet, but once you are leaving your typical Linux system, writing portable (Bourne) shell scripts becomes hard to impossible.
norir 7 months ago

I hear this fairly often and I'm genuinely curious how often you have shell but _not_ a c compiler or the ability to install a c compiler via the shell. Once you have a c compiler, you can break out of shell and either write c programs that the shell script composes or install a better scripting language like lua. At this point in time, it feels quite niche to me that one would _need_ to exclusively use shell.
- bhawks 7 months ago
  
  There are plenty of contexts where you won't have a compiler today - embedded (optimize for space) and very security hardened deployments (minimize attack surface).
  Historically people used to sell compilers - so minimizing installation to dev machines probably was a savings (and in those times space was at a premium everywhere).
  That said - I am with you, give me any other programming language besides shell!
- chubot 7 months ago
  
  That's what I thought -- I thought that OS X was the main Unix where it is "annoying" to get a C compiler (huge XCode thing IIRC), and it isn't used for servers much.
  But people have told me stories about working for the government
  (my background was more "big tech", and video games, which are both extremely different)
  Some government/defense systems are extremely locked down, and they don't have C compilers
  So people make do with crazy shell script hacks. This is obviously suboptimal, but it is not that surprising in retrospect!
  
  mdaniel 7 months ago
  
  > it is "annoying" to get a C compiler (huge XCode thing IIRC)
  FWIW, there is an apple.com .dmg "command line tools" that is much smaller than Xcode formal, but I actually came here to say that to the very best of my knowledge a $(ruby -e $(curl ...)) to brew install will then subsequently download pre-built binaries from GitHub's docker registry

sigoden 7 months ago

If you're looking for a tool to simplify the building of big shell programs, I highly recommend using argc (https://github.com/sigoden/argc). It's a powerful Bash CLI framework that significantly simplifies the process of developing feature-rich command-line interfaces.

jefftk 7 months ago

Back when I worked on mod_pagespeed we wrote shell scripts for our end-to-end tests. This was expedient when getting started, but then we just kept using it long past when we should have switched away. At one point I got buy-in for switching to python, but (inexperience) I thought the right way to do it was to build up a parallel set of tests in python and then switch over once everything had been ported. This, of course, didn't work out. If I were doing this now I'd do it incrementally, since there's no reason you can't have a mix of shell and python during the transition.

I count 10k lines of hand-written bash in the system tests:

    $ git clone git@github.com:apache/incubator-pagespeed-mod.git
    $ git clone git@github.com:apache/incubator-pagespeed-ngx.git
    $ find incubator-pagespeed-* | \
         grep sh$ | \
         grep system_test | \
         xargs cat | \
         wc -l
    10579

gjvc 7 months ago

lines ending in | do not require \
- jefftk 7 months ago
  
  Thanks for the tip!
  (I'm very conflicted about learning new things that make shell more convenient: I don't need more things pushing me toward using this in-many-ways-horrible-but-I'm-so-fast-with-it tool.)
  
  gjvc 7 months ago
  
  you might (or might not) like this, then https://iam.georgecox.com/2023/02/05/command-dispatch-via-ba...
- michaelcampbell 7 months ago
  
  I (also?) never new this; thanks.
  My old finger memory will still probably put them in, alas.
- greazy 7 months ago
  
  In my experience they do. Could it be related to strict bash mode?
  
  gjvc 7 months ago
  
  dunno. give some evidence.

mihaitodor 7 months ago

I just added winetricks (22k LoC shell script) https://github.com/Winetricks/winetricks

zabzonk 7 months ago

Don't know about the biggest, although it was quite big, , but the best shell program I ever wrote was in ReXX for a couple of IBM 4381s running VM/CMS which did distributed printing across a number of physical sites. It saved us a ton of money as it only needed a cheap serial terminal and printer and saved us so much money when IBM was wanting to charge us an ungodly amount for their own printers and associated comms. One of pieces of software I'm most proud of (written in the mid 1980s), to this day.

banku_brougham 7 months ago

Well, you gotta post this somewhere so we can see
- zabzonk 7 months ago
  
  Like much of what i wrote before the days of distributed version control, this is now lost in the mists of time. And the code wouldn't belong to me anyway.
walterbell 7 months ago

Thanks for the reminder that Rexx was open-sourced!
https://rexxinfo.org
- kristopolous 7 months ago
  
  I remember an article I read, probably around 1997 about cgi languages. It considered I believe, Rexx, TCL, Perl and Python. I bet it's at archive.org somewhere
  hah, found it. Byte, 1998: https://archive.org/details/199806_byte_magazine_vol_23_06_w...
  I tried metacard after reading that. It ran on linux: http://www.sai.msu.su/sal/F/5/metacard.gif ... I think I might have written some things with it. Cool, good luck on me trying to find 26 year old software.
  You can totally still run this if you want btw - just download some old linux ISOs from archive.org and install it in a VM ... hope they survive all their lawsuits; we're so lucky to have them around.

michaelcampbell 7 months ago

Probably my largest one that was an order of magnitude smaller than these for the most part, but it checked that my VPN was up (or not) and started it if not. (And restarted various media based docker containers.)

If it was up, it would do a speedcheck and record that for the IP the VPN was using, then check to see how that speed was compared to the average, with a standard deviation and z-score. It would then calculate how long it should wait before it recycled the VPN client. Slow VPN endpoints would cycle quicker, faster ones would wait longer to cycle. Speeds outsize a standard deviation or so would check quicker than the last delta, within 1 Z would expand the delta before it checked again.

Another one about that size would, based on current time, scrape the local weather and sunup/sundown times for my lat/long, and determine how long to wait before turning on an outdoor hose, and for how long to run it via X10 with a switch on the laptop that was using a serial port to hook into the X10 devices. The hose was attached to a sprinkler on my roof which would spray down the roof to cool it off. Hotter (and sunnier) weather would run longer and wait shorter, and vice versa. I live in the US South where shedding those BTUs via evaporation did make a difference in my air conditioning power use.

Y_Y 7 months ago

For those of you not familiar with "British Thermal Units", they're about 7e-14 firkin square furlongs per square fortnight.
eszed 7 months ago

These are my two favorite on the page, and seem somehow emblematic of the "hacker spirit".
- michaelcampbell 7 months ago
  
  Thanks; kind of you to say.

tpoacher 7 months ago

I'm writing a ticketing manager for the terminal entirely in bash. Reasonably non-trivial project, and it's been pretty enjoyable working "exclusively" with bash. ("exclusively" here used in quotes, because the whole point of a shell scripting language is to act as a glue between smaller programs or core utilities in the first place, which obviously may well have been written in other languages. but you get the point).

Having said that, if I were to start experimenting with an altogether different shell, I would be very tempted to try jshell!

Incidentally, I hate when projects say stuff like "Oils is our upgrade path from bash to a better language and runtime". Whether a change of this kind is an "upgrade" is completely subjective, and the wording is unnecessarily haughty / dismissive. And very often you realise that projects who say that kind of thing are basically just using the underlying tech wrongly, and trying to reinvent the wheel.

Honestly, I've almost developed a knee reflex to seeing the words "upgrade" and "better" in this kind of context by now. Oils may be a cool project but that description is not making me want to find out more about it.

bbkane 7 months ago

Man if you think Bash doesn't need an upgrade, more power to you, but every time I use it for anything slightly complicated I regret it, so I'm firmly in the "dear Lord, let's find a smooth upgrade from Bash" camp and I'm excited about Oils
- tpoacher 7 months ago
  
  Well I didn't necessarily mean that Bash is perfect and could never be improved; or that Oils may not have good ideas that are an 'improvement' over some things in bash.
  I meant that, if I come up with a project called "Spills: an improved Oils without all the awful warts", this says nothing good or meaningful about my project itself, and all it really does is make a casual implication that Oils is crap. So such a sentence would (personally) put me off Spills, rather than get me excited about it. Especially if I was already happy with Oils and in fact found it enjoyable, and certainly not having "awful warts".
  Yes you can go 'delve' into the project to figure out if and why Spills is actually better than Oils (and if and why Oils is 'crap' according to your tagline), but a tagline is chosen for a reason. It's a single sentence you use to describe and sell your project with. If your best tagline is "that other product is crap" then I'm not that interested to do any 'delving' in the first place. That was my point.
  Also, I think partly the reason bash gets a bad rep is because it's a scripting language, and the canonical one for that matter. A lot of the time people get frustrated with bash, I find it's because they're trying to use it as a 'system' language and get everything done via bash. But you're not supposed to. It's a scripting language, it's intended as 'glue'. Where system languages rely on external libraries, bash relies on external programs. You want to validate your inputs? Use a validator program. You want to operate on specific types? Use a type-specific program. If you need type-specific validation and you're doing it in bash, then of course you're going to get frustrated; it probably can be done, and possibly even well, but the language was just never designed for that kind of thing and you'd have to get knee-deep into arcane hackery rather than have lovely clean maintainable code.
  
  bbkane 7 months ago
  
  I respect that, and if I was happier with Bash I would probably agree with you :)

oneeyedpigeon 7 months ago

On a macOS machine, this:

  $ file /usr/bin/* | grep "shell script" | cut -f1 -d':' | xargs wc -l | sort -n

gives me:

  6431 /usr/bin/tkcon

but that's another Tk script disguised as a shell script; the next is:

  1030 /usr/bin/dtruss

which is a shell script wrapper around dtrace.

gosub100 7 months ago

Since the topic is shell, can I shamelessly ask a question?

I'm an SRE for a service everyone has heard of. I have inadvertently pasted into my terminal prompt multiple times now, which has attempted to run each line as a command. I see there is a way to disable this at the shell for each client, but what about at the server level? This way I could enforce it as a policy, and not have to protect every single user (including myself) individually. Said differently, I want to keep everyone who ssh into a prod machine from being able to paste and execute multiple lines. But not forbid paste entirely.

The only thing I could think of would be to recompile bash and detect if the input was from a tty. If so, require at least 200ms between commands, and error out if the threshold exceeded. This would still allow the first pasted command to run, however.

porridgeraisin 7 months ago

Bracketed paste might help you. It's an option for readline so it goes in ~/.inputrc. There's a way to set these options in bashrc as well which I don't remember.
It inserts a control sequence before and after the pasted contents that makes bash not execute all the lines. Instead it will keep them in the command line, post which you can choose to execute all of them in one go with enter or cancel with ctrl C.
- teo_zero 7 months ago
  
  Everything you can do in inputrc can be done in bashrc if you prepend "bind". In this case:
  bind 'set enable-bracketed-paste on'
TacticalCoder 7 months ago
Not an answer to your question but here's a "fun" thing I used to do... If you want to run a program from the CLI, which blocks you terminal (say an xterm), you can use that terminal as a temporary paste buffer. But with a trick.
Imagine you want to run, say, Firefox like that (say because you'd like to see stdin/stderr output of what's going on without having to find to which log file it's outputting stuff: it's really just a silly example):
```
    xterm>  firefox
    <-- the xterm is now "stuck here" (until Firefox exists)
```
if you now write, into that "blocked" xterm, the things you write shall execute when you exit/kill Firefox:
```
    Hello, world!
```
But one thing I used to do all the time and still occasionally do, first do this:
```
    xterm> firefox
    <-- the xterm is now "stuck" here (until Firefox exits)
    cat > /dev/null
```
You can now use that xterm as a temp paste buffer.
So, yup, a good old cat > /dev/null works wonder.
fargle 7 months ago

everybody works differently. what seems like a sensible guardrail for you would be extremely annoying for others.
so whatever you do, it should be a feature, even defaulted on. but never a policy that you enforce to "everyone who ssh into a prod machine"
if you find something that works well for you, add it as a suggestion to your developer docs.
- gosub100 7 months ago
  
  all good points, and it's not a great way to "make friends and influence people" by screwing with their workflow. After making this mistake at least twice myself (mainly due to fumbling with MacOS mouse/keyboard differences on my machine), I just wanted to prevent a disaster in the future from me or anyone else. But alas, I just need to be more careful and encourage others to learn from my mistakes :)

alsetmusic 7 months ago

I love exploring things like this. The demo for ble.sh interactive text editor made me chuckle with delight.

mulle_nat 7 months ago

I think for sports, I could wrap all the various mulle-sde and mulle-bashfunction files back into one and make it > 100K lines. It wouldn't even be cheating, because it naturally fractalized into multiple sub-projects with sub-components from a monolithic script over time.

branon 7 months ago

Biggest I know of is https://github.com/sonic2kk/steamtinkerlaunch/blob/master/st...

27k lines/24k loc

robhlt 7 months ago

FireHOL is another pretty big one, around 20k lines. It's a neat firewall configuration tool with its own custom config format.

https://github.com/firehol/firehol

anothername12 7 months ago

I would add Bash Forth to that. String-threaded concatenative programming!

transcriptase 7 months ago

Sometimes I do things I know are cursed for the sheer entertainment of being able to say it worked. E.g. my one absurdly complex R script that would write ungodly long bash scripts based on the output of various domain specific packages.

It began:

# Yeah yeah I know

chasil 7 months ago

Why is ReaR not on this list?

https://relax-and-recover.org/

This is the equivalent of the "Ignite" tool under HP-UX.

chubot 7 months ago

It's a wiki, you can add it!
Looks legit to me, e.g. https://github.com/rear/rear/blob/master/usr/share/rear/lib/...

sn9 7 months ago

I think around a decade ago, I tried installing a copy of Mathematica and the installer from Wolfram was a bash program that was over a GB in size.

I tried opening it up just to look at it and most text editors just absolutely choked on it. I can't remember, but it was either Vim xor Emacs that could finally handle opening it.

zertrin 7 months ago

Most likely it embedded a (g)zip inside the shell script? I've seen this frequently.
szszrk 7 months ago

Some installers include binaries inside their shell scripts. So the script extracts data from itself. Not great for transparency, but works and is single file.
- anthk 7 months ago
  
  shar, shell archives.
  
  szszrk 7 months ago
  
  A bit of a pain in the ass in some corporate environments, where binaries are scanned before use by DLP software ;/

sixthDot 7 months ago

everything you can do in `git gui` is actually a silly shell script but that works for me.

https://github.com/git/git/blob/master/git-gui/git-gui.sh

ilyagr 7 months ago

It (and gitk) are actually Tcl scripts. Note how the "shell script" you linked exec-s itself with `wish`

fny 7 months ago

I feel like this merits having a Computer Benchmarks Game for different shells.

svilen_dobrev 7 months ago

around ~2000, my build/install script had to simulate some kind of OO-like inheritance.. And there was python but noone understood it (and even less had it installed), so: bash - aliases had priority over funcs which had priority to whatever executables found in PATH.. so here you go - whole 3 levels of it, with lowest/PATH being changeable..

khushy 7 months ago

Most shell script installers are works of art

pwdisswordfishz 7 months ago

Surprised not to see Arch Linux’s makepkg on the list, btw.

NekkoDroid 7 months ago

Fun fact: makepkg isn't that big. IIRC it's <5k SLOC (I guess a bit bigger than some of the smaller ones on this list though).

rajamaka 7 months ago

Would love to see the same for batch on Windows

denistaran 7 months ago

If you’re scripting on Windows, it’s better to use PowerShell instead of batch. Compared to Bash, PowerShell is also better suited for large scripts because it works with objects rather than plain text. This makes handling structured data like JSON, XML, or command outputs much easier, avoiding the need for error-prone text parsing.
- shawn_w 7 months ago
  
  PowerShell is definitely better for new projects, but there's lots of legacy batch files out there.

752963e64 7 months ago

[dead]

throwaway984393 7 months ago

[dead]