Bash one-liner to produce a list of HEX color codes that read like English words

gist.github.com

199 points by ailef 3 years ago

kragen 3 years ago

I tried this a few years ago; http://canonical.org/~kragen/sw/dev3/colors.html has them as foreground colors and http://canonical.org/~kragen/sw/dev3/colors.2.html has them as background colors. I tested 3-letter words as well as 6-letter words, and used 1 as "l" as well as "I", but I didn't try aghasemi's very productive suggestion of using 5 as S. I don't remember if it it didn't occur to me or if I tried it and didn't like the results.

Some of them are pretty #bad (#011 doesn't really look much like "oil") and some, though they read quite well, correspond to awful colors; you might even say, #faeca1 colors. Still, I've made my #bed, #0dd as it may be; now I must #11e in it. I think I've #fed you enough #babb1e for today.

js2 3 years ago

The gist is rather a pipeline of Unix commands with no bash necessarily involved. Here it is in shellcheck-compliant 100% bash:

    #!/usr/bin/env bash
    shopt -s nocasematch
    while read -r word; do
        if [[ $word =~ ^[abcdefoi]{6,6}$ ]]; then
            word=${word//o/0}
            word=${word//i/1}
            word=${word^^}
            printf '#%s\n' "$word"
        fi
    done < /usr/share/dict/words

This could be collapsed to one line with semicolons. On the macOS 12.6 dictionary I get 59 words.

Edit: and in sed which someone just asked me for elsewhere:

    sed -n -e '
    /^[abcdefoi]\{6,6\}$/I {
    s/o/0/g;
    s/i/1/g;
    s/^/#/;
    y/abcdef/ABCDEF/;
    p;}' < /usr/share/dict/words

kps 3 years ago
```
     sed -n -e 'y/abcdefOoIi/ABCDEF0011/' -e 's/^[A-F01]\{6\}$/#&/p' /usr/share/dict/words
```
- js2 3 years ago
  
  I’d golf with you but I think you got a hole in one there. I didn’t spend any time thinking about how to make the sed more compact. I sorta just translated what I’d already written in bash.
version_five 3 years ago

Thanks for this. I'd probably call the original the GNU coreutils version. The linked github also has a sed-only version in the comments. It's instructive to see the different versions.
- kps 3 years ago
  
  > I'd probably call the original the GNU coreutils version.
  Why? The only GNUish bit is the grep -P option, which is unnecessary (-E will do as well).
  
  version_five 3 years ago
  
  I would have considered tr to be part of gnu coreutils, awk, not necessarily but the default on a mac is gawk I believe
  
  kragen 3 years ago
  
  tr predates GNU by about a decade.
- js2 3 years ago
  
  I just added a sed version as well. I'll have to click through and see how closely it resembles what's in the gist.
  bash is actually pretty powerful if you don't mind its baroque syntax. Writing it in POSIX would be a bit more challenging. You could use a case statement for the pattern matching, but I'm not sure about the substitution.

nine_k 3 years ago

Never mind the colors.

This snippet demonstrates how a number of small tools, each doing its narrow job, strung together via the most trivial interface, produces a non-trivial result.

This composability is still unreachable to the vast majority of GUI tools.

vesinisa 3 years ago

The non-trivial part here is actually the source data (the dict file.) It is also its pitfall - after adding 5 for S you should see a lithany of plurals. Most dict files (for English anyway) however seem to omit plural nouns. I guess the logic is that in English most plurals are regular, and the naive algorithm for deriving them from the singular forms (correctly most of the time) is quite trivial.
throwing_away 3 years ago

SaaS companies hate this one weird trick!
miohtama 3 years ago

While it is a neat trick as one liner, I would recommend against doing anything like this in any software that requires maintenance. The code is hard, or impossible to follow, no comments. Brittle and only few people can understand what it really does. Better option would be 10 lines of Python or JavaScript with some comments.
- kragen 3 years ago
  
  I thought it was trivial to understand, though the comment above it helps a lot, and it's maybe an unfair advantage that I'd done the same thing in pretty much the same way four years ago. It probably depends on your background; I wouldn't write it that way for people who didn't know shell, just like I wouldn't write this comment in English for people who speak only Spanish.
  I'm not convinced that it's easier to understand in Python (even though I simplified it a bit, in part because one piece of the Python 3 braindamage was moving string.maketrans to bytes):
  import re def main(words): for word in words: word = word.strip().upper() if re.compile(r'[A-FOI]{6}$').match(word): print('#' + word.replace('I', '1').replace('O', '0')) if __name__ == '__main__': main(open('/usr/share/dict/words'))
  I think the shell version is clearly better for interactive improvisation, though.
  
  js2 3 years ago
  
  I prefer search with an explicit '^' in the pattern to using match. For a throw-away script I'd probably do this:
  import re is_hex_like = re.compile(r"^[a-foi]{6}$", re.I).search for word in filter(is_hex_like, open("/usr/share/dict/words")): hexword = word.upper().replace("O", "0").replace("I", "1").rstrip() print(f"#{hexword}")
  
  Too 3 years ago
  
  findall and multiline mode makes it even easier, at the cost of loading whole file into memory though, for that reason your alternaive is probably better
  import re wordlist = open("/usr/share/dict/words").read() for word in re.findall(r"^[a-foi]{6}$", wordlist, re.IGNORECASE | re.MULTILINE): hexword = word.upper().replace("O", "0").replace("I", "1") print(f"#{hexword}")
  
  kragen 3 years ago
  
  That's nicer than my version! I'm curious why you prefer search(), though.
  
  js2 3 years ago
  
  1. I don't have to remember which implicitly anchors to the start of the string and which doesn't. 2. I prefer the explicitness of '^' (maybe that's just another way of stating (1). 3. I can use re.M to modify '^' to match at the start of each line on multiline strings, whereas match will still keep searching from the front. 4. The asymmetry of anchoring the front but not the end is weird. Python now has fullmatch, but ugh, just use the pattern for that if you need it. 5. Off the top of my head, I can't think of another language that has a regex function that implicitly anchors the front.
  
  kragen 3 years ago
  
  Hmm, I see. Interesting! I think of regexps as state machines, so I think of the implicit loop to find a starting position as extra complexity, which can give rise to for example performance problems, though it's true that in many languages you can't avoid it.
- rascul 3 years ago
  
  Comments can be added. Understanding it requires learning the tools. Just like understanding python or javascript requires learning python or javascript. It's not impossible to follow.
- lrvick 3 years ago
  
  I understood it instantly on first read. Probably depends on how much shell you write.
kupopuffs 3 years ago

Ah yes, the Unix Way

pwpwp 3 years ago

It's missing #DADB0D

kragen 3 years ago

I look forward to your improved version that tests against the Cartesian product of /usr/dict/words with itself plus the empty string and maybe some slang words like "bod". I suggest you limit to shortish words before the Cartesian product rather than after.
- mellosouls 3 years ago
  
  https://en.wikipedia.org/wiki/Dad_bod
  
  kragen 3 years ago
  
  Testing against a list of all Wikipedia article titles is indeed also an avenue worth exploring, and I hope you explore it.
- gabrielsroka 3 years ago
  
  I installed the American English large dictionary on Ubuntu. It has `bod`.
  
  kragen 3 years ago
  
  Nice! I'm just using the 102'401-entry version.
kgwxd 3 years ago

Wish I could say the same.

b800h 3 years ago

Is HEX another of these words which gets erroneously capitalised, like SCRUM or GAP analysis?

markrages 3 years ago

I've noticed that for years in embedded (where we use "Intel HEX" formatted files) but I ascribed it to a field full of eccentric loners doing idiosyncratic things, or some kind of DOS 8.3 brain damage.
teo_zero 3 years ago

Or ELO score?

Waterluvian 3 years ago

Does anyone have a link to a guide on how to write Python or node or rust programs that behave well with bash? Ie. Streaming inputs and outputs and other things I probably don’t know about?

KMnO4 3 years ago

It’s pretty easy. You have three basic streams:
1. Stdin - just iterate through sys.stdin
2. Stdout - regular printing will go there
3. Stderr - print errors here eg with print(…, file=sys.stderr)
And then beyond that as long as your script gets invoked by the interpreter (Ie #!/usr/bin/env python) everything will “just work”.
- IgorPartola 3 years ago
  
  Don’t you also have to keep in mind how often you flush outputs/how you buffer? Encoding? Handle EOF correctly?
  Not saying it’s hard but also it’s not 100% covered by what you said.
  
  markrages 3 years ago
  
  Those are advanced topics and you can look them up if you need them.
  Generally, Python does the right thing by default for scripting use: line buffered, system encoding, EOF handled naturally by the iterator protocol.
- gnubison 3 years ago
  
  And preferably use fileinput for the stdin so that you can name files on the command line as well
- Calzifer 3 years ago
  
  And avoid seek. Pipes are not random access. I once tried to use a python library to convert a file from stdin but it failed on a f.seek(0) the library added 'just in case' in the beginning.
jeroenjanssens 3 years ago

My book Data Science at the Command Line has a chapter about this that scratches the surface and lists some resources in case you want to dive deeper [1]. I can also recommend checking out packages such as Rich [2] and Click [3], if only to get an idea of the possibilities when it comes to creating command-line tools with Python.
[1] https://datascienceatthecommandline.com/2e/chapter-4-creatin...
[2] https://github.com/Textualize/rich
[3] https://click.palletsprojects.com/en/8.1.x/
eyelidlessness 3 years ago

This is oddly something that some of the earliest Node interfaces do quite well. (I say “oddly” because Node was mostly promoted early on for network/server use cases.) It’s generally not idiomatic in these days of async/await and Web Streams, but streaming IO was a core async primitive from very early on. 0.1.90 for child processes, unspecified for the main process object so possibly from the first release. Granted the interfaces really show their age in terms of incidental complexity, they’re far from being as simple as their shell equivalents. But as far as behaving well, streaming is solid and there’s a wealth of compatibility affordances depending on how portable your script needs to be.
zokier 3 years ago

For Python using fileinput module goes long way: https://docs.python.org/3/library/fileinput.html
- Too 3 years ago
  
  With argparse.FileType, similar behavior integrates well with argparser https://docs.python.org/3/library/argparse.html#argparse.Fil...

netule 3 years ago

Reminds me of debugging pointer values in C with 0xDEADBEEF.

dwheeler 3 years ago

I appreciate the presence of #C0FFEE.

Can't do computing without that!! :-)

layer8 3 years ago

That color doesn’t look healthy though. ;)

brrrrrm 3 years ago

Similarly, a list of hex words https://jott.live/code/hex_words

silisili 3 years ago

Fun idea. Perhaps could stretch a little like we did in calculators and add 5 for S, or even 7 for T, but that would likely be a bit less readable.

ghasemi 3 years ago

I added a comment for 5 vs S. 7/T looks like it's a bit too much :D
bawolff 3 years ago

You could just do full 1337 speek.
- genewitch 3 years ago
  
  pager code, probably better. "143" = I love you; but 177427*711773 = what time. I don't miss those days. I never had a pager, and i managed to convince all my friends that they shouldn't, either, by pager bombing them. Pagers are still in use, and they're plaintext over the air so if you live near a place that uses pagers (hospitals still use them, for instance), you can get all the messages in real time. It's the frequency. It's in VHF (iirc) so it goes places microwaves cannot; it's also low bandwidth, so the small spectrum carved out for it is usually enough for hundreds of pagers in the area.
  And since there's no real place to mention this elsewhere, there's a HTML color bot on fediverse (botsin.space) that periodically posts two colors, that work as compliments as foreground and background, and vice versa. I haven't seen it in a while, but our little instance has gotten popular so the feed rate is up near a few hundred posts an hour to sift through.
- mod 3 years ago
  
  Little town I frequently drive through has a population of 1337.
  I always have a little giggle.
  
  hoyd 3 years ago
  
  what town and country?
  
  mod 3 years ago
  
  I like my pseudo-anonymity here.
  It's in the US. Here's the census data to discover many occurrences of "1337"
  https://www.census.gov/data/tables/time-series/demo/popest/2...
  FWIW the town I'm talking about has a different population listed there, a little bit short. The road sign still says 1337, though, as of Thursday.
- silisili 3 years ago
  
  come to think of it, doing a separate list of toLower l -> 1 isn't a bad idea either...

Yenrabbit 3 years ago

It makes me happy that #ACAC1A is about the right colour for the flowers of the sweet acacia tree (a pale yellow).

dspillett 3 years ago

I know this is only looking at single words, so would miss this, but I always like to work ABAD1DEA into PoC work.

eyelidlessness 3 years ago

I like this! I usually try to pick a word/set of words that relates to the subject matter I’m testing, or something off the top of my head when that fails. But ABAD1DEA is a great default for exploratory work.
This is also an 8 character string, which I had wrongly inferred from usage in existing code to be restricted to certain APIs, but I looked it up and it’s evidently part of CSS Color Module Level 4 and has wide browser support. The one-liner could trivially be expanded to support 8-character codes. Not sure how trivial multiple words would be, my gut says “reasonably so but won’t feel quite so reasonable on one line”. Alas I’m on mobile so I’m not gonna try it right now.
- dspillett 3 years ago
  
  Just as RRGGBB has a three colour shorthand, you can use for characters too: RGBA as a shorthand for RRGGBBAA.

1vuio0pswjnm7 3 years ago

Not sure why this is being called "Bash" one-liner. It will work with many shells. It will run noticeably faster in Dash, for example. Test it yourself. Linux chooses Dash for non-interactive use, like this one-line script, because it is faster than Bash.

1vuio0pswjnm7 3 years ago
Some examples of where one finds Dash (NetBSD-derived Almquist shell, or "ash") in Linux
```
   The git.kernel.org repository
   Slackware
   Debian 
   Unbuntu
   Gentoo
   Arch initramfs
   Alpine 
   Tiny Core 
   OpenWRT
   Any other distrib that uses Busybox
   Android
```
What the OP fails to mention is that this shell one-liner (cf. "Bash one-liner"), as written, requires GNU grep, thanks to "-P".
BusyBox grep does not have a "-P" option.
In the case of Android, Google uses NetBSD userland programs, e.g., grep, which also does not include PCRE, i.e., "-P".
https://coral.googlesource.com/android-core/+/3458bb6ce1d3e7...
https://git.kernel.org/pub/scm/utils/dash/dash.git/
```
   curl -O https://mirror.rackspace.com/archlinux/iso/2022.10.01/arch/boot/x86_64/initramfs-linux.img
   xz -dc < initramfs-linux.img|cpio -t|grep -m1 usr/bin/ash
```
- kps 3 years ago
  
  It's written with `-P` but doesn't actually need it. Standard `-E` works just fine instead.
  
  1vuio0pswjnm7 3 years ago
  
  How many "professional" programmers even know the difference between BRE, ERE and PCRE.
  Perhaps this is why use of regex is so controversial amongst a majority of "professional" programmers. They are trying to use PCRE for every pattern matching task, i.e, even ones where it is not necessary, whether it is within their programing language or with command-line utilities. This "Bash one-liner" is a simple example.
  I have reviewed a number of books written about regular expressions and for the most part^1 they focus only on regex as implemented in popular programming languages. That almost invariably is PCRE or some form of PCRE-like pattern matching. There is little distinction, let alone acknowledgment, between PCRE/PCRE-like patterns and anything simpler.
  Not being a "professional" programmer, I use regex everyday but I never (intentionally) use PCRE.^2 Too complicated for my tastes, not to mention slow if using backtracking.
  1. I recall one older book that did include an incomplete table attempting to show which type of regex was used by various UNIX utilities in addition to what regex was used by popular programming languages of the day.
  2. For programs that optionally link to a PCRE library, I re-compile without them without it.
LambdaComplex 3 years ago

> Linux chooses Dash for non-interactive use
That entirely depends on the Linux distro.

ratsmack 3 years ago

I don't like using multiple commands.

    mawk 'BEGIN{b = "[abcdefois]"; l = "[a-z]"; W = "^" b l l l l l "$"}; $0 ~ W {print "#" toupper($0);}' /usr/share/dict/words

kbr2000 3 years ago

I came up with:

  gawk 'BEGIN {IGNORECASE=1} ((length($1) == 6) && /^[a-fois]+$/) {gsub(/o/,0);gsub(/i/,1);gsub(/s/,5); print toupper("#"$1)}' /usr/share/dict/words

(caveat: it does not filter out duplicates)

adrianmonk 3 years ago
You can also do it entirely in sed:
```
    sed -E -e '/^[a-fio]{6}$/!d; y/abcdefioIO/ABCDEF1010/; s/^/#/' /usr/share/dict/words
```
- xertopertha 3 years ago
  
  This produces 35 items. The grep version gives 93
  
  adrianmonk 3 years ago
  
  Yeah, I failed to make the pattern case insensitive.
  Here's a fixed version that also handles S/5:
  sed -E -e '/^[A-FIOSa-fios]{6}$/!d; y/abcdefiosIOS/ABCDEF105105/; s/^/#/' /usr/share/dict/words
Keyframe 3 years ago

you also aren't going to get valid color codes

kgwxd 3 years ago

I wanted a t-shirt that is the color #FAB; and says #FAB; on it, thought it'd be a fun one for digital artists, then I found out how hard it would be to get t-shirt that matches it just right.

teaearlgraycold 3 years ago

Fun fact: Every Java .class file starts with the magic bytes C0FEBABE

belter 3 years ago

CAFEBABE
"...We used to go to lunch at a place called St Michael’s Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line, it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after “CAFE” (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn’t seem terribly important or destined to go anywhere but the trash can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD – it was eventually replaced by RMI...."
- James Gosling
- jrumbut 3 years ago
  
  I had the distinct pleasure of discovering CAFEBABE myself, in high school (not sure what direction this is dating myself in but I'll risk it), when I went on a tear of opening odd things in a hex editor.
  Now I will never be able to see without thinking of this story: https://aphyr.com/posts/341-hexing-the-technical-interview
- TillE 3 years ago
  
  I've been using that as my own alternative to DEADBEEF for years, I had no idea it was part of the official Java spec. Maybe it got lodged in my brain subconsciously at some point.
tragomaskhalos 3 years ago

It's CAFEBABE

cantSpellSober 3 years ago

nick0garvey 3 years ago

Interesting one liner but would like to see the colors it generates

kps 3 years ago

If your terminal does 24-bit colour, and your shell is bash or ksh or zsh or close,

    sed -n -e 'y/abcdefOoIi/ABCDEF0011/' -e '/^[A-F01]\{6\}$/p' /usr/share/dict/words | while read c; do printf '\033[38;2;%d;%d;%dm#%s\033[0m\n' $((0x${c:0:2})) $((0x${c:2:2})) $((0x${c:4})) $c; done

srcreigh 3 years ago

View colors here
https://codepen.io/srcreigh/pen/QWrrgdx
Code thanks to gabrielsroka on the Github thread
- blondin 3 years ago
  
  oh wow, #seabed generated a beautiful blue. what a truly happy accident!
  
  cmehdy 3 years ago
  
  Acacia is green, and fesses (buttocks in French) is pink. Coocoo is the only red in a surrounding of violets, and sobbed is a transparent-y blue like a tear :)
  
  srcreigh 3 years ago
  
  Access is green, acidic is red, and my favourite, cabbie is a nice yellow!
LanternLight83 3 years ago

https://gist.github.com/aileftech/dd4f5598b1f3837651fdf16e5a...
Silverback_VII 3 years ago

Not long ago I saw a link here to site with the words and the colors...
- amenghra 3 years ago
  
  This maybe? https://news.ycombinator.com/item?id=31673662
- styfle 3 years ago
  
  Also this https://news.ycombinator.com/item?id=14537747

pushedx 3 years ago

What about 7 for T and also 3 for E?

jaclaz 3 years ago

E is a legit hex character:
0123456789ABCDEF
isn't it?
The 3 for E in 1337 speak was on numerical calculators that didn't display letters.
- pushedx 3 years ago
  
  Using 3 you can get more colors with human readable names, and maybe pick the canonical color for any given word based on some criteria of interestingness.

IgorPartola 3 years ago

No 7 for a T?