dwheeler 3 months ago

I understood the problem, but I found the page's explanation a little confusing at first. In particular, "lexical differential highlighting" misled me, because the word "differential" made me think that his algorithm was comparing lines or tokens in some way, and it doesn't do that.

Basically, this algorithm tokenizes the source code, and tries to color each token so that identical tokens have the same color, but similar-looking tokens have very different colors. When tokenizing it specially handles comments and quoted text.

That's an interesting approach to countering errors from "it's almost the same but I didn't notice they were different". I wonder - if I were trying to review source code that were malicious, maybe I could vary the color algorithm using a random source so that the source code writer couldn't make different tokens look similar in color. That might be an interesting countermeasure to some kinds of underhanded code.

  • saagarjha 3 months ago

    Yeah, I thought this would do something like highlight all "mov" derivatives the same way and was somewhat surprised at the brevity of the code at the bottom…

  • shipof123 3 months ago

    That reminds me of something I read in applied cryptography when I was young about how one could theoretically pass messages with “ \b” to generate infinite versions of “identical” text to cause collisions

kazinator 3 months ago

This idea is related to "rainbow parentheses" (e.g. for Lisp): different levels of parens just get arbitrary different colors. But matching parens are the same color, just like two occurrences of %ecx in the same line are the same.

  • human_banana 3 months ago

    In emacs there's a package rainbow-delimiters-mode for parantheses, braces, brackets and what not, and rainbow-identifier-mode which makes variables names unique colors.

  • andrepd 3 months ago

    It's legitimately one of the best features of Excel. Does anybody know how I can achieve that in Sublime? The few options I found were subpar.

fake-name 3 months ago

There's a sublime text package that does this for a bunch of different languages: https://github.com/vprimachenko/Sublime-Colorcoder

I'm not involved in any way, I just ran it for a while at one point.

  • synthc 3 months ago

    There is also an emacs package that does something similar: https://github.com/jacksonrayhamilton/context-coloring

    • sprobertson 3 months ago
      • synthc 3 months ago

        I think DrRacket also has something like this, but it shows lines between identical variables instead of using colors.

      • xvilka 3 months ago

        Seems dead for many years already.

        • cjs_2 3 months ago

          How many updates per month are you expecting for a package like this?

          • xvilka 3 months ago

            Multiple times a day, like radare2. Seriously, if there is no activity in 6 months - then the project is dead.

            • mikekchar 3 months ago

              This is a lexical highlighter that tries to highlight similar, but different text differently. There's a point in time where there are no new features necessary.

              radare2 is a portable reversing framework. I can't think of 2 projects more dissimilar. Perhaps you were thinking that the highlighter actually did something other than color text in an arbitrary way? Can you give an example of something that you would expect to change about it, especially at the rate of multiple times a day?

  • guessmyname 3 months ago

    > There's a sublime text package that does this for a bunch of different languages

    You don’t need a package for this, Sublime Text 3 already does this automatically [1].

    [1] https://www.sublimetext.com/docs/3/color_schemes.html#hashed...

    • nh2 3 months ago

      How can I use it?

      The simplest way seems to be to use the "Celeste" color scheme which implements this. Is this the only way? I'd like to use a dark theme, like the default Monokai.

    • fake-name 3 months ago

      Well, neat!

      I haven't used the plugin since the ST2 days, so I didn't realize it was no longer needed.

  • soulofmischief 3 months ago

    Webstorm has an option for this and it makes things like dense enclosures or JSON actually parsable.

    • galaxyLogic 3 months ago

      Which feature is that? I've been using WebStorm for some time and wishing for a feature that would highlight all matching parenthesis (), [] and {}.

      • _virtu 3 months ago

        - plugin: rainbow brackets

        - preference: semantic highlighting

        • galaxyLogic 3 months ago

          Thanks. I tried it but it did not quite do what I needed so I uninstalled it. (I'm afraid of plugins in general taking performace away). It worked on JS-files but I have HTML-documents containing (example) JavaScript etc. code. Seems it did not react to parenthesis in them. Also even in plain JS-files you may have strings containing parenthesis.

          Standard WebStorm already highlights matching parenthesis in JavaScript and does a good job at that.

          • soulofmischief 3 months ago

            I don't use rainbow brackets, but I do use semantic highlighting. It's worth seeing if semantic highlighting would still be useful to you. It greatly helps scanning speed.

  • cylon13 3 months ago

    What made you decide to stop using it?

gpspake 3 months ago

I remember Doug Crockford mentioning the idea of scope based highlighting for JavaScript in a workshop years back and thinking it would be useful. Cool to see it pop back up here.

Edit: Here's a scope based js highlighting repo that cites Crockford as the inspiration but unfortunately he posted the linked description on Google+ so... uh... oops https://github.com/azz/vscode-levels

zokier 3 months ago

Complete tangent but one thing that I've wondered about modernish asm mnemonics is how complex they are, and especially how much type information they encode in a semi-structured way. Taking the authors example of PMULHUW, the core operation is MUL(tiply), P for packed integers, H for high result, U for unsigned, and W for word sized (16 bit). I feel like there must be a better way to express the same thing that wouldn't lead stuff looking like one word all caps alphabet soup. I don't know exactly what that would be, spelling out everything would probably make assembly way too verbose. So some sort of middle ground would be nice.

  • chc4 3 months ago

    > I feel like there must be a better way to express the same thing that wouldn't lead stuff looking like one word all caps alphabet soup.

    Yes, that's called a programming language :^)

    Assembly is usually essentially a macro engine over the actual instructions you are emitting for your processor, and the Intel x86 chip manuals or whatever you're targeting use the outrageously long proper names, so your assembly will too. Heck, the author mentions specifically reading assembly too, so knowing what you're reading is 1:1 with the actual instruction stream is helpful, no matter how bad the official names are.

    Actual programming languages just abstract away some complex instructions like SSE vectorizing (which have famously terrible names) to some high-level API and intrinsic functions. And you should too.

    • zokier 3 months ago

      > the Intel x86 chip manuals or whatever you're targeting use the outrageously long proper names, so your assembly will too.

      I don't see why that has to be the case; why I'd must use Intel specified mnemonics instead of my own syntax? While not as radical, the att vs intel syntax demonstrates that the vendor syntax is not the only option. As long as the syntax captures all the details of instructions to be completely unambiguous then it should be perfectly interchangeable.

      I specifically do not desire higher level of abstraction because I want to maintain that 1:1 relation with the actual machine code. Heck, even Intel mnemonics do not truly have 1:1 relation to machine code, because the instruction (encoding) can depend on operand types.

  • breck 3 months ago

    I’ve done some experiments with tree languages that compile to ASM. I think it’s definitely the way forward.

  • okaleniuk 3 months ago

    Actually, it would be interesting to experiment with coloring all the abbreviations separately. P, then MUL, then H, then U, then W (or UW altogether). Not sure if it works, but it's something worth trying.

lifthrasiir 3 months ago

[1] was a similar idea where color is determined by the prefix, so for example `currentIndex` and `randomIndex` are distinguished from each other but `currentIndex` and `currentIdx` are not.

I'm not sure about both because, i) there are only a handful number of mutually distinguishable colors ([1] does mention the same complication), ii) we often want to highlight both the similarity and difference among identifiers and the cutoff is not clear. For i) we may want to leverage more formattings; for ii) I really don't have a good solution.

[1] https://medium.com/@evnbr/coding-in-color-3a6db2743a1e

css 3 months ago

Wow, this actually looks amazing for math (though it seems to be stripping out a lot of the code I pasted in): https://i.imgur.com/Iur9FgK.png

How difficult would it be to implement this as a VSCode extension?

  • petschge 3 months ago

    This looks pretty good, but notice how it does not split "log(difference_squared" into two tokens. Adding '(' and ')' as delimiters should fix that.

    • css 3 months ago

      Good point. That helps, but it still strips about half of the lines of my code out for some reason. Specifically, this part: https://i.imgur.com/L117fYm.png

  • BenFrantzDale 3 months ago

    I love that visually I can find usages of, day, `alpha`.

    I do wish it did some syntax highlighting, but one could easily imagine blending between this and conventional syntax highlighting.

panopticon 3 months ago

Tangential, but "Just as every other piece of code on Words and Buttons, it's properly unlicensed." reads like the code is literally unlicensed and not using the Unlicense license.

It's a little weird to me because unlicensed code is very different than the Unlicense license.

  • ChrisSD 3 months ago

    And I'd add that CC0 is more "properly unlicensed" than Unlicensed is. Or at least more thoroughly so.

canadaduane 3 months ago

I think this is also called semantic coloring. Visual Studio Code has it on the roadmap to try this year: https://github.com/Microsoft/vscode/wiki/Roadmap#editor

  • sixplusone 3 months ago

    No, semantic coloring is about the editor having deep knowledge about your code, this is about having very similar looking names or lexemes appear different. FTA:

    It's fine that mov doesn't look like eax, but I'd rather prefer pmulhw and pmulhuw to be shown as differently as possible.

m0zg 3 months ago

I'm not a fan of this approach in general, but I am a fan of highlighting instructions from different subsets in different colors in asm, and perhaps differentiating the saturation by latency/throughput. I.e. a "heavy" instruction should probably be bright, urgent red, whereas loads, stores, adds, bit ops should probably be more muted.

IshKebab 3 months ago

Something like this is implemented in vscode-clangd. I used it for a bit but it's just too colourful. There are just colours everywhere and it's overwhelming. I went back to normal syntax highlighting.

KuhlMensch 3 months ago

Curious. I mean it sounds like relying simply on contrast rather than the structure. I know our visual system is insane at contrast, and we, as humans tend to group tokens as a shorthand.

What mades me immediately pause, is when I reflect reading javascript: How often do I scan past 3+ lines using colour as my "bridge"? As far as I can remember, not often. Maybe I've overestimated colour-to-lead-me-through-structure. Maybe it is often, colour-to-give-me-token-rhythm. Curious.

I'll have to remember to load up CSS or a test suite (with lots of framework calls) using this approach.

SilkySailor 3 months ago

I really like this idea. I always wanted to try to take this to insane levels. For example, for large code bases have different images associated with different modules. So that your brain has more things to latch on to. e.g.: This function from the banana module is calling the teddy bear module. It seems a bit absurd since there is no correlation between the image and the module functionality but I still want to try it.

stochastimus 3 months ago

This is really cool. It kinda looks like rainbow salad, but who cares? For me at least, it is much easier to visually parse.

DarmokJalad1701 3 months ago

Nice to see some MASM32 code in there in one of the examples. That's from a WIN32 app if I am not wrong.

Brings back memories.

FrancisNarwhal 3 months ago

Oh my god this would have saved my bacon two days ago. p_value_default is so visually similar to v_value_default that after sitting there with another developer trying to figure out the problem for 30 mins we rewrote the whole method.

Only the next day after the deadline pressure was gone did I spot the problem.

Avamander 3 months ago

I understand it in the case of assembly, but I don't think it'd work for something like Python better than existing syntax highlighting. So it's nice and I hope things like Radare or IDA adopt it where people even intentionally make syntax highlighting nearly impossible.

ggm 3 months ago

I encourage the original author to find a way to talk about assembly coding in the nuclear industry.

  • gcbw2 3 months ago

    what do you expect to be different from your run-of-the-mill maintenance of outdated industrial automation gig?

    • YeGoblynQueenne 3 months ago

      At a guess, an increased probability of causing a criticality accident as a result of getting a program slightly wrong.

    • exDM69 3 months ago

      I'm assuming the "reading assembly" part is verifying compiler output matches what the programmer thinks and signing it off as a "blessed binary".

      Some safety critical areas of software are done this way, in aerospace for example. But run-of-the-mill automation jobs aren't.

    • ggm 3 months ago

      bit flips from surplus neutrons? TMR? Batshit crazy lack of process checks on 'what does this button do'

      war stories.

      actually, I encourage anyone in coding to share run-of-the-mill maintenance of outdated industrial automation, as a gig. I'd read that blog.

pcwalton 3 months ago

In this particular case, the highlighting is a clever workaround for the fact that x86 register naming conventions are awful. RISC architectures tend to number the registers, which makes things significantly easier to read.

m463 3 months ago

Not code, but I'm surprised that email clients don't have better colorization from the getgo.

I think it would be the single best thing to help a huge amount of people.

gnuvince 3 months ago

There are too many colors in too many places. Everything is highlighted and nothing stands out.

  • galaxyLogic 3 months ago

    I agree. Rather than rainbow the brackets I think a better solution is to highlight the matching brackets with a temporarily different color as user moves the cursor.

    Or at least make it easy to turn the rainbows on and off.

  • Insanity 3 months ago

    which forces you to read everything individually and not miss something. I prefer less highlighting for this reason. I highlight a few keywords but other than that I don't highlight. I find it helps me _read_ the code rather than skim the code. (and for skimming, I'd grep through it most likely looking for something specific rather than trying to understand it.)

Analemma_ 3 months ago

> In 2013 I was working in nuclear power plant automation ... the job required reading a lot of assembly code.

Does anyone else find this terrifying? Nuclear power plant automation should be done in the safest of the safe languages. I would be alarmed at the thought of stuff like this being written in C, never mind in assembly!

  • holy_city 3 months ago

    Not really. There are plenty of chips out there without even a C compiler. Some don't even support Turing Completeness. There's even more that were designed and installed before manufacturers started slapping C compilers together for their DSPs, FPGAs, and MCUs.

    It would be weird to care about memory safety when your board doesn't even have a heap!

  • ARandomerDude 3 months ago

    To me, it's less terrifying than a complete rewrite in a modern language. Modern languages are great. Rewrites are often littered with bugs.

  • pvg 3 months ago

    Systems like that tend to be designed with different kinds of safeties. A mildly silly example - your typical Rails app doesn't have a watchdog timer, your toaster probably does.

    • okaleniuk 3 months ago

      An excellent example!

  • sixplusone 3 months ago

    Yes he said reading assembly, not writing. Whatever they use, I'm glad that someone's having a glance at what the compiler spits out. Also could be talking about microcontrollers, and in an industrial setting PLCs wouldn't be unexpected.

splittingTimes 3 months ago

Does something like this exist for Java eclipse?