Show HN: A stab at building my own string diffing library

29 points by alexmacarthur 2 years ago

Took a stab at building my own string diffing JS package.

I built an interactive demo for TypeIt’s website (https://typeitjs.com/build-your-own) a while back. The approach I took to handle user input necessitated a way to calculate the difference between the versions of a user’s text input.

I searched around for a package to help me out and found a couple of good ones (like fast-diff), but I either didn’t really like their API or didn’t want to take on a huge new dependency. Instead, I thought I’d give it a shot myself (famous last words).

I dove into it having no real formal knowledge of the algorithmic approaches to string diffing, and so things got frustrating real fast. But then I figured out a way to build it out using JavaScript symbols to link characters that helped ease the complexity a ton. It all resulted in “striff,” which comes in at just over 600 bytes gzipped, and that I’ve been using in production for a couple of months now.

It’s a little weird looking back now, because I don’t think I’d recommend someone taking the same path in building your own, but at the same time, there’s an immense satisfaction knowing I was able to figure out a pretty reliable approach. Check it out:

https://github.com/alexmacarthur/striff

gus_massa 2 years ago

Clicky:

TipeIt demo: https://typeitjs.com/build-your-own

Diff library: https://github.com/alexmacarthur/striff

The demos get's quite slow if I type too much. Is the search method linear or quadratic?

If I type too much without spaces, like "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab" (that is an evil string for search) then in the typing box the text is wrapped but in the reply box the text is all in the same line and goes outside the box.

samatman 2 years ago

Having looked at the code a bit I stumbled onto this:

https://github.com/alexmacarthur/striff/blob/master/src/util...

This... can't be right.

Can you explain how your algorithm differs from, in particular, this one: https://en.wikipedia.org/wiki/Longest_common_subsequence_pro...

Also, why does it differ from that?

  • romellem 2 years ago

    Aside from the fact that they are basically generating power sets, if you don’t need all the values at once, a better method would be to use a generator function if all you are doing is iterating. That way you don’t need to create all the values in memory first.

    A library that I like for this is https://www.npmjs.com/package/generatorics although there may be more modern ones available.

    • samatman 2 years ago

      > Aside from the fact that they are basically generating power sets

      !

      String diff calls for an M x N cost matrix and simple dynamic programming with tail calls (or you can unwind it into loops if you have to).

      Not... not power sets.

  • melony 2 years ago

    I would recommend using an optimized permutations library/generator on the string start and end indices instead of directly manipulating the slices.

    • samatman 2 years ago

      > permutations

      !

      Hey, you said it.

      I would recommend against permutations

      • melony 2 years ago

        You still need to generate all the permutations, even if you use dynamic programming.

gen_greyface 2 years ago

when i type long sentences, it starts visibly lagging, input reflection is delayed by seconds sometimes.