JJ: JSON Stream Editor

184 points by ingve 2 years ago

j1elo 2 years ago

I'll take the chance to bring attention to the maintenance issues that 'jq' has been having in the last years [1]; there hasn't been a new release since 2018, which IMO wouldn't necessarily be a bad thing if not for the fact that the main branch has been collecting improvements and bug fixes [2] since then.

A group of motivated users are currently talking about what direction to take; a fork is being considered in order to unlock new development and bug fixes [3]. Maybe someone reading this is able and willing to join their efforts.

[1]: https://github.com/stedolan/jq/issues/2305

[2]: https://github.com/stedolan/jq/pull/1697

[3]: https://github.com/stedolan/jq/issues/2550

capableweb 2 years ago

What exactly is missing/broken in jq right now which warrants a fork? I've been using jq daily for years, and I can't remember the last time I hit a bug (must have been many years ago) and I can't recall any features I felt been missing for the years I've been using it.
For me it's kind of done. It could be faster, but then I tend to program a solution myself instead, otherwise I feel like it's Done Enough.
- j1elo 2 years ago
  
  I wouldn't say I need the program to grow with more features, but at the bare minimum they should have been more diligent with cutting releases after accepting bug fixes, instead of letting those contributions langish on the main development branch out of reach for users.
  I mean it would be understandable if the maintainers didn't have the time to keep working on it at all, but clearly the review work was done to accept some patches so why not make .point releases to allow the fixed code reach users via their distribution's channels?
- Calzifer 2 years ago
  
  What I miss from jq and what is implemented but unreleased is platform independent line delimiters.
  jq on Windows produces \r\n terminated lines which can be annoying when used with Cygwin / MSYS2 / WSL. The '--binary' option to not convert line delimiters is one of those pending improvements.
  https://github.com/stedolan/jq/commit/0dab2b18d73e561f511801...
  
  orev 2 years ago
  
  You’ll have a much better experience in Cygwin/MSYS2/WSL if you treat them like isolated environments and not call programs from outside of them. If you want to use ‘jq’ (or any tool) within Cygwin, install the Cygwin package. Don’t rely on the Windows install, and you’re guaranteed to run into problems like this.
- goranmoomin 2 years ago
  
  > What exactly is missing/broken in jq right now which warrants a fork
  AFAIK there’s quite a few bug fixes and features that are accumulated on the unreleased main branch, or opened as PRs but never merged.
  IIRC I hit one of the bugs while trying to check whether an input document is valid JSON.
  I should try checking out what’s happening to the fork, I’ve never opened a PR or something but I’ve read the source while trying to understand the jq language conceptually, and I’d say it’s quite elegant :)
- strunz 2 years ago
  
  The README fir Jj points out how it is exponentially faster than jq. Presumably some of those improvements would help this.
- jjoonathan 2 years ago
  
  > It could be faster
  A decaffinated sloth could be faster.
mi_lk 2 years ago

Looks like it's because @stedolan goes silent and not delegating the right GitHub repo accesses to the existing maintainers.
He seems to be working at Jane Street though, so if anyone is able to reach him please help the jq community :)
https://signals-threads.simplecast.com/episodes/memory-manag...

cristoperb 2 years ago

I like jq, but jj is so fast it is my go-to for pretty printing large json blobs. Its parsing engine is available as a standalone go module, and I've used it in a few projects where I needed faster parsing than encoding/json:

https://github.com/tidwall/gjson

pcthrowaway 2 years ago

I don't think I've ever been limited by jq's speed, but good to know there are alternatives if it ever becomes a bottleneck.
Other than that I can't think of a reason to use this over jq; the query language is perhaps a bit more forgiving in some ways, but not as expressive as jq (and I've spent ~8 years getting pretty familiar with jq's quirks)
- lathiat 2 years ago
  
  The limiting speed factor of jq for me is, by far, figuring out how to write the expression I need to parse a fairly small amount of data. I do a bunch of support analysis and often writing a one-liner to put into a shell script to extract some bit of JSON to re-use later in the script. Often this is going to be used only once by me or a customer to run some task.
  Followed closely by figuring out the path to the area of data I'm interested in. "gron" has been a real time saver there - it converts the json into single lines of key/value - so you can use grep and find the full path for any string.
  Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there, but, I'm usually in the terminal doing a bunch of different tasks looking through all manor of command outputs, logs, etc :)
  Relatedly my primary use of ChatGPT has been asking it to write jq queries for me, it's not too bad at getting close. It's biggest blindness seems to be string values with a dash, which you have to write as ["key-name"].
  
  pdimitar 2 years ago
  
  > Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there
  Try https://jless.io/ then.
  
  Simran-B 2 years ago
  
  I agree that figuring out non-trivial jq expressions takes a lot of time, often accompanied with a consultation of the somewhat lacking docs, and some additional googling.
  Nonetheless, it is pretty slow at processing data. For example, converting a 1 GB JSON array of objects to JSON Lines takes ages, if it works at all. Using the steaming features helps, but they are hard to comprehend. It gets memory consumption under control and doesn't take super long, but still way too long for such a trivial task IMO.
  
  bobnamob 2 years ago
  
  I’m far more likely to parse json into clojure repl session and go from there these days. Learning jq for the odd json manipulation I need to do seems like overkill
  
  dicknuckle 2 years ago
  
  For me it's usually for some automation task to gather a list of IDs for some cloud environment to build infra things.
  
  Dobbs 2 years ago
  
  > Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there
  I use an app called OK JSON on the mac for this. Its okay.
  
  AeroNotix 2 years ago
  
  emacs has a command to get the current path at point.
  
  pdimitar 2 years ago
  
  Which one is it exactly, please? I'd like to use it.

qhwudbebd 2 years ago

Interesting! I tend to use gron to bring JSON into (and out of) the line-based bailiwick of sed and awk where I'm most comfortable, rather than a custom query language like jq that I'd use much more rarely. But I guess that's at the opposite extreme of (in)efficiency than both this and the original jq.

There might be a nice 'edit just this path in-place in gron-style' recipe to be had out of jj/jq + gron together...

qhwudbebd 2 years ago

Are there any gron-like tools for xml? I'm aware it's a harder problem (and an increasingly rare problem) but perhaps someone has tackled it nonetheless?
- bandie91 2 years ago
  
  xml2[1] turns xml into line-based output. and 2xml reverses.
  [1] https://github.com/clone/xml2
robertlagrant 2 years ago

Just looked up gron - thanks. This looks useful.

maleldil 2 years ago

Am I correct in understanding that this can only manipulate (get or set values) from a JSON path? That is, is it not a replacement for jq?

For example, I frequently use jq for queries like this:

    jq '.data | map(select(.age <= 25))' input.json

Or this:

    jq '.data | map(.country) | sort[]' input.json | uniq -c

Is it possible to do something similar with this tool?

This is not a slight at jj. Even if it's more limited than jq, it's still of great value if it means it's faster or more ergonomic for a subset of cases. I'm just trying to understand how it fits in my toolbox.

TymekDev 2 years ago
It looks like the README in jj repository does not do justice when it comes to available syntax for queries. jj uses gjson (by the same author) and its syntax [0]. From what I saw the first one can be handled with:
```
    jj 'data.#(age<=25)#' -i input.json
```
I don't think there is a way to sort an array, though. However, there is an option to have keys sorted. Personally, I don't think there is much annoyance in that. One could just pipe jj output to `sort | uniq -c`.
I just discovered that gjson supports custom modifiers [1]. So technically, you could fork jj, and add another file registering `@sort` modifier via `gjson.AddModifier` and have a custom jj version supporting sorting.
[0]: https://github.com/tidwall/gjson/blob/master/SYNTAX.md
[1]: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#modif...
zimpenfish 2 years ago

Annoyingly, I think `jq` might still be the only tool capable of these kinds of things. The rest seem to be "query simple paths and print the result" (which is handy, of course - I often use `gron` to get an idea of the keys I'm after because the linear format is easier to handle than JSON.)

harisamin 2 years ago

A while ago I wrote jlq, a utility explicitly querying/filtering jsonl/json log files. It’s powered by SQLite. Nice advantage is it can persist results to a sqlite database for later inspection or to pass around. Hope it helps someone :)

https://github.com/hamin/jlq

wvh 2 years ago

I've been using the gjson (get) and sjson (set) libraries this is based on for many years in Go code to avoid deserialising JSON responses. Those libraries act on a byte array and can get only the value(s) you want without creating structs and other objects all over the place, giving you a speed bump and less allocations if all you need is a simple value. It's been working well.

This program could be an alternative to jq for simple uses.

BiteCode_dev 2 years ago

For those wondering, the README states it's a lot faster than JQ, which may be the selling point.

nigeltao 2 years ago
jj is faster than jq.
However, jsonptr is even faster and also runs in a self-imposed SECCOMP_MODE_STRICT sandbox (very secure; also implies no dynamically allocated memory).
```
  $ time cat citylots.json | jq -cM .features[10000].properties.LOT_NUM
  "091"
  real  0m4.844s
  
  $ time cat citylots.json | jj -r features.10000.properties.LOT_NUM
  "091"
  real  0m0.210s

  $ time cat citylots.json | jsonptr -q=/features/10000/properties/LOT_NUM
  "091"
  real  0m0.040s
```
jsonptr's query format is RFC 6901 (JSON Pointer). More details are at https://nigeltao.github.io/blog/2020/jsonptr.html
- zokier 2 years ago
  
  Looks neat. One suggestion: add better build instructions on wuffs readme/getting started guide. I jumped in and tried to build it using the "build-all.sh" script that seemed convenient, but gave up (for now) after nth build failure due yet another missing dependency. It's extra painful because the build-all.sh is slow, so maybe also consider some proper build automation tool (seeing this is goog project, maybe bazel?)?
  
  nigeltao 2 years ago
  
  Thanks for the feedback. I'll add better build instructions.
  If you just want the jsonptr program, instead of everything in the repo (the Wuffs compiler (written in Go), the Wuffs standard library (written in Wuffs), tests and benchmarks (written in C/C++), etc) then you can use "build-example.sh" instead of "build-all.sh".
  ./build-example.sh example/jsonptr
  For example/jsonptr, that should work "out of the box", with no dependencies required (other than a C++ compiler). For e.g. example/sdl-imageviewer, you'll also need the SDL library.
  Alternatively, you could just invoke g++ directly, as described at the very top of the "More details are at [link]" page in the grand-parent comment.
  $ git clone https://github.com/google/wuffs.git $ g++ -O3 -Wall wuffs/example/jsonptr/jsonptr.cc -o my-jsonptr
rektide 2 years ago

Presumably the memory footprint is often far less too.

Willuminaughty 2 years ago

Hey there,

Just wanted to drop a quick note to say how much I'm loving jj. This tool is seriously a game-changer for dealing with JSON from the command line. It's super easy to use and the syntax is a no-brainer.

The fact that jj is a single binary with no dependencies is just the cherry on top. It's so handy to be able to take it with me wherever I go and plug it into whatever I'm working on.

And props to you for the docs - they're really well put together and made it a breeze to get up and running.

Keep up the awesome work! Can't wait to see where you take jj next.

Cheers

Rygian 2 years ago

This behaviour looks confusing to me:

$ echo '{"name":{"first":"Tom","middle":"null","last":"Smith"}}' | jj name.middle

null

$ echo '{"name":{"first":"Tom","last":"Smith"}}' | jj name.middle

null

It can be avoided with option '-r' which should be the default, but is not.

planede 2 years ago

I don't get this behavior for your second command, it just seems to return an empty string.
edit:
There are three cases to cover:
1. The value at the path exists and not null.
2. The value at the path exists and is null.
3. The value at the path doesn't exist.
jj seems to potentially confuse 1 and 2 without the -r flag. "middle": "null" and "middle": null more specifically. It probably confuses "middle": "" and missing value as well, that's 1 and 3.

asadm 2 years ago

I wish this existed when I was trying to look at 20G of firebase database JSON dump.

vmfunction 2 years ago

that is what gets me, why did the file get to 20g? At that point just ship a SQLite file.
- capableweb 2 years ago
  
  Does it matter why? Sometimes files gets big, and you don't control the generation or trying to change the generation is a bigger task than just dealing with a "big" (I'd argue 20GB isn't that big anyways) file with standard tools.
  
  notorandit 2 years ago
  
  Nope, it matters a lot! Unstructured unindexed files get that gig usually as the result of some design flaw.

notorandit 2 years ago

Interesting. How often do you manipulate a 1+MB JSON file? Maybe I am wrong, but going from 0.01s to 0.001s doesn't motivate me to switch to jj.

untech 2 years ago

Datasets are often stored in (sometimes gzipped) jsonlines format in my field (NLP). The file size could reach 100s of GBs.
- notorandit 2 years ago
  
  100s of GBs?
  In those cases, querying un-indexed files seems quite a thinko. Even if you can fit it all in RAM.
  If you only scan that monstrous file sequentially, then you don't need either jq or jj or any other "powerful" tool. Just read/write it sequentially.
  If you need to make complex scans and queries, I suspect a database is better suited.
  
  untech 2 years ago
  
  Usually you indeed scan this file sequentially, doing some filtration / transformation. As you do this transformation for each record, the speed of the tool used (e.g. jq) really matters.
  Databases are not used in this case because it’s a complexity overhead compared to plain-text files. The ability to use unix pipelines and tools (such as grep) is a bonus.

Self-Perfection 2 years ago

I would like to see a comparison with jshon. Jshon is way faster than jq and for many years available in your distro repositories.

Alifatisk 2 years ago

Cool, didn’t know about jshon, how’s the query language?
- Self-Perfection 2 years ago
  
  Almost non-existing. A couple of excerpts from man page:
  {"a":1,"b":[true,false,null,"str"],"c":{"d":4,"e":5}} jshon [actions] < sample.json jshon -e c -> {"d":4,"e":5} jshon -e c -e d -u -p -e e -u -> 4 5
  Yet this covers like ~50% of possible use cases for jq.

listenallyall 2 years ago

Is this the SAX of JSON?