GNU recutils – human readable, plain text databases

149 points by nanna 3 years ago

From the FAQ:

> Why is the logo depicting a pair of copulating turtles?

> > Ask ams@gnu.org.

> What is the name of the turtles?

> > They are called Fred and George. And yes, they are both male.

3 of 5 of the questions in the FAQ are about these 2 turtles... Here I just wanted to know how different this was from something like noSQL.

krageon 3 years ago

> [...] how different [is this] from something like noSQL.
This database is relational and might have a real life use-case with worthwhile tradeoffs. Those are big differentiating factors.
CJefferson 3 years ago

I feel like a better FAQ would be "when you keep saying text, what do you mean? ASCII? UTF8?"
moss2 3 years ago

To be fair I was going to ask about the turtles in this thread.
samatman 3 years ago

GNU projects have excellent documentation, and as a result the FAQs tend to be whimsical. Any genuine Frequently Asked Question gets added to the info files.
- masukomi 3 years ago
  
  rather than, you know, actually removing a juvenile picture of animals copulating which pretty much guarantees no-one will take their project seriously.
  nah,... just add it to the faq
feet 3 years ago

Oh boy. First thing on the page lmao

schanzen 3 years ago

We use recutils for the GNUnet and GNU Taler names and numbers registry: https://gana.gnunet.org Using the recsel and recfmt tools that come with it you can autogenerate header/source files of constants from such a format into whatever language you need. For example, we do that for the GNU Name System record types: https://datatracker.ietf.org/doc/draft-schanzen-gns/ https://git.gnunet.org/gana.git/tree/gnu-name-system-record-...

So, quite useful if you have a protocol specification with constants from which you want to generate code.

l72 3 years ago

I use this for managing products for my very small web store. I scrape my vendors' websites and dump it into rec files.

It is then super easy to version and compare differences. I am typically interested in new products and price changes for existing products so I can see what I want to purchase from my vendors. Having a weekly snapshot of this that I can go back in time through and visualize is really useful.

Being able to quickly edit rec files in emails and run queries on the command line is great.

I suppose if I was dealing with 100k+ products, I'd use a real database, but for the volume I am dealing with, it works really well.

bjoli 3 years ago

I use it as a bookmarks manager together with about 30 lines of shell scripts. It has been my bookmark file format since 2013. It is clunky, kind of awkward to use, but I wrote it myself.

I have been wanting to rewrite it in guile using simple sexprs as storage, but I am lazy.

euroderf 3 years ago

I've been wanting to rewrite it in Go. This would benefit from multi-pass parsing, to pre-emptively resolve forward references (and flag invalid forward references). How big do the files get though ? Can anyone attest to (say) multi-hundred-MB recutils files ?

nix23 3 years ago

Also interesting, ndb from plan9/9front:

https://9fans.github.io/plan9port/man/man7/ndb.html

https://www.youtube.com/watch?v=e9Y0iDXXQh8

samatman 3 years ago

Has anyone done the work to integrate recutils with SQLite?

Recutil format hits a sweet spot: easy to read, easy to modify by a human, and 'durable' in the sense that copying records around carries the info needed to integrate them later into the same schema.

Being able to export, import, and open recutils directly, would be powerful. Recutils can export an SQLite database schema at a fine enough resolution to reconstruct it (with, well, a schema-schema. but that's tractable), which would be great.

nanna 3 years ago

It comes with exports to MS Access and CSV, if they could be used as part of a pipeline?
- dzolvd 3 years ago
  
  definitely, there are sqlite csv one-liners.

MisterTea 3 years ago

This sounds similar to plan 9's NDB, network database. Although used for storing network and machine configuration, it can be used as a general purpose database.

Utilities: http://man.postnix.pw/plan_9/8/ndb

Format: http://man.postnix.pw/plan_9/6/ndb

C Library: http://man.postnix.pw/plan_9/2/ndb

big-malloc 3 years ago

I’ve always thought this would be helpful for small in-house ops scripts and things like that, but unfortunately the logo and terminology makes this a bit to risqué for a corporate environment. I suppose that’s part of the GNU spirit!

Lio 3 years ago

The logo is a bit childish but still made me chuckle at least[1].
So much of the entire industry already relies on GNU code that I think that baring licensing issues no one ever got fired for choosing the GNU option. You know it will work and it will probably work well, forever.
1. I once went to an RMS talk where he said that the whole point of writing your own software was "so that you can give it a funny name". Which I take as good advice! :)
hdjjhhvvhga 3 years ago

I'm not sure if you are being serious or not. You don't need the logo for anything. And as for terminology, the "rec" part stands for (database) records, not "rectum" or whatever one might imagine. I mean, seriously, what's the problem with just grabbing the code and using it if it's useful and the license has been approved by your company?
- dahart 3 years ago
  
  > And as for terminology, the "rec" part stands for (database) records, not "rectum" or whatever one might imagine.
  Why are you assuming the pun wasn’t intentional, and that both meanings aren’t implied? What exactly is the point of talking about male turtles copulating in the FAQ?
  There is nothing wrong with ignoring the logo and just using the code, that isn’t the issue. The issue is the story of the logo & FAQ are intentionally controversial, even if the author only intended humor, and that loading the site and using the code at work would promote conversation about the topic of the logo and it’s narrative.
  
  hdjjhhvvhga 3 years ago
  
  > and that loading the site and using the code at work would promote conversation about the topic of the logo and it’s narrative.
  Or we could be adults and treat it in the way we treat any peculiar thing in a corporate environment, that is, ignore it and move on to what is actually needed? I've been in such situations quite a few times (mixed male/female environment, some old guys and young interns), and the maximum you could count on was a "well, that's an interesting choice", but most often people would just completely ignore it. We're not in elementary school to shout, "look, copulating turtles!"
  On the other hand, I see your point. People got extremely cautious over the years because of real and perceived harassment attempts and their consequences. I can understand showing this page at work could be perceived by some in the same way as the famous dongle joke on a Python conference.
  
  dahart 3 years ago
  
  You’re lucky! I’ve definitely worked places where supposedly adult people made many constant tasteless jokes around sex and homosexuality, and about women while speaking to women, where the environment was actually hostile and the “adults” truly didn’t even know it and thought their conduct was okay. The problem with saying “we could be adults” is that many many people disagree on what it means to be adult, it’s a criteria that’s too vague and indirect. Counting on people being adults is what we’ve tried and failed at for, I dunno, hundreds of years? Forever? BTW, I don’t think just showing the page at work is the issue, the issue is talking about it, and that is something that would happen.
- rgoulter 3 years ago
  
  I think GP meant "Selection Expression", abbreviated as "SEX" in the manual. https://www.gnu.org/software/recutils/manual/
  
  mywittyname 3 years ago
  
  Who is going to read the manual on an obscure program used in a shell script somewhere? Mostly likely, another dev whose never encountered recutils before, and probably long after the OG developer is gone.
  The command itself is a `recsel -e`. Which is not even remotely humorous, unlike, say, `dwarfdump`.
krageon 3 years ago

If your corporate environment can't deal with the reality that a logo is there to be memorable and fundamentally has nothing to do with functionality, is it even worth working there? It sounds like a very stifling environment that takes itself entirely too seriously.
- JasonFruit 3 years ago
  
  Have you worked in a corporate environment? Copulating turtles isn't the half of the "very stifling environment that takes itself entirely too seriously." But they have no choice. How long would it take for someone to run to HR complaining of a "hostile work environment" because of the turtles? If a company doesn't have "policies in place to prevent this sort of imagery", they will as soon as the turtles show up.
  
  klez 3 years ago
  
  I have, and I'm pretty sure a quarter of the corporate people I worked with wouldn't be able to point to me the Java logo in a lineup with other logos. Another quarter wouldn't know what a Java is. The rest wouldn't even know there's a thing that runs their programs. How would the logo of a single library in the myriad a company use be even brought to attention?
  I'm almost positive that if we dig in a modern project dependency tree deep enough we'll find at least a couple that have an inappropriate joke in the readme or in the documentation. So how would recutils be any different?
  
  klez 3 years ago
  
  As a side note, am I the only one who didn't notice the turtles until the FAQ made me double check?
- dahart 3 years ago
  
  > If your corporate environment can’t deal with the reality that a logo is there to be memorable and fundamentally has nothing to do with functionality
  Unfortunately, that argument is entirely straw man. Most corporate environments are absolutely fine with memorable non-functional logos. The issue is the narrative about gay turtle sex and perhaps the implied suggestion that the “rec” in recutils might stand for rectum.
  In my personal life, I’m comfortable discussing turtles and rectums and sex and homosexuality. But I don’t want to discuss those topics at work with my co-workers, and most corporate environments aren’t just avoiding them for fun because they’re a bunch of stiffs, they’re avoiding such topics because there is a history of them causing actual problems at work, often lead to hurt feelings and the sense that the environment is not welcoming to all, and as a result there are several ways they’re legally required to encourage employees to avoid discussing such topics, and legally required to take disciplinary action if someone complains that someone else made negative comments about gay turtles or sex.
  BTW this is all a side point to the fact that this photograph of turtles makes a terrible logo. It doesn’t work as a small icon or with reduced colors. It has no shape or symbolism. It’s unrelated to the product. It’s hard to see the detail even as a large full color image. The only thing that makes this logo memorable is the FAQ text, the story that it’s gay copulating turtles, which for all we know isn’t even true - turtles sometimes climb over each other, sometimes fight, and there’s no way to identify the sex of the turtles from this photo.
  
  dahart 3 years ago
  
  I really don’t know why this is getting downvoted like so, but in case it’s the “rec” comment, I am of course aware that rec stands for record, I meant to suggest the alternate meaning may be a pun. If this is lame or wrong or offensive for some other reason, I’m curious and open to feedback.
dorfsmay 3 years ago

I did not know about this, but I have used SQLite with bash scripts a lot do do ops tasks. I've given a few presentations about it, sysadmins and ops people are often surprised how easy it is to use.

ttymck 3 years ago

Glad to see so many folks sharing their use cases for recutils!

I had started on a python implementation[0] as a fun project, but got discouraged because I assumed no one used recutils to solve real problems. Hopefully this is the kick I needed to continue filling out the implementation.

[0] https://github.com/tmkontra/python-recutils

em-bee 3 years ago

what is the advantage of this format over yaml, json or other structured data formats?

my preferred format for readable data is s-expressions.

more flexible than json, easier to write than yaml and easy to parse for a program. if you structure the data properly when writing it to the file, then it is easy to read too. yet it has the flexibility to losslessly map most programmable data structures.

akhmatova 3 years ago

What is the advantage of this format over yaml, json or other structured data formats?
This is a crucial question, actually - at a first glance, I would wager to answer:
"The Recutils format is not nearly as universal or flexible as YAML/JSON. It essentially supports just one data structure, analogous to a SQL table or a list of dicts. But for that use case (which is quite important) it does seem to a more suitable format, and provides additional features that those formats and their tooling do not."
markisus 3 years ago

There are built-in tools in rec for querying, joining / foreign keys, checking data integrity against a schema, and aggregation and reporting.
So the set of advantages over json/yaml are similar to SQL's advantages, except the database format is human readable and grepable.

dotancohen 3 years ago

This looks like a good format for versioning databases in Git. Previously I was storing SQL dumps, but for a small project this might be better. Testing to be done!

dorfsmay 3 years ago

Good point. I was wondering if it were a good idea to have human writeable records, but having text diffable dumps definitely is!

ungamedplayer 3 years ago

You can use this nicely from within emacs, its very usable.

rurban 3 years ago

Still no built-in index on the fly for larger numbers of records? linear search doesn't fly with >5.000 records, it's not even sorted on the fly for logarithmic search.

ttymck 3 years ago

Could you expand on what an "index on the fly" would look like? I imagine the engine could provision and maintain a separate index file, but the guarantees for keeping it in sync with the recfile would be tenuous. Interested in targeting something like this in my python implementation, since this is also my biggest concern with recutils.
- rurban 3 years ago
  
  on the fly is different to a cached index, just keep a hash table or b-tree in memory, with just offset values to the text. with range queries a tree would make sense, with lookups a hash table. the perl variant uses a hash.
  in some of my bigger text engines I prefer a serialized hash table (berkeley db) over real databases because it's 10x faster under 20.000 records. constant database overhead is huge
  
  rurban 3 years ago
  
  I've now looked at the code deeper, and whilst the records are only stored in a linked list, there is another mset (multiset) built on demand. So my concerns are void. Good architecture.

gglitch 3 years ago

I'd be thrilled if someone with production experience with both would compare/contrast Recutils with Prolog for a few use cases.

anthk 3 years ago

From GNU there's plotutils too as a GNU alternative to Gnuplot, but sadly is not as integrated with Emacs' calc.

Semiapies 3 years ago

Previous discussion:

https://news.ycombinator.com/item?id=15302035

aspyct 3 years ago

Aaaaaah thanks, I was looking for this a few days ago but couldn't find the name!

Also, is it me, or the turtles at the top of the page are... Why?

reidjs 3 years ago

Check the FAQ lol
About the logo
Why is the logo depicting a pair of copulating turtles? Ask ams@gnu.org. What is the name of the turtles? They are called Fred and George. And yes, they are both male. Why those names? /join ##English 16:40 <jemarch> Hello. 16:40 <someone> Hi 16:40 <jemarch> How would you name two (paired) gay turtles? 16:42 <someone> Fred & George?

hsn915 3 years ago

The whole point of a database engine is to make efficient use of disk space by allocating pages and distributing the data across them so that inserting or removing small elements does not result in megabytes of data rewrites

markisus 3 years ago

I think you're getting at only half of the idea of a database engine. The other half is to provide a flexible querying interface to slice up and aggregate the data into reports. The user may be willing to tolerate some inefficiency in storage / modification if they have a small number of records, as long as the querying interface is useful.
- PainfullyNormal 3 years ago
  
  I thought the point of a database engine was transaction processing. What you're describing is an interface that can have many different methods of storage.
- hsn915 3 years ago
  
  If you have a small number of records what do you need an engine for? Just write a for loop.
nh23423fefe 3 years ago

no, the point of a database engine is to respond to queries