The magic of small databases

192 points by topcat31 a year ago

btown a year ago

Buried in here is a fascinating musing on "Market-making Small Databases" - "Imagine a Substack for databases - an easy tool for creating, maintaining and publishing databases with the ability to restrict parts or all of it behind a pay wall. Pair it with the ability to send email updates to your audience about changes and additions..." It's worth a read in full in the original article.

One of my favorite small databases is https://hiregoats.com/ - it's a simple site showing goat herds for rent (for clearing brush in a sustainable way, etc.), monetized with at $35 listing fee and nothing else. There's no e-commerce, no attempt to insert the site into the transaction or funds flow, no bells and whistles. Certainly this doesn't scale to other niches where suppliers are less incentivized to pay a listing fee, but I'd love to see this kind of thing be more common, and incentivize people to curate.

fbdab103 a year ago

I was quite amused when I went to the goats page to see they are expanding into other markets. They now have a sister site of https://hiresheep.com/
- tomcam a year ago
  
  Damnit I need to register hirequokkas.com and hireibexes.com immediately before one of you sharks beats me to it
  
  Gordonjcp a year ago
  
  "unsoliciteddikdik.com" is available. You know you want to send someone an unsolicited dik-dik.
  
  tomcam a year ago
  
  That was perfect. You win HN for today.
- btown a year ago
  
  Much less inventory, though! But it's cool that they're starting somewhere - they have no need to feel sheepish just because their other site is so much more goated.
2h a year ago

uBlock Origin blocks that site for some reason

dmje a year ago

I run a little agency in the UK who works with museums to help them with digital. A large part of this is getting collections online.

Some years ago we commissioned a developer to make CultureObject[0], a free and open source WordPress plugin to make it easier to ingest collections data for display on the web. At the heart it's a glorified data importer, and many people just use the CSV mode to sync and import collections data.

It requires some dev effort - we've built an add-on which makes this easier but there's no denying that search, faceting and display needs knowledge of wordpress development.

Three years ago we then launched The Museum Platform[1] which is a more SaaS based model - we take away the need for dev skills and ask clients to just send us a CSV and any related media and we do the hard work. It's WordPress again but a modified version where we also facilitate storytelling and narrative around the ingested collections.

The interesting thing about this journey is that the requirement to "get a collection online" is apparently and theoretically easy. But the reality is it gets hard quite quickly as the need for search / filtering appears, and it gets harder still as scale comes into it. 1000 records is fine. 100,000 gets quite a bit harder.

There are also many subtleties - particularly with museum collections. "Location" of a record could be where it was collected, or where it is now, or where it's on display. Relational stuff is hard, as are taxonomies and authority terms. It's hard to generalise and it's hard to scale.

[0] https://cultureobject.co.uk/ [1] https://themuseumplatform.com/

mmsimanga a year ago

I see you decided on Wordpress, if you were going to use a CMS I think Drupal 7 would have been a good choice. Drupal has concept of entities and views. An entity as the name suggests is essentially a table and you can add all sorts of different fields to it. From simple text and number fields to images and fields that lookup other entities thus creating relationships between entities. Views is another construct that lets you choose how to display the entities. As a list of as a table a two possible views. Most of this can be done in Drupal 7 without writing code. I say Drupal 7 because you mentioned Wordpress. Drupal 8 and above is more of a developer framework and requires knowledge of Composer. Backdrop [0] is fork of Drupal 7.
[0]https://backdropcms.org/
- dmje a year ago
  
  WordPress has custom post types, taxonomies and metafields so is very capable of dealing with complex relationships if you need it to. What's challenging is going from simple columnar data such as CSV to something complex and relational.
  We chose WordPress because of its ubiquity and power - plus it's insanely easy to host and use as a non technical editor, which (last time I looked) can't be said of Drupal.
- justusthane a year ago
  
  This is how ExpressionEngine is structured as well, except they're called channels and templates. I really enjoyed working with EE, although coding is definitely required - you basically have to build your site from the ground up. No themes included.
  That being said, I found it much, much easier to develop than WordPress.
noduerme a year ago

This is a really cool niche... and I love the idea of it being more generally applicable or extensible to the kinds of private collections of objects that the writer is describing. (I really like what the article seems to be arguing for).
It seems like the data storage / search / filtering aspects of your software would be really fun and interesting to develop flexible solutions to. The Wordpress aspects probably wouldn't be so fun to maintain, but it's always pick-your-poison when it comes to CMSs unless you develop your own in-house.
That being said, a collection CMS doesn't necessarily need to have all the plugins and doodads that a Wordpress site does. It could be something bare-bones and extensible that was written to be more tightly coupled to a layer that interpreted the underlying data structure. Just toying with the idea, maybe even something that flattened the data views of the collection into static webpages for deployment so that at least some of the indexing could be handled by naming conventions and directory structure without recourse to database searches.
The world could definitely use an open source kit along these lines, with a GUI backend that would let non-developers build their own table structure and search parameters, draw up some page layouts, and just generate a searchable site that collated CSV records with images.
Some of this actually reminds me of what HyperCard could do... it allowed some really interesting experiments with user-classified data. Like this, from 1989: https://core.ac.uk/download/pdf/225955134.pdf
Relational stuff is hard, as you say, but in a structure built around a collection it seems like you could come up with a DSL that defined which columns needed to relate to other tables (any column with repeating data, for instance), suggest making that column "normalized", and automatically generate a linked table.
- dmje a year ago
  
  That's a nice idea - point a script at a CSV file and generate a bunch of flat files for each item using some kind of simple templating language. I might take this back to the guys at the Platform and see if we can do a POC for the clients who have zero budget but want to get going with something straightforward... Thanks for the thinking :-)
  
  noduerme a year ago
  
  Thanks. I was inspired by this article and by your WP implementations of a similar notion. I can't stand WP but I've written a couple CMSs from scratch. As I wrote the idea, it struck me this concept might be something I'd enjoy building as my next side project. If you do decide to create something along those lines as an open platform, maybe I could contribute.
- tomrod a year ago
  
  Is this not Libre Office, more or less?
  
  noduerme a year ago
  
  I don't understand. How would you use LibreOffice to build a collection of object photos and descriptions, with custom descriptors and normalized data references, that could be deployed as a searchable / filterable website?
  
  tomrod a year ago
  
  Key value store in Libre Calc with backup storage somewhere for file objects and file references in sheet. Deploy as csv.
  
  noduerme a year ago
  
  What I'm describing, and what the article suggests, isn't about a UI for creating key-value stores that can be saved as CSVs. Any database UI wrapper can do that. The concept is about ingesting CSVs, normalizing the data, and turning them into websites. The parent I was responding to built a system (or two systems, apparently) for doing this as SaaS based around Wordpress.
mootpoints a year ago

I'm curious about what you make of Omeka, and whether you think it relates to OP's point. It's quite common in the digital humanities, but I've never seen it used outside that context.
- dmje a year ago
  
  I really like Omeka. It's a very cool project and we did look into it early on. Really though we chose WP because it's ubiquitous - no lock in, very powerful, easy for editors to use. With the right nudging it does all the things Omeka does.
ZephyrBlu a year ago

This is really interesting. The problem you're describing sound similar to what I wrote about why this kind of thing is hard to generalize in another comment: https://news.ycombinator.com/item?id=34564394.
8n4vidtmkvmk a year ago

maybe I'm being naive, but 100K records doesn't sound hard to search either. maybe at 5 or 10M it starts getting ugly/expensive

breck a year ago

I'm going to plug our related project: TreeBase. It's the public domain software that powers PLDB.com (a Programming Language DataBase).

It's very simple. If your small database was about cars, your structure might look something like this:

    database/
     grammar/
      engine.grammar
      interior.grammar
     things/
      model3.car
      camry.car

The `grammar` files are written in a Tree Language called Grammar. Those are your schema files. You basically create a new syntax-free plain text "language" for storing your data, in this case 1 "car" file per model of car.

It was a pipedream of mine until the M1's came out. Those changed everything, because then it became fast enough to actually do it.

We have a new release coming out soon with a new query language that will change everything. Here is the source code: https://github.com/breck7/jtree/tree/main/treeBase

LunarAurora a year ago

There are categories of “Nocode” online services that could work, more or less, as small databases. Some are already cited in the article:

- DBs platforms (Best for more advanced DB) : Airtable, getgrist.com

- wikis+DB platforms (Best for building a site around the DB) : notion.so, coda.io

- Airtable/GSheet publishing (Best for simple/custom UI) : glideapps.com, siteoly.com

- Bookmarks/Collections (Best for links/References) : Zotero (online groups), are.na

- List sharing (Best for open collaboration?) : listium.com, (ranker.com ?)

- BI platforms (Best for advanced filters/charts) : polymersearch.com, Google Data Studio

- Data Set Hosting (Best for downloading?) : data.world, kaggle.com

All these allow publishing, and some collaboration

nerdponx a year ago

What about Datasette and/or Dolt.
- LunarAurora a year ago
  
  My list included nocode services only.
  
  simonw a year ago
  
  What's your definition of No Code?
  The quickest way to get a CSV file into Datasette these days is like this:
  1. Put the CSV file in a Gist, e.g. https://gist.github.com/simonw/8a2494a3402450716f4c8129d280b...
  2. Paste that into the "Load CSV" dialog on https://lite.datasette.io/ to get this shareable and bookmarkable URL: https://lite.datasette.io/?csv=https%3A%2F%2Fgist.githubuser...
  I'm going to keep ticking away at making this as easy as possible for people (I have a paid SaaS product on the way, but I want people who just want to publish something small to free to be able to do so) but I'm interested in understanding what it would take to qualify as "no code" in your book.
  
  LunarAurora a year ago
  
  Datasette is great! I'm not very familiar, but here is my impression :
  I guess my working definition is about the "default mode". So the (Datasette) Lite version could indeed be "no code" in a similar fashion to the "BI platforms" category above, that is, for read only advanced exploration.
  In a deeper sense, "no-code" is also about how the software is designed for the ground up. This may include visually customising/configuring most of the settings/plugins and all those delicate UX touches for the "average-user".
  Personally, I'm waiting for two things : better facets and support for multivalued columns (I really like my openrefine for local or polymersearch for online)
  
  simonw a year ago
  
  I'm thinking about faceting improvements at the moment - they're such a powerful feature and there's definitely room for making them work and look better.
  What do you mean by multivalued columns?
  Datasette currently does have support for faceting by JSON arrays, e.g. on this page: https://musiccaps.datasette.io/musiccaps/musiccaps_details?_...
  
  LunarAurora a year ago
  
  > faceting by JSON arrays
  Yep that's is it. I should have read that facet page in the doc (just used the wrong keyword for search months ago)
  btw, I played a bit more with faceting, and it is actually (functionally) good !
  Datasette is (going to be) the "mpv" of DBs : An open (in code and in extensibility) standard for reading them. And that plugins system is a godsend. I will certainly delve into it more.
  
  codeslave13 a year ago
  
  I would consider datasette and its plugins nocode.

xnx a year ago

"Publishing documents to the web is a well-served use case but publishing small indexes, databases and collections to the web is still an incredibly frustrating and under-served use case. Here I outline why I think it matters and a variety of approaches to solving it."

Amen. I'm surprised the post doesn't mention sqlite3 WASM/JS (https://sqlite.org/wasm/doc/trunk/about.md). That, paired with an easy-to-use faceting library, would go a long way.

ZephyrBlu a year ago

I love this. I've been thinking about something similar lately. There are so few good indexes and search engines for niche collections of data.

Imagine if there was a niche search engine for everything, and the search engine was customized for that niche.

I think the main problems here are:

- Data format and ingestion - Domain-specific indexing/relevance

Most data is super messy and it not accessible through nice APIs, which presents a problem. You might need custom ingestion for each niche and it's pretty likely you'll need some rules to standardize data from multiple sources, neither of which seems easy to generalize and automate because they're very domain-specific.

The other part to this is indexing/relevance so the search feels good to use. Some fields are obviously going to be more important than others and people are going to want to utilize search for things that are to predict ahead of time.

To use the authors example of artists in Brooklyn, people might want to search for artists near them. Now you have to gather location data, format it, ingest it, index it and add it to the search UI.

The fact that adding another field to index on is a vertical integration adds a lot of overhead.

All of this stuff in isolation is not difficult, but when you put it together it becomes quite a lot of work that generally isn't easily scalable.

itsmemattchung a year ago

Reminds of Amazon EBS and a white paper describing the philosophy of deploying millions of tiny databases:

https://assets.amazon.science/c4/11/de2606884b63bf4d95190a3c...

zokier a year ago

Personally I find the whole dBase etc non-SQL kinda-graphical database systems interesting historical software branch that feels mostly died out these days. Access probably did quite a lot of damage here, killing out competitors before mostly succumbing itself.

gcanyon a year ago

FileMaker is still a thing. I don't know their internal financials, but they've steadily improved the product over the years. https://www.claris.com
Or if you want to go super-niche, Panorama is still around, and (they say) the longest-running Mac software developer apart from Microsoft. https://www.provue.com
Either one makes it easy to build a database+interface.
- digitalsankhara a year ago
  
  I had a distant memory about this Mac based spreadsheet/database thing but could not remember its name (Panorama). Couldn't surface it in searches either. Thought about it the other week and here we are!
  Odd pricing though = pay in advance credits. Ummm, not something I'd like to use for work when I'm in the middle of an important analysis with a deadline and I (inevitably) run out of credits and have to start faffing about with in-app purchases. Maybe its not that bad and I'm being unfair.
  
  dangoor a year ago
  
  When I read your comment, I thought the credits system sounded like it was going to be complicated or messy and involving usage. Turns out that they're just trying to make a potentially more affordable subscription model that automatically accounts for times when you're not using the software.
  Each credit allows a month of use. Seems pretty straightforward, but I agree with you that because these aren't an auto-renewing subscription you could find yourself needing to pay at an unexpected time.
gavinmckenzie a year ago

Takes me back to the days of dBase, Clipper, and my favourite FoxPro which was acquired by Microsoft and continued to exist in the 90s. Access definitely destroyed the market for these other products by combining aspects of Visual Basic and database tech.
- pstuart a year ago
  
  FoxPro on the Mac was wonderful. I learned SQL wrangling with analytics on it -- there weren't all the options we have today.

simongray a year ago

This post is an exercise in describing the motivation and features of the Semantic Web seemingly without realising the tech stack already exists.

simonw a year ago

I honestly think that reflects more poorly on the semantic web tech stack than it does on the author of that piece.
I spend almost all of my time thinking about this class of problems and hanging out with other people who do, and sadly it's vanishingly rare to run into anyone outside of academia who's trying to use the classic semantic web stack (RDF an suchlike) to build this kind of thing.
- osi a year ago
  
  i worked at a then-web3.0 startup in the 00’s that had built something that could have been pivoted into this, but instead the CEO wanted to be like Digg instead.
  the commercial community of practice is small for sure.

moehm a year ago

For what it's worth, here is my "small database" attempt, a structured list of worthwhile Wikipedia articles to read.

https://www.mostdiscussed.com

overgard a year ago

People would love this for sports. There's so much interesting data locked up in proprietary databases

roncesvalles a year ago

I'm aware that this may sound dismissive but the solution that the author of the OP is looking for is the World Wide Web itself.

The "small database" in question is, well, an HTML page. It can be shared and passed around by selecting the portions of it that you need and pressing Ctrl+C/Ctrl+V. Search is accomplished by the browser using Ctrl+F. Collaboration can take many forms - wikis, comments, forums, live editing. Links between databases are what URL links are. The database that OP is looking for is a page of text (for unstructured data) or somewhat structured solutions like CSV, JSON, or YAML.

Now, yes, there are certain participants on the WWW who make poor web design choices that cause agreed-upon functionality to break. E.g. unnecessary pagination or accordions breaking Ctrl+F, not offering data for download, not having useful URL paths etc.

Zababa a year ago

I like the idea, but I think one issue is that the database is the easy part. If I look again at the list of requirements, most are not about the database but about how to put data from external source in the database, how to edit the database, and how to publish it. To me this sounds like an interface problem. But since the whole point is small, specialized collections, interfaces have to be specialized too. That means no single tool that can offer a solution. Maybe it's an issue of definition, I call a database something like MySQL or SQLite or even a CSV file, while for the author it's the finished product, the database about <stuff> and the tools that are adapted to <stuff>.

Substack is an interesting example. It's great for written content with a few images, which mostly looks the same everywhere. But it lacks great customisation features that I think a database would need, because that stuff is hard to do.

If I had to propose a solution, it would be this: if you want to do a small database, do it. Experimentation in the cyberspace is very cheap. These days you have lots of resources for everything online. It can be intimidating, and can lead to analysis paralysis. I'm supposed to be a professional developer and still struggle with that. But one thing that has helped me a lot recently is to try stuff, see if it works, if it fails, ask questions (to either real people or ChatGPT/Copilot, Copilot is especially valuable to get in a "just keep writing, editing comes later" mood). It's not always fun, in fact it can be quite frustrating, but that's how things are.

In the end, this is about decentralisation and you can't have proper decentralisation if you don't also decentralise the skills, the know-how. For example, there has been a lot of talk about Mastodon as a decentralised alternative to Twitter. And it is one. But if you simply go from being a user on Twitter to being a user on Mastodon, well you don't regain much control. On the other hand if you try running a small instance, even just a local instance to see how it works, or maybe add a few feature to your preferred client (it can be code, but it could also be helping translation, or maybe a color scheme (you wouldn't believe how many color scheme are barely usable when you're colorblind)), well then you start being in control.

dgudkov a year ago

Small databases aren't popular because Excel spreadsheets already occupy that niche. A small database doesn't have to be normalized. Because it's small, it can be denormalized into a flat table that can be conveniently handled in Excel.

FridgeSeal a year ago

That’s not really analogous and kind of misses some of the other aspects the author talks about.
Excel doesn’t cover the publishing and discovery aspect. It is absolutely atrocious from a machine usability and schema perspective, nevermind performance, etc.
Even if you think excel does address those, I think the shortcomings of the format should rule it out. It is better to have a more powerful tool, and fix the usability aspects, rather than trying to proverbially rub glitter on what amounts to a turd of a format.
- omnipath a year ago
  
  Yeah, but I think you're underestimating what the grandparent post is saying. The people who would PAY for some of these functions are already making do with Excel. And microsoft has responded in kind by increasing Excel's ability to do, well, everything. I'm pretty sure if someone wanted to include those features the article is talking about on top of Excel, they would. Just last week, we saw someone add onto Excel a C# IDE with debugger, and posted about it here. It seems one's limit with Excel is only one's imagination.
  
  FridgeSeal a year ago
  
  And PowerPoint is Turing complete, so I guess we may as well just throw away all these other programming languages and use that going forward because lots of people already use PowerPoint?
  Excel as a format is an awful abomination of XML, and the program itself is an awful experience that people just stick with due to a mix of Stockholm syndrome and inertia. It’s not exactly something I would ever want to “aim for”.
  
  omnipath a year ago
  
  "Excel as a format is an awful abomination of XML"
  You and I may think so, but why would it matter for the end user?
  "and the program itself is an awful experience that people just stick with due to a mix of Stockholm syndrome and inertia"
  Disagree. For the purpose of what Excel was made for, it seems reasonable. Or rather, I've yet to see what benefits alternative experiences produce. For the vast 90% of users, Excel is a 1NF/2NF db, emphasized on visually viewing their data. It's dead simple to enter, dead simple to share, and reasonable easy to add some constraints, some linkage, and show data trends.

topcat31 a year ago

Hey OP here, just wanted to say thanks for all the comments (goats and all). There's lots I still need to learn about (actual) databases as a hobby developer...

In the meantime I've made a big update to the Airtable with links to tools, examples and further reading:

https://airtable.com/shrYY94GrqVB4HUsi/tblHPrdomiPbLpod6/viw...

marniewebb a year ago

H2O — https://h2o.law.harvard.edu/ — is a now-defunct collaborative syllabus project from Harvard that gets at a lot of this I think. It’s basically a list maker with a lot of additional capabilities. While it’s made for small list of things it’s easy to imagine this is a piece of the solution.

082349872349872 a year ago

A search for "filemaker" reveals that Claris is still in business; I'd hope they'd have something that might address this need?

hardwaresofton a year ago

Weirdly enough I haven’t seen too much mention of CMSes — them plus/minus spreadsheet like tools are almost surely the way to handle this kind of use case.

What’s missing is the added search + UI capabilities.

I think about saas ideas a lot and this is actually quite a common one (though I’m generally thinking of a specific niche) —- enabling people to craft and expose datasets would surely be a great startup.

jerryu a year ago

Having a small database is especially useful when collaborating on data strategy. I have seen some database diagrams with 1000s of tables and it is hard to make sense of it using ERD tools.

Even with advanced views offered by tools like ERDLab.io it is a pain in the ass to collaborate on large schemas at various stages of development.

cavisne a year ago

I feel like this is getting really close. GPT is create at writing sql queries from text and turning a blob of semi structured data into an sql schema.

We just need to somehow tie it together so anyone can explain their use case, and show an example of the data in plain english, then lock in a schema and feed everything in.

aabbcc1241 a year ago

For collection of links with short description for projects/services, there are many awesome list on github.

For more complex data to be shared, maybe it can be csv/md/mdx shared over git as well?

It can have stable url and be searchable from github, search engines, and 3rd indicies

vaporup a year ago

https://zed.brimdata.io

maphew a year ago

Makes me think of what something like Datasette fused with Fossil SCM could accomplish.

Trayja-Peter a year ago

"I want to empower more individuals to publish, maintain and collaborate on small indexes. To build a million tiny libraries, community databases, weird collections and indie indexes."

Funnily enough, a friend and I have been building https://Trayja.com, a tool which does this exact thing, with a focus on the "community" aspect. There's a huge amount of wisdom in communities, whose value could be multiplied if it would be aggregated in a structured, indexable, searchable way. This article articulated so much of what I've been trying to explain about my project.

LAC-Tech a year ago

With how fast computers are now, they can work well for small businesses too.