Is an All-in-One Database the Future?

10 points by Etai 2 months ago

Just posting a thought experiment here.

As data challenges become more defined and difficult, purpose-built databases have emerged to optimize specific problems, leading to a proliferation of specialized solutions. This has resulted in complex and messy data infrastructures, patched together by third-party data pipelines and event streaming products.

Companies rarely rely on a single primary data storage system, often deploying multiple databases to handle different needs, which creates an intricate web of interconnected data systems.

But, the idea of a generalist, all-in-one database that is scalable, performant across contexts, and commercially appealing remains elusive.

Different database types, such as relational OLTP, non-relational document, and memory-based cache, are optimized for specific use cases and face unique challenges that prevent effective consolidation. While tools like Object-Relational Mappers (ORMs) attempt to simplify interactions across databases, they fall short in managing non-relational types, underscoring the complexity of creating a unified solution.

Postgres, with its extensibility through plugins, comes close but still falls short of being a one-stop-shop. Ultimately, the technical and practical hurdles make an all-in-one database unfeasible, leaving the modern data stack as a complex but necessary reality.

In theory, an all-in-one database would struggle with optimization, data model overhead, and latency. Each database type is built for specific use cases, making a universal solution inefficient.

Closest Solution Today: Postgres

While not a full one-stop-shop, Postgres can be extended with plugins like pg_vector for vector search. However, it's not intended to solve every problem efficiently, evidenced by complex data stacks in companies using Postgres.

iamlucaswolf 2 months ago

I think this question mixes up two orthogonal dimensions: transactionality (i.e., OLTP vs. OLAP) and data model (relational, document-based, graph, vector, ...).

Regarding transactionality: There is an entire area of research on "hybrid transactional and analytical processing" (HTAP) systems that unifies OLAP and OLTP systems. Hyper [1] pioneered this path at TU Munich, it's successor Umbra [2] recently incorporated as CedarDB [3]. There are lots of others. Most of these systems, AFAIK, are relational.

Regarding data model: What we've seen in the past few decades is that non-relational DBMS (excluding key-value stores) only make sense in rare edge cases that require huge scale. There has, e.g. been research [4] that shows that graph databases are still, well, lacking, compared to relational systems. The common pattern seems to be: unless you need to service very specific workloads at huge scales, SQL is probably enough [5]. Then again, it really comes down to intrinsics. If you were to, for example, implement distributed locking using Postgres, you would likely run into problems with MVCC and Xids very quickly.

So, as you already mentioned, there is no silver bullet. But even today, unless you are Meta or Google, SQL is probably enough for a long time and lots of use cases.

(Full disclosure: I'm working on Hyper full-time).

[1]: https://hyper-db.de/ [2]: https://umbra-db.com/ [3]: https://cedardb.com/ [4]: https://homepages.cwi.nl/~boncz/edbt2022.pdf [5]: https://www.youtube.com/watch?v=VxKt245X_ws

lifeinthevoid 2 months ago

It all depends on the use-case, a generalist database can never achieve the performance of specialized databases once the workload gets large enough. It's like asking for a car that's great off-road but also beats an F1 car on the circuit.

tacostakohashi 2 months ago

Hmm, ever used Oracle?

I'm not here to defend it, but in practical terms, it does everything anyone is likely to need out of the box. Relational, XML, JSON, replication, document...

  • Etai 2 months ago

    Where's it stand compared to some of the more traditional DW systems? Can it handle the workloads snowflake would for analytics?

yen223 2 months ago

I would be curious to know if anyone tried using one of the newer data-warehouse databases (Clickhouse, Snowflake, etc) as an OLTP database, to run their crud app.

  • kwillets 2 months ago

    We came close, as we had architects who knew little besides Snowflake.

    While there are all kinds of issues it can't handle for CRUD, such as locking and small updates, it's also expensive to do small reads due to the reliance on auto-scaling instead of timesharing. If your app pageloads and issues dozens of queries to populate its widgets, SF will be both slow and expensive.

  • hodgesrm 2 months ago

    You can't use ClickHouse this way. It does not have ACID transactions and the main table type (MergeTree) is optimized for reads on append-only workloads that operate on large blocks.

    MySQL & PostgreSQL are vastly better for OLTP use cases.

    • yen223 2 months ago

      I know that you shouldn't, but I'm curious if anyone did.

      I don't think it is impossible to use Clickhouse as a backend to a CRUD app, if you didn't particularly care about things like transactional guarantees or read performance. You still can write rows to it, and read rows from it.

      I've also worked in the business long enough to know that people make all kinds of suboptimal engineering decisions all the time.

      • altdataseller 2 months ago

        I’m sure some people do. Just simply because ppl do all sorts of things they shouldn’t do.