Hi, OP. First, congratulations on launching a product, and thank you for giving it strong copyleft! I ran it as directed, and it's pretty slick. I have some detailed comments on the database side of things that I hope you'll take seriously before trying to scale this. I've ran a distributed Postgres DB at scale for a well-known company that used `yjs` for precisely the same thing you're doing here, so I have some real-world experience with this.
You do not want to run this in Postgres, or any RDBMS for that matter. I promise you. Here [0] is `y-sweet` [1] discussing (at a shallow level) why persisting the actual content in an RDBMS isn't great. At $COMPANY, we ran Postgres on massive EC2s with native NVMe drives for storage, and they still struggled with this stuff (albeit with the rest of the app also using them). Use an object store, use an LSM-tree solution like MyRocks [2], just don't use an RDBMS, and especially not Postgres. It is uniquely bad at this. I'll explain.
Let's say I'm storing RFC2324 [3]. In TXT format, this is just shy of 20 KB. Even if it's 1/5th that size, it doesn't matter for the purposes of this discussion. As you may or may not know, Postgres uses something called TOAST [4] for storing large amounts of data (by default, any time a tuple hits 2 KB). This is great, except there's an overhead to de-TOAST things. This overhead can add up on retrievals.
Then there's WAL amplification. Postgres doesn't really do an `UPDATE`, it does a `DELETE` + `INSERT`. Even worse, it has to write entire pages (8 KB) [5], not just the changed content (there are circumstances in which this isn't true, but assume it is in general). Here's a view of `pg_stat_wal`, after I've been playing with it:
docmost=# SELECT wal_fpi, wal_bytes FROM pg_stat_wal:
wal_fpi | wal_bytes
---------+-----------
1641 | 11537465
(1 row)
Now I'll change a single byte in the aforementioned RFC, and run that again:
docmost=# SELECT wal_fpi, wal_bytes FROM pg_stat_wal;
wal_fpi | wal_bytes
---------+-----------
1654 | 11656052
(1 row)
That is nearly 120 KB of WAL written to change one byte. This is of course dependent upon the size of the document being edited, but it's always going to be bad.
Now let's look at the search query [6], which I've reproduced (mostly; I left out creator_id and the ORDER BY) here:
docmost=# EXPLAIN(ANALYZE, BUFFERS, COSTS) SELECT id, title, icon, parent_page_id, slug_id, creator_id, created_at, updated_at, ts_headline('english', text_content, to_tsquery('english', 'method'), 'MinWords=9, MaxWords=10, MaxFragments=10') FROM pages WHERE space_id = '01906698-1b7c-712b-8d4f-935930b03318' AND tsv @@ to_tsquery('english', 'method');
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on pages (cost=0.00..12.95 rows=1 width=192) (actual time=13.473..48.684 rows=3 loops=1)
Filter: ((tsv @@ '''method'''::tsquery) AND (space_id = '01906698-1b7c-712b-8d4f-935930b03318'::uuid))
Rows Removed by Filter: 3
Buffers: shared hit=32
Planning:
Buffers: shared hit=1
Planning Time: 0.261 ms
Execution Time: 48.717 ms
~50 msec to do a relatively simple SELECT with no JOINs isn't great, and it's from the use of `ts_headline`. Unfortunately, it has to parse the original document, not just the tsvector summary to produce results. If I remove that function from the query, it plummets to sub-msec times, as I would expect.
It doesn't get better if I forcibly disable sequential scans to get it to favor the GIN index on `tsv` (unsurprising, given the small dataset):
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.pages (cost=106.29..110.56 rows=1 width=192) (actual time=17.983..51.424 rows=3 loops=1)
Recheck Cond: (pages.tsv @@ '''method'''::tsquery)
Filter: (pages.space_id = '01906698-1b7c-712b-8d4f-935930b03318'::uuid)
Heap Blocks: exact=1
Buffers: shared hit=41
-> Bitmap Index Scan on pages_tsv_idx (cost=0.00..106.29 rows=1 width=0) (actual time=1.231..1.231 rows=7 loops=1)
Index Cond: (pages.tsv @@ '''method'''::tsquery)
Buffers: shared hit=25
Planning:
Buffers: shared hit=1
Planning Time: 0.343 ms
Execution Time: 51.647 ms
And speaking of GIN indices, while they're great for this, they also need regular maintenance, else you risk massive slowdowns [7]. This was after having inserted a few large-ish documents similar to the RFC, and creating a few short pages organically.
docmost=# SELECT * FROM pgstatginindex('pages_tsv_idx');
version | pending_pages | pending_tuples
---------+---------------+----------------
2 | 23 | 26
Let's force an early cleanup:
docmost=# EXPLAIN (ANALYZE, BUFFERS, COSTS) SELECT gin_clean_pending_list('pages_tsv_idx'::regclass);
QUERY PLAN
--------------------------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=8) (actual time=16.574..16.577 rows=1 loops=1)
Buffers: shared hit=4659 dirtied=47 written=22
Planning Time: 0.322 ms
Execution Time: 16.776 ms
17 msec doesn't sound like a lot, but bear in mind this was only hitting 4659 pages, or 37 MB. It can get worse.
You should also take a look at the DB config if you're to keep using it, starting with `shared_buffers`, since it's currently at the default value of 128 MB. That is not going to work well for anyone trying to use this for real work.
You should also optimize your column ordering. EDB has a great writeup [8] on why this matters.
Finally, I would like to commend you for using UUIDv7. While ideally I'd love (as someone who works with DBs) to see integers or natural keys, at least these are k-sortable. Oh, and foreign keys – thank you! They're so often eschewed in favor of "we'll handle it in the app", but they can absolutely save your data from getting borked.
[0]: https://digest.browsertech.com/archive/browsertech-digest-fi...
[1]: https://github.com/jamsocket/y-sweet
[2]: http://myrocks.io
[3]: https://www.rfc-editor.org/rfc/rfc2324.txt
[4]: https://www.postgresql.org/docs/current/storage-toast.html
[5]: https://wiki.postgresql.org/wiki/Full_page_writes
[6]: https://github.com/docmost/docmost/blob/main/apps/server/src...
[7]: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4...
This is excellent info.
However, do you think an application like this would be database bound? It feels like so many more things would limit throughout (and product market fit) before Postgres.
Given its buried a bit in their stack, they can always optimize later as well.
I think it sounds like premature optimisation too. To ever hit these problems would mean the project had been a roaring success. And they would be much easier to fix later than complicating installs early on.
I think what the OP is trying to get across is that storing text documents in which you're live-editing, will cause a lot of updates, and that these updates may be relatively big in the database (the size of the whole document) compared to the actual edit (a few characters at a time), and can also get quite large in number quickly even as a small number of people are live-editing. I don't know this for any kind of fact, but I could see how this is a fundamental architectural problem, rather than a (premature) optimization. I get that we don't need to build for infinite scale right out the gate, but I could see that the OP is possibly onto something here worth verifying at the very least.
Precisely this. There’s a difference between premature optimization and choosing the correct technology. Even then, I’m willing to compromise; for example, using a DB as a queue. Is it what they’re designed to do? Absolutely not, but it’s easy to implement, and crucially, it’s a lot easier to later shift queues – by definition, ephemeral data — than your persistence model.
At scale, I can definitively say that yes, Postgres becomes the bottleneck for this. The company I worked for was using the same libraries, the same stack.
Re: optimize it later, your persistence model is the hardest thing to change once it’s built and full of data. I strongly recommend getting that right the first time.
Not OP but thanks for sharing, excellent comment.
I'm a bit surprised. Postgres is one of the possible database you can use on XWiki, which is something quite comparable to Confluence. Postgres seems to do fine and relational databases have strengths that are not to be discarded too early.
Confluence itself uses relational databases including Postgres, and they seem to do well too.
And I know for a fact that both handle huge wikis. (hundreds of spaces, millions of document revisions)
I’m by no means suggesting they abandon RDBMS, just that they shift large textual / binary content out of it, and store pointers instead.
You can store essentially anything in an RDBMS, especially Postgres. That doesn’t make it the right choice. It might make it easier, but easier isn’t the same thing as correct.
I am of course biased as a DBRE, since I’m the one who gets paged about problems caused by decisions like this. Then I get to deliver the uncomfortable news that everything needs to be overhauled, and inevitably am told that’s not going to happen, and to just figure it out.
I truly appreciate your in-depth articulation.
Indeed, the Yjs state update can be problematic due to its growing size and constant updates.
I will have a look at MyRocks. Reference pointers sound more plausible.
I spent time analyzing and deciding my usage of uuid7 from different perspectives. From the git logs, you can see it came at the last minute.
As someone who: - would love to self host, and would much prefer keeping it to Postgres / Redis for which there is widespread knowledge on how to host - would like you to keep your development velocity up
I'd encourage you not to switch databases, or at least to defer this for a while. I can't imagine you'll have issues with the amount of WAL written for quite a while, and by that time, the world could be quite different (OrioleDB might be mature! https://github.com/orioledb/orioledb)
MyRocks speaks MySQL. It’s just a different storage engine behind MySQL, replacing InnoDB. The major change I can think of would be replacing the `uuid` type with `bin(16)`, and needing to do the encoding and decoding in the app.
Re: development velocity, IMO there’s a solid base product now. I feel like it’s a great time to make a solid change to help future velocity, but I’m not OP.
I should have been more explicit.
I do not intend to change the Postgres database or introduce a new one. I’m sure this won’t be an issue for the majority use-case.
However, I am open to learning more about alternate ways to efficiently handle Yjs state updates, which may be useful for a cloud version that would run at scale. If I were to go that way, it would not affect self-hosted users and would probably be via a reference pointer and not a database switch.
This is absolutely not an issue at the moment. Nothing to worry about.
Agreed that if you want to keep the relational part, using the DB to store pointers is probably the easiest, though it might require rethinking search.
I appreciate you at least considering the various options. If you do nothing else, tuning Postgres parameters, optimizing column ordering, and removing duplicated indices will be a great step forward that is completely backwards-compatible.
Are these numbers the same for JSON/jsonb data?
Ed: looks like it https://pganalyze.com/blog/5mins-postgres-jsonb-toast
Forgot one other thing: there is at least one duplicated index, where there’s a UNIQUE constraint on a column (I think on Pages?), and then an index on the same column. The unique constraint sets up an index already, so you don’t need both.
nice of you to help OP