Show HN: DriftDB – an open source WebSocket backend for real-time apps

367 points by paulgb a year ago

Hey HN! I’ve written a bunch of WebSocket servers over the years to do simple things like state synchronization, WebRTC signaling, and notifying a client when a backend job was run. I realized that if I had a simple way to create a private, temporary, mini-redis that the client could talk to directly, it would save a lot of time. So we created DriftDB.

In addition to the open source server that you can run yourself, we also provide https://jamsocket.live where you can use an instance we host on Cloudflare’s edge (~13ms round trip latency from my home in NY).

You may have seen my blog post a couple months back, “You might not need a CRDT”[1]. Some of those ideas (especially the emphasis on state machine synchronization) are implemented in DriftDB.

Here’s an IRL talk I gave on DriftDB last week at Browsertech SF[2] and a 4-minute tutorial of building a cross-client synchronized slider component in React[3]

[1] https://news.ycombinator.com/item?id=33865672

[2] https://www.youtube.com/watch?v=wPRv3MImcqM

[3] https://www.youtube.com/watch?v=ktb6HUZlyJs

alexisread a year ago

I've not looked at DriftDB in depth (cloudflare worker running this is neat!), but can't MQTT handle this sort of workload?

Obv. there's not a cloudflare worker running say an MQTT server over websockets, but you can scope topics with wildcards (https://www.hivemq.com/blog/mqtt-essentials-part-5-mqtt-topi...), replay missed messages on reconnection, last-will-and-testament, ACLs, dynamic topic creation, binary messages etc.

I'm asking as many of these websocket projects seem to use custom protocols rather than anything standard aka interoperable.

paulgb a year ago

The problem with MQTT is that most of the use cases I’m interested in involve a web browser as at least one party of the connection, and the browser doesn’t support MQTT. I could wrap MQTT in a WebSocket, but then I’d lose the advantages of MQTT’s compactness and interoperability (unless MQTT-over-WebSocket is a thing?)
The other operation that I haven’t seen elsewhere, but is vital to enabling stream compaction without a leader, is the idea of a stream rollup up to a specific stream number. NATS Jetstream, for example, has the ability to roll up an entire stream, but if another message hits the stream between when the rollup is computed and when it arrives at the server, that message too will be replaced (IIRC). So I thought about using NATS (which already has a WebSocket protocol), but ruled it out.
- elithrar a year ago
  
  > The problem with MQTT is that most of the use cases I’m interested in involve a web browser as at least one party of the connection, and the browser doesn’t support MQTT. I could wrap MQTT in a WebSocket, but then I’d lose the advantages of MQTT’s compactness and interoperability (unless MQTT-over-WebSocket is a thing?)
  We support MQTT over WS (or JSON over WS, or just HTTP) in Cloudflare Pub/Sub, FWIW - https://developers.cloudflare.com/pub-sub/learning/websocket...
  I also agree with the comments re: MQTT being well suited to a lot of these "broadcast" use-case, but that the IoT roots seem to hold it back. MQTT 5.0 is just a great protocol — clear spec, explicit about errors, flexible payloads — that make it well suited to these broadcast/fan-in/real-time workloads. The traditional cloud providers do MQTT (3.1.1) in their respective IoT platforms but never grew it beyond that.
  
  jconley a year ago
  
  Can I get in on that beta? Submitted the form yesterday. Currently building something that could use it. ;)
- alexisread a year ago
  
  MQTT-over-websocket does exist (https://github.com/mqttjs/MQTT.js), and most MQTT brokers support it (Mosquito, AmazonMQ etc.). You're right about the compaction - MQTT doesn't have anything in it's protocol about compaction, and I don't know of any brokers that implement it. Having said that, you could use an MQTT-kafka bridge.
  Something like Mosquito + https://github.com/nodefluent/mqtt-to-kafka-bridge + Redpanda in a docker image would work, though obv. this might be a bit overkill for most. Having said that, it does open many new avenues for interaction at scale. You pays your money...
  
  fud101 a year ago
  
  What is compaction?
  
  paulgb a year ago
  
  Compaction is where you take a chunk of messages and replace them with a single message.
  For example, one of the DriftDB demos is a counter (https://demos.driftdb.com/counter). State is synchronized by putting increment/decrement events into a stream. When a new user connects, their client get all the messages in the stream, plays them back, and arrives at the same state as everyone else.
  If that’s all we did, over time, the stream would grow unruly. It would take ages to load the page because we’d have to load every state change. But we only really care about a single numeric value. Compaction takes a chunk of messages that look like this:
  {"apply":"increment"} {"apply":"increment"} {"apply":"decrement"} {"apply":"increment"}
  And replaces them with a message that looks like this:
  {"reset":2}
  DriftDB doesn’t know how to compute the compaction, it relies on clients to do that. When a client does something that increases the length of the stream, the server sends back the new length of the stream, so that the client can decide whether to compact it (i.e. if it passes some threshold).
  The important part that I haven’t seen elsewhere is that when a client compacts the stream, it includes a sequence number of the last message that’s part of the compaction. The server will preserve messages greater than that sequence number, since they are not part of the compaction.
- chrisdalke a year ago
  
  Most MQTT implementations do support MQTT-over-Websocket. I use it extensively at work and it's been fairly reliable!
- Kinrany a year ago
  
  Isn't it fairly easy to implement rollups as a stream of rollups with matching stream numbers?
nine_k a year ago

Maybe it's the richness of MQTT that makes it a worse choice for a startup. Offering a conformant MQTT broker is a lot of work, and the semantics come from elsewhere, not geared towards emphasizing your unique advantages.
Building a much simpler, custom-tailored protocol allows to ship faster, and improve gradually. If the point is to deploy on Cloudflare in a massively-parallel fashion (which is likely harder for a regular MQTT broker), the custom protocol allows to concentrate on that special advantage, and not on standards conformance or interoperability with a bevy of existing libraries.
manv1 a year ago

Funny, the IoT space has bought into MQTT but the general internet space has not.
MQTT scales and works. And it's easy, fast, and small.
I've been trying to get our guys to do MQTT-based pub/sub, and they're rather do their own thing with web sockets because MQTT is scary. <shrug>.
That's the problem when front-end guys make decisions about tech sometimes, they choose stuff that seems easy to integrate without caring about things like deployment, scalability, capabilities, etc.
- Scottopherson a year ago
  
  Jeez that's a big paint brush you're slinging around.
  That's the problem when non-front-end guys make decisions about tech sometimes, they choose stuff that seems easy to integrate without caring about things like accessibility, design scalability, client device capabilities, etc.
  
  nine_k a year ago
  
  How does a wire protocol relate to UX concerns like accessibility or design scalability?
  Client device capabilities are there, MQTT is neither rocket science nor a resource hog, since it was designed for underpowered IoT devices.
- manv1 a year ago
  
  I mean, it'd be trivial to write stream replay for MQTT. It's literally just stashing messages and sending them back on connect. Not sure what the issue is there.

samhuk a year ago

Looks interesting. Coincidentally, I've just completed the bulk of work on a distributed Websocket network system to synchronize certain bits of state between multiple clients for my own kind of Storybook tool [0]. How interesting!

This kind of tool is exactly what I would have needed, instead of the approach I've taken which is a bit kludgy and grass-roots.

By far the most difficult part of it for me was ensuring that the web socket network can heal from outages of any of the clients or the server. E.g. If a client loses connection, how does it regain knowledge of state? If the server dies, what do clients do with state changes they want to upload? Etc. It was really difficult!

Good work :)

[0] https://github.com/samhuk/exhibitor/pull/22

mrtksn a year ago

How the race conditions are handled? If one of the clients of the shared state delivers the the input with a delay(network issue etc.), will it overwrite state of the other client once delivered or will be dismissed? Is there a concept of slave/master client?

Edit:

So, I played a bit and it appears that if a client is disconnected and changes of the state happens when offline, once connected these changes will be applied to the other client who was having its own changes in the state. So its working on the "last message" basis? Also it seems like it can't detect the offline/online status?

I'm curious because the interesting part of this kind of systems is the way races are handled.

paulgb a year ago

> So, I played a bit and it appears that if a client is disconnected and changes of the state happens when offline, once connected these changes will be applied to the other client who was having its own changes in the state. So its working on the "last message" basis? Also it seems like it can't detect the offline/online status?
From the server’s point of view, it’s just an ordered broadcast channel with replay. The conflict semantics are whatever you build on top of that.
The `useSharedState` hook in the React bindings implements last-write-wins. For the `useSharedReducer` hook, the reducer itself determines the semantics, but in the voxel editor demo we also use last-write-wins.
> Also it seems like it can't detect the offline/online status?
Online/offline status is exposed in the client libraries, e.g. in the react bindings there is a useConnectionStatus hook: https://driftdb.com/docs/react#useconnectionstatus-hook
> I'm curious because the interesting part of this kind of systems is the way races are handled.
It’s academically the interesting part, but I think it matters less than people assume it does. Here’s a section from a blog post I wrote a couple months ago:
> Developers may find it tempting to treat collaborative applications as any other distributed systems, and in many ways that’s a useful way to look at them. But they differ in an important way, which is that they always have humans-in-the-loop. As a result, many edge cases can simply be deferred to the user.
> For example, every multiplayer application has to decide how to handle two users modifying the same object concurrently. In practice, this tends to be rare, because of something I call social locking: the tendency of reasonable people not to clobber each other’s work-in-progress, even in the absence of software-based locking features. This is especially the case when applications have presence features that provide hints to other users about where their attention is (cursor position, selection, etc.) In the rare times it does occur, the users can sort it out among themselves.
> A general theme of successful multiplayer approaches we’ve seen is not overcomplicating things. We’ve heard a number of companies confess that their multiplayer approach feels naive — especially compared to the academic literature on the topic — and yet it works just fine in practice.
https://driftingin.space/posts/you-might-not-need-a-crdt
- mrtksn a year ago
  
  Good point, in the case of users interacting it’s probably a non issue. Thanks for the insight.

rlt a year ago

Neat.

> DriftDB is a real-time data backend that runs on the edge

What does it mean for these backends to be “on the edge”? Do geographically disperse clients connect to different backends? If so are messages synchronized between them? If so what’s the point of them being on the edge?

paulgb a year ago

By “on the edge”, I mean that if you’re in London and I’m in Amsterdam, and we want to exchange messages, the messages shouldn’t have to do a round-trip through Virginia, they should go through a server closer to both of us. (Of course, if I’m in SF and you’re in London, this is less of a win.)
The way it works in DriftDB is that everything is siloed into “rooms”, which are effectively broadcast channels. The room is started based on the geography of the person who first joins it (Cloudflare handles this part).
- rlt a year ago
  
  > The room is started based on the geography of the person who first joins it
  Cool, makes a lot of sense because people using a given “room” are often likely to be geographically collocated.
  
  paulgb a year ago
  
  Exactly!
- trollitarantula a year ago
  
  Nice! Would love to see Cloudflare deployment guide. Cloudflare isn't mentioned in the docs.
  
  paulgb a year ago
  
  Ah, you’re right, I haven’t written that up yet. The tl;dr is something like:
  cd driftdb-worker npm i npm run deploy
  You’ll need to sign in to wrangler if you haven’t already, and will need to have rustc/cargo available (wrangler will install some things and build it into a WebAssembly module).
  
  SpaghettiX a year ago
  
  A bit more detail since I've used Cloudflare Durable objects:
  DriftDB's cloudflare implementation uses durable objects, so you need a "workers paid subscription: https://developers.cloudflare.com/workers/runtime-apis/durab.... It's $5/month.
  Side note: the durable object API is quite verbose, so it's nice to see something building on-top-of/encapsulate that. I wonder if this would compete with Cloudflare's pub/sub product though. I wonder how Cloudflare will handle situations like this. They don't look so good today after taking down a customer: https://news.ycombinator.com/item?id=34639212
  
  kentonv a year ago
  
  Durable Objects is a low-level primitive which you can certainly build all sorts of distributed systems on -- including ones that compete with our own products. In fact, those products of ours are often themselves built on Durable Objects!
  Personally, I would be absolutely thrilled to see people building their own custom versions of these products directly on DO, and I think the rest of the team would agree. Our goal is to productize our physical network (machines in hundreds of locations worldwide). We build high-level products to make it easier for people to use us, but if you want to build your own versions of those products based on our lower-level primitives, that's great!
  (I'm the lead engineer on Workers.)
  (I don't know the story with that other customer from earlier today, so cannot comment there, sorry.)
- HighlandSpring a year ago
  
  Oh, cool! So kinda like IRC?
  
  paulgb a year ago
  
  Yes, the concept of rooms is analogous to rooms in a chat service. One difference from IRC as a protocol (besides being over websocket) is that each connection corresponds to exactly one room (since different rooms may be on different servers.)
fernandopj a year ago

OP must have meant it runs on Cloudflare Edge.
- scaredginger a year ago
  
  Please explain your reasoning here
  
  paulgb a year ago
  
  That’s essentially what I meant. The core database is separate from the Cloudflare parts, so it could in theory run on other edges (I want to get it running on fly.io!), but for now “the edge” can be read as “Cloudflare Workers”.

BTBurke a year ago

This is great. I'm going to use this with something I'm working on. The edge behavior is just what I need.

When you say limitations are a "relatively small number of clients need to share some state over a relatively short period of time," I read in another comment about a dozen or so clients, but what about the time factor? Can it be on the order of hours?

paulgb a year ago

> but what about the time factor? Can it be on the order of hours?
So far I’ve focused on use cases where clients are online for overlapping time intervals. When all the clients go offline, Cloudflare will shut down the worker after some period and the replay ability will be lost. The core data structure is designed such that it could be stored in the Durable Object storage Cloudflare provides, but I haven't wired it up yet.
- BTBurke a year ago
  
  One more thought - any consideration of hooking this to Cloudflare's queue? Then you could optionally connect another worker to that and e.g. persist everything in their D1 SQLite database.
  
  paulgb a year ago
  
  I haven’t looked at the queue specifically, but Durable Objects have a nice key/value storage mechanism that happens to map nicely. It would take a bit of munging to make it work for a stream instead of a single value, but I have a design in mind.
- BTBurke a year ago
  
  That works perfectly for what I'm using it for. Thanks for building this!

jcq3 a year ago

I didn't find the use case section, the first thing I read before code, implementation example or whatever. Why is it always lacking in SaaS landpages?

paulgb a year ago

Good feedback, here you go :) https://github.com/drifting-in-space/driftdb/commit/8d946217...
- jcq3 a year ago
  
  Beautiful, now it makes me want to use your tool because I can relate to use cases I might have...

speps a year ago

Reminds me of Colyseus: https://github.com/colyseus/colyseus

Colyseus has support for persistence as well as matchmaking!

dabeeeenster a year ago

This is super interesting! Do you have any data on how well this scales when running on Cloudflare Edge? Can you run more than one instance and have them share state?

paulgb a year ago

Thanks! When hosted on Cloudflare, it uses their Durable Objects product. Rather than running multiple backend instances that share state, it's set up so that all users in the same "room" are connected to the same instance. The instances can then be scaled out horizontally (but Cloudflare takes care of that.)
Within a room, things are a bit more constrained. We haven't found the limit yet, and I suspect it's pretty high, but our design goal was to support on the order of dozens of users in a room, not necessarily beyond that. (Targeting e.g. a shared whiteboard use case)
- tmikaeld a year ago
  
  We also looked at using Cloudflare, but it was prohibitively expensive, because you pay for the duration of each "room" (Connection, depending on how you use it).
  https://developers.cloudflare.com/workers/platform/pricing/#...
  Eventually we went with Centrifuge.
  
  paulgb a year ago
  
  Yeah, it remains to be seen whether it is economical for us to keep the hosted version on CF. I suspect that for users who want to run their own geographically distributed instance of it, CF will be the path that makes sense for the majority of them.
  Who did you end up going with as a hosting provider? (Centrifuge looks to be a library, if I’m looking at the right thing)
  
  tmikaeld a year ago
  
  https://github.com/centrifugal/centrifugo (Server/Admin)
  https://github.com/centrifugal/centrifuge (Server core)
  https://github.com/centrifugal/centrifuge-js (Library)
  It's a complete solution, including server, admin panel and client library.
  We're an European company and use OVH, Hetzner and others.
  
  e1g a year ago
  
  "Centrifuge" as in https://github.com/centrifugal/centrifugo ?
  
  unraveller a year ago
  
  CF edge wants you to be more one and done, very anti connection. Deno is the better priced edgejs compute for websockets last I checked.
  Probably still worthwhile for DriftDB SaaS if mainly short lived connections are used, even though similar functionality can be had with NATS bridge + an ordered streaming library in your fav language on fly.io
  
  crabmusket a year ago
  
  I can't find any mention of websockets in Deno Deploy's documentation. Last I heard they had not released anything. But I'm really hoping for more providers to copy CloudFlare's edge features, especially durable objects.

bruth a year ago

Nice, have you come across NATS? https://nats.io. The server natively supports WebSockets. There are many clients including Deno, Node, WebSockets, Rust, Go, C, Python, etc.

In addition to stateless messaging, it supports durable streams, and optimized API layers on top like key-value, and object storage.

The server also natively supports MQTT 3.1.1.

SpaghettiX a year ago

Nats is not something I see as a competitor for external clients (browsers, mobile apps), primarily because it doesn't handle reconnections / message delivery / quality-of-service / at-least-once or exactly-once delivery (except for MQTT).
> When the connection is lost, your application would have to re-create it and all subscriptions if any. https://github.com/nats-io/stan.go#connection-status
Therefore, I don't see what it adds here. It seems designed for service communication, not client-server. They also don't list browsers as a use case https://docs.nats.io/nats-concepts/overview#use-cases. (though it is of course possible, it's just not ideal IMHO.)
They still have a js/browser client library though if you want to use them: https://github.com/nats-io/nats.ws. And yes, their servers "have websocket support".
- bruth a year ago
  
  In fact it does all of these things now properly! STAN (NATS Streaming) was deprecated two years ago in favor of a new embedded subsystem called JetStream: https://docs.nats.io/nats-concepts/jetstream released in March 2021.
  
  SpaghettiX a year ago
  
  Even with NATS jetstream, NATS has a focus on service communication.
  "It supports websockets" and "qos" does not mean it will work robustly with web apps if nobody uses NATS for that use case. See https://github.com/nats-io/nats.ws/issues/172 for an example issue. If NATS is not used for websockets in browsers, it will have a mine field of issues to fix. And what about all the other clients (mobile, mobile web)? Sure there may be a NATS client library for it, but it won't handle user connectivity issues, because again it's aimed at service communication where the network is great.
  People are using NATS in kubernetes, not web browsers.
  
  bruth a year ago
  
  > Even with NATS jetstream, NATS has a focus on service communication.
  It indeed excels at service communication as well. However, a core use case for NATS is the edge, be it your definition (browsers and mobile), but also in cars, factories, tractors, low-orbit satellites, etc, whether it is running on Kubernetes, k3s, or bare metal.
  The issue you called out is a Firefox-specific issue, but it will be addressed and not indicative of an inherit limitation of NATS.
  Check out this playlist of a live event I organized last fall with a variety of live demos: https://youtube.com/playlist?list=PLgqCaaYodvKY6xRbvB6ffON0_...
  
  SpaghettiX a year ago
  
  Currently, the only people I see talking about NATS for edge is Synadia - they're also not very specific. In theory/documentation, "edge" is a core NATS use case, but in practice why does NATS compare themselves to Kafka and microservices? Most of that playlist is not edge-focused. Can you explain what concepts and problems you have to solve to support the "edge" - none of that is in your website.
  > The issue you called out is a Firefox-specific issue, but it will be addressed and not indicative of an inherit limitation of NATS.
  My point is NATS is not being used in browsers, mobile apps or edge use cases. It doesn't even explore the concepts. It looks like it doesn't care about Firefox. For IoT, what does NATS bring on top of MQTT? NATS ends up being an MQTT broker so it will have to compete with all of them.
  Why don't you start comparing yourself to products and technology that serve the edge (other edge-focused companies (ably, pusher, pubnub), and other MQTT brokers)?
  Side note: Would appreciate it if you disclosed your affiliation with Synadia and NATS before advertising it.
  
  bruth a year ago
  
  > Would appreciate it if you disclosed your affiliation with Synadia and NATS before advertising it.
  Fair point, but I did not mention Synadia. FWIW, I have been a NATS user for seven+ years prior to joining Synadia so I was speaking on behalf of myself and experience with the tech.
  Also fair point that the nats.io website does not highlight this strongly. The NATS maintainers are aware (nearly all employed by Synadia) and we are working on it.
  I disagree that simply because it is not advertised as a "edge" technology that it is not one of the best-in-class techs for edge. It simply means, we are doing a poor job at awareness.
  > Why don't you start comparing yourself to products and technology that serve the edge
  The vast majority of people and customers compare NATS to Kafka and the variety of variants out there. Once the push on edge occurs, I suspect comparison to these other tech will occur.
  To be clear, I am not looking for a "winner" in this discussion, rather my original comment was to correct a gap in understanding of what NATS is capable of.
  
  SpaghettiX a year ago
  
  Thanks
  It's nice to hear that NATS is tackling this problem space. I'll give it a try.
  > I disagree that simply because it is not advertised as a "edge" technology that it is not one of the best-in-class techs for edge. It simply means, we are doing a poor job at awareness.
  It's very easy to say "NATS is for everything", which seems to be the case here. It would be great if that was backed up with evidence.
  
  bruth a year ago
  
  It is a top priority. I do appreciate the constructive criticism and conversation. Feel free to reach out to me (byron at synadia dot com) or on the NATS Slack (https://slack.nats.io).
  
  autobeam a year ago
  
  actually, NATS can be used in the browser in a way that eliminates the need to REST and offers both client/server communication as well as realtime functionality, a big benefit of that that full stack developers, will be able to communicate between web app and backend services in the same way that services talk to each other in a secure and reliable way, i wrote a blog post about that here: https://www.ahmed.wiki/blog/nats-more-productivity-client-de...
  Not saying that driftDB is not cool, it is a nice tool, the point is that NATS has a great way of streamlining the communication between all components of a system including client apps, (react + flutter) in my case
  
  SpaghettiX a year ago
  
  > a big benefit of that that full stack developers, will be able to communicate between web app and backend services in the same way that services talk to each other in a secure and reliable way
  In practice, I have not seen that manifest as a benefit. Services have a dramatically different environment than edge devices. Tools built for services (NATS, Kafka, gRPC) do not translate well to the edge. The latter is used by a group of people who don't care or understand the edge-edge-cases: when a user drives through a tunnel and is disconnected, or when they restarts their device, or is throttled by an OS, etc. One issue I found with grpc-web (the alternative to grpc that supports browsers) is that it's severely limited by connection count by the browser - making streams completely useless). Also, grpc-web is neglected by Google.
  NATS does not look ready, and is not designed for use, in browsers or mobile apps. It's not a use case that Synadia/NATS care much enough to even mention on their website.
  > (react + flutter) in my case
  How are you using NATS in Flutter? Using the client library that hasn't been updated in 20 months, with no link the repository and 4 upvotes? Or writing your own custom library to connect using websockets. If you use websockets directly, you'll be writing extra code to handle disconnections, retries, qos, etc.
  
  paulgb a year ago
  
  How do you prevent a malicious user from taking the wss endpoint you’re using, subscribing to a wildcard, and seeing messages intended for other clients?
- SpaghettiX a year ago
  
  They also don't list themselves as competitors to products that do target this market: ably, firebase messaging, pubnub, pusher. Instead, they compare themselves to Kafka, RabbitMQ, Pulsar, and gRPC, none of which work well on browsers. (Yes grpc-web "works" on the browser, but I suggest everyone avoids it.)
  https://docs.nats.io/nats-concepts/overview/compare-nats
paulgb a year ago

NATS is great, we use it in another project, Plane (https://plane.dev). The reasons I didn’t use it instead of making DriftDB:
- In NATS, unless you set up authentication, any user can subscribe to “>” and get a firehose of every message, even if they don’t know the room IDs.
- NATS Jetstream supports rollups, but they roll up the entire stream, rather than up to a certain sequence number. This would break our ability to do leaderless compaction.
- bruth a year ago
  
  Was not aware of Plane, nice! Regarding the two points:
  - A unique room ID/subject is a form of authentication. Essentially anyone having that unique identifier can join, akin to a token. This is straightforward to setup in NATS avoiding the ">" for all problem (which I may now need to write a blog post about ;-)
  - Rollups are supported on a per-subject basis. Each room could be modeled as a subject and individually rolled up.
  
  paulgb a year ago
  
  Re. #1, can you elaborate? The tracking issue for this is still open[1]. As far as I can tell this is a hard blocker for any use case where a random user on the web can connect to NATS, since it means that user can wiretap any room without knowing the room ID.
  Re #2, the problem is that a rollup of a subject in NATS rolls up the whole subject, so there’s a race condition if you try to use it the way DriftDB uses it. If one client is computing a compaction while another client sends a message, that message will be erased by the compaction.
  This works if a single producer is writing to a stream, because that producer can stop emitting messages during the compaction. But in our case, each client can produce messages at any time.
  DriftDB solves this by sending a sequence number alongside the rollup of the last message included in the rollup, and the server preserves messages after that sequence number.
  [1] https://github.com/nats-io/nats-server/issues/2667
  
  bruth a year ago
  
  #1: This isn't a hard requirement to achieve the desired permissions. For the DriftDB use case, my understanding is that all members in the room have full pub/sub k/v permissions, so that could be achieved by declaring a new permission pinned to the room when the room is created or joined (this can be done dynamically without a config file reload).
  #2 Publishes do support optimistic concurrency control using the `Nats-Last-Expected-Sequence` (stream level) or `Nats-Last-Expected-Subject-Sequence` for the subject-level. This ensures to concurrent publishes will be serialized and all but one is rejected with "conflict wrong sequence" error. For example headers in Rust[0] and WS[1]
  [0]: https://docs.rs/async-nats/latest/async_nats/header/index.ht...
  
  paulgb a year ago
  
  #1 your understanding of the DriftDB permission model is correct, but I’m not sure what declaring a new permission at runtime entails? Would I be creating a new bearer token for each room, and attaching the room’s permissions to it?
  #2 the optimistic concurrency headers don’t solve the problem here, e.g.:
  - I increment a counter (seq: 1)
  - I increment a counter again (seq: 2, expected last sequence: 1)
  - I begin computing a snapshot, resulting in a counter value of 2
  - You increment the counter (seq: 3, expected last sequence: 2)
  - I complete the snapshot and publish it with a Nats-Rollup header
  - Your event has been lost, the counter value is now 2
  
  bruth a year ago
  
  #1: Correct, I putting together an example this week to show what this looks like. Pretty straightforward.
  #2: You can combine rollup and expected last sequence header to prevent this, unless I am missing another subtle detail?
  (I am enjoying this thread FWIW :)
  
  paulgb a year ago
  
  Cool, looking forward to the example :)
  I actually didn’t realize that the expected last sequence could be combined with nats-rollup. In the example above (as I understand it) if we added that to the snapshot, NATS would throw out the snapshot, so the log would still be correct, but the work of snapshotting it and sending it over the wire would be lost. If messages were frequent enough, and/or snapshots took a while to compute/transfer, you might never have a roll-up succeed.
  Our approach is that a roll-up will always succeed, we just preserve any messages in the stream with a sequence number greater than the one provided.
  (I’m enjoying it too, and am a user of NATS so I’m happy to learn things I didn’t know about it :)
  
  bruth a year ago
  
  > If messages were frequent enough, and/or snapshots took a while to compute/transfer, you might never have a roll-up succeed.
  Yes, good point. The "snapshotting up to a lagging sequence" could be achieved with two separate subjects to reduce contention, but is a bit more work.
  It sounds like, in Drift's case, the snapshot effectively brings up the tail (snapshot), but the head can still be appended to with new events.
  
  paulgb a year ago
  
  > It sounds like, in Drift's case, the snapshot effectively brings up the tail (snapshot), but the head can still be appended to with new events.
  Yep, exactly. It’s a subtle feature but it makes it possible for multiple clients to attempt to compact without worrying about races.

ArtWomb a year ago

Seems expensive no? To start a http container per request? But I suppose it does solve many server side persistence issues. And I love the power it affords you in creating virtual worlds. Awesome stuff ;)

https://github.com/drifting-in-space/plane

paulgb a year ago

We created Plane, but we’re actually not using it for this! DriftDB stemmed out of realizing that a lot of the use cases people were coming to Plane for were simple WebSocket servers for which spinning up a container is excessive.
Plane is still great (I mean, I’m biased) if you want to run a WebSocket server that implements custom business logic, uses heavy compute, GPUs[1], or is stateful.
[1] teaser: https://canvas.stream/
- ArtWomb a year ago
  
  Blender over WebRTC demo looks fast too ;)
  
  paulgb a year ago
  
  Thanks! My colleagues gave a talk last week on streaming data visualization that you might like: https://www.youtube.com/watch?v=0WyeZ9lKdSU

jwilber a year ago

Awesome stuff. Here's a short video talking about DriftDB at Browsertech SF (I believe this is an put on by them ("Drifting in Space"): https://www.youtube.com/watch?v=wPRv3MImcqM

winrid a year ago

Reminds me of DerbyJS and ShareDB/Racer. It's a pretty productive stack, but came out at the wrong time. You can plug in different storage engines (mongo, postgres) and it handles conflicts via operational transform.

JohnCClarke a year ago

useState() --> useSharedState()

My brain just exploded with how perfect this DX is! Love it!

globalise83 a year ago

This looks just about perfect for powering all those team online games we all played a lot during lockdown (and still do), is that right?

paulgb a year ago

Yep, in fact, it was making a word game[1] for my family to play on zoom calls early in lockdown that sent me down the rabbit hole of synchronizing state in distributed systems.
[1] https://word.red
Thaxll a year ago

Websocket is not very good for online games because it's TCP based, also there are millions of websockets library in every languages.
- paulgb a year ago
  
  Right, WebSocket is fine for a chess game, but you wouldn’t use it for a first-person shooter.
  If you do want UDP from a browser (via a WebRTC data channel) you first need a side channel to establish the connection, and DriftDB is handy for that.

matt-attack a year ago

> DriftDB is a real-time data backend that runs on the edge.

What does "on the edge" mean in this context? Can I just run the server part on my own infrastructure? What if I have multiple pods for redundancy, and client web connections might get connected randomly to any of those pods? How would the pods all share state between each other?

paulgb a year ago

> What does "on the edge" mean in this context?
DriftDB has a concept of “rooms”, which are essentially broadcast channels. By “on the edge”, what I mean is that the authoritative server for each room can be geographically located near the participants in that room. In practice, today that means that it can be compiled to WebAssembly and run as a Cloudflare Worker.
> Can I just run the server part on my own infrastructure?
Kinda. It includes a server that runs locally, but it’s only useful as a development server at this point. Your question about multiple pods is exactly the reason -- unless you have a routing layer that is aware of DriftDB’s “rooms”, it won’t work if you scale it up. We also make https://plane.dev which provides the routing layer, but it might be overkill for a DriftDB use case.

bufferoverflow a year ago

SurrealDB was supposed to be a websocket real time DB, but it seems they never finished that websocket part.

Glad there's an alternative.

https://surrealdb.com/docs/integration/websockets

nextaccountic a year ago

I see you also created Plane https://plane.dev/ - what's the relationship between driftdb and plane? Can they be used together? Does one depend on another?

paulgb a year ago

The idea for DriftDB came from seeing people try to use Plane for things that it wasn’t intended for (e.g. simple WebSocket backends). DriftDB doesn’t depend on Plane if it’s hosted on Cloudflare. For deploying on your own infrastructure, Plane could be useful, but there are a few pieces missing to do the integration.

Aldipower a year ago

How can something be real-time, if there is a websocket connection in-between. How do you ensure real time? In real-time applications response times must be guaranteed. Seems impossible to me with websocket connections.

paulgb a year ago

I mean real-time apps in the colloquial sense - applications where two people see the same state nearly instantly. In the strict computation sense, it’s true that you can’t guarantee an upper bound for delivery of a message. This isn’t just a limitation of WebSockets, it’s a limitation of TCP/IP, which don’t provide a way to reserve bandwidth along a path between hosts (IIRC).

quickthrower2 a year ago

Would be fascinating if you could build Jitsi like video ontop of this.

I think DB in the name is a little misleading due to there being no persistence (I assume?) but that is a small nitpick!

paulgb a year ago

> I think DB in the name is a little misleading due to there being no persistence (I assume?) but that is a small nitpick!
Yes, I feel a bit guilty about that part. When I started it the design looked more like a traditional key/value or durable stream database with real-time capabilities, but over time I realized that the use cases I had in mind usually didn’t actually need long-term persistence. The DB stuck, partly because it turns out if you add “db” as a suffix it’s a lot easier to find available package names and domains :). If it’s any consolation, I still do intend to support persistence eventually.
- quickthrower2 a year ago
  
  Thanks for the reply. Sounds like a neat bit of infrastructure. Well done for getting it done! I almost want to create a project as an excuse to use it ha ha! Also naming stuff is hard of course.

ocimbote a year ago

Plane.dev is mentioned.

Has anyone experience with it? It seems quite interesting but I need more opinion on what they call "backend sessions"...

avinassh a year ago

This is really cool! But how are conflicts handled?

paulgb a year ago

As far as the server itself is concerned, it’s just a broadcast channel with replay and compaction capabilities, so it’s not directly concerned with conflict resolution. You could use it as a broadcast channel for CRDTs if you wanted to.
The useSharedState react hook is more opinionated, it uses last-write-wins semantics in the case of a conflict. The useSharedReducer hook’s behavior on conflict is up to the reducer provided.

stmblast a year ago

This is really cool!

Looking forward to seeing how this progresses.

arthurcolle a year ago

This is kinda cool. What needs to happen for persistent state? That would be really nice.

jaime-ez a year ago

for those interested in open source websocket servers checkout deepstream.io ... data persistence, subscriptions, rpc calls, authorization, permissions, custom connectors..basically everything you need to develop an app.

atentaten a year ago

Can this be used in the Dart/Flutter world?

paulgb a year ago

The server itself speaks a very simple WebSocket protocol[1], so it could be used by anything that can speak WebSocket.
The JS/React bindings that implement the actual data sync patterns (shared state, shared reducers, presence) haven’t been ported to Dart (yet?) though.
[1] https://driftdb.com/docs/api
- rgbrgb a year ago
  
  > presence
  Congrats on the launch! You have a pointer to docs about presence? Use-case is an ephemeral chatroom where I want to show who's online.
  
  paulgb a year ago
  
  Good catch, this should be in the react docs but it's missing. Until then, it's pretty simple. You call `const presence = usePresence({})` and pass in any data you want, and the `presence` value that gets returned is an object that maps client IDs (a unique string for each client) to the values that they passed in to `usePresence`.
  Here's an example from the voxel demo: https://github.com/drifting-in-space/driftdb/blob/af64f62b29...
  And from the canvas demo: https://github.com/drifting-in-space/driftdb/blob/af64f62b29...