points by antirez 7 years ago

What we are doing instead: just started a Redis Proxy project (designed by Fabio and me, coded by Fabio Nicotra, so we started yesterday). We will start abstracting a Redis Cluster this way so that you can talk with to it like it was a single instance, but this is not enough. We also are planning for:

1. Tooling to make it simpler to start a Redis Cluster in a single host if all you want is to leverage all the cores.

2. Smartness in the AOF rewriting so that instances on the same host will do a better use of the disk.

3. A caching mode for Redis Cluster. This is useful for deployments where most accesses are idempotent and you have for instance a cluster of 10 physical hosts running, for instance, 32 Redis instances each. When an host is no longer reachable we don't want failover in that case, but just reassignment of hash slots to the other masters. Think at consistent hashing but on the server side.

Two years ago I had this private branch that I wrote in a few days that threaded Redis, with N event loops on N threads. After realizing that it worked and scaled, I considered how many bugs such Redis would have, how harder it would be to add features, and how incompatible is to have a single huge instance with Redis persistence method. In that moment and after consulting with other engineers I decided that for a few years our strategy will be Redis Cluster. Maybe I'll change idea in the future? Maybe, but first let's explore what can be done with a shared-nothing architecture that keeps the promise of continuing to deliver software that hardly crashes ever.

EDIT: Obviously the proxy itself is multithreaded :-D

EDIT2: Btw jdsully already sent a PR to the original Redis project, so I guess a smart developer exposed to the Redis core is great anyway!

jdsully 7 years ago

The problem with clustering still becomes lower queries per GB as instances can’t share data. Redis itself runs in RAM so storage is at a premium.

One of my main reasons for doing multithreading, and FLASH in the first place was to make Redis work well for much larger value sizes.

I really think we have different use cases in mind.

  • antirez 7 years ago

    You mean the use case where you have a single very "hot" key? Yes, that's a limitation, but in that case replicas can scale read load very well. But indeed this is a special case where threads are a better model: yet this is a very niche use case that anyway ends being inherently unscalable because the object to manipulate is a single one, so accesses end being serialized in one way or the other.

    I understand the point about Redis on Flash. Redis Labs is also doing a lot of things with this model. I believe it is super useful for some folks. Yet I believe what makes Redis "Redis" is that there are no tricks: it runs the same always whatever the access pattern is. I want to stress this part over the ability to work with bigger data sets. Moreover I believe that persistent RAM with huge amounts is in the horizont now.

    • derefr 7 years ago

      > yet this is a very niche use case

      I've always thought that one of the sensible basic data structures that Redis could support operation on is an (int->int) digraph.

      There are a lot of things you can build on top of "digraphs as a toolkit object"—just see the lengths people go to to get PostGIS+pg_routing installed in Postgres, just to do some random digraph queries to their (non-geographic) RDBMS data.

      IMHO digraphs very similar to Redis Streams in that sense: they're something that might seem like a single high-level "use-case" at first glance, but which is actually a fundamental primitive for many use-cases.

      However, unlike a hashmap, list, set, etc., a digraph is necessarily—if you want graphwise computations to run efficiently—one large in-memory object with lots of things holding mutual references to one-another that can't just be updated concurrently in isolation. But, it is likely that people would demand multiple writers be able to write to disjoint "areas" of the graph in parallel, rather than having to linearize those writes just in case they end up touching overlapping parts of the graph data.

      If I understand Redis Streams, it was created because there's no real need for things like Kafka to do their own data-structure persistence. Data-structure persistence is not the "comparative advantage" of a thing like Kafka; network message-passing with queuing semantics is. High-level infrastructure pieces like Kafka should rely on lower-level infrastructure pieces like Redis for their data persistence, and then they can just do what they do best. Is that accurate?

      If so, then I would argue the same thing applies to graph databases: persisting and operating on data-structures is not really their job; it's just something they do because there isn't anything to delegate that work to. Redis could be such a thing. But not if it's single-threaded.

      This is, just from the top of my head, a use-case (set of use-cases?) that needs threading (or at least "optimistically threaded, with fallback to linearization") for writes. And it's not really all that niche, is it? Not any more than message queues are niche.

DrJosiah 7 years ago

I think this is an interesting idea.

What I did a year ago was just make Redis cluster able to do Lua scripting transactions on arbitrary keys across the cluster. With rollbacks, like my 2015 patch.

https://news.ycombinator.com/item?id=16294627

Instead of reinventing a bad wheel, you could have something better.

  • nilkn 7 years ago

    This sounds really interesting. Is this work available publicly? Could you comment on how you did this and what the pros and cons were (surely a compromise had to be made somewhere)?

    • DrJosiah 7 years ago

      It's not not available for public consumption. After finishing, I realized 2 things: SSL/TLS was necessary for nontrivial cluster solutions, and very few people use Redis Cluster in practice (the multi-key issues being the primary reason).

      In 2015 I released a patch to allow for transactions with rollbacks in Lua scripts; it used dump/restore to snapshot keys. This idea was extended to be the cluster solution I released last year. The primary compromise is that as implemented, it more or less relied on the equivalent of a temporary key migration. You want a transaction on server X to involve keys from servers X, Y, Z? You migrate the keys to X explicitly, run your transaction, then migrate them back.

      Caveat: big keys are slow to snapshot, migrate, prepare for rollback, etc.

      I've got incoming changes to my own fork that allow for fast state transfer for large structures (useful on multiple machines, or just heavily threaded + transactions), more or less eliminating the caveats. But that's also not yet publicly available, and I've spent 5-6 of the last 7 weeks sick (maybe 4 full workdays in that period), so I don't have an ETA on any of it.

e12e 7 years ago

Any reason you need more than ha proxy? (assuming you get the part about spinning up on a server for multicore support out of the way?)

  • antirez 7 years ago

    Redis Cluster can be abstracted away only understanding the Redis protocol and how clients should communicate with the different Redis instances. It's not just a matter of sending TCP data to multiple ends.

rainhacker 7 years ago

Is abstracting a master-slave-sentinel setup also in the scope for the proxy or only Redis Cluster ? For use case where replicas are used to scale reads, a proxy shipped with Redis could be useful.

xfalcox 7 years ago

The proxy will still not allow to run a LUA script that span keys that live in different hosts right?

  • antirez 7 years ago

    Right, the usual Redis Cluster limitations initially but for MGET that probably will get support via multi-fetch.

    • tyingq 7 years ago

      These honest, technically correct answers where you acknowledge limitations, but highlight pluses are terrific.

      A really nice contrast to the typical hand-waving, defensiveness, etc, I get from a lot of vendors, product owners, etc.

ohnoesjmr 7 years ago

Is it the same proxy that is used in redis enterprise today?

  • antirez 7 years ago

    Nope, a different project created from scratch from the OSS project. Yet sponsored by Redis Labs like all the rest of the developments Fabio and I are doing.