btilly 2 years ago

I think a few more concrete use cases would help.

First, a key limitation that every architect should pay attention. Redis reaches the limits of what you can do in well-written single-threaded C. One of those limits is that you really, really, *really* don't want to go outside of RAM. Think about what is stored, and be sure not to waste space. (It is surprisingly easy to leak memory.)

Second, another use case. Replication in Redis is cheap. If your data is small and latency is a concern (eg happened to me with an adserver), then you can locate read-only Redis replicas everywhere. The speed of querying off of your local machine is not to be underestimated.

And third, it is worth spending time mastering Redis data structures. For example suppose you have a dynamic leaderboard for an active game. A Redis sorted set will happily let you instantly display any page of that leaderboard, live, with 10 million players and tens of thousands of updates per second. There are a lot of features like that which will be just perfect for the right scenario.

  • koolba 2 years ago

    > One of those limits is that you really, really, really don't want to go outside of RAM. Think about what is stored, and be sure not to waste space. (It is surprisingly easy to leak memory.)

    You can have massive amounts of RAM these days. You’re sooner to hit big-O limits from bad architectural decisions than run out of memory. If you do get to that point you likely have enough value in your usage to justify scaling out further and sharding.

    > And third, it is worth spending time mastering Redis data structures.

    Bingo. The true secret to properly using Redis: understanding the big-O complexity of each operation (…and ensuring that none of your interactions are more than logarithmic).

    • btilly 2 years ago

      You can have massive amounts of RAM these days. You’re sooner to hit big-O limits from bad architectural decisions than run out of memory. If you do get to that point you likely have enough value in your usage to justify scaling out further and sharding.

      Absolute disagreement.

      It is very easily to accidentally leak a few hundred MB per week in a busy Redis system. The code will look and work fine...at first. It is correspondingly hard to track down and clean up the leak a few months later. (Particularly if there are multiple such to track down.) Yes, you can go for years just buying larger and larger EC2 instances. But that will also come with a shocking price tag.

      I know of a number of organizations that this happened to. And pretty much every bad Redis story I hear about had this as a root cause. That is why I brought it up as an important consideration.

      • jasonwatkinspdx 2 years ago

        Yes, this matches my experience.

        Redis excels as a memcached alternative with some useful operations. Where people get into trouble with redis is treating it as a persistent data store, when despite it's ability to replicate and persist, redis has some constraints you need to work within. At best think of redis as something that can hold a materialized view, but where it can become corrupted at any random time, so you'll need the ability to rematerialized it from something else. And second, you absolutely have to be conscious of how close you are to ram limits.

      • renonce 2 years ago

        Redis is production-ready and it has a lot of features to help you track down problems with either memory or CPU usage. For example: `redis-cli --bigkeys` will help you find the very large keys. For smaller keys that occur too often, sampling a few hundred keys should be sufficient to help you find what type of keys are taking more space than necessary.

        Once you get the Redis database designed well, there is a lot of things you can do before hitting the limit where you can't install any more RAMs onto a new machine. For example, there are no more than a billion .com domains out there. Say a single record takes 100 bytes on average, consisting of the domain name and a glue record pointing to the IP of its authoritative DNS server. Then it takes just 100GB of memory to store enough information to handle all queries to .com domains in the world. It's not so hard to obtain a machine with 768GB memory these days, and 2TB machines are not uncommon.

        And if you worry about the price tag - don't use EC2. You can rent a 1TB RAM dedicated server at https://www.hetzner.com/dedicated-rootserver/ax161/configura... for $600 per month. At Scaleway you can rent it for $1000 per month: https://www.scaleway.com/en/pricing/?tags=baremetal,availabl.... AWS is notoriously hard to be made cost effective.

      • jbboehr 2 years ago

        You can also "leak" rows in a traditional RDBMS or even a filesystem. Why is this particular notable for Redis?

        • sidlls 2 years ago

          Redis starts to have issues at high scale, even on sophisticated hardware, that can be quite difficult to debug without a lot of additional effort and storage. It’s not just memory, but odd behavior (e.g. randomly dropped connections) with a lot of connected clients, or hot keys/nodes in a cluster configuration, etc.

          These issues can exist in any system, but in my experience it’s especially tough (relatively) to identify and diagnose them with Redis. Once you add lua script usage it can get even worse.

        • btilly 2 years ago

          A traditional RDBMs or filesystem is designed for high throughput and concurrency, even if some tasks are blocked on data. Additionally both have options to partition steadily growing things. If needed with old partitions being moved to tape backup while the server continues running.

          Redis is a single threaded program acting against RAM whose philosophy is that it does things fast then moves to the next job. If it needs to access memory that got paged to disk, the whole server stops and waits to get it. Nobody can do anything.

          Because Redis doesn't have to deal with locking and concurrency, it can run much faster on the same resources. But when concurrency is required, it is stuck because it doesn't have it.

    • itake 2 years ago

      > You can have massive amounts of RAM these days.

      True, but I am finding that balancing CPU and RAM can be tricky. Slapping 128GB on a 1-core machine means you quickly have CPU limitations.

      • tomnipotent 2 years ago

        Redis is single-threaded and will have no problem saturating a 10G NIC with a single socket.

        • itake 2 years ago

          My concern is how fast it takes a CPU to scan through all of that memory.

          • tomnipotent 2 years ago

            What "scanning"? That's not how memory access works in a K/V store, and Redis does very little work that demands much of the CPU.

            • mnutt 2 years ago

              There are workloads that will saturate a redis instance's CPU: using it as an LRU cache, eventually you will hit the configured memory limits and adding new keys will require finding old keys to delete. Eventually it may also require redis to do memory defragmentation which can be fairly intensive.

              • tomnipotent 2 years ago

                > There are workloads that will saturate a redis instance's CPU

                I might imagine this scenario if you're excessively using smembers and a few other slow ops, but I have yet to see CPU issues outside of bad eval's.

                > require finding old keys to delete

                LRU/LFU eviction is not particularly CPU intensive.

                > redis to do memory defragmentation which can be fairly intensive

                Active defrag has relatively negligible overhead, and assuming jemalloc even more so.

                • mnutt 2 years ago

                  Nothing but lots of small (~100b) pipelined SETs and a small number of GETs here and there. Only 10MB/s but at 100k SETs/sec redis’s CPU core sits at 60-70%. Active defrag can easily send it into a death spiral.

    • googletron 2 years ago

      > understanding the big-O complexity of each operation (…and ensuring that none of your interactions are more than logarithmic).

      This is a good idea, maybe a prompt for another post.

  • LewisVerstappen 2 years ago

    > Replication in Redis is cheap. If your data is small and latency is a concern (eg happened to me with an adserver), then you can locate read-only Redis replicas everywhere. The speed of querying off of your local machine is not to be underestimated.

    Do you face any consistency issues with doing this?

    • btilly 2 years ago

      No. Replication time was measured in hundredths of a second, and Redis operations are atomic. So all queries got a consistent view of the data, and the lag to update was very reasonable.

      • YetAnotherNick 2 years ago

        It depends on definition of consistency, but it is not strongly consistent in theoretical terms[0]. But the ordering of update is guaranteed to be same, so if master is guaranteed to be internally consistent, so is the replica. And that property is enough for almost all usecases, except for maybe transactions.

        [0]: https://redis.io/docs/manual/scaling/#:~:text=Redis%20Cluste....

        • adra 2 years ago

          During partions you may as well throw the play book away. You could have minority writes on both sides of the cluster and a big nadda to reconcile the two when they're mended. Redis is a great ssytem for what its built for and for the trade-offs that it makes to keep itself fast and lean. Redis is not CP and it will probably never care to support it. If data resiliency and correctness is important to you, Redis alone isn't sufficient. Several years ago, we tried sentinels mostly to avoid large costly rebuilds when an instance went down, and though it usually worked just fine, we certainly had single network disruptions large enough to throw off the cluster enough that required a manual rebuild.

      • anonymousDan 2 years ago

        So in other words, potentially yes since there is some lag :)?

        • btilly 2 years ago

          For that application, there really wasn't. The results of the read were not used for writes, and the latency from when information was published to available was on par with the time a request to the master would have taken. The time from data published to available was faster than the time to switch tabs in a browser and manually check.

          But your requirements will depend on the application. Financial transactions need explicit locking logic and atomic operations. Such as is provided by SELECT ... FOR UPDATE in SQL. So another application could have more demanding requirements. Which is why, in addition to answering whether I encountered problems, I gave the actual performance characteristics. So that anyone planning their application can know whether this is a good enough solution for you.

dtertman 2 years ago

At desktop resolution, the floating table of contents menu blocks out two of the (excellent) illustrations (second and second-last). Deleting aside.toc was very helpful.

  • dsmmcken 2 years ago

    Yes, I would suggest increasing z-index on images so they pass above the toc. Adding a large dropshadow to the images the same color as the background would make it look like it fades out as it passes by. That's what I did for our blog that has a similar floating TOC + images that escape the text width.

googletron 2 years ago

My personal nightmare happened and accidentally published a rough draft! It has since been updated! Apologies!

  • lalwanivikas 2 years ago

    I haven't read it, but the presentation is really beautiful! Is it possible to make it printer friendly by any chance? Default browser PDF is a mess. Long form content like this is much easier to read printed.

xnorswap 2 years ago

> Send 1KB over a 1GBps network

This is said to have a 10μs latency in the chart. But I'm fairly sure that is a calculation of bandwidth based on 1KB / 1GBps

10μs is about 3Km, so at most a 1.5Km round-trip.

For a chart labelled latency, I'm surprised to see bandwidth calculations included. Any network hop would actually have far greater latency, if nothing else because communication typically involves more than a single round-trip for acknowledgement, etc.

It might be worth making it clear some of the numbers are about bandwidth not latency.

  • foota 2 years ago

    Distance is of course a factor, but at fixed distance size matters a lot, and most applications are at more or less a fixed latency.

rfrey 2 years ago

A question so noob I'm almost shy to ask it:

The simplest scenario in the article is a single Redis instance residing on the same machine as the application. What's the benefit to this versus just storing data directly within the application?

  • Groxx 2 years ago

    Storing it in a different process at the very least lets you restart/deploy changes to/crash your application without losing that data. Or building your own replication/in-place upgrader/well you're just screwed if it crashes.

  • halotrope 2 years ago

    Sometimes your application has multiple instances even on the same machine. Scripting languages like Node or Python don‘t share memory. With Redis they can share state in a high-performance manner.

    • intelVISA 2 years ago

      Database-as-IPC makes me feel uneasy... even if it's kinda OK with Redis.

  • piaste 2 years ago

    Your application and runtime are probably tuned to act as servers, with short-lived requests and little or no persistent stage, and they may not play well with keeping a bunch of persistent data around forever.

    I personally first reached for Redis when I needed to asynchronously process a bunch of JSON uploaded by clients via POST. I initially just stuck them in a ConcurrentQueue in memory, but no matter how much I fiddled with HostedServices and BackgroundWorkers and whatever the MS documentation recommended, the ASP.NET Core app would occasionally 'lose' that queue before it could be consumed (or the consuming loop would get stuck, with the same result).

    You are also probably running your app on a pretty high-level language, with bytecode and reflection and all that nice stuff - if not even an interpreted language - while Redis is raw C code and will outperform your homebrew double-linked list or hash set.

  • louissm_it 2 years ago

    Storing the data directly inside the application still means you need to store it somewhere, likely a SQL database (such as PostgreSQL). These databases are insanely well engineered and very very fast, but compared to a key value store such as Redis and Memchached they are comparatively slow and resource hungry (because they are optimized for different things).

    So if you can fetch some cached data from a Redis key, even if on the same machine, it will cost you significantly less than querying a relational database.

  • halukakin 2 years ago

    Not all applications can store data out of the box. For instance some ways of PHP have embedded caches, some others don't have cache by default and you would need to install cache software (for instance apcu). Also, redis has many different types of data. For instance coding something similar to its "hash" data type is not trivial.

  • radicalriddler 2 years ago

    Redis persists on disk (well, it's optional), if you restart your server I'd assume that it'd be able to restore the disk data into memory, versus your applications memory, which would just be lost.

    I'm not a Redis user, but that's based on what I've read

  • ok123456 2 years ago

    Short lived processes/workers.

_gmnw 2 years ago

Really love this style of writing. Pairing the diagrams/illustrations with the easy to grok copy is really helpful for folks like myself who have been mainly focused on the front-end.

What tool do you use for your diagramming, is it all hand-drawn?

  • dsmmcken 2 years ago

    Font for the handwriting is Skippy Sharp, incase anyone else was wondering.

  • googletron 2 years ago

    Its hand drawn with some fonts for the titles.

    • stjohnswarts 2 years ago

      This is what I do for my presentations (on a wacom of course). I have gotten grief over it, but I work faster and get less distracted by the eccentricacies of powerpoint and figma. My handwriting is abysmal so I will do that in a "handwriting font" to sort of look hand drawn. Even if I have to convert them later for some big wig, at least I have my "rough draft". Plus it all feels a little more human.

googletron 2 years ago

I wrote a little post on how Redis works and its various setups! How does everyone setup Redis? Elasticache is a good answer too :P

  • tpmx 2 years ago

    [Potentially inaccurate content removed by author]

    • xnorswap 2 years ago

      It's a new article so it's relatively easy to explain:

      HN automatically combines submissions so that subsequent submissions count as upvotes for the first submission.

      If a popular source posts a new article, users will "rush" to post it to HN to reap that sweet karma and the winner will "catch" the upvotes of the others.

      • tpmx 2 years ago

        That could explain it. Thanks!

        Is this behavior documented anywhere on news.ycombinator.com?

        • mindcrime 2 years ago

          There isn't a ton of documentation per-se about HN behavior. There is:

          https://news.ycombinator.com/newsguidelines.html

          and

          https://news.ycombinator.com/newsfaq.html

          and a handful of posts by dang, sama, pg, etc. over the course of the years. most of the rest is what long-time users have just figured out through observation. There's a Git repo[1] out there that aggregates a lot of that stuff, but keep in mind that it's technically unofficial. That said, I think most of what's there is widely considered to be correct.

          [1]: https://github.com/minimaxir/hacker-news-undocumented

          • tpmx 2 years ago

            Thanks, that's a good summary of what I've seen referenced throughout my years here.

            I can't find any reference to something like "combine the scores of new submissions of the same URL to the first submission's score" though. I guess that's either new information or incorrect.

            • mindcrime 2 years ago

              I can't find any reference to something like "combine the scores of new submissions of the same URL to the first submission's score" though. I guess that's either new information or incorrect.

              I think that falls into the "noticed through observation" bucket. I'm relatively sure that it is correct, as I've noticed that behavior myself. But, I have no official standing here and I could be totally wrong. But that sure seems to be what happens in my experience.

              • tpmx 2 years ago

                So you may have misunderstood your observations, just like I maybe did.

                • mindcrime 2 years ago

                  That's absolutely possible. This particular pattern has seemed pretty consistent over the years, but unless somebody from the HN admin crew chimes in, I guess we'll never be 100% sure.

            • manigandham 2 years ago

              You can try this yourself. Go to the ‘new’ page and submit an existing URL. You’ll be redirected to the existing post which will now have at least one more vote.

              • tpmx 2 years ago

                which will now have at least one more vote

                Not automatically.

                • manigandham 2 years ago

                  What's that even mean? Did you observe something different or are you just arguing?

    • _gmnw 2 years ago

      The saltiness isn't a good look here. Especially seeing as he's not the poster.

      It's the HN algorithm which is probably due to the fact that other posts from his domain have done relatively well, plus the actual poster here has quite a bit of karma.

  • secondcoming 2 years ago

    We use both MemoryStore and normal instances. The latter for a use case where the data is shardable and so we run a redis process on each core and the client picks the right one. It saves a lot of money over using MemoryStore.

    It also saves you from Google performing maintenance on the machine and deleting all your Lua scripts.

    KeyDB is becoming increasingly popular though.

    The biggest problem with Redis, at least in C++ land, is the client libraries. hiredis doesn’t support Redis Cluster, and other 3rd party clients that do are of unknown quality.

  • bcjordan 2 years ago

    I've been using UpStash's serverless Redis offering and it's worked super well for my needs. Scales to zero/free which was nice for getting started, and using their http SDK didn't need to worry about concurrent connection limits when calling from simultaneous cloud functions. & not a second of downtime in the few months I've used it so far.

    Want to move more of my app's datastore to Redis now that I've learned more about sorted sets etc.

    • tpmx 2 years ago
      • bcjordan 2 years ago

        To be clear, I am not affiliated with that web service other than as a now happy paying user. I only replied with my experience getting started running Redis on it since that was GP's question and I found it useful while first learning Redis and now in production.

xnorswap 2 years ago

I'm not too familiar with redis and this may well help, so thank you.

I see some data-types on the right. It surprises me that redis doesn't have a numeric data type. I understand that at its heart it is just a key-value store and doesn't ever need to do range-based lookup but it still surprises me.

One consequence of "everything is a string" I've run into (although probably a sign I'm "doing it wrong"), is serialisation overhead in the client.

If redis is expecting strings then it's left to the client to choose an appropriate serialisation which can have either performance or other pitfalls.

  • voxic11 2 years ago

    Numbers in redis can be natively represented using BITFIELDS.

    > BITFIELD player:1:stats SET u32 #0 1000

    1) (integer) 0

    > BITFIELD player:1:stats INCRBY u32 #0 -900

    1) (integer) 100

    > BITFIELD player:1:stats GET u32 #0

    1) (integer) 100

    • xnorswap 2 years ago

      OK, that's helpful thank you.

      That said, all the keys themselves are still strings and therefore you can't have a SET of numbers or bitfields.

  • morelisp 2 years ago

    How would a native number type avoid some serialization overhead that using e.g. 4 byte BE keys yourself must pay?

jrm4 2 years ago

As someone who doesn't code for a living but teaches it to mostly novices, this helps (because before this I had no clue what it was except that it had something to do with databases.) Typically for my courses we just use some flavor of SQL and call it a day (and that kind of spoils us because of how declarative it tends to be) -- roughly, what's the "explain like I'm 10" use case for Redis over something else? From what I'm seeing, it's mostly an "efficiency" thing?

  • tmpz22 2 years ago

    The traditional line of thinking is:

    * you're building a web-ish application and need to store session data

    * you don't want to go through the overhead of building a strongly typed relational table

    * you know minimal operations stuff

    * just use redis, its easy to deploy, easy to code for, and available on all major cloud platforms as a managed service

    ---

    The problem is there are tradeoffs and session storage becomes a fundamental architectural decision once your application matures. So something you added as a once-off so you can get back to feature development is now a foundational pillar.

  • hobs 2 years ago

    I have worked at places where every page load hits the database, and we've scaled ok, mainly because it was b2b stuff.

    However a simple redis instance in front of the database serving as a readable cache changes the rules of the game significantly - depending on the complexity of your calculation and your end result subsequent "page loads" or whatever you are doing can be tens of thousands (or more) times as efficient, and if you decide to use an expensive database or a cloud database this can help you a lot.

    Eventually the hard part is you might have bugs in synchronizing the state of redis and your database, look to existing implementations for your stack instead of reinventing the wheel.

  • avmich 2 years ago

    Relational databases are optimized for typical operations over data structured in tables. So, joins and records. However sometime you want something simpler - like LIFO queue - and wouldn't mind to have is faster. Redis allows to have this - the variety of data structures it has is much bigger than with relational databases. They (Redis and RDBs) both have their uses, of course. Ideally you would structure your system to use one of them where appropriate according to data requests.

  • TheBlight 2 years ago

    If you're not super concerned about reliability but really need speed. That's when Redis really makes the most sense, IMO.

  • notyourday 2 years ago

    Blazingly fast serialized access to the shared data structures over the network by multiple writers and readers.

theden 2 years ago

This is great, the visual explanations work really well

One thing that threw me off is that it says for an SSD a random read is 150μs, but 1MB sequential read is 1ms? Shouldn't sequential reads be faster, or are two different read sizes being compared or something? If so, the ambiguity may confuse some people to think random reads are faster

  • sharikous 2 years ago

    My interpretation is that 150us is the minimum latency no matter what for any size, since the seek time is provided for comparison for HDs

  • darkcha0s 2 years ago

    Well I'm guessing that it's referring to the fact that 1MB sequential is essentially a bunch of random reads?

    AFAIK, on SSD's there is no concept/guarantee that blocks are adjacent, so a sequential read is just a bunch of random reads.

    • jasonwatkinspdx 2 years ago

      The way the Flash Translation Layer works is complicated, but long story short, there's still an advantage to sequential reads and writes on SSDs. The difference in latency and throughput isn't as dramatic as with spinning disks, but is still there. Random vs sequential writes have big implications for the long term health and performance of the SSD.

omarshammas 2 years ago

Very informative and love the illustrations.

I'm building a new website and am using sidekiq for background job processing which relies on redis behind the scenes to store all the job data. I configured a high availability redis instance with `maxmemory-policy noeviction` to ensure no data is lost.

The website is still in its infancy so not thinking about scale for the next little while but curious if you have any tips or gotchas to keep an eye out for. Thanks!

  • googletron 2 years ago

    I would ensure that the data size is managed if you hit the limits due to your policy Redis will stop responding to ensure the data it has isn’t lost. I would also turn on some sort of persistence for data recovery in case of catastrophic failure. Early on this is totally fine and I would setup some monitors in redis data size relative to memory and try to keep 20% overhead weird things start to happen when systems are memory constrained.

topspin 2 years ago

I am thinking of using Redis as a lightweight queuing mechanism. An event source will MULTI a small amount of metadata as a hash and append a list. Event sinks will BLPOP the list and retrieve and delete the metadata key. One requirement is the events survive power loss.

Is there anything inherently wrong with this? Gotchas? A mockup I've done works great so far.

  • renonce 2 years ago

    In case the event sink crashes or the connection to Redis is lost, you could lose events. Redis Streams are better designed for use cases where more reliable delivery is needed and have a ton more features, though it comes with more complexity.

  • zo1 2 years ago

    RabbitMQ. It's so cheap and easy to startup a super performant queuing broker with docker these days. And the libraries are all there, async ready and with established patterns. Closest to zero code you can get for this. You'll likely end up reimplementing all those patterns and support around them using redis.

    If you want something quick and easy and dirty, go with Redis. But switch to Rabbit when you start having to write a lot of handling and other code.

its_bbq 2 years ago

I've been looking into tech stacks to make a collaborative editor and Redis CRDTs come up a lot. IIUC this requires a Redis db running in each users machine and they connect P2P with each other. Do I understand right? Anyone have good resources for this? I've also seen Riak come up as an alternative. Do they work similarly?

witnesser 2 years ago

The only person stand out to witness a use case is a adserverer, I read the 1st 100lines of comments. It is like california highway system particularly when I witnessed, the billboard is very outstanding. The jams an pits, people are very nice to them.

  • witnesser 2 years ago

    The above is just random comment. So I have a long time question, how cache miss is handled.

anton96 2 years ago

Very interesting.

This is leading me to think, using redis as the sole database is very tempting but the Ram requirement is making me think twice.

Wouldn’t there be a database like redis that only stores the latest data into memory and keeps the rest in an AOF file ?

  • remote-dev 2 years ago

    Not to make this an ad, but you can actually do better with Redis Enterprise using Redis on flash (part of the flexible and annual plans). It stores hot data in RAM and "warm" data in flash. Here is a good 68s video on the subject: https://www.youtube.com/watch?v=hFQnhPstqLM

claytn 2 years ago

It's not clear to me why it makes sense to use both RDB Files and AOF on the same Redis instance. Seems like AOF would always be the more accurate source of truth here. What am I missing?

Great article though!

dsmmcken 2 years ago

Beautifully presented, worth reading just for the illustrations.

Havoc 2 years ago

The white cube in the traditional usage example - what does that represent? App code? Or so that cache miss to db implemented in some standardized way?

thecosmicfrog 2 years ago

Excellently written and illustrated! We've just added Redis to our platform so guides like this are a fantastic resource. Thanks!

groffee 2 years ago

It's a good article, but a couple of hopefully constructive points

1, .toc-wrap covers the image on desktop

2, the image is way too busy, there's too much going on

ptbg 2 years ago

I love this style of post. You cover a wide range of topics in an easy to understand way. Keep up the great work!

didip 2 years ago

By the way, if you want a painless multi master Redis, simply install KeyDB.

I am not affiliated, just a happy user.

tunesmith 2 years ago

In what cases would one want to pick memcached instead of redis?