mertleee 2 days ago

s2 is one of the coolest technologies that more people need to be talking about - I'm still begging them to move one layer lower -> turning s2 into an incredible middleware for edge IOT deployments!

PLEASE if someone from the team sees this - I would pay so much for a ephemeral object store using your same edge protocol (seen in the sensor example from your blog).

Cheers!

  • shikhar 2 days ago

    Hi mertletee, I'd like to understand the request better, mind dropping me an email? It's in my profile

onethumb 3 days ago

This looks super interesting for single-AZ systems (which are useful, and have their place).

But I can't find anything to support the use case for highly available (multi-AZ), scalable, production infrastructure. Specifically, a unified and consistent cache across geos (AZs in the AWS case, since this seems to be targeted at S3).

Without it, you're increasing costs somewhere in your organization - cross-AZ networking costs, increased cache sizes in each AZ to be available, increased compute and cache coherency costs across AZs to ensure the caches are always in sync, etc etc.

Any insight from the authors on how they handle these issue on their production systems at scale?

  • trueismywork 2 days ago

    Not the author but. Its a user side read through cache, so no need for pre-emptive cache coherence as such. But there will be a performance penalty for fetching data under write contention irrespective of whether you have single az/multiple AZ. The only way to mitigate the performance penalty here is to have accurate predictive fetching which works for usage patterns.

  • immibis a day ago

    Not the author, but my suggestion is to use a real infrastructure provider. You will save tons of money.

_1tan 3 days ago

Can someone explain when this would be good solution? We currently store loads of files in S3 and directly ingest them on demand in our Java app API pods. Seems interesting if we could speed up retrievals for sure.

  • thinkharderdev 2 days ago

    The basic tradeoff is that you are paying an extra tax on all requests that are not served by the cache, so you something like this would help if you are reading the same data repeatedly. So, for example, a database built on object storage or something like that.

  • Havoc 2 days ago

    If you can have the cache onsite then it'll likely benefit many things just by virtue of not going through a slow internet link

OutOfHere 2 days ago

Frankly, any web app I develop has configurable in-memory caching built in to it, so I would rather increase its size than add an extrinsic cache. By keeping my cache internal to my application, it's also easier for me to invalidate keys accurately.

  • perbu 2 days ago

    It's about scalability. If you have 100 instances you really want them to share the cache so you increase hitrate and keep egress costs low.

    • OutOfHere 2 days ago

      > If you have 100 instances you really want them to share the cache

      I think that assumes decoupled compute and storage. If instead I couple compute and storage, I can shard the input, and then I won't share the cache across the instances. I don't think there is one approach that wins every time.

      As for egress fees, that is an orthogonal concern.

denis_dolya 2 days ago

An interesting approach to caching with hybrid memory and disk and support for any S3-compatible backend, but limitations may arise when working with large data streams and specific backends.