Broker-Side SQL Filtering with RabbitMQ Streams

zbentley 2 days ago

What an incredibly useful feature. Besides the obvious developer experience benefits, it’s huge for network-bound use cases: really heavily optimized uses of RabbitMQ (or less-optimized uses with really big message payloads) end up bottlenecked or paying lots of money for broker network capacity, since a message’s bytes must cross the wire 2 or more times (publish, consume, maybe replication) for it to be processed. Moving filtering logic to the consumer side helps a lot with that—but workloads should still use separate queues/topics/streams instead whenever they can, of course (I’m sure there will be some one-topic-for-everything abuses enabled by the combination of poor architectural foresight + SQL filtering, but such is life).

I am confused, though: why does the bloom filter … er, filter still need to be manually specified by the consumer (filterValues in the example Java)?

As far as the broker filtering query evaluation logic is concerned, bloom-filter enabled fields are just indexes; why can’t the SQL-filter query planner automatically make use of them?

I’m probably missing something, but it seems like a very light query plan optimization pass would not be hard to implement here; there’s only one kind of index, and it can only be used with equality comparisons, so it doesn’t seem like the the implementation complexity would be too bad versus needing a fully general SQL optimizing plannner.

zbentley 2 days ago

One possible drawback of this kind of system is performance (or broker CPU) getting dragged down by crazy/bad filtering queries.

Normally, those issues are solved the usual way (monitor, identify, fix). It’s rarer to see systems that proactively detect/reject costly arbitrary queries when they’re issued, though.

Proactively detecting potentially bad SQL queries in RDBMSes relies on table statistics (can’t be known for streams) or query text/plan analysis heuristics (hairy, subjective/error prone).

But it just occurred to me: could RabbitMQ’s choice of Erlang enable the easy rejection of query plans above a certain cost?

Could the BEAM be easily made to reject a query plan (assuming the plan—or a worst-case version of it at least—can be compiled into a loopless/unrolled chunk of BEAM bytecode ahead of time) with a reduction count more than a user specified threshold?

That might be interesting, if possible. Most runtimes don’t have user-surfaced equivalents of reduction counts, so there might be some mechanical sympathy in RabbitMQ’s case.

4ndrewl 2 days ago

I guess there's some effect on the broker side wrt resources or efficiency, but I couldn't immediately see anything about this.