bastawhiz 15 days ago

It's a little ironic that a startup that bills itself as

> Detect and prevent performance regressions before they make it to production with continuous benchmarking

Would ever write

> I got feedback from a user that their Bencher Perf Page was taking a while to load. So I decided to check it out, and oh, man were they being nice.

I get that this is an edge case, but it doesn't feel awesome that they had such a painful performance issue that went undetected!

  • epompeii 13 days ago

    Definitely ironic and embarrassing!

    I figured I could either hide in shame or share some hard learned wisdom, and I opted for the latter.

pizza234 15 days ago

1. clickbat (reminds patent descriptions ;)); the article itself is about routine query optimization

2. there is no "materialized view" - the view created is a standard one; subquery materialization != materialized views. see wikipedia article (https://en.wikipedia.org/wiki/Materialized_view) to find what materialized views are

chx 15 days ago

That bonus bug could be read as the perfect case to argue against using an ORM

> .left_join(schema::alert::table.on(view::metric_boundary::boundary_id.eq(schema::alert::boundary_id.nullable())))

instead of

> LEFT OUTER JOIN alert ON (alert.boundary_id = metric_boundary.boundary_id)

  • klabb3 15 days ago

    Is that from an orm? Looks like a query builder to me.

    The main thing you left out is the type conversions with plain sql. So a fair comparison would include the e2e flow, imo.

    That said, your point is very much valid because of the nested parentheses - I think the main issue is that SQL is an “infix”(?) dsl? It’s strange because Rust has perfectly good proc macros for this type of stuff.

  • epompeii 13 days ago

    If anything, this experience has made me appreciate Diesel even more!

    Having strongly typed SQL has prevented these sort of bugs from happening at compile time. The only reason I ran into trouble here was that I went into `unsafe` land without proper precaution. That's on me, not the ORM.

  • orthoxerox 15 days ago

    metric_boundary.as("m").join(alert.as("a"), $"a.boundary_id" === $"m.boundary_id")

    Scala is sometimes dangerously syntax-sweet, but Spark does some things right.

speed_spread 15 days ago

Any story about making something 1200x faster is going to be about something being 1197x too slow to begin with. It's like celebrating a performance failure.

  • epompeii 13 days ago

    I think it was definitely unoptimized to begin with. The major slow down didn't happen until the model evolved though, as I mention in the Background section. Not catching that change was certainly a performance failure!

  • sharpshadow 14 days ago

    Exactly! If one can optimise for 1200x it was bad from the start.

chmaynard 15 days ago

Nice writeup!

I much prefer using SQLite via its CLI (sqlite3) as opposed to embedded SQL statements via an API. Much easier to read and understand what's going on.

magicalhippo 15 days ago

The culprit for the slowdown is this section, reformatted to be readable:

    LEFT OUTER JOIN (
      boundary 
      INNER JOIN threshold ON (threshold.id = boundary.threshold_id)
      INNER JOIN model ON (model.id = boundary.model_id)
      LEFT OUTER JOIN alert ON (alert.boundary_id = boundary.id)
    ) ON (boundary.metric_id = metric.id)
From what I can gather, it does a full scan on "boundary" table so it can do the inner joins before doing the left outer join against "metric".

The author then introduces the "metric_boundary" view which does the left outer join between "metric" and "boundary". However the SQL using the "metric_boundary" view is not shown, and from the query planner results I can't see that the new SQL is equivalent?

That is, without the "metric_boundary" view, one immediate thought on how to optimize the original query is to avoid doing the "threshold" and "model" inner joins, instead flattening the query:

    LEFT OUTER JOIN boundary ON (boundary.metric_id = metric.id)
    LEFT OUTER JOIN threshold ON (threshold.id = boundary.threshold_id)
    LEFT OUTER JOIN model ON (model.id = boundary.model_id)
    LEFT OUTER JOIN alert ON (alert.boundary_id = boundary.id)
This should allow the database to just search on the primary keys, similar to the final result in the article. However, this is not equivalent. It will include values from "boundary" even if there are no matches for "threshold" and "model", unlike the original query. In the original, "boundary" rows will not be returned unless there's also linked rows in "threshold" and "model".

However, from the query planner results after introducing the "metric_boundary" view, from what I can see the author has in effect done just the same kind of flattening?

    |--SEARCH boundary USING INDEX sqlite_autoindex_boundary_2 (metric_id=?) LEFT-JOIN
    |--SEARCH threshold USING INTEGER PRIMARY KEY (rowid=?) LEFT-JOIN
    |--SEARCH model USING INTEGER PRIMARY KEY (rowid=?) LEFT-JOIN
Am I missing something here? Did the author not really need to reject "boundary" table rows which didn't have any corresponding rows in "threshold" and "model" table, ie didn't really need the inner joins, and didn't realize this?
larschdk 15 days ago

I thought this would be about optimizing or tuning SQLite settings, but it's about query and schema optimization.

zb3 15 days ago

Are there materialized views in sqlite?

  • eddd-ddde 15 days ago

    Not like in postgres, but you can do stored computed values.

  • fforflo 15 days ago

    Not really. In this context they're a hint to the optimizer to evaluate the CTE result store=materialize it somewhere and use that snapshot when the CTE is queried subsequently.

    The alternative means the CTE is copy-pasted essentially and them evaluated.

    The closest one can get to materialized views in SQLite is probably with CREATE TABLE AS. Drop and recreate.