LunaSea 10 hours ago

I would disagree on the fact that a table without a primary key is a critical problem.

There are multiple reasons for tables not having primary keys. Log tables are one example.

Excessive sequential scans is also not a problem for small tables.

  • davidw 10 hours ago

    This looks like it's targeted at finding some obvious things, and if you know your table doesn't need a primary key, you could always exclude it from the report.

  • somat 8 hours ago

    Their are many good reasons to always have a primary key, even if it is just an automatic serial number, but the one that hit me personally is that it is surprisingly difficult to deduplicate a relational database.

    When I was first learning SQL I was pretty firmly in the "use natural keys" department. And when the natural key was every single column I would go "whats the point?" shrug and have no primary key. Until I started getting duplicated rows

        insert into customer_email (name, address) values ('bob', 'bob@bobco.com');
        insert into customer_email (name, address) values ('bob', 'bob@bobco.com');
    
    Duplicate rows a. tend to mess up your query results and b. are surprisingly difficult to remove. If I remember correctly after spending far too long trying to find a pure sql solution I ended up writing a program that would find the duplicates, delete them(all of them as there is no way to delete all but one) then re insert them. and adding that missing primary key.

    I still like natural keys more than I probably should. (you still need a key to prevent functional duplicates, even when using a surrogate key, why not cut out the middle man?) But am no longer so militant about it(mainly because it makes having dependent tables a pain)

    • paulryanrogers 7 hours ago

      I'm a fan of always including unique indexes in the DB, even if it must exclude soft deleted rows. At a minimum it can keep functional duplicate out. Those seem especially insidious when there are races.

    • ahachete 5 hours ago

      Using natural keys is what actually can prevent duplicate rows. In your above example, if email is the PK, there would be no duplicates. But adding an id as a PK would essentially keep your database with duplicates:

      (1, 'bob', 'bob@bobco.com')

      (2, 'bob', 'bob@bobco.com')

  • 1a527dd5 6 hours ago

    Logical replication requires a primary key. We found that out the bad way. It _is_ a critical problem.

    • ahachete 5 hours ago

      Logical replication does NOT require a primary key. It requires either a primary key, a unique index or to define a replica identity.

      Sure, that still boils down, in most cases, to having a PK (replica identity is normally not a good idea), but there are cases where this would not be the case.

  • PunchyHamster 8 hours ago

    Log tables with pkey on date can be searched faster in typical log table use of "tell me what happened in this time range", tho of course you have to make it unique

    • s1mplicissimus 6 hours ago

      I'd rather logs not fail because for some weird reason the date of 2 records is exactly the same. Time savings adjustments / clock drift correction sound like the most obvious candidate to produce such a case. Granted, chances are not high, but I'd usually prefer knowing that the chance of it failing for this reason is 0.

    • saurik 8 hours ago

      1) But with quite a hefty penalty on writes... I'd think you would be better off without a primary key and just using a BRIN index for that use case?

      2) Even if you did really want that B-tree, you can still have it and not have to have to awkwardly make it unique if you don't make it a "primary" key.

sumibi 10 hours ago

This is inspired by the first responder kit from Brent Ozar for SQL Server, which is an amazing tool for accidental DBAs that do not know where to start optimizing.

I'm looking forward to trying this out on my postgres databases.

  • rapfaria 4 hours ago

    Damn, those are some 10k loc sql files, 3k+ pull requests. Talk about battle-tested.

    Hopefully this grows to be postgres equivalent

evanelias 7 hours ago

> Unused Large Indexes - Indexes consuming significant disk space but never used (>10MB, 0 scans)

Is this a typo? I would think that 10MB seems ridiculously small for a threshold here.

ahachete 5 hours ago

Does the unused index feature look into replicas? (I guess it doesn't) It's risky to delete an unused index by looking at a single instance, since it may be used by other read replicas.

RedShift1 9 hours ago

Why are indexes on foreign keys required? If I'm doing a join, it's going to select the primary key of the other table, how will an index on the foreign key help?

  • formerly_proven 8 hours ago

    Referential integrity checks by the DB engine (e.g. when deleting from the foreign table) require reverse look-ups of foreign keys, which would necessarily become full table scans without an index. Apart from that, applications also often do look-ups like this.

netcraft 13 hours ago

Very nice!

Did you consider making this a view instead? Just curious if there is a reason why you couldn't.

  • singron 11 hours ago

    I'm not the author, but I think you could by using UNION ALL instead of temp tables. You could also make a view that just calls this function. I'm not sure why it would matter though.