Speeding up sort performance in Postgres 15

321 points by todsacerdoti 3 years ago

dewey 3 years ago

There's a lot of interesting improvements in PG15, some of my favorites from https://www.postgresql.org/about/news/postgresql-15-beta-1-r... are:

>PostgreSQL 15 introduces the jsonlog format for logging. This allows PostgreSQL logs to be consumed by many programs that perform structured log aggregation and analysis.

>pg_basebackup, a utility used to take full backups of a PostgreSQL cluster, now supports server-side compression using Gzip, LZ4, or Zstandard compression.

>This includes the introduction of parallelization for SELECT DISTINCT statements and improvements in performance to window functions that use row_number(), rank(), and count().

>PostgreSQL 15 builds on its existing support for the SQL/JSON path language by including more standard SQL/JSON functions. These include SQL/JSON constructors, query / introspection functions, and the ability to convert JSON data into a table.

thejosh 3 years ago

I'm also super excited for the points you raised, looks like some great features being released, I really like that JSON is becoming a first class citizen.
I'm also looking forward to the \copy improvements, especially around loading data into PG.
BitPirate 3 years ago

I think they missed one of the biggest changes: ICU collations can finally be set as default for clusters and databases.
mchusma 3 years ago

I use both SELECT DISTINCT and row_number a lot, so excited for that as well.
- 1500100900 3 years ago
  
  SELECT DISTINCT is rarely the best idea. In most cases it means someone is using joins to filter out rows that should have been removed with EXISTS (a semi-join).
  
  sa46 3 years ago
  
  Interesting, I hadn't thought of it that way. Looking through our codebase:
  - We get a lot of mileage out of DISTINCT ON to get the most recent version of a row for some subset of columns. I think this a different use case than what you refer to.
  - We typically use DISTINCT to get PKs for a table when starting from another table. Like, find all user IDs that commented on a given story by looking at the join table.
  SELECT DISTINCT user_id FROM story_comment WHERE story_id = 4
  
  1500100900 3 years ago
  
  - "most recent" is usually best done with a max() with an adequate index or, when that's not possible, with a LATERAL JOIN, an ORDER BY and a LIMIT 1
  - SELECT id FROM users WHERE EXISTS (SELECT 1 FROM story_comment WHERE story_id = 4 AND story_comment.user_id = users.id)
  
  richbell 3 years ago
  
  Can you provide an example of how you'd rewrite a query using SELECT DESTINCT with EXISTS?
  
  LunaSea 3 years ago
  
  As far as I know, nested loop semi joins using EXISTS aren't always capable of using indexes.
  Like in the case of a GIN indexed array intersection.
baggiponte 3 years ago

That’s fascinating! Totally orthogonal: how does PG store data? Do they use columnar specifications such as Arrow & Parquet?
- chrsig 3 years ago
  
  The main table is stored in a 'heap', which is more similar to the dynamic allocation region of a program rather than the data structure.
  If it needs to write to the table, it has a free space map listing all of the available fixed size pages. Each page can hold some number of fixed size tuples.
  I'm surprised they've gotten so far with this. It seems ripe for fragmentation issues. I'm surprised they don't at least have a b+tree implementation to provide clustering based on the primary key. They have a b+tree implementation for indices, but not for tuple storage. IIRC, there's a feature to allow fixed size data row to be stored directly in the index, to help alleviate the need to look in the heap table. Perhaps that's what's kept the heap as viable?
  Variable length strings are kept in a separate file (TOAST tables), referenced by the tuple. This keeps records fixed size, and allows for circumventing disk reads for string data for queries that don't require them. I'm not quite sure how the allocation scheme or data structure works there. My (very speculative) guess is that they're stored as a linked list, and on write, the allocator tries to give contiguous pages.
  fwiw, I don't use postgresql on a regular basis, I've just spent some good amount of time squinting at the file format and related code, soley due to my own curiosity. It's been a while, so it's very possible that I'm misrepresenting or misremembering something. Someone has already posted their documentation on the file format -- it's a good read if you're interested.
  
  uhoh-itsmaciek 3 years ago
  
  >Variable length strings are kept in a separate file (TOAST tables), referenced by the tuple. This keeps records fixed size...
  That's not entirely true: variable-width row values under a size threshold are stored inline (possibly compressed).
  The beauty part about TOAST is that they're mostly just regular tables (any table that needs its data TOASTed gets its own TOAST table). You can look them up in the system catalogs and query them directly.
  The schema for these is (chunk_id, chunk_seq, chunk_data). Chunk "pieces" are limited in size so they can be stored inline (obviously there's not much benefit to a TOAST table itself having a TOAST table), so any row values that need to be TOASTed may also be broken up into multiple pieces. Row values stored inline in the "owning" table reference a chunk_id where the TOASTed data is stored, and queries that need to return that data look up all the chunks for a chunk id, ordered by chunk_seq.
  You're right that this scheme avoids reading TOASTed data when you're not actually using it (a good reason to avoid "SELECT *" if you've got big TOASTed data).
  
  tomnipotent 3 years ago
  
  > but not for tuple storage
  Both tables and indexes use the same b+-tree implementation based on Lehman and Yao (prev/next links on internal nodes).
  > Each page can hold some number of fixed size tuples
  Variable size.
  > Variable length strings are kept in a separate file
  Only if the row exceeds the page size, otherwise everything is stuffed into the data leaf node.
  > I'm not quite sure how the allocation scheme or data structure works there
  Values are broken up into DataSize/PageSize "virtual rows" and stitched back together, but everything is still in the same slotted page structure.
  
  SigmundA 3 years ago
  
  >Both tables and indexes use the same b+-tree implementation based on Lehman and Yao (prev/next links on internal nodes).
  Tables in Postgres are not btree's they are unordered heaps. The row id in the b-tree index points to a page number + item index in the heap.
  A true clustered index would simply expose the index itself as a table without secondary heap storage, reducing I/O and storage space. Row visibility seems to complicate this in PG as index only scans may require checking the heap for visibility.
- uhoh-itsmaciek 3 years ago
  
  There's a pretty good high-level overview of the implementation details in the official docs: https://www.postgresql.org/docs/current/storage.html
- paulryanrogers 3 years ago
  
  It's row based with versions kept along side current data. Later to be vacuumed away. Unlike a redo log based approach.

kabes 3 years ago

A couple of months back there was a post from someone who made a presentation about a lot of things that are 'bad' in postgres (mostly performance-wise) and a github repo that tried to fix that (in first instance with patches and the idea was to later have it either upstreamed or as an addon).

I can't find anything of this effort anymore. Does someone know what I'm talking about and what the current status of that is?

EDIT: found it: https://github.com/orioledb/orioledb

I'd love to hear what PG devs here think of his criticism and proposed solutions.

fuy 3 years ago

Agree with you - would love to hear some of the core PG devs to pay attention and discuss this work.
- ConnorLeet 3 years ago
  
  I think that this repo was created by a core PG dev.
  https://github.com/orioledb/orioledb/commits?author=akorotko...
  https://github.com/akorotkov
  
  fuy 3 years ago
  
  that's the point - my understanding is that he created it as a result of reluctance of core pg team to make progress on these more low level and fundamental problems. I'd be interested to know what core pg people think of it - whether it's just too costly to implement these enhancements in pg, or they think these are just not desirable/correct per se.
canadiantim 3 years ago

Yeah I’ve been following orioleDB as well. I think the guys behind it are also going to be offering a managed version of it too. I fully intend to use it in my next project.

nhoughto 3 years ago

Interesting to see there is still some more to squeeze out of the old girl yet. Most seem like things that could have attempted before now but I guess weren’t the top pain points to fix, I wonder how many more 4-6% savings there are in there, good to see people going deep and keeping the improvements coming.

mattashii 3 years ago

I'm currently working (on and off) on improving the performance of specific "kinds"/"shapes" of btree indexes by several %s (depending on shape it can be tens of percents). I hope and expect that it'll be ready for Pg16, but that of course assumes that the concept behind the implementation is acceptable.
See https://www.postgresql.org/message-id/flat/CAEze2Wg52tsSWA9F... for specifics
- mchusma 3 years ago
  
  Thanks so much for your contributions! The community for PG is so great.
jeffbee 3 years ago

Anyone who is using a distro package of pg has the largest opportunity because they are all built wrong. A peak-optimized binary will get you much more than 4% speed up over generic packages.
- Bootvis 3 years ago
  
  Can you please elaborate, what’s wrong and what do I need to do when building from source?
  
  adgjlsfhk1 3 years ago
  
  compile with --march=native can often be significant.
  
  jeffbee 3 years ago
  
  Compiling for your target is good, not using the PLT for mem* functions helps a ton, feedback-directed optimization and LTO and/or post-link optimization helps the most. For CPU-bound workloads of course.
  
  mayama 3 years ago
  
  Will native march option work in case of cloud where you're not in control of cpu?
  
  jeffbee 3 years ago
  
  I haven't used a cloud where you don't dictate the exact CPU you want. EC2, GCE, and Azure all provide specific guarantees. Even if you wanted a generic any-cloud binary you would probably compile with -march=skylake or, if truly paranoid, sandybridge.
  Many distro packages are, by contrast, compiled for k8-generic. Freaking Opteron!
  BTW "native" is not really recommendable. One really ought to target a particular platform.
  
  brianwawok 3 years ago
  
  Do google and AWS hosted postgres do this? I assume, but I am not sure how to check.
  
  jeffbee 3 years ago
  
  I don't know but it would be out-of-character for Google to have a C program that wasn't optimized to the hilt.
- brightball 3 years ago
  
  Using PG for years and this is the first time somebody made me aware of this, so thank you.
hinkley 3 years ago

In the long tail it’s more a matter of choosing a module or concern and squeezing a percent here and a percent there until it adds up to 5%, rather than looking for tall tent poles. Also counting calls for a fixed workload, and making sure this call stack really should call functionB nine times for three results or if that should be six or three. Function is still exactly the same cost but is now 1% of the overall cost instead of 3%.

maccard 3 years ago

> The change made here adds a set of new quicksort functions which suit some common datatypes. These quicksort functions have the comparison function compiled inline to remove the function call overhead.

Stuff like this frustrates me. This is what generics are for in most languages. Looking at the commit[0], this is a _perfect_ example of where a sprinkling of C++ features (a very basic template) would outperform the original C version. It's also super disappointing that in 2022 we're still manually marking 10- line functions used in one place and called from one site as inline.

[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

JacobiX 3 years ago

Just to clarify, this is what C++ templates do, they can create specializations for you, because templates are processed at compile-time. But its not how generics works for most languages (runtime, type erasure, etc).
mort96 3 years ago

> It's also super disappointing that in 2022 we're still manually marking 10- line functions used in one place and called from one site as inline.
The `inline` keyword usually isn't necessary anymore to get the compiler to inline stuff, the compiler has some pretty good heuristics for determining whether to inline a function call or not. The `inline` keyword is an extra hint to the compiler, and affects linkage (i.e multiple object files defining the same `inline` function does not result in a linker error).
I don't understand what you're disappointed at, this all seems very reasonable.
- maccard 3 years ago
  
  I never said they used the inline keyword, you inferred that. If you look at the diff, they mark the fuctions as `pg_attribute_always_inline` which expands to __forceinline or __attribute__((always_inline)) inline depending on your compiler. These specifiers will remove any of the compilers heurestics [0].
  > I don't understand what you're disappointed at, this all seems very reasonable.
  I'm disappointed by the fact that we're still manually overriding compiler heuristics and manually writing faux copy-pasted generics for performance, when this is a solved problem in so many other languages.
  EDIT: I would _love_ to see some benchmarks without the pg_attribute_always_inline to see whether forcibly inlining the code is even necessary. In my (extended) experience with optimising C++ code, leaning on __forceinline is unnecessary in the vast vast majority of cases and should be used very sparingly.
  [0] https://docs.microsoft.com/en-us/cpp/cpp/inline-functions-cp...
  
  gpderetta 3 years ago
  
  If the profiler tells you to use force_inline, you use force_inline.
  
  maccard 3 years ago
  
  Given this is a blog post explaining their optimisations, you'd expect them to mention profiling with/without this attribute if they did it. Looking at the email thread, they share profiling results of before/after but there's nothing justifying the use of __forceinline in the thread or the blog post as far as I can see.
  
  tialaramex 3 years ago
  
  At least they measured, so that's immediately at least a B+. It is of course often hard to know whether anybody actually measured before doing their work or only afterwards to justify it but we should give benefit of the doubt.
  Not everything you tried is going to be worth explaining, and I get that their whole plan here is "forcibly inline this" so in C, where the compiler won't necessarily get your memo, it seems reasonable to explicitly say so.
  
  macdice 3 years ago
  
  FWIW I just removed pg_attribute_always_inline from those three comparator functions in tuplesort.c and it made no difference to the generated code under GCC 10 at -O2. If we can show that's true for Clang and MSVC too, maybe we should just take them out. Edit: But I don't really mind them myself, they do document the intention of the programmer in instantiating all these specialisations...
  
  maccard 3 years ago
  
  > FWIW I just removed pg_attribute_always_inline from those three comparator functions in tuplesort.c and it made no difference to the generated code under GCC 10 at -O2.
  I experimented with this on various projects and found the same even under lower optimisation levels. I'm definitely in the camp of "profile profile profile", and the guideline that I use is "if it has internal linkage, the compiler will figure it out. If it has external linkage, then I'm profiling".
  > they do document the intention of the programmer in instantiating all these specialisations...
  Is the intent of the programmer to write them inline, or for them to be inlined for performance? marking something as __forceinline does the former, but implies the latter, when it's not necessarily true to say inline == faster in every case!
- Sesse__ 3 years ago
  
  > The `inline` keyword usually isn't necessary anymore to get the compiler to inline stuff, the compiler has some pretty good heuristics for determining whether to inline a function call or not.
  Not really. The heuristics are pretty much based on a rough idea of function size (e.g. if it believes the call overhead is larger than just inlining the function, it will generally always be done if optimization is on), and/or functions it can prove is used only once (e.g. because they are static). That's it. There's no good idea of e.g. “this looks like a tight loop, and if I inline, I can maybe special-case away this cruft here”. The programmer's hints are very much relevant even in 2022.
  
  jeffbee 3 years ago
  
  They’re not. You are either using PGO and getting effective inline or you are just guessing.
  
  Sesse__ 3 years ago
  
  Or, just maybe, I am right.
  https://gcc.godbolt.org/z/qhPa1xxdq
macdice 3 years ago

Speaking as the author of that change, and also as a long time user of C++ (in other projects), I hear ya! But I don't have the power to make PostgreSQL switch to C++. Years ago, std::sort() vs qsort() was almost a standard example of how C++ can be faster than C.
src/lib/sort_template.h is an example of a pseudo template function, but we also have at least one pseudo template container in src/lib/simplehash.h.
And if you want to see some "policy based design" (C++ community term) in PostgreSQL C code, check out src/backend/exec/nodeHashjoin.c. It uses forced inlining to "instantiate" two different versions of the hash join executor code, one with parallelism and one without, so that we don't have all the "am I in parallel mode?" branching/code size in the generated code.
- maccard 3 years ago
  
  Thanks for replying!
  > I hear ya! But I don't have the power to make PostgreSQL switch to C++. Years ago, std::sort() vs qsort() was almost a standard example of how C++ can be faster than C.
  Indeed - re-reading my initial comment it sounds like that particular part of my comment was directed at you, but it's really directed at projects like Postgres. The std::sort vs qsort example is exacly what I mean!
bfgoodrich 3 years ago
> This is what generics are for in most languages.
That goes exactly opposite of this optimization, and is just entirely orthogonal.
> It's also super disappointing that in 2022 we're still manually marking 10- line functions used in one place and called from one site as inline.
You misunderstand the change.
In a nutshell the prior version was a polymorphic function pointer that varied by the sort type.
```
   (*sort_ptr)(data);
```
No compiler can inline this, no matter how aggressively you turn up the optimizations.
Their change adds specialized cases.
```
   if (dataType == PG_INT32) {
      sort_int32(data);
   } else if (dataType == PG_INT64) {
      sort_int64(data);
   } ... {
   } else {
      (*sort_ptr)(data);
   }
   
```
This enabled the inlining, and while it's no harm to add the inline specifier if you want to be very overt in your intentions, most compilers would inline it regardless in most scenarios.
- vlovich123 3 years ago
  
  In c++, the sort algorithm is polymorphic and inlined because it automatically generates those sort variants for you.
  Op is not trying to say “the code as is” but “written in c++ style”.
  I don’t think they misunderstood anything (and their beef with seemingly unprofiled use of always inline attribute is probably correct too)
  
  bfgoodrich 3 years ago
  
  > I don’t think they misunderstood anything
  The code could have made the type-specific sort macros. They could have literally written the code blocks in the actual sort function.
  Instead they decided to make it separate functions and mark it inline because that is their overt intention and they thought it was cleaner.
  What is the problem? Honestly, aside from noisy, useless griping, what is possibly the problem with that?
  Their implication seems to be that inline shouldn't be necessary. And the truth is that it almost certainly isn't necessary, and any optimizing compiler would inline regardless. But clarifying their intentions is precisely what developers should do: "We're moving it here for organizational/cleanliness purpose, but it is intentionally written this way to be inlined because that is the entire point of this optimization".
  As to the "unprofiled" use, their specific optimization was because they know the time being wasted in function calls, and the optimization was to avoid that.
  
  macdice 3 years ago
  
  (Author of the code here) Yeah :-)
- wahern 3 years ago
  
  > In a nutshell the prior version was a polymorphic function pointer that varied by the sort type.
  > (*sort_ptr)(data);
  > No compiler can inline this, no matter how aggressively you turn up the optimizations.
  LTO can inline these just fine, provided the object files contain the necessary info (i.e. the IR or assembly from static archives, which is how -flto works for clang and GCC). I even tested this using a copy[1] of OpenBSD's qsort implementation a few weeks ago, verifying that both the sort algorithm itself as well as the comparator were inlined, just as would happen with C++ template'd vector sorting.
  [1] I had to use a copy of qsort because the compiled libc qsort lacks the accompanying info necessary for LTO. But in theory libc and shared libraries in general could ship critical routines with the necessary metadata required for LTO optimizations. (IOW, a cross between a dynamic and static library.) Fundamentally this is just a toolchain issue, not a language issue. AFAIU, Rust code works much the same way, and its comparator functions tend to be inlined because Rust effectively compiles using LTO (and heavily relies on these implicit optimizations), except that because all Rust code is compiled statically, the necessary IR is always available. The same would be true of Go, but the Go compiler doesn't optimize as heavily. FWIW, AFAIU the C++ standard doesn't guarantee a templated sort comparator would get inlined, either, even if you're using a lambda. But this is optimization is extremely likely when the sort algorithm itself is template-expanded and the comparator is simple--as almost all would be.
- samhw 3 years ago
  
  >> This is what generics are for in most languages. > That goes exactly opposite of this optimization, and is just entirely orthogonal.
  What does that mean? Also, how is it "exactly opposite" and "orthogonal"? Maybe let's get rid of the mixed metaphors and speak in concrete terms.
  In your comment below ("why would you want the compiler to do that? you can write out all the monomorphizations yourself!") I'm not totally certain that you understand the point of generics. You can always write out each case yourself. Incidentally that also applies to your precious inlining. But it's a tool to make programmers more productive.

MR4D 3 years ago

It is this type of analytics and discipline that I love about the PG team.

It takes a lot of dedication and patience to do, but I wish more software teams did this. Perhaps schools should put more emphasis on this type of analysis.

Maybe then Call of Duty would load in a reasonable time. (sorry, pet peeve of mine)

maccard 3 years ago

I work in video games and have done my fair share of perf work. Pretty much every game does this kind of work, and all the live service games have had a full team dedicated to ongoing performance work.
I could easily sit and throw shade that Twitter takes longer to show my timeline than most ps5 games do to boot up and load into game, but that doesn't help anyone.
- hinkley 3 years ago
  
  It's typical in non-gaming teams to have any instinct to optimize beaten out of you instead of guided.
  Knuth's comments about premature optimization have been so misquoted at this point that we've gone way beyond people using it as permission not to think, they also use it as a mandate to stop other people from thinking as well.
  Academics are particularly bad at falling back on aphorisms, so how one would change the curriculum to fix this would also involve changing the teachers, which as an automatic conversation killer.
  In my case I was keeping my own council before anyone tried to beat it out of me, and I learned to just keep things to myself. On the plus side I know dozens of ways to refactor for legibility and 'accidentally' improve performance at the same time.
  
  HideousKojima 3 years ago
  
  >On the plus side I know dozens of ways to refactor for legibility and 'accidentally' improve performance at the same time.
  My favorite is "That nested loop is kind of hard to read, how about making a dictionary/hashmap instead?"
  
  hinkley 3 years ago
  
  I'll split this operation into a variable so it's easier to debug. Oh, now the variable is defined outside of the loop. How did that happen.
  
  touisteur 3 years ago
  
  If your compiler is not doing expression hoisting at -O2 in 2022, we should (collectively) ask for a refund (I know I'd be opening tickets since we pay for compiler support).
  Of course, very often, doing that often improves readability so it might still be a good use if your time.
- zerocrates 3 years ago
  
  There's also the factor of what kinds of performance you care about and are thinking about, and what you're not.
  There was the memorable story shared around here a while ago of a player patching GTA V's live service online to eliminate multiple-minute load times [1]. I'm sure Rockstar had/has many people working on performance (the core game is a well-known good performer and now has been on three different console generations since launch) but just, not this.
  (Though to totally miss such truly extensive load times on your main revenue-generating component, that one's a headscratcher. Feels like something you'd end up addressing just for your own internal sanity. Maybe everyone just had monster workstations and didn't notice.)
  [1] https://news.ycombinator.com/item?id=26296339
- MR4D 3 years ago
  
  You make a good point, and I am aware of that. Frankly, the engineering that goes into any FPS (even the bad ones) is beyond amazing, especially when you add networking latencies to the complexity.
  I should have chosen websites for my rant, but CoD has been on my mind this week due to the weird way you have to keep getting into a game every time and go through what seems like multiple home screens in a way that is either a bug, or just poorly thought out. Either way, it is a weird waste of time, and annoying when you're trying to play two back-to-back games.

ollysb 3 years ago

It seems a little surprising that GIN indexes have never been improved to allow for efficient sorting. Is this just not in enough demand to receive any attention or is there a technical aspect that makes it particularly difficult?

mattashii 3 years ago

I'm not familiar with the full details, but GIN indeed has some issues that make it difficult to make presorting expensive:
It has a guarantee of only one item in the tree for each key. All table rows that share that key are stored under that one item, as one large value. This value can thus become really large (and is stored as a subtree in such cases).
To sort these tuples, you can either implement that as 'merge key/value pairs when keys are equal during the sort phase' or 'merge key/value pairs when keys are equal during the index load phase'.
The first requires handling of arbitrarily large KV pairs (potentially larger than the normal page size, which indexes generally don't handle well) and losing sort elements during the sort (due to merges of values, I'm not sure sort algorithms can work with that).
The second requires a significantly larger temporary space to store the temporary data (potentially several times the size of the original data).
Additionally, someone must first find this enough of an issue to try to fix this, and apparently nobody has yet thought it to be enough of an issue to take a stab at it.

gampleman 3 years ago

Could the specialisation of sort functions be further improved by using e.g. something like Radix sort for integers?

Jamie9912 3 years ago

Great stuff. What's the best way to update a major version of Postgres when it comes out?

mike-cardwell 3 years ago

https://www.postgresql.org/docs/current/upgrading.html
speedgoose 3 years ago

It's difficult to say without knowing your situation. Personally I would wait a few months at least to be sure that the biggest bugs are killed. Then if it's a new project or something not critical, I may upgrade without any precaution and hope for the best. For something more critical I would rely on the integration tests and manual testing in a pro-production environment for a few weeks. Before the final upgrade, I would finally do a few practice upgrade runs and be ready to rollback if needed.
- altendo 3 years ago
  
  > Personally I would wait a few months at least to be sure that the biggest bugs are killed.
  This is my default behavior with any database product. Solid advice.
dewey 3 years ago

This is just a data point but in my experience over the past few years Postgres updates are usually nothing to worry about. There's other systems where updates come with a lot more baggage attached (Like ElasticSearch) where you have to refactor schemas and stay up to date with the documentation way more than with PG.
- jon_richards 3 years ago
  
  Homebrew on mac has a long-standing upgrade bug with Postgres extensions like postgis. It uninstalls the old postgis before copying the data over and then gets stuck in a hard to recover state. I always assume I’m going to lose all my data whenever I update major versions.

darksaints 3 years ago

One of the things I have been consistently excited about was the new pluggable storage API, and I had followed a handful of projects that were working on plugins, like zedstore, zheap, and cstore. But now it seems like all of those efforts have stagnated. Any idea what happened?

ccleve 3 years ago

The difficulty is that the new table access method is very complex and hard to get right. I'm building something using a regular index access method now and it's hard enough. Once I get all that infrastructure working properly I may take another crack at a table access method.
For some purposes you can get away with a foreign data wrapper. FDWs are a lot simpler.
jfbaro 3 years ago

orioleDB is another one

JacKTrocinskI 3 years ago

They added the MERGE statement!

...and better defaults for the public schema.

Really looking forward to Postgres 15.

hinkley 3 years ago

I’m surprised they use quicksort. I would think in a database you’d value predictability over best case behavior.

Also is it a stable quicksort?

avar 3 years ago

A related discussion about inlining these sort functions, templating etc.: https://www.postgresql.org/message-id/flat/CA%2BhUKGJ2-eaDqA...

zeckalpha 3 years ago

One graph has sorting a 10GB table and refers to it as a “Large sort”… is 10GB large any more?

d_watt 3 years ago

Not for OLAP, but this seems to be about OLTP. If you're trying to sort 10GB of data for a traditional OLTP query, I would consider that "too large" and need of some data structure optimization.
Also it's specifically about queries where the sorted data is too big for memory:
> The final change is aimed at larger scale sorts that exceed the size of work_mem significantly. The algorithm which merges individual tapes was changed to use a k-way merge. When the number of tapes is large, less I/O is required than the original polyphase merge algorithm.
- hinkley 3 years ago
  
  It is a common complaint in discussions about sort algorithms that the case for sorting numbers is a degenerate case and 'real' cases involve at least pointer chasing, which can make the cost of compares to stop being constant time, or have a constant time that is greater than log(n) (or worse, n).
  Sometimes locality of reference may increase the total number of comparisons but decrease the average cost considerably. For practical applications like databases, it's the runtime that matters, not the theory.
dagw 3 years ago

"Large" in this context is stated to mean significantly larger than work_mem, which defaults to 4MB out of the box. Interestingly performance (relative to PG14) seems to get worse as work_mem increases and the ratio of table size to work_mem decreases. If this holds it should mean that for a fixed work_mem PG15 performance (again relative to PG14) should improve as table size increases.
- davidrowley 3 years ago
  
  (blog author here) Yes, thanks for raising this. I didn't manage to complete my analysis for the regression at 64MB work_mem before publishing the blog. I drew up my analysis on https://www.postgresql.org/message-id/CAApHDvqXpLzav6dUeR5vO... shortly after publishing. Basically it relates to the generation memory context having a slightly larger chunk header than the original aset context combined with the tuple width of my test table being exactly 64 bytes, which results in no power-of-2 rounding savings. This causes slightly fewer tuples to be stored per batch in PG15, which results in more batches (tapes) and slower performance. If I'd added another column to that table the results would have been very different. PG14 would have rounded the allocation size up to 128 bytes and only about half the tuples would have been sorted per tape. The allocation size would be 72-bytes in PG15 with an extra column in the test table.
- pigcat 3 years ago
  
  > performance (relative to PG14) seems to get worse as work_mem increases and the ratio of table size to work_mem decreases
  Do you have any source for this? This is the first time I've ever heard this and it seems contrary to everything I've read.
  
  d_watt 3 years ago
  
  It's in the article, the section being discussed in this thread: https://www.citusdata.com/blog/2022/05/19/speeding-up-sort-p...

nitinreddy88 3 years ago

"The first test I ran while working on reducing the memory consumption of sort improved performance by 371%"

This is impressive in terms of performance for SQL Server operations. Imagine the end performance benefits for analytical/warehouse model of queries.

mattashii 3 years ago

> SQL Server operations
I might me misunderstanding you, but this is a post about PostgreSQL, not Microsoft's SQL Server.
> performance benefits for analytical/warehouse model
That performance result was for a fully in-memory sort operation, which you are are less likely to see in data warehouses.
- dspillett 3 years ago
  
  > but this is a post about PostgreSQL, not Microsoft's SQL Server.
  I assume that what was meant is what I'd say as “SQL engine” or “SQL query runner”. It is unfortunate that MS has a habit of using common words in its product names. PG is a SQL server (no capital) in the sense that it is a server that processes SQL statements for retrieving and manipulating data.

aftbit 3 years ago

Ah yes, good old:

    # set client\_min\_messages TO &rsquo;debug1&rsquo;;

winrid 3 years ago

Awesome! Will have to benchmark against the latest Mongo.

The best solution however, if you can afford it, remains to have an index that stores the values pre-sorted. These changes are still cool, though.

mkurz 3 years ago

Great work!