I think the problem with the quote is that everyone forgets the line that comes after it.
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
vvvvvvvvvv
Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.
^^^^^^^^^^
This makes it clear, in context, that Knuth defines "Premature Optimization" as "optimizing before you profile your code"
@OP, I think you should lead with this. I think it gets lost by the time you actually reference it. If I can suggest, place the second paragraph after
> People always use this quote wrong, and to get a feeling for that we just have to look at the original paper, and the context in which it was written.
The optimization part gets lost in the middle and this, I think, could help provide a better hook to those who aren't going to read the whole thing. Which I think how you wrote works good for that but the point (IMO) will be missed by more inattentive readers. The post is good also, so this is just a minor critique because I want to see it do better.
Amdahl’s Law is the single best thing I learned in 4 years of university. It sounds obvious when spelled out but it blew my mind.
No amount of parallelization will make your program faster than the slowest non-parallelizable path. You can be as clever as you want and it won’t matter squat unless you fix the bottleneck.
This extends to all types of optimization and even teamwork. Just make the slowest part faster. Really.
While Amdahl’s Law is very important, its practical effects are very frequently overestimated, at least as frequently as Knuth is misquoted.
Simple problems, e.g. solving a system of equations, will usually include some non-negligible sequential part, which, according to Amdahl’s Law will limit the amount of speed-up provided by hardware parallelism.
On the other hand, complex problems, e.g. designing an integrated circuit, can usually be decomposed in a very great number of simpler subproblems that have weaker dependencies between them, than between the parts of a subproblem, so that by distributing the execution of the simple subproblems over parallel hardware that executes sequentially each subproblem you can obtain much greater acceleration factors than when attempting to parallelize the execution of each simple subproblem.
With clever decomposition of a complex problem and with good execution planning for its subtasks, it is much easier to approach the performance of an embarrassingly parallel problem, than when trying to find parallel versions of simple algorithms, whose performance is frequently limited to low values by Amdahl’s Law.
Amdahl’s Law frequently prevents you from reducing the execution time of some task from 1 minute to 1 second, but it normally does not prevent you from reducing the execution time of some task from 1 year to 1 week, because a task so complex to have required weeks, months or years before parallelization normally contains a great enough number of weakly-coupled subproblems.
I find Amdahl's Law very useful for parallelizable work as well.
Or at least, I find it trivially scalable to parallel case. Often it helped for me in modeling how interconnect would be a possibly limiting element for a task even for embarassingly parallel tasks.
Rather, than the slowest non-parallelized path. Ultimately you may reach maximum speed on that path but the assumptions that we are close to it often turn out to be poorly considered, or considered before 8 other engineers added bug fixes and features to that code.
From a performance standpoint you need to challenge all of those assumptions. Re-ask all of those questions. Why is this part single threaded? Does the whole thing need to be single threaded? What about in the middle here? Can we rearrange this work? Maybe by adding an intermediate state?
> It sounds obvious when spelled out but it blew my mind.
I think there's a weird thing that happens with stuff like this. Cliches are a good example, and I'll propose an alternative definition to them.
A cliche is a phrase that's so obvious everyone innately knows or understands it; yet, it is so obvious no one internalizes it, forcing the phrase to be used ad nauseam
At least, it works for a subset of cliches. Like "road to hell," "read between the lines," Goodheart's Law, and I think even Amdahl's Law fits (though certainly not others. e.g. some are bastardized, like Premature Optimization or "blood is thicker than water"). Essentially they are "easier said than done," so require system 2 thinking to resolve but we act like system 1 will catch them.
Like Amdahl's Law, I think many of these take a surprising amount of work to prove despite the result sounding so obvious. The big question is if it was obvious a priori or only post hoc. We often confuse the two, getting us into trouble. I don't think the genius of the statement hits unless you really dig down into proving it and trying to make your measurements in a nontrivially complex parallel program. I think that's true about a lot of things we take for granted
Well, sort of. The specific point Justice Oliver Wendell Holmes was making is pretty much the same as how it’s used today: some speech is so obviously harmful that it can’t be protected by freedom of speech. The problem is that Holmes was using it as an example of what even “the most stringent protection of free speech” would leave unprotected, and he went on to interpret the First Amendment rather less stringently, ruling that it didn’t protect anti-draft flyers.
Before optimizing, I always balance the time I'll need to code the optimization and the time I (or the users of my code) will effectively gain once the optimization is there (that is, real-life time, not CPU time).
If I need three weeks to optimize a code that will run for 2 hours per month, it's not worth it.
But by not optimizing, you don't grow your profiling/optimizing skills and you miss out on a reduction in how long optimizing takes you for future work. Therefore many more codes will not meet the threshold and your skills may not grow for a long time.
You couldn't know, but my job is 50% about optimizing computational workloads. But many times, when questionning my users, it happens that they want an optimisation for some code that will run 2 or 3 times. So eventhough they'll have to wait a week for the computation to run, it'll take me just as much time to make the optimisation run :-)
But if code happens to be used 10 times a week and takes about a day or two to run, it's a no brainer: spending a month optimizing for 10% speed increase is worth it !
The one question that needs to be asked is would the users run it more often if it didn't take so long? There is nothing a computer can do that a room full of clerks in 1800 couldn't do, but the runtime would be so slow (or the cost in clerks so high) that nobody dared ask those questions.
Exercise for the reader, given an unlimited budget to hire 1800's clerks how many FFS could you achieve running doom. (obviously the number is too low to make the game playable)
... to the tune of about 10 cents a second, yeah. If you can hire a dev team to optimize one of those out, you plausibly come out ahead (well, vs face value).
That's $6/minute, $360/hour. Granted, if you're using it at a scale approaching FTE costs, you probably want at least some reserved capacity and you're likely not paying face value, but then the comparison was never going to be rigorous anyway.
Working for a smaller digital agency, while using .NET Framework, and then moving to .NET Core (now just .NET) this savings was felt. Using a performance profiler and slightly complicating the code a bit (at least in terms of the documentation for the CMS we use), I was able to get the code in less than the minimum recommended requirements for the base CMS software, without having to alter the source of the CMS. I did this with the Umbraco CMS, but also started using it in 2013 when the documentation was lacking, so reading the source code was a requirement to getting as much as possible out of the software.
I didn't get that impression from their response. I mean I could be wrong, but in context of "use a profiler" I don't think anything you said runs counter. I think it adds additional information, and it's worth stating explicitly, but given my read yours comes off as unnecessarily hostile. I think we're all on the same page, so let's make sure we're on the same side because we have the same common enemy: those who use Knuth's quote to justify the slowest piece of Lovecraftian spaghetti and duct tape imaginable
Nearly all the awful spaghetti code I've seen started out as good code, but as requirements changed in unexpected ways fitting the new features of version 5.0 while keeping all the previous version features working resulted in a mess that nobody knows how to clean up without the expensive big rewrite.
This has been my experience as well. A real "the road to hell is paved with good intentions" situation. No one was intentionally acting malicious nor trying to make bad code, but the ultimate truth is that software "rots". We write for static environments, but the environment isn't static.
I'd expect even rather junior programmers to recognize that the more time you spend working on a project the more new and unexpected things you find. New features, better ways to implement things, whatever. And ultimately, we all look back at code we wrote a year ago and are going to say "what idiot wrote this garbage?" I mean, if you don't, it is probably a sign that you aren't improving. It's impossible to write perfect code, and even if it was, what would be "perfect" is a moving target. So either way, you should be writing better code today when compared to yesterday.
> without the expensive big rewrite.
While it isn't possible to completely put this off, I find there's a helpful tactic to reduce spaghetti-creep. "Maintenance is cheaper than repair." People say "don't fix what isn't broken" but I think that's wrong. You should "fix" things before they are broken. Because frankly, it is cheaper to replace your visibly degrading water pipe than it is to wait for it to burst. You not only have to pay for the new pipe, but all the damage it caused when it broke. Maintenance increases longevity and helps you monitor so you can replace things before they break. Doesn't matter if you're plumbing water pipes or plumbing spaghetti, the tactic is the same.
Just remember that there can be hundreds of bottlenecks. Your second slowest matters too. And sometimes there are different requirements - the UI needs to respond within 1/10th second (sometimes much faster, sometimes much slower). Users can often live with your main calculation taking 10x longer than the optimized version so long as there is fast feedback that you are working. Eventually they will want the calculations faster too, but taking a minuter off of those is less valuable than a few ms off of your UI feedback.
you're still releasing resources - so you might not become faster overall but you can compute more in the same time later if necessity arises (althougth that might be somewhat premature but can be good for library code - so it becomes more applicable in different environments)
and there are some rare but tricky scenarios like:
hardware is mobile phone:
app seem to be bottlenecked on arithmetics according to the profiler, so it feels obvious to start optimization there
in reality what happens - hardware has limit on power, so it can't give full power to CPU, GPU and memory all at the same time
since app uses too much memory - it has to redirect power there, also memory emits heat, so CPU becomes throttled
By optimizing memory, everything runs colder -> CPU gets more power, can run sustained higher frequencies - app becomes faster
This one is more dangerous, as there may be backend resources in use that could be optimized, which could drop costs drastically. This may not change anything for your users, but is definitely worth exploring when it comes to getting the most out of your investment.
His boss is essentially making the same argument as Knuth: Spend your time optimizing what benefits the most from optimization. Or in other words, prioritize, don't optimize blindly.
It's a single line phrase, I wouldn't interpret it too literally. Usually you got to be pretty liberal when interpreting those easy to remember lines. It's a lot harder to remember the literally multiple books we could fill when bringing up exceptions.
There is also the "death by a thousand cuts" kind of slowness that accrues over time and it doesn't really matter where you start peeling the onion and the part you started is rarely the best.
It is exactly this "lulled into complacency" that I rail against when most people cite that line. Far too many people are trying to shut down down dialog on improving code (not just performance) and they're not above Appeal to Authority in order to deflect.
"Curiosity killed the cat, but satisfaction brought it back." Is practically on the same level.
If you're careful to exclude creeping featurism and architectural astronautics from the definition of 'optimization', then very few people I've seen be warned off of digging into that sort of work actually needed to be reined in. YAGNI covers a lot of those situations, and generally with fewer false positives. Still false positives though. In large part because people disagree on what "The last responsible moment" in part because our estimates are always off by 2x, so by the time we agree to work on things we've waited about twice as long as we should have and now it's all half assed. Irresponsible.
I'm with you, and been on a bit of a rampage about it lately. Honestly, just too much broken shit, though I'm not sure what the last straw was.
A big thing I rail against is the meaning of an engineer and that our job is about making the best product, not making the most profitable product (most times I bring this up someone will act like there's no difference between these. That itself is concerning). The contention between us engineers and the business people is what creates balance, but I'm afraid we've turned into yesmen instead. Woz needs Jobs, but Jobs also needs Woz (probably more than the other way around). The "magic" happens at the intersection of different expertise.
There's just a lot of weird but subtle ways these things express themselves. Like how a question like "but what about x problem" is interpreted as "no" instead of "yes, but". Or like how people quote Knuth and use it as a thought terminating cliche. In ML we see it with "scale is all you need."
In effect, by choosing to do things the easy way we are choosing to do things the hard way. Which this really confuses me, because for so long in CS the internalization was to "be lazy." Not in the way that you put off doing the dishes now but in the way that you recognize that doing the dishes now is easier than doing them tomorrow when you 1) have more dishes 2) the dishes you left out are harder to clean as the food hardens on the plate. What happened to that "efficient lazy" mindset and how did we turn into "typical lazy"?[0]
One of the aphorisms I operate by is that when the order of magnitude of a problem changes, the appropriate solution to that problem may also need to change.
Here we are sitting at four to seven orders of magnitude separated from Knuth, depending on whether you mean number of devs or number of machines or size of problems tackled.
Size of machines is pretty amazing. The PDP-10s and 360/67s Knuth was talking about in 01974 were about 1 MIPS (with 32-bit or 36-bit operations) and could scale as high as 16 mebibytes of RAM. Today you can put 6 tebibytes in a 384-core two-socket AMD server that can do in excess of 10 trillion 32-bit operations per second: 6 orders of magnitude more RAM, 7 orders of magnitude more arithmetic.
But that's really understating the case, because those were large shared mainframes. Today's equivalent would be not a 2-socket rackmount server, or even a whole rack, but quite plausibly a whole data center, three more orders of magnitude. 9 orders of magnitude more RAM, 10 orders of magnitude more arithmetic.
Probably also worth mentioning that the absolute number of computers has also increased a lot; every USB-C power brick contains a multi-MIPS computer.
I agree that the number of devs has increased by only about four or five orders of magnitude. In 01974 I'd guess there might have been 50,000 programmers; my grandparents took programming classes around that time involving batch submission of punched cards, and that was a reasonably common thing at US universities. Today, Roblox has 380 million monthly active users; what percentage of them write their own games in it? And the popular programming environments Microsoft Excel and Google Sheets have over a billion users each.
> It is exactly this "lulled into complacency" that I rail against when most people cite that line. Far too many people are trying to shut down down dialog on improving code (not just performance) and they're not above Appeal to Authority in order to deflect.
Your comment reads like a strawman argument. No one is arguing against "improving code". What are you talking about? It reads like you are misrepresenting any comment that goes against your ideas, no matter how misguided they are, by framing your ideas as obvious improvements that can only be conceivably criticized by anyone who is against good code and in favor of bad code.
It's a rehash of the old tired software developer cliche of "I cannot do wrong vs everyone around me cannot do right".
Ironically, you are the type of people Knuth's quote defends software from: those who fail to understand that using claims of "optimization" as a cheat code to push through unjustifiable changes are not improving software, and are actually making it worse.
> "Curiosity killed the cat, but satisfaction brought it back." Is practically on the same level.
This is the same strawman. It's perfectly fine to be curious. No one wants to take the magic out of you. But your sense of wonder is not a cheat code that grants you the right to push nonsense into production code. Engineers propose changes based on sound reasoning. If the engineers in your team reject a change it's unlikely you're surrounded by incompetent fools who are muffling your brilliant sense of wonder.
> If you're careful to exclude creeping featurism and architectural astronautics from the definition of 'optimization', (...)
Your comment has a strong theme of accusing anything not aligned with your personal taste as extremely bad and everything aligned with your personal taste as unquestionably good that can only possibly be opposed if there's an almost conspiratorial persecution. The changes you like are "improving code" whereas the ones proposed by third parties you don't like suddenly receive blanket accusations such as "creeping featurism and architectural astronautics".
Perhaps the problems you experience, and create, have nothing to do with optimization? Food for thought.
> No one is arguing against "improving code". What are you talking about?
This is actually a frequent occurrence. But honestly it usually doesn't come with that exact same phrasing. It usually comes with "let's spend our time on this other thing that isn't important but is new therefore important". It's true, sometimes you need to triage, but there's a clear bias that once something "works" (no matter how poorly) there is much less incentive to make it work not poorly.
Some food for thought: your comment entirely relies upon wisdom of the crowds. Which I agree is usually a good bet to go with. But there are conditions where it fails, even at scale. It's actually fairly easy to upend. All you need to do is remove independence of actors. Which, you can probably guess is fairly common in our field. The more homogeneous we become the less reliable wisdom of the crowds is.
Plus, people have different experiences and different environments. What has been true is your experience need not be true in mine or any others.
I'll give you an example of irrationality here. I was working at a big tech company and as part of breaking down my project I was playing around with another part of the code. I more then doubled the prediction accuracy on customer data and made the code run faster and use less memory. Objectively better. It even happened "for free" because the tasks involved were part of my original job, though the goal was not explicitly. What happened? Last I know, the PR is still open. Boss never pushed for it because they were more invested in their other work that was yet to be completed but had more buzzwords. That work was only a bit more accurate than mine but 100x shower and required 10x more memory. I even told them what I did would work for the other one too, yet it never happened.
I've seen many instances like this. Sometimes people just don't want things to be better unless it's better in a very specific kind of way (and not necessarily via performance). Sometimes it's just politics.
On the other hand, I've seen very good teams where there's the exact opposite experience. It's a very common thread that those teams openly discuss and explain why a seemingly good idea is actually bad. Frequently, they'll let you have a go at it too, because it's a no lose situation. If you're right, we all win. If you're wrong there's an important lesson about the code that's leaned because there's hidden complexity that makes the seemingly good idea bad and it's often hard to explain. But you end up with a lot more knowledge about the code, which helps you in the long run
> This is actually a frequent occurrence. But honestly it usually doesn't come with that exact same phrasing. It usually comes with "let's spend our time on this other thing that isn't important but is new therefore important". It's true, sometimes you need to triage, but there's a clear bias that once something "works" (no matter how poorly) there is much less incentive to make it work not poorly.
Exactly. “It technically functions and therefore doesn’t need attention” has become an industry norm. There’s a massive bias towards piling on more features over making older ones good.
> “It technically functions and therefore doesn’t need attention” has become an industry norm.
Most commonly seen with the aphorism "don't fix what isn't broken." But I really hate that aphorism. There's some truth to it, but it is commonly used as a thought terminating cliche. If you see a rusty pipe you should sure as hell fix it. Sure, it isn't "broken" in the sense that it has burst and is leaking water everywhere, but it sure is a ticking time bomb. And boy, is it a hell of a lot cheaper to fix a rusty pipe that isn't leaking than to fix a burst pipe and also pay for all the water damage.
The aphorism misses that "maintenance" is much cheaper than "repair". The aphorism misses that you can get major cost savings by doing thing differently. Sure, it costs more in the moment, but are we trying to run a business paycheck to paycheck or are we trying to create something sustainable. I sure hope your business is thinking more than just 3 months out. "More expensive now" is far too common of an excuse, enough that I see it used to justify not implementing things that would have a ROI within 6 months! Which not doing anything with under a year ROI is absolutely batshit insane (unless you're a startup living on the edge or something, but I see this in big tech companies and I hope we understand that's the context here).
I became the SME for batch processing on this project. The whole project was a read heavy workflow that should have had all decisions made at write time but was so far in the weeds when I got there that I couldn’t pull it out. And everyone I convinced of this fact decided to work somewhere else instead of stay and help fix it.
But there were parts we could sort out at build or deploy time, and I was able to fix a number of those.
One guy was tasked with a project I got turned into an epic: build fallback error pages at deploy time and push them into CDN. I told him not to build it the way he was, copy the one I had been working on. I got busy and he left, and we discovered that they hadn’t been updating when a customer changed their contact info and noticed months later that we still had the old info.
The build trigger had no downstream dependencies and there was no alert being sent for failure. The job was timing out for reasons similar to the ones I’d fixed. I tried to bandaid it still errored out an hour into what looked like at least a 100-120 minute workload.
I discovered the main problem was that our customers could have multiple domain names and he was frobbing through three or four service calls trying to find the canonical one, and making one call per customer to see if they were still customers (we had an endpoint that would paginate all active customers, so ~400x more efficient and rising due to turnover).
The main table I had been using had one column I didn’t use, which had a url in it. I confirmed with our SME that this was in fact the canonical domain name, I just had to add the field to the QL and do a `new URL(url).hostname`. At the end of the day I ended up extracting about half the code from my tool, deleting over half of his code and replacing it with calls into mine (do it like I did it, but literally).
Four and a half minutes, and no more intermittent failures, because my version was built on back pressure to throttle requests under high server load, and total outbound requests around 1/8 of the original.
This just kinda made me think, are there less war stories being shared on HN? It sure feels like it. But what got me thinking is how sharing these types of war stories is an effective way to teach and share with one another. Maybe it's just my bubble, but I feel like I hear them infrequent. Anyways, this was a good one. Thanks for sharing.
A lot of people are still thinking of Knuth's comment as being about just finding the slow path or function and making it faster. What Knuth has talked about, however, and why any senior engineer who cares about performance has either been taught or discovered, is that most real optimization is in the semantics of the system and - frankly - not optimizing things but finding ways not to do them at all.
Tuning the JSON parser is not nearly as effective as replacing it with something less stupid, which is, in turn, not nearly as effective as finding ways to not do the RPC at all.
Most really performant systems of a certain age are also full of layer violations for this reason. If you care about performance (as in you are paid for it), you learn these realities pretty quickly. You also focus on retaining optionality for future optimizations by not designing the semantics of your system in a way that locks you permanently into low performance, which is unfortunately very common.
Another example of this is using the right algorithm for the job. I see a lot of code like let's say you have two lists and you need to go through the first one and find the corresponding element by id in the second. The naive implementation uses linear search of the second list, making the algorithm O(n^2). This will stall and take hours, days or longer to run when the lists get large, say into the tens to hundreds of thousands of elements.
If both lists come from your own database then you should have used a foreign key constraint and had the database join them for you. If you can't do that, let's say one or both lists come from a third-party api, you can create a dictionary(hashmap) of the second list to allow O(1) lookup rather than O(n) which brings your full algorithm's complexity to O(n). Now the you can have millions of elements and it'll still run in milliseconds or seconds. And if this is something you need to do very often then consider storing the second list in your database so you can use a join instead. Avoid doing the work, as you say.
These are the most common mistakes I see people make. That and situations where you make one request, wait for a response, then use that to make another request, wait for response and so on.
You don't need a profiler to avoid this type of problem, you just need to understand what you're doing at a fundamental level. Just use a dictionary by default, it doesn't matter if it might be slightly slower for very small inputs - it'll be fast enough and it'll scale.
We had one of those, except it was two loops and a python "in" on a list, so actually O(n^3). Changing the "in" to be on a set brought it from around two hours to around five minutes. And the best part is related to the topic at hand: This was a small postprocessing function, and other more experienced developers had tried to speed up the overall process but assumed the slowness was earlier in the process instead of in this function, and after failing they assumed it couldn't be sped up further. They never bothered to check which part of the process was actually slow.
Related, people not understanding that the latency alone from a DB call (especially when the DB has network-based storage, which is most of the cloud offerings) is substantial. I’ve found so many cases where people are unknowingly doing serialized queries because the ORM obscured everything, and they didn’t realize they could get everything in a single query with joins.
You can almost simplify it to simply observing that PREMATURE optimization is not the same as OPTIMIZATION.
Most people I see who get offended are reacting to the claim that optimization is never useful. But it's pretty easy to knock that claim over.
I don't deny that plenty of people use adjectives very sloppily, and that much writing is improved by just ignoring them, but Knuth is not one of them.
A pet peeve of mine is how common it is for people to ignore qualifying words. Like you can say "most people do 'x'" and you can guarantee if you post that online people will come out of the woodwork saying "I don't do 'x'". Sure, the most claims can often be inaccurate, but those types of responses aren't helpful and aren't meaningfully responsive even if true. "Most" is just a very different word from "all". It implies a distribution but idky we often think things are discrete or binary.
I also see ignoring qualifiers as a frequent cause for online arguments. They're critical words when communicating and dropping them can dramatically change what's being communicated. And of course people are fighting because they end up talking about very different things and don't even realize it.
While I agree with you, that it's common for people to use these words sloppily, I think it is best to default to not ignore them. IME, in the worst case, it aids in communication as you get clarification.
Isn't that more clear and at least as concise as the original one, also using some idiomatic metaphor while avoiding the reference to mythical dark forces.
The operative word was always "premature". As with everything, whether or not a particular optimization is worthwhile depends on its ROI, which in the case of software depends on how often your code is going to be run. If you are building a prototype that's only going to run once, then the only optimizations that are worthwhile are the ones that speed things up by times comparable to the time it take to write them. On the other hand, if you're writing the inner loop of an LLM training algorithm, a 1% speedup can be worth millions of dollars, so it's probably worth it even if it takes months of effort.
> This makes it clear, in context, that Knuth defines "Premature Optimization" as "optimizing before you profile your code"
As with that Google testing talk ("We said all your tests are bad, but that's not entirely true because most of you don't have any tests"), the reality is that most people don't have any profiling.
If you don't have any profiling then in fact every optimisation is premature.
Yes but the second half of the second half is "only after that code has been identified." So the advice is still 'don't waste your time until you profile.'
Same is true for lots of things. Classic example is we are so leetcode obsessed because all the people that were leetcode obsessed got hired and promoted. We've embraced p-hacking, forgetting Goodhart's Law (and the original intention of leetcode style interviewing. It's just the traditional engineering interview, where you get to see how the interviewee problem solves. It is less about the answer and more about the thought process. It's easy to educate people on answers, but it is hard to teach someone how to think in a different framework... (how much money do we waste through this and by doing so many rounds of interviewing?))
I like this article. It’s easy to forget what these classic CS papers were about, and I think that leads to poorly applying them today. Premature optimisation of the kind of code discussed by the paper (counting instructions for some small loop) does indeed seem like a bad place to put optimisation efforts without a good reason, but I often see this premature optimisation quote used to:
- argue against thinking about any kind of design choice for performance reasons, eg the data structure decisions suggested in this article
- argue for a ‘fix it later’ approach to systems design. I think for lots of systems you have some ideas for how you would like them to perform, and you could, if you thought about it, often tell that some designs would never meet them, but instead you go ahead with some simple idea that handles the semantics without the performance only to discover that it is very hard to ‘optimise’ later.
Oh man, I hate how often this is used. Everyone knows there's nothing more permanent than a temporary fix lol.
But what I think people don't realize is that this is exactly what tech debt is. You're moving fast but doing so makes you slow once we are no longer working in a very short timeline. That's because these issues compound. Not only do we repeat that same mistake, but we're building on top of shaky ground. So to go back and fix things ends up requiring far more effort than it would have taken to fix it early. Which by fixing early your efforts similarly compound, but this time benefiting you.
I think a good example of this is when you see people rewrite a codebase. You'll see headlines like "by switching to rust we got a 500% improvement!" Most of that isn't rust, most of that is better algorithms and design.
Of course, you can't always write your best code. There's practical constraints and no code can be perfect. But I think Knuth's advice still fits today, despite a very different audience. He was talking to people who were too obsessed with optimization while today were overly obsessed with quickly getting to some checkpoint. But the advice is the same "use a fucking profiler". That's how you find the balance and know what actually can be put off till later. It's the only way you can do this in an informed way. Yet, when was the last time you saw someone pull out a profiler? I'm betting the vast majority of HN users can't remember and I'd wager a good number never have
I completely agree with most of what you've said, but personally I rarely use a profiler. I don't need it, I just think about what I'm doing and design things to be fast. I consider the time complexity of the code I'm writing. I consider the amount of data I'm working with. I try to set up the database in a way that allows me to send efficient queries. I try to avoid fetching more data than I need. I try to avoid excessive processing.
I realize this is a very personal preference and it obviously can't be applied to everyone. Someone with less understanding might find a profiler very useful and I think those people will learn the same things I'm talking about - as you find the slow code and learn how to make it fast you'll stop making the same mistakes.
A profiler might be useful if I was specifically working to optimize some code, especially code I hadn't written myself. But for my daily work it's almost always good enough to keep performance in mind and design the system to be fast enough from the bottom up.
Most code doesn't have to be anywhere near optimal, it just has to be reasonably fast so that users don't have to sit and stare at loading spinners for seconds at a time. Some times that's unavoidable, some times you're crunching huge amounts of data or something like that. But most of the time, slow systems are slow because the people who designed and implemented them didn't understand what they were doing.
> I consider the time complexity of the code I'm writing
First, I applauded you for doing this. This is very good practice and I want to encourage everyone to do it.
Second, it's important to remember that big O is very naïve, especially when you drop constant. You're size of n can make a big difference. O(n) is worse than O(n^2) for small n. When considering constants it's also reasonable to have O(n^3) algos be better than O(n)! This throws a wrench in analysis making it more time consuming.
But where the profiler really shines is *you don't write all your code from scratch*. So it tells you when to use a library and when to rewrite. The only other option is to painstakingly go through every library you use.
So the profiler is a big time saver. Big O for quick and dirty first pass and profiler for moving from alpha to beta and beyond.
Don't get me wrong, I'm not full of best habits either! I definitely don't do this for most personal projects nor most research code (which is most of what I write, though I profile a different thing...) but a profiler is the most cost effective way to write *production* code. It becomes exponentially more important as code size grows and team size grows. After all, not everyone is using your best practices. So you need practices that scale and work as you relinquish control.
Unfortunately, you need to profiler routinely. Luckily you can automate this and attach it to the CI pipeline so by doing that the cost isn't high
It can be, it isn't necessarily. And I don't care about small values of n. If I'm spinning through two lists of size 5 it doesn't matter if option A is slightly faster than option B, both options will run on the order of nanoseconds. The lower time complexity solution will continue to be reasonably fast as the input grows, pretty soon the difference will be measured in seconds, minutes, hours, days... Using a worse complexity solution for small inputs is a micro optimization you can use some times but it is a micro optimization. Using the best time complexity is what I like to call a macro optimization. It's the default solution. You might deviate from it at times, but you're much better off using complexity to guide your decisions than not doing it. Once you know what you're doing you can deviate from it when appropriate, some times worse complexity is better in specific cases. But 99.9% of the time, best time complexity is good enough either way.
I usually don't need the code I write to be optimal. It doesn't matter if it's a bit slower than it technically could be - as long as it's fast enough and will continue to be fast when the input grows.
Some times you may want to squeeze the absolute maximum performance and in that case you may be able to micro optimize by choosing a worse complexity solution for cases where you know the inputs will always be small. If that assumption ever breaks, your code is now a bottle neck. It can be useful thing to do in certain niche situations, but for the vast majority of code you're better off just using the best complexity solution you can. Otherwise you may come back to find that the code that ran in a millisecond when you profiled it, now takes 20 minutes because there's more data.
How do you run a profiler in CI? Do you hook it up to your tests? You need some code that runs with actual input so I guess you either profile the test suite or write tests specifically for profiling? Or maybe you write benchmarks and hook a profiler up to that?
This sounds like it could be a useful technique but very time consuming if you have to write the profiler tests/benchmarks specifically, and kind of useless if you just hook it up to your general test suite. I want my tests to run fast so I don't use big data for testing, and you won't really see where the bottlenecks are unless you're testing with a lot of data.
IFF you understand computing fundamentals, and IFF you have a solid understanding of DS&A, then yes, this is a reasonable approach. At that point, a profiler will likely show you the last 1% of optimizations you could make.
Most web devs I’ve met do not meet that criteria, but sadly also have rarely used a profiler. They’re not opposed to doing so, they’ve just never been asked to do so.
I think you're missing "IFF you write all the code and use no libraries."
The strategy works well, as you point out, for a single person but doesn't scale and can't be applied to library code without doing a deep dive into the library. Which at that point, a profiler is far more time effective.
> They’re not opposed to doing so, they’ve just never been asked to do so.
I think the best option is to integrate profiling into the CI framework. Like a git pre-hook or even better, offload so that your VMs profile while you're testing different environments.
I think mostly it is a matter of habit. I mean let's be honest, we probably don't care about profiling our for fun project codes. Where profiling really matters is in production. But I think the best way to normalize it is just to automate it. Which, honestly, I've never seen done and I'm really not sure why.
You also need to remember that back when those classic CS papers were written, the CPU/RAM speed ratios were entirely different from what they are today. Take, for instance, a Honeywell 316 from 1969 [0]: "Memory cycle time is 1.6 microseconds; an integer register-to-register "add" instruction takes 3.2 microseconds". Yep, back in those days, memory fetches could be twice as fast as the most simple arithmetic instruction. Nowadays, even the L1 fetch is 4 times as slow as addition (which takes a single cycle).
No wonder the classical complexity analysis of algorithms generally took memory access to be instantaneous: because it, essentially, was instantaneous.
I 100% agree. I could have written the same comment.
The biggest way I see this is picking an architecture or programming language (cough Python) that is inherently slow. "We'll worry about performance later" they say, or frequently "it's not performance critical".
Cut to two years later, you have 200k lines of Python that spends 20 minutes loading a 10GB JSON file.
I failed to put an important point on the above comment, which is spending too much time designing the perfect thing can lead to a system that is either never finished, or one that meets the goals but doesn’t do the work it was originally intended to do, and then it’s too late to pivot to doing the right thing.
If you develop in an environment where you have high velocity (eg python) you can much sooner learn that you are building the wrong thing and iterate.
Most systems do not have very high performance requirements. A typical python web application is not going to need to service many requests and the slow stuff it does is likely going to be waiting on responses from a database or other system. The thing to make such a program faster is to send better db queries rather than optimising the O(1 web page) work that is done in python.
I know this was an example, but if you get to that point and haven’t swapped out Python’s stdlib json library for orjson or some other performance-oriented library, that’s on you.
That's exactly the sort of after-the-fact performance workaround that you should avoid. Instead of using a faster JSON parser, you shouldn't have used JSON in the first place.
> It’s easy to forget what these classic CS papers were about, and I think that leads to poorly applying them today.
Notably, pretty much the entire body of discourse around structured programming is totally lost on modern programmers failing to even imagine the contrasts.
It is interesting that there's so much discourse about the effort people have had to put into data structure and algorithm stuff for interviews, but then people refuse to take advantage of the knowledge studying that gives you towards trivial effort optimizations (aka your code can look pretty similar, just using a different data structure under the hood for example).
That's because people don't get it. You see people saying things like "It's pointless to memorize these algorithms I'll never use" - you're not supposed to memorize the specific algorithms. You're supposed to study them and learn from them, understand what makes them faster than the others and then be able to apply that understanding to your own custom algorithms that you're writing every day.
In practice repetition of this quote more often than not leads to a lazy attitude of "don't think about performance, it's too complicated, measure first and then let's talk".
In my experience all great programmers think about performance from the moment they start thinking about a problem.
Like, you're writing an O(n^2) nested loop over an array that's not going to be tiny? Get out of here! That just shouldn't happen and talking about "premature optimization" points in exactly the wrong direction.
> Yet we should not pass up our opportunities in that critical 3 %
The funny thing is, we forgot how to write programs that spend most of their time in 3% of the code.
Profile a modern application and you see 50 level deep stacks and tiny slices of 0.3% of CPU time spent here and there. And yet these slices amount to 100%.
I think the C# dev team had an interesting way to describe that kind of profile [0]:
> In every .NET release, there are a multitude of welcome PRs that make small improvements. These changes on their own typically don’t “move the needle,” don’t on their own make very measurable end-to-end changes. However, an allocation removed here, an unnecessary bounds check removed there, it all adds up. Constantly working to remove this “peanut butter,” as we often refer to it (a thin smearing of overhead across everything), helps improve the performance of the platform in the aggregate.
This is a bit different because C# is designed and implemented by industry experts - far above the average code monkey in terms of skill, understanding and experience, with added help from community experts, and the software in question is used by millions of systems all over the world.
Practically every part of C# is a hot path because it runs on probably billions of devices every minute of every day. This code is worth micro-optimizing. A 1% improvement in a popular C# library probably saves megawatts of electricity (not to mention the time saves) on a global scale.
Most code is not like that at all. Most code is not worth putting this kind of effort into. Most code is fine being 10% or even 500% slower than it technically could be because computers are lightning fast and can crunch numbers at an incredible speed. Even if the computer is doing 5 times as much work as it needs to it'll still be instant for most use cases. The problem for these kinds of applications arises when it's not doing 5 times as much work as it needs to but 5000 times or 5 million times. Or when the data is just really large and even the optimal solution would take a noticeable amount of time.
Most programs inherently don't have a small bit of code they spend most of their time in. There are exceptions like scientific computing, AI, etc. But most programs have their runtime distributed through lots of code so the idea that you can ignore performance except to profile and optimise some hotspots is nonsense.
Most programs - after dealing with bugs & low hanging fruit - don't have hotspots.
I'm not sure about the "most programs" bit. I think you're forgetting whole swathes of programs like system utilities, databases, tools etc.
But, in any case, this is why we build large programs using modular architectures. You don't profile the Linux kernel hoping to find a hotspot in some obscure driver somewhere, you profile the driver directly. Same for filesystems, networking etc. Similarly we build libraries for things like arithmetic and SAT solvers etc. that will be used everywhere and optimise them directly.
Your comment reads like you're disagreeing with your parent, but in fact you are reinforcing the point. We've forgotten how to build programs like this and end up with big balls of mud that can't be profiled properly.
I completely disagree. If the program doesn't have "hotspots" then it's probably pretty fast and it's following Knuth's advice. But in my experience most applications are completely littered with "hotspots" - endpoints taking 5+ seconds to respond, reports taking minutes or hours to generate, stuff like that. I've never worked on an application where I didn't quickly find multiple easily avoidable hotspots. The average developer, at least in my area of the world, just really doesn't understand how to write performant code at all. They write algorithms with terrible time and/or space complexity, they write inefficient database queries, design their database schemas in ways that don't even allow efficient queries etc.
Then they add caching to compensate for their crappy system design and stuff like that, bloating the system with lots of unnecessary code. They store the data naively then spend significant processing time preparing the data for delivery rather than just storing the data in an ideal format in the first place. Lots of stuff like that.
Pick your top 3 and optimize if you really have to. But if the whole chain is 0.3% here and there, theres very little room for large optimization wins without some redesign.
If you have benchmarked something then optimizations are not premature. People often used to 'optimize' code that was rarely run, often making the code harder to read for no gain.
beware too of premature pessimization. Don't write bubble sort just because you haven't benchmarked your code to show it is a bottleneck - which is what some will do and then incorrectly cite premature optimization when you tell them they should do better. Note that any compitenet languare has sort in the standard library that is better than bubble sort.
Most of the time, the arguments aren’t even that cogent, they’re more along the lines of “I’m not going to specify the columns I need from this SELECT query, because that’s premature optimization.”
This is my all-time favorite paper. It's so easy to read, and there's so much to think about, so much that still applies to everyday programming and language dedign.
Also there's Knuth admitting he avoids GO TO because he is afraid of being scolded by Edsger Dijkstra.
"It is clearly better to write programs in a language that reveals the control structure, even if we are intimately conscious of the hardware at each step; and therefore I will be discussing a structured assembly language called PL/MIX in the fifth volume of The art of computer programming"
Yeah, it's one of my favourites as well. And it weirds me out that the "premature optimization" bit is the most quoted piece of that paper — arguably, that's the least interesting part of it, compared to the musings on the language design and (semi)automatic transformation of algorithms!
I personally find that "break/continue" with (optionally labelled) loops cover about 95% of GOTO's use cases today (including "one-and-a-half" loops), although e.g. Rust and Zig, with their expression-oriented design (and support for sum types) offer an option to experiment with Zahn's event mechanism... but usually factoring an inner loop body into a separate function is a more clear-to-read approach (as for efficiency? Hopefully you have a decent inliner idk).
The only thing I truly miss, occasionally, is a way to concisely write an one-and-a-half range/FOR loop. Something like this:
for key, value in enumerate(obj):
emit(f'{key}={value}')
between:
emit(',')
When you have a generic "while True:" loop, conditional "break" works fine enough, but here? Ugh.
I might be missing something, but how would `goto` improve matters on that for-loop? It seems to me that the fundamental problem here is rather than there's no easy way to check whether this is the last iteration or not, but that is orthogonal to goto vs break.
Well, what you actually need is to make the first iteration special — generally, you can detect whether the iteration was last only after you've already finished it, but special-casing the very first iteration can be done. You put the "between" block at the beginning of the loop body, but start the loop by jumping over it into the middle. It works even better with "for" loops, they desugar into:
i = START;
if (i > END) goto _after_loop;
goto _loop_body;
while (1) {
BETWEEN;
_loop_body:
BODY;
_loop_test: // "continue" would desugar into "goto _loop_test".
i++;
if (i > END) break;
}
_after_loop:
...
which, if you throw away the "BETWEEN" part, is a standard way to compile for-loops, with pre-test and loop inversion. But writing this by hand is just... no. I'd rather add either a boolean flag, or throw in "if i > 0: ..." or "if i != END: ..." guard. Sometimes those even get optimized away, see e.g. [0] — both of loops there desugar into pretty much this; but it's brittle, changing "if (i > START)" into "if (i != START)" in the second function duplicates the loop body, and if you make it sufficient larger than a single function call, the elided conditional branch rematerializes again.
loop where you must crank the iterator before every iteration; there you have to either duplicate some code, or count your iterations and check the current number.
This statement in the introduction applies to so many things in CS:
I have the
uncomfortable feeling that others are making
a religion out of it, as if the conceptual
problems of programming could be solved by
a single trick, by a simple form of coding
discipline!
> Usually people say “premature optimization is the root of all evil” to say “small optimizations are not worth it” but […] Instead what matters is whether you benchmarked your code and whether you determined that this optimization actually makes a difference to the runtime of the program.
In my experience the latter is actually often expressed. What else would “premature” mean, other than you don’t know yet whether the optimization is worth it?
The disagreement is usually more about small inefficiencies that may compound in the large but whose combined effects are difficult to assess, compiler/platform/environment-dependent optimizations that may be pessimizations elsewhere, reasoning about asymptotic runtime (which shouldn’t require benchmarking — but with cache locality effects sometimes it does), the validity of microbenchmarks, and so on.
The way I often hear it expressed has nothing to do with small efficiency changes or benchmarking and it’s more of a yagni/anticipating hyper scale issue. For example, adding some complexity to your code so it can efficiently handle a million users when you’re just fine writing the simple to read and write version for hat isn’t optimal but will work just fine for the twenty users you actually have.
I think the best way to go about optimization is to maintain a constant baseline level of mechanical sympathy with the computer's architecture.
If you burn into your soul the table of NUMA latencies and really accept the inevitable conclusions regarding practical computation, a lot of the optimization discussion simply boils down to keeping the closest cache as hot as possible as often as possible. Put differently, if you can get the working set to ~fit in L1, then you have nothing to worry about. The CPU is faster than most people can reason about now. You can talk to L1 1000x before you can talk to the GPU 1x. This stuff makes a ridiculous difference when applied correctly.
Context switching and communication between threads is where most developers begin to lose the rabbit. Very often a problem can be solved on one physical core much faster than if you force it to be solved across many physical cores.
It is generally better just to focus on algorithmic complexity - O(xN^k). The first version of the code should bring code to the lowest possible k (unless N is very small then who cares). Worry about x later. Don't even think about parallelizing until k is minimized. Vectorize before parallelizing.
For parallel code, you basically have to know in advance that it is needed. You can't normally just take a big stateful / mutable codebase and throw some cores at it.
Choose an algorithm that's reasonably efficient and easy to implement. Then parallelize if it wasn't fast enough. Then try to find a better algorithm. And only then vectorize. Pick the low-hanging fruit first, and only resort to options that require more work and make the code less maintainable if that wasn't enough.
Software engineers and computer scientists are often reluctant to parallelize their code, because they have learned that it's difficult and full of hidden pitfalls. But then they lose the performance gains from multithreaded loops and other simple approaches, which are often essentially free. Multithreaded code is not particularly difficult or dangerous, as long as you are not trying to be clever.
Not sure I agree. Yes algorithmic complexity is important and should always be considered. But you shouldn't ignore X. Picking Rust instead of Python only changes X but it can easily reduce X by a factor of 100 or more, and it's not something you can realistically decide later.
Hmmm. I wasn't thinking about making performance decisions that early in a project - more like an optimization approach for a given feature / work.
In terms of deciding on what programming language to use for a project, that is a complicated one. Usually it is some combination of the team and the problem I suppose.
- Knuth puts sentinels at the end of an array to avoid having to bounds check in the search.
- Knuth uses the register keyword.
- Knuth custom writes each data structure for each application.
When I was young someone pointed out to me how each hype cycle we cherry-pick a couple interesting bits out of the AI space, name them something else respectable, and then dump the rest of AI on the side of the road.
When I got a bit older I realized people were doing this with performance as well. We just call this part architecture, and that part Best Practices.
When Knuth wrote his paper compilers/optimizer were not nearly as good as today, and CPUs were much more deterministic (only disk access shows the issues we now see with cache misses). Most of his optimizations are things that the compiler will do better than you can (most is not all!) and so not even worth thinking about. Meanwhile the compiler will almost never turn a O(n^2) into O(n*ln(n)) despite that resulting in a much faster speedup than anything the compiler can do.
I understand the Knuth's _Premature Optimization_ saying as: (1) find the most hot code (best way to have full heat chart), (2) optimize there. That's all of it, and it really works. E.g. If you have some code that prepares data for a loop that is n^2, and you work hard on this data preparation, profile it etc., this will not give you any visible improvement, because this is one-time code. What's inside the n^2 loop is more important and optimize there, even work together and prepare the data better to allow the loop work more efficient.
Suggest not over indexing on this approach. The profiler will not always identify an easily fixable hotspot. Performance should not entirely be post hoc. Some upfront investment pays off. Sure, not every algorithm matters but with some basic upfront investment (like don't do O(N^2) when O(N) is possible, or don't hit the DB for every item in a loop) will pay off. The alternative can be weeks / months of profiling and debugging. The fixed code often introduces other bugs as code built upon it makes various assumptions.
I'm speaking from experience in a large code base where people liked to use the "root of all evil" argument quite a bit at one time.
Love this paper and read it several times, most recently around 10 years ago when thinking about whether there were looping constructs missing from popular programming languages.
I have made the same point several times online and in person that the famous quote is misunderstood and often suggest people take the time to go back to the source and read it since it’s a wonderful read.
I happen to have also reread the paper last week. The last time I read it was 30 years ago, closer to when it was written than to the present, and I had forgotten a lot of it. One of the few bits that stuck with me was the Shanley Design Criterion. Another was "n and a half loops".
The bit about sequential search and optimization, the topic of this whole blog post, is kind of a minor detail in the paper, despite being so eloquently phrased that it's the part everyone quotes—sometimes without even knowing what “optimization” is. There's a table of contents on its second page, which is 33 lines long, of which "A Searching Example" and "Efficiency" are lines 4 and 5. They are on pages 266–269, 3 pages of a 41-page paper. (But efficiency is an issue he considers throughout.)
Mostly the paper is about control structures, and in particular how we can structure our (imperative!) programs to permit formal proofs of correctness. C either didn't exist yet or was only known to less than five people, so its break/continue control structures were not yet the default. Knuth talks about a number of other options that didn't end up being popular.
It was a really interesting reminder of how different things looked 51 years ago. Profilers had just been invented (by Dan Ingalls, apparently?) and were still not widely available. Compilers usually didn't do register allocation. Dynamically typed languages and functional programming existed, in the form of Lisp and APL, but were far outside the mainstream because they were so inefficient. You could reliably estimate a program's speed by counting machine instructions. People were sincerely advocating using loops that didn't allow "break", and subroutines without early "return", in the interest of building up their control flow algebraically. Knuth considered a recursive search solution to the N-queens problem to be interesting enough to mention it in CACM; similarly, he explains the tail-recursion optimization as if it's not novel but at least a bit recondite, requiring careful explanation.
He mentions COBOL, BCPL, BLISS, Algol W, Algol 60, Algol 68, other Algols, PL/I (in fact including some example code), Fortran, macro assemblers, "structured assemblers", "Wirth's Pascal language [97]", Lisp, a PDP-10 Algol compiler called SAIL (?), META-II, MIXAL, PL360, and something called XPL, but not Smalltalk, CLU, APL, FORTH, or BASIC.
He points out that it would be great for languages to bundle together the set of subroutines for operating on a particular data type, as Smalltalk and CLU did, but he doesn't mention CLU; it had only been introduced in the previous year. But he's clearly thinking along those lines (p.295):
> (...) it turns out that a given level of abstraction often involves several related routines and data definitions; for example, when we decide to represent a table in a certain way, we also want to specify the routines for storing and fetching data from that table. The next generation of languages will probably take into account such related routines.
Often when I read old papers, or old software documentation like the TENEX EXEC manual, it's obvious why the paths not taken were not taken; what we ended up doing is just obviously better. This paper is not like that. Most of the alternatives mentioned seem like they might have turned out just as well as what we ended up with.
I mean basically what he's saying is check the impact of your optimization. Every time there's an optimization, theres a complexity and brittleness cost. However sometimes unoptimized code is actually more difficult to read than the optimized version. I mean its quite logical tbf.
Wait... people understood "premature optimisation" to mean "small optimisations are not worth it"? I've always understood it to mean exactly what it's supposed to mean, namely don't optimise something until you've shown that it's actually needed. I honestly don't know how it could be interpreted any other way.
It is sad how often people cite premature optimization in the real world even when it isn't premature. Using a good algorithm is never premature optimization, particularly when the good algorithm is likely built into your language as well. Likewise if you have run the profiler it is not premature optimization, and running the profiler to find those places is not premature optimization.
The word “typesetting” does not refer to all aspects of visual appearance, but merely to the crucial decision of where each character goes: what glyph (from the fonts available) is placed at what position on the page. This paper was first published in the ACM Computing Surveys journal, and the typesetting is fine (as you'll find if you pick up a physical copy of the journal) — I think what you're complaining about is that all PDF versions that have been put up online so far (including the official one available via https://doi.org/10.1145/356635.356640) are from the same poor scan of the journal: it's not the typesetting that is atrocious, but the scanning quality (converting the printed page into digital images).
(meta) I am probably wasting my time commenting on linked article here. Nobody does that /s.
I don't think the measurements support conclusion that well.
What I want to have when I see those measurements:
I want language abstract machine and compiler to not get in the way I want code on certain platforms to perform. This is currently not what I get at all. The language is actively working against me. For example, I can not read cache line from an address because my object may not span long enough. The compiler has its own direction at best. This means there is no way to specify things, and all I can do is test compiler output after each minor update. In a multi-year project such updates can happen dozens of times! The ergonomics of trying to specify things actually getting worse. The example with assembly is similar to my other experiences: the compiler ignores even intrinsics now. If it wants to optimize, it does.
I can't run to someone else for magic library solutions every time I need to write code. I need to be able to get things done in a reasonable amount of time. It is my organization development process that should decide if the solution I used should be part of some library or not. It usually means that efforts that cover only some platforms and only some libraries are not that universally applicable to my platforms and my libraries as folks at language conferences think /s.
I understood it to mean that optimising for speed at the expense of size is a bad idea unless there are extremely obvious performance improvements in doing so. By default you should always optimise for size.
The famous "premature optimization" quote isn't from a dedicated paper on optimization, but from Knuth's 1974 "Structured Programming with go to Statements" paper where he was discussing broader programming methodology.
I will say I have worked on projects where sales and upper management were both saying the same thing: customers don't like that our product is slow(er than a competitors), and the devs just shrug and say we've already done everything we can. In one case the most senior devs even came up charts to prove this was all they were going to get.
Somehow I found 40%. Some of it was clever, a lot of it was paying closer attention to the numbers, but most of it was grunt work.
Besides the mechanical sympathy, the other two main tools were 1) stubbornness, and 2) figuring out how to group changes along functional testing boundaries, so that you can justify making someone test a change that only improves perf by 1.2% because they're testing a raft of related changes that add up to more than 10%.
Most code has orphaned performance improvements in 100 little places that all account for half of the runtime because nobody can ever justify going in and fixing them. And those can also make parallelism seem unpalatable due to Amdahl.
I think the problem with the quote is that everyone forgets the line that comes after it.
This makes it clear, in context, that Knuth defines "Premature Optimization" as "optimizing before you profile your code"@OP, I think you should lead with this. I think it gets lost by the time you actually reference it. If I can suggest, place the second paragraph after
The optimization part gets lost in the middle and this, I think, could help provide a better hook to those who aren't going to read the whole thing. Which I think how you wrote works good for that but the point (IMO) will be missed by more inattentive readers. The post is good also, so this is just a minor critique because I want to see it do better.https://dl.acm.org/doi/10.1145/356635.356640 (alt) https://sci-hub.se/10.1145/356635.356640
Amdahl’s Law is the single best thing I learned in 4 years of university. It sounds obvious when spelled out but it blew my mind.
No amount of parallelization will make your program faster than the slowest non-parallelizable path. You can be as clever as you want and it won’t matter squat unless you fix the bottleneck.
This extends to all types of optimization and even teamwork. Just make the slowest part faster. Really.
https://en.wikipedia.org/wiki/Amdahl%27s_law
While Amdahl’s Law is very important, its practical effects are very frequently overestimated, at least as frequently as Knuth is misquoted.
Simple problems, e.g. solving a system of equations, will usually include some non-negligible sequential part, which, according to Amdahl’s Law will limit the amount of speed-up provided by hardware parallelism.
On the other hand, complex problems, e.g. designing an integrated circuit, can usually be decomposed in a very great number of simpler subproblems that have weaker dependencies between them, than between the parts of a subproblem, so that by distributing the execution of the simple subproblems over parallel hardware that executes sequentially each subproblem you can obtain much greater acceleration factors than when attempting to parallelize the execution of each simple subproblem.
With clever decomposition of a complex problem and with good execution planning for its subtasks, it is much easier to approach the performance of an embarrassingly parallel problem, than when trying to find parallel versions of simple algorithms, whose performance is frequently limited to low values by Amdahl’s Law.
Amdahl’s Law frequently prevents you from reducing the execution time of some task from 1 minute to 1 second, but it normally does not prevent you from reducing the execution time of some task from 1 year to 1 week, because a task so complex to have required weeks, months or years before parallelization normally contains a great enough number of weakly-coupled subproblems.
I find Amdahl's Law very useful for parallelizable work as well.
Or at least, I find it trivially scalable to parallel case. Often it helped for me in modeling how interconnect would be a possibly limiting element for a task even for embarassingly parallel tasks.
> faster than the slowest non-parallelizable path
Rather, than the slowest non-parallelized path. Ultimately you may reach maximum speed on that path but the assumptions that we are close to it often turn out to be poorly considered, or considered before 8 other engineers added bug fixes and features to that code.
From a performance standpoint you need to challenge all of those assumptions. Re-ask all of those questions. Why is this part single threaded? Does the whole thing need to be single threaded? What about in the middle here? Can we rearrange this work? Maybe by adding an intermediate state?
Like Amdahl's Law, I think many of these take a surprising amount of work to prove despite the result sounding so obvious. The big question is if it was obvious a priori or only post hoc. We often confuse the two, getting us into trouble. I don't think the genius of the statement hits unless you really dig down into proving it and trying to make your measurements in a nontrivially complex parallel program. I think that's true about a lot of things we take for granted
another commonly misinterpreted one is the `shouting fire in a crowded theatre` quote.
In it's original context it means the opposite of how people use it today.
Well, sort of. The specific point Justice Oliver Wendell Holmes was making is pretty much the same as how it’s used today: some speech is so obviously harmful that it can’t be protected by freedom of speech. The problem is that Holmes was using it as an example of what even “the most stringent protection of free speech” would leave unprotected, and he went on to interpret the First Amendment rather less stringently, ruling that it didn’t protect anti-draft flyers.
Before optimizing, I always balance the time I'll need to code the optimization and the time I (or the users of my code) will effectively gain once the optimization is there (that is, real-life time, not CPU time).
If I need three weeks to optimize a code that will run for 2 hours per month, it's not worth it.
But by not optimizing, you don't grow your profiling/optimizing skills and you miss out on a reduction in how long optimizing takes you for future work. Therefore many more codes will not meet the threshold and your skills may not grow for a long time.
You couldn't know, but my job is 50% about optimizing computational workloads. But many times, when questionning my users, it happens that they want an optimisation for some code that will run 2 or 3 times. So eventhough they'll have to wait a week for the computation to run, it'll take me just as much time to make the optimisation run :-)
But if code happens to be used 10 times a week and takes about a day or two to run, it's a no brainer: spending a month optimizing for 10% speed increase is worth it !
The one question that needs to be asked is would the users run it more often if it didn't take so long? There is nothing a computer can do that a room full of clerks in 1800 couldn't do, but the runtime would be so slow (or the cost in clerks so high) that nobody dared ask those questions.
Exercise for the reader, given an unlimited budget to hire 1800's clerks how many FFS could you achieve running doom. (obviously the number is too low to make the game playable)
Interestingly, you didn't learn the full lesson:
When optimizing, always consider the cost of doing the optimization vs. it's impact.
In a project where you are looking a 45/30/25 type split. The 45 may actually be well optimized, so the real gains may be in the 30 or 25.
The key is to understand the impact you CAN have, and what the business value of that impact is. :)
The other rule I've learned is: There is always a slowest path.
> The key is to understand the impact you CAN have, and what the business value of that impact is. :)
Like I tell everyone in system design interviews: AWS will rent you a machine with 32TB of RAM. Are you still sure about all this extra complexity?
... to the tune of about 10 cents a second, yeah. If you can hire a dev team to optimize one of those out, you plausibly come out ahead (well, vs face value).
Hopefully if you’re doing something that actually needs that much RAM you’re also making 6-figures per hour in revenue.
I’ve heard that some companies spend hundreds of millions per year on cloud hosting and it’s still worth it. I can’t even imagine that level of scale.
PS: The context in which I bring this up is a design exercise that would deal with maybe 5gb of data per month. People really over-complicate it :)
How do you come out ahead hiring an entire team to save $6/hour?
That's $6/minute, $360/hour. Granted, if you're using it at a scale approaching FTE costs, you probably want at least some reserved capacity and you're likely not paying face value, but then the comparison was never going to be rigorous anyway.
That's not even the main problem with the comparison. Suppose it was actually $6/hour. Even that's >$50k/year, indefinitely.
How long does it take your team once to make that resource consumption go away permanently? An hour? Even a month? You'd still be ahead.
Working for a smaller digital agency, while using .NET Framework, and then moving to .NET Core (now just .NET) this savings was felt. Using a performance profiler and slightly complicating the code a bit (at least in terms of the documentation for the CMS we use), I was able to get the code in less than the minimum recommended requirements for the base CMS software, without having to alter the source of the CMS. I did this with the Umbraco CMS, but also started using it in 2013 when the documentation was lacking, so reading the source code was a requirement to getting as much as possible out of the software.
Oh, I’m an idiot. Sorry, ignore me.
I didn't get that impression from their response. I mean I could be wrong, but in context of "use a profiler" I don't think anything you said runs counter. I think it adds additional information, and it's worth stating explicitly, but given my read yours comes off as unnecessarily hostile. I think we're all on the same page, so let's make sure we're on the same side because we have the same common enemy: those who use Knuth's quote to justify the slowest piece of Lovecraftian spaghetti and duct tape imaginable
It is hostile because I've seen people mess up optimization too many times.
Sometimes leaving the spaghetti alone IS correct. Sometimes it isn't.
But most awful spaghetti happens because what it was asked to do was awful, IMHO.
Nearly all the awful spaghetti code I've seen started out as good code, but as requirements changed in unexpected ways fitting the new features of version 5.0 while keeping all the previous version features working resulted in a mess that nobody knows how to clean up without the expensive big rewrite.
This has been my experience as well. A real "the road to hell is paved with good intentions" situation. No one was intentionally acting malicious nor trying to make bad code, but the ultimate truth is that software "rots". We write for static environments, but the environment isn't static.
I'd expect even rather junior programmers to recognize that the more time you spend working on a project the more new and unexpected things you find. New features, better ways to implement things, whatever. And ultimately, we all look back at code we wrote a year ago and are going to say "what idiot wrote this garbage?" I mean, if you don't, it is probably a sign that you aren't improving. It's impossible to write perfect code, and even if it was, what would be "perfect" is a moving target. So either way, you should be writing better code today when compared to yesterday.
While it isn't possible to completely put this off, I find there's a helpful tactic to reduce spaghetti-creep. "Maintenance is cheaper than repair." People say "don't fix what isn't broken" but I think that's wrong. You should "fix" things before they are broken. Because frankly, it is cheaper to replace your visibly degrading water pipe than it is to wait for it to burst. You not only have to pay for the new pipe, but all the damage it caused when it broke. Maintenance increases longevity and helps you monitor so you can replace things before they break. Doesn't matter if you're plumbing water pipes or plumbing spaghetti, the tactic is the same.A former boss: an optimization made at a non-bottleneck is not an optimization.
Just remember that there can be hundreds of bottlenecks. Your second slowest matters too. And sometimes there are different requirements - the UI needs to respond within 1/10th second (sometimes much faster, sometimes much slower). Users can often live with your main calculation taking 10x longer than the optimized version so long as there is fast feedback that you are working. Eventually they will want the calculations faster too, but taking a minuter off of those is less valuable than a few ms off of your UI feedback.
it's more nuanced:
you're still releasing resources - so you might not become faster overall but you can compute more in the same time later if necessity arises (althougth that might be somewhat premature but can be good for library code - so it becomes more applicable in different environments)
and there are some rare but tricky scenarios like:
hardware is mobile phone: app seem to be bottlenecked on arithmetics according to the profiler, so it feels obvious to start optimization there
in reality what happens - hardware has limit on power, so it can't give full power to CPU, GPU and memory all at the same time
since app uses too much memory - it has to redirect power there, also memory emits heat, so CPU becomes throttled
By optimizing memory, everything runs colder -> CPU gets more power, can run sustained higher frequencies - app becomes faster
And if the app becoming faster doesn't mean anything because the app is waiting for user input the whole time, it was a lot of work for naught.
Perhaps restated: If the optimization cannot be felt (ie, impact on the product experience), it is not an optimization worth pursuing.
> And if the app becoming faster doesn't mean anything because the app is waiting for user input the whole time, it was a lot of work for naught.
Oh, that might still be good for battery life (or power consumption in general).
This one is more dangerous, as there may be backend resources in use that could be optimized, which could drop costs drastically. This may not change anything for your users, but is definitely worth exploring when it comes to getting the most out of your investment.
Translation: I'm "The Boss" so it's not a bottleneck unless I say it is.
It's not true, though. Speedups are speedups even if there are still slow parts.
His boss is essentially making the same argument as Knuth: Spend your time optimizing what benefits the most from optimization. Or in other words, prioritize, don't optimize blindly.
It's a single line phrase, I wouldn't interpret it too literally. Usually you got to be pretty liberal when interpreting those easy to remember lines. It's a lot harder to remember the literally multiple books we could fill when bringing up exceptions.
There is also the "death by a thousand cuts" kind of slowness that accrues over time and it doesn't really matter where you start peeling the onion and the part you started is rarely the best.
There is more to it than that.
1. Decide if optimization is even necessary.
2. Then optimize the slowest path
It is exactly this "lulled into complacency" that I rail against when most people cite that line. Far too many people are trying to shut down down dialog on improving code (not just performance) and they're not above Appeal to Authority in order to deflect.
"Curiosity killed the cat, but satisfaction brought it back." Is practically on the same level.
If you're careful to exclude creeping featurism and architectural astronautics from the definition of 'optimization', then very few people I've seen be warned off of digging into that sort of work actually needed to be reined in. YAGNI covers a lot of those situations, and generally with fewer false positives. Still false positives though. In large part because people disagree on what "The last responsible moment" in part because our estimates are always off by 2x, so by the time we agree to work on things we've waited about twice as long as we should have and now it's all half assed. Irresponsible.
I'm with you, and been on a bit of a rampage about it lately. Honestly, just too much broken shit, though I'm not sure what the last straw was.
A big thing I rail against is the meaning of an engineer and that our job is about making the best product, not making the most profitable product (most times I bring this up someone will act like there's no difference between these. That itself is concerning). The contention between us engineers and the business people is what creates balance, but I'm afraid we've turned into yesmen instead. Woz needs Jobs, but Jobs also needs Woz (probably more than the other way around). The "magic" happens at the intersection of different expertise.
There's just a lot of weird but subtle ways these things express themselves. Like how a question like "but what about x problem" is interpreted as "no" instead of "yes, but". Or like how people quote Knuth and use it as a thought terminating cliche. In ML we see it with "scale is all you need."
In effect, by choosing to do things the easy way we are choosing to do things the hard way. Which this really confuses me, because for so long in CS the internalization was to "be lazy." Not in the way that you put off doing the dishes now but in the way that you recognize that doing the dishes now is easier than doing them tomorrow when you 1) have more dishes 2) the dishes you left out are harder to clean as the food hardens on the plate. What happened to that "efficient lazy" mindset and how did we turn into "typical lazy"?[0]
[0] (I'm pretty sure I need to add this) https://en.wikipedia.org/wiki/Rhetorical_question
One of the aphorisms I operate by is that when the order of magnitude of a problem changes, the appropriate solution to that problem may also need to change.
Here we are sitting at four to seven orders of magnitude separated from Knuth, depending on whether you mean number of devs or number of machines or size of problems tackled.
Size of machines is pretty amazing. The PDP-10s and 360/67s Knuth was talking about in 01974 were about 1 MIPS (with 32-bit or 36-bit operations) and could scale as high as 16 mebibytes of RAM. Today you can put 6 tebibytes in a 384-core two-socket AMD server that can do in excess of 10 trillion 32-bit operations per second: 6 orders of magnitude more RAM, 7 orders of magnitude more arithmetic.
But that's really understating the case, because those were large shared mainframes. Today's equivalent would be not a 2-socket rackmount server, or even a whole rack, but quite plausibly a whole data center, three more orders of magnitude. 9 orders of magnitude more RAM, 10 orders of magnitude more arithmetic.
Probably also worth mentioning that the absolute number of computers has also increased a lot; every USB-C power brick contains a multi-MIPS computer.
I agree that the number of devs has increased by only about four or five orders of magnitude. In 01974 I'd guess there might have been 50,000 programmers; my grandparents took programming classes around that time involving batch submission of punched cards, and that was a reasonably common thing at US universities. Today, Roblox has 380 million monthly active users; what percentage of them write their own games in it? And the popular programming environments Microsoft Excel and Google Sheets have over a billion users each.
> It is exactly this "lulled into complacency" that I rail against when most people cite that line. Far too many people are trying to shut down down dialog on improving code (not just performance) and they're not above Appeal to Authority in order to deflect.
Your comment reads like a strawman argument. No one is arguing against "improving code". What are you talking about? It reads like you are misrepresenting any comment that goes against your ideas, no matter how misguided they are, by framing your ideas as obvious improvements that can only be conceivably criticized by anyone who is against good code and in favor of bad code.
It's a rehash of the old tired software developer cliche of "I cannot do wrong vs everyone around me cannot do right".
Ironically, you are the type of people Knuth's quote defends software from: those who fail to understand that using claims of "optimization" as a cheat code to push through unjustifiable changes are not improving software, and are actually making it worse.
> "Curiosity killed the cat, but satisfaction brought it back." Is practically on the same level.
This is the same strawman. It's perfectly fine to be curious. No one wants to take the magic out of you. But your sense of wonder is not a cheat code that grants you the right to push nonsense into production code. Engineers propose changes based on sound reasoning. If the engineers in your team reject a change it's unlikely you're surrounded by incompetent fools who are muffling your brilliant sense of wonder.
> If you're careful to exclude creeping featurism and architectural astronautics from the definition of 'optimization', (...)
Your comment has a strong theme of accusing anything not aligned with your personal taste as extremely bad and everything aligned with your personal taste as unquestionably good that can only possibly be opposed if there's an almost conspiratorial persecution. The changes you like are "improving code" whereas the ones proposed by third parties you don't like suddenly receive blanket accusations such as "creeping featurism and architectural astronautics".
Perhaps the problems you experience, and create, have nothing to do with optimization? Food for thought.
Some food for thought: your comment entirely relies upon wisdom of the crowds. Which I agree is usually a good bet to go with. But there are conditions where it fails, even at scale. It's actually fairly easy to upend. All you need to do is remove independence of actors. Which, you can probably guess is fairly common in our field. The more homogeneous we become the less reliable wisdom of the crowds is.
Plus, people have different experiences and different environments. What has been true is your experience need not be true in mine or any others.
I'll give you an example of irrationality here. I was working at a big tech company and as part of breaking down my project I was playing around with another part of the code. I more then doubled the prediction accuracy on customer data and made the code run faster and use less memory. Objectively better. It even happened "for free" because the tasks involved were part of my original job, though the goal was not explicitly. What happened? Last I know, the PR is still open. Boss never pushed for it because they were more invested in their other work that was yet to be completed but had more buzzwords. That work was only a bit more accurate than mine but 100x shower and required 10x more memory. I even told them what I did would work for the other one too, yet it never happened.
I've seen many instances like this. Sometimes people just don't want things to be better unless it's better in a very specific kind of way (and not necessarily via performance). Sometimes it's just politics.
On the other hand, I've seen very good teams where there's the exact opposite experience. It's a very common thread that those teams openly discuss and explain why a seemingly good idea is actually bad. Frequently, they'll let you have a go at it too, because it's a no lose situation. If you're right, we all win. If you're wrong there's an important lesson about the code that's leaned because there's hidden complexity that makes the seemingly good idea bad and it's often hard to explain. But you end up with a lot more knowledge about the code, which helps you in the long run
> This is actually a frequent occurrence. But honestly it usually doesn't come with that exact same phrasing. It usually comes with "let's spend our time on this other thing that isn't important but is new therefore important". It's true, sometimes you need to triage, but there's a clear bias that once something "works" (no matter how poorly) there is much less incentive to make it work not poorly.
Exactly. “It technically functions and therefore doesn’t need attention” has become an industry norm. There’s a massive bias towards piling on more features over making older ones good.
The aphorism misses that "maintenance" is much cheaper than "repair". The aphorism misses that you can get major cost savings by doing thing differently. Sure, it costs more in the moment, but are we trying to run a business paycheck to paycheck or are we trying to create something sustainable. I sure hope your business is thinking more than just 3 months out. "More expensive now" is far too common of an excuse, enough that I see it used to justify not implementing things that would have a ROI within 6 months! Which not doing anything with under a year ROI is absolutely batshit insane (unless you're a startup living on the edge or something, but I see this in big tech companies and I hope we understand that's the context here).
I became the SME for batch processing on this project. The whole project was a read heavy workflow that should have had all decisions made at write time but was so far in the weeds when I got there that I couldn’t pull it out. And everyone I convinced of this fact decided to work somewhere else instead of stay and help fix it.
But there were parts we could sort out at build or deploy time, and I was able to fix a number of those.
One guy was tasked with a project I got turned into an epic: build fallback error pages at deploy time and push them into CDN. I told him not to build it the way he was, copy the one I had been working on. I got busy and he left, and we discovered that they hadn’t been updating when a customer changed their contact info and noticed months later that we still had the old info.
The build trigger had no downstream dependencies and there was no alert being sent for failure. The job was timing out for reasons similar to the ones I’d fixed. I tried to bandaid it still errored out an hour into what looked like at least a 100-120 minute workload.
I discovered the main problem was that our customers could have multiple domain names and he was frobbing through three or four service calls trying to find the canonical one, and making one call per customer to see if they were still customers (we had an endpoint that would paginate all active customers, so ~400x more efficient and rising due to turnover).
The main table I had been using had one column I didn’t use, which had a url in it. I confirmed with our SME that this was in fact the canonical domain name, I just had to add the field to the QL and do a `new URL(url).hostname`. At the end of the day I ended up extracting about half the code from my tool, deleting over half of his code and replacing it with calls into mine (do it like I did it, but literally).
Four and a half minutes, and no more intermittent failures, because my version was built on back pressure to throttle requests under high server load, and total outbound requests around 1/8 of the original.
This just kinda made me think, are there less war stories being shared on HN? It sure feels like it. But what got me thinking is how sharing these types of war stories is an effective way to teach and share with one another. Maybe it's just my bubble, but I feel like I hear them infrequent. Anyways, this was a good one. Thanks for sharing.
A lot of people are still thinking of Knuth's comment as being about just finding the slow path or function and making it faster. What Knuth has talked about, however, and why any senior engineer who cares about performance has either been taught or discovered, is that most real optimization is in the semantics of the system and - frankly - not optimizing things but finding ways not to do them at all.
Tuning the JSON parser is not nearly as effective as replacing it with something less stupid, which is, in turn, not nearly as effective as finding ways to not do the RPC at all.
Most really performant systems of a certain age are also full of layer violations for this reason. If you care about performance (as in you are paid for it), you learn these realities pretty quickly. You also focus on retaining optionality for future optimizations by not designing the semantics of your system in a way that locks you permanently into low performance, which is unfortunately very common.
Well said.
Another example of this is using the right algorithm for the job. I see a lot of code like let's say you have two lists and you need to go through the first one and find the corresponding element by id in the second. The naive implementation uses linear search of the second list, making the algorithm O(n^2). This will stall and take hours, days or longer to run when the lists get large, say into the tens to hundreds of thousands of elements.
If both lists come from your own database then you should have used a foreign key constraint and had the database join them for you. If you can't do that, let's say one or both lists come from a third-party api, you can create a dictionary(hashmap) of the second list to allow O(1) lookup rather than O(n) which brings your full algorithm's complexity to O(n). Now the you can have millions of elements and it'll still run in milliseconds or seconds. And if this is something you need to do very often then consider storing the second list in your database so you can use a join instead. Avoid doing the work, as you say.
These are the most common mistakes I see people make. That and situations where you make one request, wait for a response, then use that to make another request, wait for response and so on.
You don't need a profiler to avoid this type of problem, you just need to understand what you're doing at a fundamental level. Just use a dictionary by default, it doesn't matter if it might be slightly slower for very small inputs - it'll be fast enough and it'll scale.
We had one of those, except it was two loops and a python "in" on a list, so actually O(n^3). Changing the "in" to be on a set brought it from around two hours to around five minutes. And the best part is related to the topic at hand: This was a small postprocessing function, and other more experienced developers had tried to speed up the overall process but assumed the slowness was earlier in the process instead of in this function, and after failing they assumed it couldn't be sped up further. They never bothered to check which part of the process was actually slow.
Related, people not understanding that the latency alone from a DB call (especially when the DB has network-based storage, which is most of the cloud offerings) is substantial. I’ve found so many cases where people are unknowingly doing serialized queries because the ORM obscured everything, and they didn’t realize they could get everything in a single query with joins.
You can almost simplify it to simply observing that PREMATURE optimization is not the same as OPTIMIZATION.
Most people I see who get offended are reacting to the claim that optimization is never useful. But it's pretty easy to knock that claim over.
I don't deny that plenty of people use adjectives very sloppily, and that much writing is improved by just ignoring them, but Knuth is not one of them.
A pet peeve of mine is how common it is for people to ignore qualifying words. Like you can say "most people do 'x'" and you can guarantee if you post that online people will come out of the woodwork saying "I don't do 'x'". Sure, the most claims can often be inaccurate, but those types of responses aren't helpful and aren't meaningfully responsive even if true. "Most" is just a very different word from "all". It implies a distribution but idky we often think things are discrete or binary.
I also see ignoring qualifiers as a frequent cause for online arguments. They're critical words when communicating and dropping them can dramatically change what's being communicated. And of course people are fighting because they end up talking about very different things and don't even realize it.
While I agree with you, that it's common for people to use these words sloppily, I think it is best to default to not ignore them. IME, in the worst case, it aids in communication as you get clarification.
Preprofiling optimisation is stab in the dark.
Isn't that more clear and at least as concise as the original one, also using some idiomatic metaphor while avoiding the reference to mythical dark forces.
By definition anything premature is going to be bad. It's not any kind of an insight.
You have joined a noble line of Knuth Quote Expanders, https://hn.algolia.com/?dateRange=all&page=3&prefix=true&que...
They are all worth reading.
Twice in the last 2 months. I'm not sure if that's a good thing or a bad thing hahaha
The operative word was always "premature". As with everything, whether or not a particular optimization is worthwhile depends on its ROI, which in the case of software depends on how often your code is going to be run. If you are building a prototype that's only going to run once, then the only optimizations that are worthwhile are the ones that speed things up by times comparable to the time it take to write them. On the other hand, if you're writing the inner loop of an LLM training algorithm, a 1% speedup can be worth millions of dollars, so it's probably worth it even if it takes months of effort.
> This makes it clear, in context, that Knuth defines "Premature Optimization" as "optimizing before you profile your code"
As with that Google testing talk ("We said all your tests are bad, but that's not entirely true because most of you don't have any tests"), the reality is that most people don't have any profiling.
If you don't have any profiling then in fact every optimisation is premature.
I'm appalled by the number of developers that don't know that profilers exist.
BTW, my junior developers don't know what a debugger is.
Yes but the second half of the second half is "only after that code has been identified." So the advice is still 'don't waste your time until you profile.'
too late, all the slackers got promoted and now are demanding to keep pushing features no one is asking about.
Same is true for lots of things. Classic example is we are so leetcode obsessed because all the people that were leetcode obsessed got hired and promoted. We've embraced p-hacking, forgetting Goodhart's Law (and the original intention of leetcode style interviewing. It's just the traditional engineering interview, where you get to see how the interviewee problem solves. It is less about the answer and more about the thought process. It's easy to educate people on answers, but it is hard to teach someone how to think in a different framework... (how much money do we waste through this and by doing so many rounds of interviewing?))
I like this article. It’s easy to forget what these classic CS papers were about, and I think that leads to poorly applying them today. Premature optimisation of the kind of code discussed by the paper (counting instructions for some small loop) does indeed seem like a bad place to put optimisation efforts without a good reason, but I often see this premature optimisation quote used to:
- argue against thinking about any kind of design choice for performance reasons, eg the data structure decisions suggested in this article
- argue for a ‘fix it later’ approach to systems design. I think for lots of systems you have some ideas for how you would like them to perform, and you could, if you thought about it, often tell that some designs would never meet them, but instead you go ahead with some simple idea that handles the semantics without the performance only to discover that it is very hard to ‘optimise’ later.
But what I think people don't realize is that this is exactly what tech debt is. You're moving fast but doing so makes you slow once we are no longer working in a very short timeline. That's because these issues compound. Not only do we repeat that same mistake, but we're building on top of shaky ground. So to go back and fix things ends up requiring far more effort than it would have taken to fix it early. Which by fixing early your efforts similarly compound, but this time benefiting you.
I think a good example of this is when you see people rewrite a codebase. You'll see headlines like "by switching to rust we got a 500% improvement!" Most of that isn't rust, most of that is better algorithms and design.
Of course, you can't always write your best code. There's practical constraints and no code can be perfect. But I think Knuth's advice still fits today, despite a very different audience. He was talking to people who were too obsessed with optimization while today were overly obsessed with quickly getting to some checkpoint. But the advice is the same "use a fucking profiler". That's how you find the balance and know what actually can be put off till later. It's the only way you can do this in an informed way. Yet, when was the last time you saw someone pull out a profiler? I'm betting the vast majority of HN users can't remember and I'd wager a good number never have
I completely agree with most of what you've said, but personally I rarely use a profiler. I don't need it, I just think about what I'm doing and design things to be fast. I consider the time complexity of the code I'm writing. I consider the amount of data I'm working with. I try to set up the database in a way that allows me to send efficient queries. I try to avoid fetching more data than I need. I try to avoid excessive processing.
I realize this is a very personal preference and it obviously can't be applied to everyone. Someone with less understanding might find a profiler very useful and I think those people will learn the same things I'm talking about - as you find the slow code and learn how to make it fast you'll stop making the same mistakes.
A profiler might be useful if I was specifically working to optimize some code, especially code I hadn't written myself. But for my daily work it's almost always good enough to keep performance in mind and design the system to be fast enough from the bottom up.
Most code doesn't have to be anywhere near optimal, it just has to be reasonably fast so that users don't have to sit and stare at loading spinners for seconds at a time. Some times that's unavoidable, some times you're crunching huge amounts of data or something like that. But most of the time, slow systems are slow because the people who designed and implemented them didn't understand what they were doing.
Second, it's important to remember that big O is very naïve, especially when you drop constant. You're size of n can make a big difference. O(n) is worse than O(n^2) for small n. When considering constants it's also reasonable to have O(n^3) algos be better than O(n)! This throws a wrench in analysis making it more time consuming.
But where the profiler really shines is *you don't write all your code from scratch*. So it tells you when to use a library and when to rewrite. The only other option is to painstakingly go through every library you use.
So the profiler is a big time saver. Big O for quick and dirty first pass and profiler for moving from alpha to beta and beyond.
Don't get me wrong, I'm not full of best habits either! I definitely don't do this for most personal projects nor most research code (which is most of what I write, though I profile a different thing...) but a profiler is the most cost effective way to write *production* code. It becomes exponentially more important as code size grows and team size grows. After all, not everyone is using your best practices. So you need practices that scale and work as you relinquish control.
Unfortunately, you need to profiler routinely. Luckily you can automate this and attach it to the CI pipeline so by doing that the cost isn't high
> O(n) is worse than O(n^2) for small n.
It can be, it isn't necessarily. And I don't care about small values of n. If I'm spinning through two lists of size 5 it doesn't matter if option A is slightly faster than option B, both options will run on the order of nanoseconds. The lower time complexity solution will continue to be reasonably fast as the input grows, pretty soon the difference will be measured in seconds, minutes, hours, days... Using a worse complexity solution for small inputs is a micro optimization you can use some times but it is a micro optimization. Using the best time complexity is what I like to call a macro optimization. It's the default solution. You might deviate from it at times, but you're much better off using complexity to guide your decisions than not doing it. Once you know what you're doing you can deviate from it when appropriate, some times worse complexity is better in specific cases. But 99.9% of the time, best time complexity is good enough either way.
I usually don't need the code I write to be optimal. It doesn't matter if it's a bit slower than it technically could be - as long as it's fast enough and will continue to be fast when the input grows.
Some times you may want to squeeze the absolute maximum performance and in that case you may be able to micro optimize by choosing a worse complexity solution for cases where you know the inputs will always be small. If that assumption ever breaks, your code is now a bottle neck. It can be useful thing to do in certain niche situations, but for the vast majority of code you're better off just using the best complexity solution you can. Otherwise you may come back to find that the code that ran in a millisecond when you profiled it, now takes 20 minutes because there's more data.
How do you run a profiler in CI? Do you hook it up to your tests? You need some code that runs with actual input so I guess you either profile the test suite or write tests specifically for profiling? Or maybe you write benchmarks and hook a profiler up to that?
This sounds like it could be a useful technique but very time consuming if you have to write the profiler tests/benchmarks specifically, and kind of useless if you just hook it up to your general test suite. I want my tests to run fast so I don't use big data for testing, and you won't really see where the bottlenecks are unless you're testing with a lot of data.
IFF you understand computing fundamentals, and IFF you have a solid understanding of DS&A, then yes, this is a reasonable approach. At that point, a profiler will likely show you the last 1% of optimizations you could make.
Most web devs I’ve met do not meet that criteria, but sadly also have rarely used a profiler. They’re not opposed to doing so, they’ve just never been asked to do so.
I think you're missing "IFF you write all the code and use no libraries."
The strategy works well, as you point out, for a single person but doesn't scale and can't be applied to library code without doing a deep dive into the library. Which at that point, a profiler is far more time effective.
I think the best option is to integrate profiling into the CI framework. Like a git pre-hook or even better, offload so that your VMs profile while you're testing different environments.I think mostly it is a matter of habit. I mean let's be honest, we probably don't care about profiling our for fun project codes. Where profiling really matters is in production. But I think the best way to normalize it is just to automate it. Which, honestly, I've never seen done and I'm really not sure why.
You also need to remember that back when those classic CS papers were written, the CPU/RAM speed ratios were entirely different from what they are today. Take, for instance, a Honeywell 316 from 1969 [0]: "Memory cycle time is 1.6 microseconds; an integer register-to-register "add" instruction takes 3.2 microseconds". Yep, back in those days, memory fetches could be twice as fast as the most simple arithmetic instruction. Nowadays, even the L1 fetch is 4 times as slow as addition (which takes a single cycle).
No wonder the classical complexity analysis of algorithms generally took memory access to be instantaneous: because it, essentially, was instantaneous.
[0] https://en.wikipedia.org/wiki/Honeywell_316#Hardware_descrip...
I 100% agree. I could have written the same comment.
The biggest way I see this is picking an architecture or programming language (cough Python) that is inherently slow. "We'll worry about performance later" they say, or frequently "it's not performance critical".
Cut to two years later, you have 200k lines of Python that spends 20 minutes loading a 10GB JSON file.
I failed to put an important point on the above comment, which is spending too much time designing the perfect thing can lead to a system that is either never finished, or one that meets the goals but doesn’t do the work it was originally intended to do, and then it’s too late to pivot to doing the right thing.
If you develop in an environment where you have high velocity (eg python) you can much sooner learn that you are building the wrong thing and iterate.
Most systems do not have very high performance requirements. A typical python web application is not going to need to service many requests and the slow stuff it does is likely going to be waiting on responses from a database or other system. The thing to make such a program faster is to send better db queries rather than optimising the O(1 web page) work that is done in python.
I know this was an example, but if you get to that point and haven’t swapped out Python’s stdlib json library for orjson or some other performance-oriented library, that’s on you.
That's exactly the sort of after-the-fact performance workaround that you should avoid. Instead of using a faster JSON parser, you shouldn't have used JSON in the first place.
> It’s easy to forget what these classic CS papers were about, and I think that leads to poorly applying them today.
Notably, pretty much the entire body of discourse around structured programming is totally lost on modern programmers failing to even imagine the contrasts.
It is interesting that there's so much discourse about the effort people have had to put into data structure and algorithm stuff for interviews, but then people refuse to take advantage of the knowledge studying that gives you towards trivial effort optimizations (aka your code can look pretty similar, just using a different data structure under the hood for example).
That's because people don't get it. You see people saying things like "It's pointless to memorize these algorithms I'll never use" - you're not supposed to memorize the specific algorithms. You're supposed to study them and learn from them, understand what makes them faster than the others and then be able to apply that understanding to your own custom algorithms that you're writing every day.
In practice repetition of this quote more often than not leads to a lazy attitude of "don't think about performance, it's too complicated, measure first and then let's talk".
In my experience all great programmers think about performance from the moment they start thinking about a problem.
Like, you're writing an O(n^2) nested loop over an array that's not going to be tiny? Get out of here! That just shouldn't happen and talking about "premature optimization" points in exactly the wrong direction.
> Yet we should not pass up our opportunities in that critical 3 %
The funny thing is, we forgot how to write programs that spend most of their time in 3% of the code.
Profile a modern application and you see 50 level deep stacks and tiny slices of 0.3% of CPU time spent here and there. And yet these slices amount to 100%.
I think the C# dev team had an interesting way to describe that kind of profile [0]:
> In every .NET release, there are a multitude of welcome PRs that make small improvements. These changes on their own typically don’t “move the needle,” don’t on their own make very measurable end-to-end changes. However, an allocation removed here, an unnecessary bounds check removed there, it all adds up. Constantly working to remove this “peanut butter,” as we often refer to it (a thin smearing of overhead across everything), helps improve the performance of the platform in the aggregate.
[0]: https://devblogs.microsoft.com/dotnet/performance-improvemen...
They are describing "marginal gains theory". Lots of small improvements to something add up to a larger more tangible improvement.
This is a bit different because C# is designed and implemented by industry experts - far above the average code monkey in terms of skill, understanding and experience, with added help from community experts, and the software in question is used by millions of systems all over the world.
Practically every part of C# is a hot path because it runs on probably billions of devices every minute of every day. This code is worth micro-optimizing. A 1% improvement in a popular C# library probably saves megawatts of electricity (not to mention the time saves) on a global scale.
Most code is not like that at all. Most code is not worth putting this kind of effort into. Most code is fine being 10% or even 500% slower than it technically could be because computers are lightning fast and can crunch numbers at an incredible speed. Even if the computer is doing 5 times as much work as it needs to it'll still be instant for most use cases. The problem for these kinds of applications arises when it's not doing 5 times as much work as it needs to but 5000 times or 5 million times. Or when the data is just really large and even the optimal solution would take a noticeable amount of time.
Most programs inherently don't have a small bit of code they spend most of their time in. There are exceptions like scientific computing, AI, etc. But most programs have their runtime distributed through lots of code so the idea that you can ignore performance except to profile and optimise some hotspots is nonsense.
Most programs - after dealing with bugs & low hanging fruit - don't have hotspots.
I'm not sure about the "most programs" bit. I think you're forgetting whole swathes of programs like system utilities, databases, tools etc.
But, in any case, this is why we build large programs using modular architectures. You don't profile the Linux kernel hoping to find a hotspot in some obscure driver somewhere, you profile the driver directly. Same for filesystems, networking etc. Similarly we build libraries for things like arithmetic and SAT solvers etc. that will be used everywhere and optimise them directly.
Your comment reads like you're disagreeing with your parent, but in fact you are reinforcing the point. We've forgotten how to build programs like this and end up with big balls of mud that can't be profiled properly.
I completely disagree. If the program doesn't have "hotspots" then it's probably pretty fast and it's following Knuth's advice. But in my experience most applications are completely littered with "hotspots" - endpoints taking 5+ seconds to respond, reports taking minutes or hours to generate, stuff like that. I've never worked on an application where I didn't quickly find multiple easily avoidable hotspots. The average developer, at least in my area of the world, just really doesn't understand how to write performant code at all. They write algorithms with terrible time and/or space complexity, they write inefficient database queries, design their database schemas in ways that don't even allow efficient queries etc.
Then they add caching to compensate for their crappy system design and stuff like that, bloating the system with lots of unnecessary code. They store the data naively then spend significant processing time preparing the data for delivery rather than just storing the data in an ideal format in the first place. Lots of stuff like that.
Pick your top 3 and optimize if you really have to. But if the whole chain is 0.3% here and there, theres very little room for large optimization wins without some redesign.
Yep. There are no silver bullets, and the only way through is shaving down the inefficiencies one by one. Really hard to advocate for in an org.
If you have benchmarked something then optimizations are not premature. People often used to 'optimize' code that was rarely run, often making the code harder to read for no gain.
beware too of premature pessimization. Don't write bubble sort just because you haven't benchmarked your code to show it is a bottleneck - which is what some will do and then incorrectly cite premature optimization when you tell them they should do better. Note that any compitenet languare has sort in the standard library that is better than bubble sort.
Most of the time, the arguments aren’t even that cogent, they’re more along the lines of “I’m not going to specify the columns I need from this SELECT query, because that’s premature optimization.”
This is my all-time favorite paper. It's so easy to read, and there's so much to think about, so much that still applies to everyday programming and language dedign.
Also there's Knuth admitting he avoids GO TO because he is afraid of being scolded by Edsger Dijkstra.
https://pic.plover.com/knuth-GOTO.pdf
Reading Knuth is always a pleasure.
From the paper:
"It is clearly better to write programs in a language that reveals the control structure, even if we are intimately conscious of the hardware at each step; and therefore I will be discussing a structured assembly language called PL/MIX in the fifth volume of The art of computer programming"
Looking forward to that!
Yeah, it's one of my favourites as well. And it weirds me out that the "premature optimization" bit is the most quoted piece of that paper — arguably, that's the least interesting part of it, compared to the musings on the language design and (semi)automatic transformation of algorithms!
I personally find that "break/continue" with (optionally labelled) loops cover about 95% of GOTO's use cases today (including "one-and-a-half" loops), although e.g. Rust and Zig, with their expression-oriented design (and support for sum types) offer an option to experiment with Zahn's event mechanism... but usually factoring an inner loop body into a separate function is a more clear-to-read approach (as for efficiency? Hopefully you have a decent inliner idk).
The only thing I truly miss, occasionally, is a way to concisely write an one-and-a-half range/FOR loop. Something like this:
When you have a generic "while True:" loop, conditional "break" works fine enough, but here? Ugh.I might be missing something, but how would `goto` improve matters on that for-loop? It seems to me that the fundamental problem here is rather than there's no easy way to check whether this is the last iteration or not, but that is orthogonal to goto vs break.
Well, what you actually need is to make the first iteration special — generally, you can detect whether the iteration was last only after you've already finished it, but special-casing the very first iteration can be done. You put the "between" block at the beginning of the loop body, but start the loop by jumping over it into the middle. It works even better with "for" loops, they desugar into:
which, if you throw away the "BETWEEN" part, is a standard way to compile for-loops, with pre-test and loop inversion. But writing this by hand is just... no. I'd rather add either a boolean flag, or throw in "if i > 0: ..." or "if i != END: ..." guard. Sometimes those even get optimized away, see e.g. [0] — both of loops there desugar into pretty much this; but it's brittle, changing "if (i > START)" into "if (i != START)" in the second function duplicates the loop body, and if you make it sufficient larger than a single function call, the elided conditional branch rematerializes again.But yes, the goto doesn't help much with
loop where you must crank the iterator before every iteration; there you have to either duplicate some code, or count your iterations and check the current number.[0] https://godbolt.org/z/Y4hv4oYdP
thank you, that's an incredible paper !
This statement in the introduction applies to so many things in CS:
Like Brooks said: No Silver Bullets> Usually people say “premature optimization is the root of all evil” to say “small optimizations are not worth it” but […] Instead what matters is whether you benchmarked your code and whether you determined that this optimization actually makes a difference to the runtime of the program.
In my experience the latter is actually often expressed. What else would “premature” mean, other than you don’t know yet whether the optimization is worth it?
The disagreement is usually more about small inefficiencies that may compound in the large but whose combined effects are difficult to assess, compiler/platform/environment-dependent optimizations that may be pessimizations elsewhere, reasoning about asymptotic runtime (which shouldn’t require benchmarking — but with cache locality effects sometimes it does), the validity of microbenchmarks, and so on.
The way I often hear it expressed has nothing to do with small efficiency changes or benchmarking and it’s more of a yagni/anticipating hyper scale issue. For example, adding some complexity to your code so it can efficiently handle a million users when you’re just fine writing the simple to read and write version for hat isn’t optimal but will work just fine for the twenty users you actually have.
I think the best way to go about optimization is to maintain a constant baseline level of mechanical sympathy with the computer's architecture.
If you burn into your soul the table of NUMA latencies and really accept the inevitable conclusions regarding practical computation, a lot of the optimization discussion simply boils down to keeping the closest cache as hot as possible as often as possible. Put differently, if you can get the working set to ~fit in L1, then you have nothing to worry about. The CPU is faster than most people can reason about now. You can talk to L1 1000x before you can talk to the GPU 1x. This stuff makes a ridiculous difference when applied correctly.
Context switching and communication between threads is where most developers begin to lose the rabbit. Very often a problem can be solved on one physical core much faster than if you force it to be solved across many physical cores.
After 20 years of programming, I've came to the following realization:
On a long enough timeline, the probability that someone will call your code in a for-loop approaches 1.
and that is how we end up with this 'accidentally quadratic' software https://news.ycombinator.com/item?id=9217048
It is generally better just to focus on algorithmic complexity - O(xN^k). The first version of the code should bring code to the lowest possible k (unless N is very small then who cares). Worry about x later. Don't even think about parallelizing until k is minimized. Vectorize before parallelizing.
For parallel code, you basically have to know in advance that it is needed. You can't normally just take a big stateful / mutable codebase and throw some cores at it.
Choose an algorithm that's reasonably efficient and easy to implement. Then parallelize if it wasn't fast enough. Then try to find a better algorithm. And only then vectorize. Pick the low-hanging fruit first, and only resort to options that require more work and make the code less maintainable if that wasn't enough.
Software engineers and computer scientists are often reluctant to parallelize their code, because they have learned that it's difficult and full of hidden pitfalls. But then they lose the performance gains from multithreaded loops and other simple approaches, which are often essentially free. Multithreaded code is not particularly difficult or dangerous, as long as you are not trying to be clever.
OMG, this is terrible advice.
Not sure I agree. Yes algorithmic complexity is important and should always be considered. But you shouldn't ignore X. Picking Rust instead of Python only changes X but it can easily reduce X by a factor of 100 or more, and it's not something you can realistically decide later.
Hmmm. I wasn't thinking about making performance decisions that early in a project - more like an optimization approach for a given feature / work.
In terms of deciding on what programming language to use for a project, that is a complicated one. Usually it is some combination of the team and the problem I suppose.
- Knuth puts sentinels at the end of an array to avoid having to bounds check in the search. - Knuth uses the register keyword. - Knuth custom writes each data structure for each application.
When I was young someone pointed out to me how each hype cycle we cherry-pick a couple interesting bits out of the AI space, name them something else respectable, and then dump the rest of AI on the side of the road.
When I got a bit older I realized people were doing this with performance as well. We just call this part architecture, and that part Best Practices.
I like to call anything I don't like "anti-patterns".
When Knuth wrote his paper compilers/optimizer were not nearly as good as today, and CPUs were much more deterministic (only disk access shows the issues we now see with cache misses). Most of his optimizations are things that the compiler will do better than you can (most is not all!) and so not even worth thinking about. Meanwhile the compiler will almost never turn a O(n^2) into O(n*ln(n)) despite that resulting in a much faster speedup than anything the compiler can do.
Knuth does this today.
And a compiler will never make sentinels for you. Most of these tricks still work and have a measurable difference.
The real root of all evil is reasoning by unexamined phrases.
"A clever saying proves nothing." - Voltaire
I understand the Knuth's _Premature Optimization_ saying as: (1) find the most hot code (best way to have full heat chart), (2) optimize there. That's all of it, and it really works. E.g. If you have some code that prepares data for a loop that is n^2, and you work hard on this data preparation, profile it etc., this will not give you any visible improvement, because this is one-time code. What's inside the n^2 loop is more important and optimize there, even work together and prepare the data better to allow the loop work more efficient.
Suggest not over indexing on this approach. The profiler will not always identify an easily fixable hotspot. Performance should not entirely be post hoc. Some upfront investment pays off. Sure, not every algorithm matters but with some basic upfront investment (like don't do O(N^2) when O(N) is possible, or don't hit the DB for every item in a loop) will pay off. The alternative can be weeks / months of profiling and debugging. The fixed code often introduces other bugs as code built upon it makes various assumptions.
I'm speaking from experience in a large code base where people liked to use the "root of all evil" argument quite a bit at one time.
Love this paper and read it several times, most recently around 10 years ago when thinking about whether there were looping constructs missing from popular programming languages.
I have made the same point several times online and in person that the famous quote is misunderstood and often suggest people take the time to go back to the source and read it since it’s a wonderful read.
I happen to have also reread the paper last week. The last time I read it was 30 years ago, closer to when it was written than to the present, and I had forgotten a lot of it. One of the few bits that stuck with me was the Shanley Design Criterion. Another was "n and a half loops".
The bit about sequential search and optimization, the topic of this whole blog post, is kind of a minor detail in the paper, despite being so eloquently phrased that it's the part everyone quotes—sometimes without even knowing what “optimization” is. There's a table of contents on its second page, which is 33 lines long, of which "A Searching Example" and "Efficiency" are lines 4 and 5. They are on pages 266–269, 3 pages of a 41-page paper. (But efficiency is an issue he considers throughout.)
Mostly the paper is about control structures, and in particular how we can structure our (imperative!) programs to permit formal proofs of correctness. C either didn't exist yet or was only known to less than five people, so its break/continue control structures were not yet the default. Knuth talks about a number of other options that didn't end up being popular.
It was a really interesting reminder of how different things looked 51 years ago. Profilers had just been invented (by Dan Ingalls, apparently?) and were still not widely available. Compilers usually didn't do register allocation. Dynamically typed languages and functional programming existed, in the form of Lisp and APL, but were far outside the mainstream because they were so inefficient. You could reliably estimate a program's speed by counting machine instructions. People were sincerely advocating using loops that didn't allow "break", and subroutines without early "return", in the interest of building up their control flow algebraically. Knuth considered a recursive search solution to the N-queens problem to be interesting enough to mention it in CACM; similarly, he explains the tail-recursion optimization as if it's not novel but at least a bit recondite, requiring careful explanation.
He mentions COBOL, BCPL, BLISS, Algol W, Algol 60, Algol 68, other Algols, PL/I (in fact including some example code), Fortran, macro assemblers, "structured assemblers", "Wirth's Pascal language [97]", Lisp, a PDP-10 Algol compiler called SAIL (?), META-II, MIXAL, PL360, and something called XPL, but not Smalltalk, CLU, APL, FORTH, or BASIC.
He points out that it would be great for languages to bundle together the set of subroutines for operating on a particular data type, as Smalltalk and CLU did, but he doesn't mention CLU; it had only been introduced in the previous year. But he's clearly thinking along those lines (p.295):
> (...) it turns out that a given level of abstraction often involves several related routines and data definitions; for example, when we decide to represent a table in a certain way, we also want to specify the routines for storing and fetching data from that table. The next generation of languages will probably take into account such related routines.
Often when I read old papers, or old software documentation like the TENEX EXEC manual, it's obvious why the paths not taken were not taken; what we ended up doing is just obviously better. This paper is not like that. Most of the alternatives mentioned seem like they might have turned out just as well as what we ended up with.
I mean basically what he's saying is check the impact of your optimization. Every time there's an optimization, theres a complexity and brittleness cost. However sometimes unoptimized code is actually more difficult to read than the optimized version. I mean its quite logical tbf.
Wait... people understood "premature optimisation" to mean "small optimisations are not worth it"? I've always understood it to mean exactly what it's supposed to mean, namely don't optimise something until you've shown that it's actually needed. I honestly don't know how it could be interpreted any other way.
It is sad how often people cite premature optimization in the real world even when it isn't premature. Using a good algorithm is never premature optimization, particularly when the good algorithm is likely built into your language as well. Likewise if you have run the profiler it is not premature optimization, and running the profiler to find those places is not premature optimization.
Ironically, all pdfs of the famous paper have atrocious typesetting and are a pain to read.
The word “typesetting” does not refer to all aspects of visual appearance, but merely to the crucial decision of where each character goes: what glyph (from the fonts available) is placed at what position on the page. This paper was first published in the ACM Computing Surveys journal, and the typesetting is fine (as you'll find if you pick up a physical copy of the journal) — I think what you're complaining about is that all PDF versions that have been put up online so far (including the official one available via https://doi.org/10.1145/356635.356640) are from the same poor scan of the journal: it's not the typesetting that is atrocious, but the scanning quality (converting the printed page into digital images).
You may also want to look at the version published in Knuth's collection called Literate Programming (with corresponding errata: https://cs.stanford.edu/~knuth/lp.html#:~:text=Errata,correc... ), and I've just uploaded a scan here: https://shreevatsa.net/tmp/2025-06/DEK-P67-Structured.progra...
My eyes thank you!
The typesetting expert's papers are hard on the eyes. The algorithm expert's programs are not executed.
(meta) I am probably wasting my time commenting on linked article here. Nobody does that /s.
I don't think the measurements support conclusion that well.
What I want to have when I see those measurements:
I want language abstract machine and compiler to not get in the way I want code on certain platforms to perform. This is currently not what I get at all. The language is actively working against me. For example, I can not read cache line from an address because my object may not span long enough. The compiler has its own direction at best. This means there is no way to specify things, and all I can do is test compiler output after each minor update. In a multi-year project such updates can happen dozens of times! The ergonomics of trying to specify things actually getting worse. The example with assembly is similar to my other experiences: the compiler ignores even intrinsics now. If it wants to optimize, it does.
I can't run to someone else for magic library solutions every time I need to write code. I need to be able to get things done in a reasonable amount of time. It is my organization development process that should decide if the solution I used should be part of some library or not. It usually means that efforts that cover only some platforms and only some libraries are not that universally applicable to my platforms and my libraries as folks at language conferences think /s.
Disclaimer. I work in gamedev.
the reason were still arguing over this is because knuth cant say things concisely; he didnt communicate this idea succintly enough
I understood it to mean that optimising for speed at the expense of size is a bad idea unless there are extremely obvious performance improvements in doing so. By default you should always optimise for size.
The famous "premature optimization" quote isn't from a dedicated paper on optimization, but from Knuth's 1974 "Structured Programming with go to Statements" paper where he was discussing broader programming methodology.
That's literally the first sentence of the article.
It's no longer relevant. It was written when people were writing IBM operating systems in assembly language.
Things have changed.
Remember this: "speed is a feature"?
If you need fast softwqare to make it appealing then make it fast.
I will say I have worked on projects where sales and upper management were both saying the same thing: customers don't like that our product is slow(er than a competitors), and the devs just shrug and say we've already done everything we can. In one case the most senior devs even came up charts to prove this was all they were going to get.
Somehow I found 40%. Some of it was clever, a lot of it was paying closer attention to the numbers, but most of it was grunt work.
Besides the mechanical sympathy, the other two main tools were 1) stubbornness, and 2) figuring out how to group changes along functional testing boundaries, so that you can justify making someone test a change that only improves perf by 1.2% because they're testing a raft of related changes that add up to more than 10%.
Most code has orphaned performance improvements in 100 little places that all account for half of the runtime because nobody can ever justify going in and fixing them. And those can also make parallelism seem unpalatable due to Amdahl.