There isn't much point to HTTP/2 past the load balancer

byroot.github.io

334 points by ciconia 5 months ago

The maximum number of connections thing in HTTP/1 always makes me think of queuing theory, which gives surprising conclusions like how adding a single extra teller at a one-teller bank can cut wait times by 50 times, not just by 2.

However, I think the problem is the Poisson process isn't really the right process to assume. Most websites which would run afoul of the 2/6/8/etc connections being opened are probably trying to open up a lot of connections at the same time. That's very different from situations where only 1 new person arrives every 6 minutes on average, and 2 new people arriving within 1 second of each other is a considerably rarer event.

[1]: https://www.johndcook.com/blog/2008/10/21/what-happens-when-...

hinkley 4 months ago

And if memory serves if you care about minimizing latency you want all of your workers running an average of 60% occupied. (Which is also pretty close to when I saw P95 times dog-leg on the last cluster I worked on).
Queuing theory is really weird.
- atombender 4 months ago
  
  Most analyses I've read say the threshold is around the 80% mark [1], although it depends on how model the distribution, and there's nothing magical about the number. The main thing is to avoid getting close to 100%, because wait times go up exponentially as you get closer to the max.
  Little's Law is fundamental to queueing theory, but there's also the less well-known Kingman's formula, which incorporates variability of arrival rate and task size [2].
  [1] https://www.johndcook.com/blog/2009/01/30/server-utilization...
  [2] https://taborsky.cz/posts/2021/kingman-formula/
  
  dude187 4 months ago
  
  Really both of those models show 60% as about the limit to where you're still effectively at the baseline for latency. 80% is just about the limit to where you're up there in the exponential rise, any higher and things become unusable.
  0-60 and you're still at minimum latency. 60-80 you're at twice the latency but it's probably worth the cost savings of the extra compute density since it's still pretty low. Higher than 80 and things are already slowing down and getting exponentially worse by the request
  
  hinkley 4 months ago
  
  If you look at the chart in the second link, where does the wait time leave the origin? Around 60%.
  The first one is even worse; by 80% you're already seeing twice the delay of 70%.
  If I were to describe the second chart I'd say 80% is when you start to get into trouble, not just noticing a slowdown.
  I said minimize latency, not optimize latency.
- winrid 4 months ago
  
  Somehow I was building distributed systems earlier in my career before I learned about queuing theory and learned this the hard way.
  Nowadays with DB stuff I tend to get assigned new infra leads who see a DB cluster at 50% CPU utilization and think they can go down two instances sizes without severely impacting latency.
  
  hinkley 4 months ago
  
  For me it was seeing a machine at 70% utilization and thinking they could squeeze another small service onto the box. The first time it didn’t sound right, and after that I knew it was a bad idea. By the third time I was willing to throw a veto if they wouldn’t listen to reason.
  And the thing is, even if you stuff a low priority service onto a bunch of boxes, and convince the OS to honor that priority fairly, the fact that the service runs reasonably at all gets baked in as an expectation. Maybe it’s the hedonic treadmill, one kid expects dessert with every meal because they’ve always gotten it, and another knows it’s a special occasion. But anything given is jealously guarded when you have to take it away. Even a “best effort” batch process that is supposed to finish on some interval, is missed when it no longer does. And somehow it’s always your fault.
  I’m sure the grocery store employees who are assigned as backup tellers constantly get grief for not getting their other tasks done “on time”.
  
  hinkley 4 months ago
  
  This is, by the way, one of those problems the Cloud solved, because management couldn’t use friction with the procurement and operations departments to try to pin you into oversubscribing machines. Cloud computing usually leaves the barn door open so that you can requisition another server and ask forgiveness later. And I think at some level developers all know this, so we went along with the dope deal. Because fuck Operations and Accounting for making us the scapegoats to problems they cause.
- lurking_swe 4 months ago
  
  Any resources you’d recommend to learn more about this?
  for context: 10 yrs experience as a software engineer and only a couple on high traffic products. Some things (like this) continue to surprise me.
  
  hiAndrewQuinn 4 months ago
  
  Peep https://yzr95924.github.io/pdf/book/Basic-Queueing-Theory.pd... , honestly just being aware of it is already half the battle.
  
  hinkley 4 months ago
  
  The more examples you read the easier you can recall this stuff. I find the single to two teller examples very compelling myself. How quickly a line can build up and how long it lasts when people/tasks are arriving at 90% of capacity and you get one burst of extra traffic is quite shocking the first time you understand the math.
- deepsun 4 months ago
  
  Why 60%? I suppose if they are less than 1% then latency will be even less.
  
  hinkley 4 months ago
  
  Poisson could probably explain it elegantly, but he's not here, just us.
- gunian 4 months ago
  
  dog food dog leg dog ram lol
  
  hinkley 4 months ago
  
  dog leg is what people who aren't pretentious prats call an 'inflection point'.
  
  TeMPOraL 4 months ago
  
  First time I ever hear of this term, even though it's apparently applied to everything from pixel graphics to traffic engineering to rocket launches. I mean, you US-ians seem obsessed with dog legs to an unhealthy degree[0].
  --
  [0] - https://en.wikipedia.org/wiki/Dogleg
  
  hinkley 4 months ago
  
  We love dogs. Don’t you?
  
  TeMPOraL 4 months ago
  
  I'm more of a cat person.
  
  hinkley 4 months ago
  
  ... "The cat-leg in this chart at 75 req/s suggests that we should set our autoscaling rules somewhere in this viscinity."
  
  gunian 4 months ago
  
  im more a free the slaves type of person
  
  bubblyworld 4 months ago
  
  What do people who aren't pretentious prats call "latency" and "quantiles"? =P
  
  hinkley 4 months ago
  
  Wait time and maybe clusters?
  
  beacon294 4 months ago
  
  dog race and dog quarter
  
  beacon294 4 months ago
  
  I've never heard dog leg before.
  
  gunian 4 months ago
  
  ngl inflection point sounds less of a pratty thing to say than dog leg lmao
  
  AlexeyBelov 4 months ago
  
  [flagged]
tome 4 months ago

Can't it cut wait times by infinity? For example, if the arrivals are at 1.1 per minute, and a teller processes 1 per minute.
- SJC_Hacker 4 months ago
  
  Could be that, could also be the people taking a long time aren't at least causing a bottleneck (assuming there arent two of them at the same time). So you have this situation like this: first person takes 10 minutes, while there are 9 waiting in line that take only one minute a piece. With one teller, average wait time is ~15 minutes. With two tellers, its now ~5 minutes.
  Which is why it is highly annoying when there's only one worker at the coffee stand, and there's always this one jerk at the front of the queue who orders a latte when you just want a coffee. With two workers, the people who just want coffee won't have to wait 15 minutes for the lattee people
  And I've also noticed a social effect, when people wait a long time it seems to reinforce how they perceive the eventual serviced, that is, they want more out of the interaction, so take longer. Which makes it the situation even worse
  
  lostlogin 4 months ago
  
  > there's always this one jerk at the front of the queue
  Here in the espresso world, that’s not so bad. But the ‘vanilla oat milk decaf, and also a hot muffin with butter’ is tedious.
  There is a roaster in Auckland that’s been there since the ‘80s. On the counter it says ‘espresso, flat white or fuck off’. Clear and concise. I like it. https://millerscoffee.co.nz/
  
  yxhuvud 4 months ago
  
  I'd order a "fuck off espresso" in that situation just to see what happens.
  
  raddan 4 months ago
  
  That’ll teach them to use Oxford commas.
  
  coffeenazi 4 months ago
  
  [flagged]
  
  gerdesj 4 months ago
  
  "On the counter it says ‘espresso, flat white or fuck off’"
  Sounds a bit pretentious to me. I generally order a coffee, no milk ... ta.
  
  pimlottc 4 months ago
  
  Try ordering a “just a cup of coffee” in AU/NZ and they will look at you with a blank expression. Espresso is the norm there.
  
  DiggyJohnson 4 months ago
  
  Presumably whoever wrote that sign anticipates this criticism
  
  dylan604 4 months ago
  
  Luckily, we don't get stuck behind someone using a check any more.
  
  pengaru 4 months ago
  
  Instead we're stuck behind people trying to make their phones do something and taking PoS surveys...
- hiAndrewQuinn 4 months ago
  
  The Poisson process is a random distribution. Just because its average is 1.1 per minute doesn't mean you can't have 2 people show up at virtually the same time, it's just pretty rare. Or there's like a 10^-100 chance a million people show up at the same second, etc etc. You'll always have some nonzero average queuing time if you calculate it across the whole distribution for that reason.
- hinkley 4 months ago
  
  You've forgotten about banker's hours :)
  
  dylan604 4 months ago
  
  There was a discussion not tool long ago about modern banks still with archaic practices. I have accounts at two different banks, and if I make a transfer request before 1:45PT, it is counted as same day. That makes no damn sense to me why that's a limitation at all today. It's not like a human needs to look at it, but even so, why the 1:45PT cutoff? Is it because it is 4:45ET? Then why not list it as that? And why does a banking computer system care about timezones or bankers' hours at all. It's all just mind boggling lame
  
  hinkley 4 months ago
  
  I know my father sometimes had to take 1 am phone calls because the insurance industry runs a lot of batch processing overnight, when the systems aren't competing with OLTP traffic. Banking software may be built the same way.
  
  chii 4 months ago
  
  It's because when you do transfers, the banks will reconcile their accounts at the end of the day (e.g, if one bank deposits more to another, they will need to make up the difference with their own capital).
  These cutoff means banks have certainty about the transaction, as these reconciliation is batched, rather than real time.
  
  hinkley 4 months ago
  
  I think a better take on this may be credit cards. And an old classic article, “your coffee shop does not use two phase commit”
  https://ieeexplore.ieee.org/document/1407829
  
  lmm 4 months ago
  
  > Is it because it is 4:45ET? Then why not list it as that?
  Because a lot of their customers are too stupid to understand timezones.
  
  dylan604 4 months ago
  
  I guess that's saying more about the left coast customers then, right? as all of the customers from the other time zones have to do the conversion.
  
  hinkley 4 months ago
  
  If more businesses were based in Indiana we would understand this problem better.
  Gary Indiana is a population center on Lake Michigan, across the Illinois/Indiana border from the Chicago exurbs. They’ve elected to be in the same time zone as Illinois, while the rest of the state is in the same time zone as Ohio. And 1997-2006 was just a clusterfuck where another county joined with Gary, and a bunch of counties decided not to follow DST.
  I am not smart enough to know what time it is without asking a computer.
  
  lmm 4 months ago
  
  Well, if someone misunderstands and sends their payment by 1:45ET or 1:45CT, it's not a problem for the bank.
  
  hinkley 4 months ago
  
  It’s never a problem for a bank, because they’re depend on late fees to hit their profit projections.
  
  lmm 4 months ago
  
  A late fee that a customer quietly pays is a win. A late fee that a customer feels you screwed them over on and kicks up a fuss about can be more trouble than it's worth.
  
  hinkley 4 months ago
  
  I’m too stupid to understand time zones.
  Because everyone is too stupid to understand them at 2 am when alarms are going off. All backends should always be in UTC, even if your boss tells you we will “never” move our servers to another time zone. Because they’re going to buy or be bought by a company based in Chicago or New York or find a vertical so fat they want a second office there, or both. And then they will not remember promising you that you would never have to do a time zone migration. Why are you being so grumpy, just fix it?
  
  DiggyJohnson 4 months ago
  
  This is just a rude guess and not relevant to the actual reason
  
  lmm 4 months ago
  
  Rude perhaps, but I think it very much is the actual reason. "If they listed it as 4:45ET their support costs would be higher" is perhaps the more polite way of putting it, but the meaning is not actually any different.
amluto 4 months ago

[flagged]
- dgfitz 4 months ago
  
  I find a lot of value in being able to get a water or a coffee, use the restroom, have sidebar conversations with fellow employees, begrudgingly attend meetings, or take a walk to stretch my legs for a minute and think, personally.
- timewizard 4 months ago
  
  [flagged]
  
  arjie 4 months ago
  
  Almost every web forum enters a phase where participants bring in their pet politics into unrelated discussions. Whether they last or not depends entirely on whether the flamebait/troll creates a large reply structure or a non-existent one. This is why shadowbans are more effective than large groups of people responding angrily. Or, to cite the Deep Magic: "don't feed the trolls".
  
  timewizard 4 months ago
  
  It's hacker news.
  If you cared about the plight of these people why not organize to do something about it?
  Or is this just finding suffering in the world and then personally capitalizing on it?
  I get the dynamic. I find it disgusting here, in particular.
  
  efdee 4 months ago
  
  [flagged]

vasilvv 4 months ago

The article seems to make an assumption that the application backend is in the same datacenter as the load balancer, which is not necessarily true: people often put their load balancers at the network edge (which helps reduce latency when the response is cached), or just outsource those to a CDN vendor.

> In addition to the low roundtrip time, the connections between your load balancer and application server likely have a very long lifetime, hence don’t suffer from TCP slow start as much, and that’s assuming your operating system hasn’t been tuned to disable slow start entirely, which is very common on servers.

A single HTTP/1.1 connection can only process one request at a time (unless you attempt HTTP pipelining), so if you have N persistent TCP connections to the backend, you can only handle N concurrent requests. Since all of those connections are long-lived and are sending at the same time, if you make N very large, you will eventually run into TCP congestion control convergence issues.

Also, I don't understand why the author believes HTTP/2 is less debuggable than HTTP/1; curl and Wireshark work equally well with both.

dgoldstein0 4 months ago

I think the more common architecture is for edge network to terminate SSL, and then transmit to the load balancer which is actually in the final data center? In which case you can http2 or 3 on both those hops without requiring it on the application server.
That said I still disagree with the article's conclusion: more connections means more memory so even within the same dc, there should be benefits of http2. And if the app server supports async processing, there's value in hitting it with concurrent requests to make the most of its hardware, and http1.1 head of line blocking really destroys a lot of possible perf gains when the response time is variable.
I suppose I haven't had a true bake off here though - so it's possible the effect of http2 in the data center is a bit more marginal than I'm imagining.
- hansvm 4 months ago
  
  HTTP2 isn't free though. You don't have as many connections, but you do have to track each stream of data, making RAM a wash if TLS is non-existent or terminated outside your application. Moreover, on top of the branches the kernel is doing to route traffic to the right connections, you need an extra layer of branching in your application code and have to apply it per-frame since request fragments can be interleaved.
littlecranky67 4 months ago

TCP slow start is not an issue for load balancers, as operatings system cache the congestion window (cwnd) on a per-host basis, even after termination of all connections to that host. That is, next time a connection to the same backend host is created, the OS uses a higher initial congestion window (initcwnd) during slow start based on the previous cache value. It does not matter if the target backend host is in the same datacenter or not.

jchw 5 months ago

Personally, I'd like to see more HTTP/2 support. I think HTTP/2's duplex streams would be useful, just like SSE. In theory, WebSockets do cover the same ground, and there's also a way to use WebSockets over HTTP/2 although I'm not 100% sure how that works. HTTP/2 though, elegantly handles all of it, and although it's a bit complicated compared to HTTP/1.1, it's actually simpler than WebSockets, at least in some ways, and follows the usual conventions for CORS/etc.

The problem? Well, browsers don't have a JS API for bidirectional HTTP/2 streaming, and many don't see the point, like this article expresses. NGINX doesn't support end-to-end HTTP/2. Feels like a bit of a shame, as the streaming aspect of HTTP/2 is a more natural evolution of the HTTP/1 request/response cycle versus things like WebSockets and WebRTC data channels. Oh well.

Matthias247 4 months ago

Duplex streams are not really a HTTP/2-only feature. You can do the same bidirectional streaming with HTTP/1.1 too. The flow is always: 1. The client sends a header set. 2. It can then start to stream data in the form of an unlimited-length byte-stream to the server. 3. The server starts to send a header set back to the client. 4. The server can then start to stream data in the form of an unlimited-length byte-stream to the client.
There is not even a fixed order between 2) and 3). The server can start sending headers or body data before the client sent any body byte.
What is correct is that a lot of servers and clients (including javascript in browsers) don't support this and make stricter assumptions regarding how HTTP requests are used - e.g. that the request bytes are fully sent before the response happens. I think ReadableStream/WritableStream APIs on browsers were supposed to change that, but I haven't followed the progress in the last few years.
NGINX falls into the same category. It's HTTP/2 support (and gRPC support) had been built with a very limited use-case in mind. That's also why various CDNs and service meshes use different kinds of HTTP proxies - so that various streaming workloads don't break in case way the protocol is used is not strictly request->response.
- jchw 4 months ago
  
  No browser I'm aware of is planning on allowing the request and response bodies to be streamed simultaneously for the same request using ReadableStream and WriteableStream. When using streaming request bodies, you have to set the request explicitly to half-duplex.
  Anyways, yes, this is technically true, but the streaming semantics are not really that well-defined for HTTP/1.1, probably because it was simply never envisioned. The HTTP/1.1 request and response were viewed as unary entities and the fact that their contents were streamed was mostly an implementation detail. Most HTTP/1.1 software, not just browsers, ultimately treat the requests and responses of HTTP as different and distinct phases. For most uses of HTTP, this makes sense. e.g. for a form post, the entire request entity is going to need to be read before the status can possibly be known.
  Even if we do allow bidirectional full-duplex streaming over HTTP/1.1, it will block an entire TCP connection for a given hostname, since HTTP/1.1 is not multiplexed. This is true even if the connection isn't particularly busy. Obviously, this is still an issue even with long-polling, but that's all the more reason why HTTP/2 is simply nicer.
  NGINX may always be stuck in an old school HTTP/1 mindset, but modern software like Envoy shows a lot of promise for how architecting around HTTP/2 can work and bring advantages while remaining fully backwards compatible with HTTP/1 software.
- nitely 4 months ago
  
  > I think ReadableStream/WritableStream APIs on browsers were supposed to change that, but I haven't followed the progress in the last few years.
  There has been a lot of pushback against supporting full-duplex streams[0].
  [0]: https://github.com/whatwg/fetch/issues/1254
- rixed 4 months ago
  
  > a lot of servers and clients (including javascript in browsers) don't support this
  To say nothing about the many http proxies in between.
KaiserPro 4 months ago

HTTP2 works great on the LAN, or if you have really good network.
It starts to really perform badly when you have dropped packets. So any kind of medium quality wifi or 4/5g kneecaps performance.
It was always going to do this, and as webpages get bigger, the performance degradation increases.
HTTP2 fundamentally underperforms in the real world, and noticeably so on mobile. (My company enthusiastically rolled out http2 support when akamai enabled it.)
Personally I feel that websockets are a hack, and frankly HTTP 3 should have been split into three: a file access protocol, a arbitrary TCP like pipe and a metadata channel. But web people love hammering workarounds onto workarounds. so we are left with HTTP3
- jchw 4 months ago
  
  HTTP/2, in my experience, still works fine on decent connections, but the advantages definitely start to level out as the connection gets worse. HTTP/2 definitely has some inherent disadvantages over HTTP/1 in those regards. (Though it depends on how much you are constrained by bandwidth vs latency, to be sure.)
  However, HTTP/3 solves that problem and performs very well on both poor quality and good quality networks.
  Typically, I use HTTP/2 to refer to both HTTP/2 and HTTP/3 since they are basically the same protocol with different transports. Most people don't really need to care about the distinction, although I guess since it doesn't use TCP there are cases where someone may not be able to establish an HTTP/3 connection to a server. Still, I think the forward looking way to go is to try to push towards HTTP/3, then fall back to HTTP/2, and still support HTTP/1.1 indefinitely for simple and legacy clients. Some clients may get less than ideal performance, but you get the other benefits of HTTP/2 on as many devices as possible.
- ninkendo 4 months ago
  
  > HTTP 3 should have been split into three: a file access protocol, a arbitrary TCP like pipe and a metadata channel
  HTTP3 is basically just HTTP2 on top of QUIC… so you already have the tcp-like pipe, it’s called QUIC. And there’s no reason to have a metadata channel when there are already arbitrary separate channels in QUIC itself.
  
  KaiserPro 4 months ago
  
  You're right of course, it has virtual channels. I just think it would have been good to break with the old HTTP semantics and change to something reflecting modern usage.
commandlinefan 4 months ago

It seems like the author is agreeing that HTTP/2 is great (or at least good) for browser -> web server communication, but not useful for the REST-style APIs that pervade modern app design. He makes a good case, but HTTP was never really a good choice for API transport _either_, it just took hold because it was ubiquitous.
- yawaramin 4 months ago
  
  What's the difference? Aren't they both request-response protocols?
  
  commandlinefan 4 months ago
  
  It's not terrible, it just has a lot of fluff that you don't really need or want in an API call like chunked transfer encoding, request pipelining, redirect responses, mime encoding, content type negotiation, etc. Of course, you can just ignore all that stuff and either a) implement a stripped down, incomplete HTTP-ish protocol that has just the parts you need or b) use a full-blown HTTP implementation like nginx like most people do. The problem with (b) is when nginx suddenly starts behaving like an actual web server and you have to troubleshoot why it's now barfing on some UTF-16 encoding.
  
  withinboredom 4 months ago
  
  There are many other transport protocols for APIs, http basically took the lead in the early days because it made it through firewalls/proxies; not because it is better.
KingMob 5 months ago

Yeah, it's a shame you can't take advantage of natural HTTP/2 streaming from the browser. There's the upcoming WebTransport API (https://developer.mozilla.org/en-US/docs/Web/API/WebTranspor...), but it could have been added earlier.
- Matthias247 4 months ago
  
  If you want to stream data inside a HTTP body (of any protocol), then the ReadableStream/WritableStream APIs would be the appropriate APIs (https://developer.mozilla.org/en-US/docs/Web/API/Streams_API) - however at least in the past they have not been fully standardized and implemented by browsers. Not sure what the latest state is.
  WebTransport is a bit different - it offers raw QUIC streams that are running concurrently with the requests/streams that carry the HTTP/3 requests on shared underlying HTTP/3 connections and it also offers a datagram API.
bawolff 4 months ago

I think the problem is that duplex communication on the web is rarely useful except in some special cases, and usually harder to scale as you have to keep state around and can't as easily rotate servers.
Some applications it is important but for most websites the benefits just dont outweigh the costs.
jonwinstanley 4 months ago

I thought http/2 was great for reducing latency for JS libraries like Turbo Links and Hotwire.
Which is why the Rails crowd want it.
Is that not the case?
- the_duke 4 months ago
  
  H2 still suffers from head of line blocking on unstable connections (like mobile).
  H3 is supposed to solve that.
  
  wtarreau 4 months ago
  
  Yep. Actually H1/H2/H3 do have the same problem (remember the good old days when everyone was trying to pipeline over H1?), except that H1 generally comes with multiple connections and H3 currently goes over QUIC and it's QUIC that addresses HoL by letting streams progress independently.
spintin 4 months ago

[dead]
withinboredom 5 months ago

[flagged]
- theflyinghorse 4 months ago
  
  I have an nginx running on my VPS supporting my startup. Last time I had to touch it was about 4 years ago. Quality software
- Aurornis 4 months ago
  
  I really like Caddy, but these nginx performance comparisons are never really supported in benchmarks.
  There have been numerous attempts to benchmark both (One example: https://blog.tjll.net/reverse-proxy-hot-dog-eating-contest-c... ) but the conclusion is almost always that they're fairly similar.
  The big difference for simple applications is that Caddy is easier to set up, and nginx has a smaller memory footprint. Performance is similar between the two.
  
  otterley 4 months ago
  
  AFAIK both proxies are capable of serving at line rate for 10Gbps or more at millions of concurrent connections. I can't possibly see how performance would significantly differ if they're properly configured.
- jasonjayr 4 months ago
  
  nginx's memory footprint is tiny for what it delivers. A common pattern I see for homelab and self-hosted stuff is a lightweight bastion VPS in a cloud somewhere proxying requests to more capabile on-premise hardware over a VPN link. Using a cheap < $5mo means 1GB or less of RAM, so you have to tightly watch what is running on that host.
  
  neoromantique 4 months ago
  
  To be fair 1GB is a lot, both caddy and nginx would feel pretty good with it I'd imagine.
  
  eikenberry 4 months ago
  
  1 GB should be way more than either should need. I run nginx, unbound, postfix, dovecot plus all the normal suff (ssh, systemd, etc) for a Linux system on a VPS w/ 500MB of RAM. Currently the system has ~270MB used. It actually has 1GB available due to a plan auto-upgrade but have never bothered as I just don't need it.
  
  Aurornis 4 months ago
  
  1GB would be for everything running on the server, not just the reverse proxy.
  For small personal projects, you don't usually buy a $5/month VPS just to use as a dedicated reverse proxy.
  
  sophacles 4 months ago
  
  1GB for a VPC that runs an HTTP load balancer/reverse proxy and a handful of IPsec or Wireguard tunnels back to the app servers (origin) is overkill. You could successfully run that in 512MB, and probably even 256MB. (That's the scenario described).
  What needs to run on this that's a memory hog making 512MB too small? By my (very rough) calculations youd need 50-100MB for kernel + systemd + sshd + nginx base needs + tunnels home. That leaves the rest for per-request processing.
  Each request starts needing enough RAM to parse the https headers into a request object, open a connection back to the origin, and buffer a little bit of traffic that comes in while that request is being processed/origin connection opens. After that you only need to maintian 2 connections plus some buffer space - Generously 50KB initially and 10KB ongoing. There's enough space for a thousand concurrent requests in the ram not used by the system. Proxying is fairly cheap - the app servers (at the origin) may need much much more, but that's not the point of the VPS being discussed.
  Also worth noting that the cheap VPS is not a per-project cost - that is the reverse proxy that handles all HTTP traffic into your homelab.
- p_ing 4 months ago
  
  Why would you use either when there is OpenBSD w/ carp + HAProxy?
  There's lots of options out there. I mean, even IIS can do RP work.
  Ultimately, I would prefer a PaaS solution over having to run a couple of servers.
- otterley 4 months ago
  
  You're going to need to show your homework for this to be a credible claim.
- evalijnyi 5 months ago
  
  Is it just me or did anyone else completely miss Caddy for it's opening sentence?
  >Caddy is a powerful, extensible platform to serve your sites, services, and apps, written in Go.
  To me it reads that if your application is not written in Go, don't bother
  
  jeroenhd 4 months ago
  
  The Go crowd, like the Rust crowd, likes to advertise the language their software is written in. I agree that that specific sentence is a bit ambiguous, though, as if it's some kind of middleware that hooks into Go applications.
  It's not, it's just another standalone reverse proxy.
  
  SteveNuts 4 months ago
  
  > The Go crowd, like the Rust crowd, likes to advertise the language their software is written in.
  Probably because end users appreciate that usually that means a single binary + config file and off you go. No dependency hell, setting up third party repos, etc.
  
  0x457 4 months ago
  
  > Probably because end users appreciate that usually that means a single binary + config file and off you go. No dependency hell, setting up third party repos, etc.
  Until you have to use some plugin (e.g. cloudflare to manage DNS for ACME checks), now it's exactly "dependency hell, setting up third party repos, etc."
  I also fully expect to see a few crashes from unchecked `err` in pretty much any Go software. Also, nginx qualifies for `single binary + config`, it's just NGINX is for infra people and Caddy is for application developers.
  
  airstrike 4 months ago
  
  Fortunately I don't think any of that applies to Rust ;-)
  
  0x457 4 months ago
  
  Actually, all of it applies to rust. The only stable ABI in Rust is C-ABI and IMO at that point it stops being rust. Even dynamically loading rustlib in rust application is unsafe and only expected to work when both compiled with the same version. In plugins context, it's the same as what Caddy making you do.
  However, Rust Evangelical Strike Force successfully infiltrated WASM committee and when WASM Components stabilize, it can be used for plugins in some cases (see Zed and zellij). (Go can use them as well, rust is just the first (only?) to support preview-2 components model.
  
  airstrike 4 months ago
  
  Yeah, I don't really do dynamic loading in my corner of Rust. And I can always target some MSRV, cargo package versions, and be happy with it. Definitely beats the dependency hell I've had to deal with elsewhere
  
  0x457 4 months ago
  
  Don't get me wrong, I love rust and use it almost every day. Doing `cargo run` in a project it handles everything is good. This gets lost once you start working in a plugin context. Because now you're not dealing in your neatly organized workplace, you're working across multiple workplaces from different people.
  IIRC it's more than just MSRV or even matching version exactly. It also requires flags that were used to compile rustc match (there is an escape hatch tho).
  
  0x457 4 months ago
  
  When I see software written in Go, I know that it has a very sad plugin support story.
  
  kbolino 4 months ago
  
  Terraform providers seem to work pretty well, but as far as I know, they're basically separate executables and the main process communicates with them using sockets.
  
  0x457 4 months ago
  
  Yes, works very well for terraform. You probably can see why it's not going to work for a webserver?
  
  unification_fan 5 months ago
  
  Why should a reverse proxy give a single shit about what your lang application is written in
  
  evalijnyi 5 months ago
  
  It shouldn't, which is why I think wording there is strange. Nginx doesen't market itself as "platform to serve your sites, services, and apps, written in C". Reading the first sentence I don't even know what Caddy is, what does a platform mean in this context? Arriving on Nginx's site the first sentence visible to me is
  >nginx ("engine x") is an HTTP web server, reverse proxy, content cache, load balancer, TCP/UDP proxy server, and mail proxy server.
  Which is perfect
  
  suraci 5 months ago
  
  when it says 'written in Go', the subtext is - i'm fast, i'm new, i'm modern, go buddies love me please
  the better one is 'written in rust', the subtext is - i'm fast, i'm new, i'm futurism, and i'm memory-safe, rust buddies love me please
  --- cynicism end ---
  i do think sometimes it's worth to note the underlying tech stack, for example, when a web server claims it's based on libev, i know it's non-blocking
  
  jchw 5 months ago
  
  Back when Caddy first came out over 10 years ago, the fact that it was written in Go was just simply more notable. For Go, it also at least tells you the software is in a memory-safe programming language. Now neither of those things is really all that notable, for new software.
  
  PeterCorless 4 months ago
  
  I didn't even read the article, but I love the comments on the thread.
  Yes. The implementation language of a system should not matter to people in the least. However, they are used as a form of prestige by developers and, sometimes, as a consumer warning label by practitioners.
  "Ugh. This was written in <language-I-hate>."
  "Ooo! This was written in <language-I-love>!"
  
  jchw 4 months ago
  
  There's certainly some aspect of that going on, but I think mainly it's just notable when you write something in a programming language that is relatively new.
  Does it matter? In theory no, since you can write pretty much anything in pretty much any language. In practice... It's not quite that black and white. Some programming languages have better tooling than others; like, if a project is written in pure Go, it's going to be a shitload easier to cross compile than a C++ project in most cases. A memory-safe programming language like Go or Rust will tell you about the likely characteristics of the program: the bugs are not likely to be memory or stack corruption bugs since most of the code can't really do that. A GC'd language like Go or Java will tell you that the program will not be ideal for very low latency requirements, most likely. Some languages, like Python, are languages that many would consider easy to hack on, but on the other hand a program written in Python probably doesn't have the best performance characteristics, because CPython is not the fastest interpreter. The discipline that is encouraged by some software ecosystems will also play a role in the quality of software; let's be honest, everyone knows that you CAN write quality software in PHP, but the fact that it isn't easy certainly says something. There's nothing wrong with Erlang but you may need to learn about deploying BEAM in production before actually using Erlang software, since it has its own unique quirks.
  And this is all predicated on the idea that nobody ever introduces a project as being "written in C." While it's definitely less common, you definitely do see projects that do this. Generally the programming language is more of a focus for projects that are earlier in their life and not as refined as finished products. I think one reason why it was less common in the past is because writing that something is written in C would just be weird. Of course it's written in C, why would anyone assume otherwise? It would be a lot more notable, at that point, if it wasn't.
  I get why people look at this in a cynical way but I think the cynical outlook is only part of the story. In actuality, you do get some useful information sometimes out of knowing what language something is written in.
  
  PeterCorless 4 months ago
  
  I do know of a shop where an OSS database written in Java was chosen over one written in C++ because of the ability of the internal team to read the code, modify it, troubleshoot it, etc. That makes sense. It that was driven by pragmatics — maintainability. Not simply bias, or aesthetics or "rule of cool."
  
  cruffle_duffle 4 months ago
  
  Pretty sure Ruby on Rails sites were the same way.
  
  eichin 4 months ago
  
  Python certainly was too, back in the day. It feels like it's roughly a "first 10 years of the language" thing, maybe stretched another 5 if there's an underdog aspect (like being interpreted.)
  
  jpc0 4 months ago
  
  > ... and i'm memory-safe...
  Go is memory safe..
  
  p_ing 4 months ago
  
  For an open source product, it's fun to say "written in X language". It also advertises the project to developers who may be willing to contribute.
  If you put "product made with Go", I'm not going to contribute as I don't know Go, though that wouldn't prevent me from using it should it fit my needs. But if you wrote your project in .NET, I may certainly be willing to contribute.
timewizard 4 months ago

> elegantly
There is a distinct lack of elegance in the HTTP/2 protocol. It's exceptionally complex and it has plenty of holes in it. That it simply does a job does not earn it "elegant."
- jchw 4 months ago
  
  Honestly, I don't understand this critique. The actual protocol is pretty straight-forward for what it does. I'm not sure it can be much simpler given the inflexible requirements. I find it more elegant than HTTP/1.1.
  Versus HTTP/1.1, some details are simplified by moving the request and status line parts into headers. The same HEADERS frame type can be used for both the headers and trailers on a given stream. The framing protocol itself doesn't really have a whole lot of cruft, and versus HTTP/1 it entirely eliminates the need for the dancing around with Content-Length, chunked Transfer-Encoding, and trailers.
  In practice, a lot of the issues around HTTP/2 implementations really just seem to be caused by trying to shoehorn it into existing HTTP/1.1 frameworks, where the differences just don't mesh very well (e.g. Go has some ugly problems here) or just simply a lack of battle-testing due to trouble adopting it (which I personally think is mainly caused by the difficulty of configuring it. Most systems will only use HTTP/2 by default over TLS, after all, so in many cases end-to-end HTTP/2 wasn't being tested.)
  
  SlightlyLeftPad 4 months ago
  
  From the perspective of a user, where it starts to seem inelegant is when grpc comes into the picture. You get grpc to function but then plain http traffic breaks and vice versa. It seems to be odd implementation details on specific load balancer products. When in theory, all of it should operate the same way, but it doesn’t.
  
  jrockway 4 months ago
  
  grpc doesn't really do anything special on top of http/2. Load balancers that are aware of http/2 on both sides shouldn't have any trouble with either.
  The problem that people run into load balancing grpc is that they try to use a layer 4 load balancer to balance layer 7 requests; that is, if there are 4 backends, the load balancer tells you the address of one of them, and then you wonder why the other 3 backends don't get 25% of the traffic. That's because grpc uses 1 TCP connection and it sends multiple requests over that connection ("channel"). If your load balancer tells you the addresses of all 4 servers, then you can open up 4 channels and load balance inside your application (this was always the preferred approach at google, with a control channel to gracefully drain certain backends, etc.). If your load balancer is aware of http/2 at the protocol level (layer 7), then you open up one channel to your load balancer, which already has one channel for each backend. When a request arrives, it inspects it and picks a backend and proxies the rest of the exchange.
  Ordinary http/2 works like this, it's just that you can get away with a network load balancer because http clients open new connections more regularly (consider the lifetime of a browser page with the lifetime of a backend daemon). Each new connection is a load balancing opportunity for the naive layer 4 balancer. If you never make new connections, then it never has an opportunity to load balance.
  grpc has plenty of complexity for "let applications do their own load balancing", including built-in load balancing algorithms and built-in service discovery and health discovery (xDS); http/2 doesn't have any of this. Whether these are actually part of grpc or just random add-ons to popular client libraries is somewhat up for debate, however.
- mikepurvis 4 months ago
  
  The same arguments apply to WebSockets probably— yes the implementation is a little hairy, but if the end result is a clean abstraction that does a good job of hiding that complexity from the rest of the stack, then it's elegant.

treve 5 months ago

First 80% of the article was great, but it ends a bit handwavey when it gets to its conclusion.

One thing the article gets wrong is that non-encrypted HTTP/2 exists. Not between browsers, but great between a load balancer and your application.

byroot 5 months ago

> One thing the article gets wrong is that non-encrypted HTTP/2 exists
Indeed, I misread the spec, and added a small clarification to the article.
tuukkah 5 months ago

Do you want to risk the complexity and potential performance impact from the handshake that the HTTP/2 standard requires for non-encrypted connections? Worst case, your client and server toolings clash in a way that every request becomes two requests (before the actual h2c request, a second one for the required HTTP/1.1 upgrade, which the server closes as suggested in the HTTP/2 FAQ).
- arccy 4 months ago
  
  most places where you'd use it use h2c prior knowledge, that is, you just configure both ends to only speak h2c, no upgrades or downgrades.
fragmede 5 months ago

Not according to Edward Snowden, if you're Yahoo and Google.
- ChocolateGod 5 months ago
  
  You can just add encryption to your backend private network (e.g. Wireguard)
  Which has the benefit of encrypting everything and avoids the overhead of starting a TLS socket for every http connection.
  
  jeroenhd 4 months ago
  
  If you're going that route, you may as well just do HTTPS again. If you configure your TLS cookies and session resumption right, you'll get all of the advantages of fancy post-quantum crypto without having to go back to the days of manually setting up encrypted tunnels like when IPSec did the rounds.
- soraminazuki 4 months ago
  
  Wait, are some people actively downvoting advice encouraging the use of encryption in internal networks? I sure hope those people don't go anywhere near the software industry because that's utterly reckless in the post-Snowden world.
  
  fragmede 4 months ago
  
  People are all over the place. I had to talk someone into SSH over VPN being double encrypted isn’t a waste.

fulafel 4 months ago

There's a security angle: Load balancers have big problems with request smuggling. HTTP/2 does something to the picture, maybe someone is more up to date if it's currently better or worse?

ref: https://portswigger.net/web-security/request-smuggling

mwcampbell 4 months ago

This is why I configured my company's AWS application load balancer to disable HTTP2 when I first saw the linked post, and haven't changed that configuration since then. Unless we have definitive confirmation that all major load balancers have fixed these vulnerabilities, I'll keep HTTP2 disabled, unless I can figure out how to do HTTP2 between the LB and the backend.
- wtarreau 4 months ago
  
  If you transfer large objects, H2 on the backend will increase transfer costs (due to framing). If you deal with many moderate or small objects however, H2 can improve the CPU usage for both the LB and the backend server because they will have less expensive parsing and will be able to merge multiple messages in a single packet over the wire, thus reducing the number of syscalls. Normally it's just a matter of enabling H2 on both and you can run some tests. Be careful not to mix too many clients over a backend connection if you don't want slow client to limit the other ones' xfer speed or even cause head of line blocking, though! By typically supporting ~10 streams per backend connection does improve things quite a bit over H1 for regular sites.
albinowax_ 4 months ago

Yes HTTP/2 is much less prone to exploitable request smuggling vulnerabilities. Downgrading to H/1 at the load balancer is risky.
nitely 4 months ago

In theory request smuggling is not possible with end-to-end HTTP/2. It's only possible if there is a downgrade to HTTP/1 at some point.
- SahAssar 4 months ago
  
  A h2 proxy usually wouldn't proxy through the http2 connection, it would instead accept h2, load-balance each request to a backend over a h2 (or h1) connection.
  The difference is that you have a h2 connection to the proxy, but everything past that point is up to the proxies routing. End-to-end h2 would be more like a websocket (which runs over HTTP CONNECT) where the proxy is just proxying a socket (often with TLS unwrapping).
  
  nitely 4 months ago
  
  > A h2 proxy usually wouldn't proxy through the http2 connection, it would instead accept h2, load-balance each request to a backend over a h2 (or h1) connection.
  Each connection need to keep state of all processed requests (the HPACK dynamic headers table), so all request for a given connection need to be proxied through the same connection. Not sure I got what you meant, though.
  Apart from that, I think the second sentence of my comment makes clear there is no smuggling as long as the connection before/past proxy is http2, and it's not downgraded to http1. That's all that I meant.

jiggawatts 4 months ago

Google measured their bandwidth usage and discovered that something like half was just HTTP headers! Most RPC calls have small payloads for both requests and responses.

HTTP/2 compresses headers, and that alone can make it worthwhile to use throughout a service fabric.

LAC-Tech 5 months ago

Personally, this lack of support doesn’t bother me much, because the only use case I can see for it, is wanting to expose your Ruby HTTP directly to the internet without any sort of load balancer or reverse proxy, which I understand may seem tempting, as it’s “one less moving piece”, but not really worth the trouble in my opinion.

That seems like a massive benefit to me.

Animats 5 months ago

The amusing thing is that HTTP/2 is mostly useful for sites that download vast numbers of tiny Javascript files for no really good reason. Like Google's sites.

theandrewbailey 4 months ago

I've seen noticeable, meaningful speed improvements with HTTP/2 on pages with only 1 Javascript file.
But I'd like to introduce you/them to tight mode:
https://docs.google.com/document/d/1bCDuq9H1ih9iNjgzyAL0gpwN...
https://www.smashingmagazine.com/2025/01/tight-mode-why-brow...
paulddraper 5 months ago

Or small icon/image files.
Anyone remember those sprite files?
- cyberpunk 5 months ago
  
  You ever had to host map tiles? Those are the worst!
  
  SahAssar 4 months ago
  
  Indeed, there is a reason most mapping libraries still support specifying multiple domains for tiles. It used to be common practice to setup a.tileserver.test, b.tileserver.test, c.tileserver.test even if they all pointed to the same IP/server just to get around the concurrent request limit in browsers.
youngtaff 4 months ago

That’s not quite true… lots of small files still have the overhead of IPC in the browser

littlecranky67 4 months ago

One overlooked point is ephemeral source port exhaustion. If a load balancer forwards a HTTP connection to a backend system, it needs a TCP source port for the duration of that connection (not destination port, which is probably 80 or 443). That limits the number of outgoing connections to less than 65535. A common workaround is to use more outgoing IP addresses to the backends as source IPs, thus multiplying the available number of source ports to 65535 times number_of_ips.

HTTP/2 solves this, as you can multiplex requests to backend servers over a single TCP socket. So there is actually a point of using HTTP/2 for load_balancer <-> backend_system connections.

mike_d 4 months ago

This has been solved for 10+ years. Properly configure your load balancer with HTTP keep alive and piplining.
- littlecranky67 4 months ago
  
  HTTP keep-alive still limits the number of outgoing connections to 65535. Pipelining suffers from the known same issues addressed in the article.
  But I agree, it is a solved problem unless you really have a lot of incoming connections. When you use multiple outgoing ip addresses that fixes that even for very busy load balancers, and since IPv6 is common today you will likely have a /64 to draw addresses from.
  
  mike_d 4 months ago
  
  On modern systems you have about 28k ephemeral ports available. 65,535 is the total number of ports (good luck trying to use them all). Either way, if you have more than 20k connections open to a single backend (remember linux does connection tracking using the 4 tuple, so you can reuse a source port to different destinations) you are doing something seriously wrong and should hire competent network engineering folks.
  
  littlecranky67 4 months ago
  
  > Either way, if you have more than 20k connections open to backends you are doing something seriously wrong
  I don't see how that is a fringe or rare case. With a loadbalancer (using no pipelining or multiplexing), the number of simultaenous outgoing http connections to backend systems is at least the number of simulatenous open incoming http connections. Having more than 28k simultanous incoming http requests is not a lot for a busy load balancer.
  Now with pipelining (or limiting to 28k outgoing connections), the loadbalancer has to queue requests and multiplex them to the backends when connections become available. Pipelining suffers from head-of-line blocking, increasing possible latency caused by the loadbalancer further. In any case, you will increase latency to the end-user by queing. If you use HTTP/2 multiplexing, you can go past those 28k incoming connections without queing on the loadbalancer side.
  
  mike_d 4 months ago
  
  > (using no pipelining or multiplexing),
  "Doctor it hurts when I hit myself"
  > the number of simultaenous outgoing http connections to backend systems is at least the number of simulatenous open incoming http connections
  No it isn't. You establish a pool of long lived connections per backend. The load balancer should be coalescing in flight requests. At that traffic volume you should also be doing basic in-memory caches to sink things like favicon requests.
  I am not going to respond further as this chain is getting quite off topic. There are plenty of good resources available from relevant Google searches, but if you really still have questions about how load balancers work my email is in my profile.
  
  littlecranky67 4 months ago
  
  > You establish a pool of long lived connections per backend
  Yes, and you would do the same with HTTP/2. You haven't addressed the head-of-line blocking problem caused by HTTP/1.1 pipelining, which HTTP/2 completely solves. Head-of-line blocking becomes an increasing issue when your HTTP connections are long lived, such as when using websockets or large-media transfers or streaming.
  
  wtarreau 4 months ago
  
  It's amazing how people having visibly never dealt with high loads can instantly become vehement against those reporting a real issue.
  The case where ports are quickly exhausted is with long connections, typically WebSocket. And with properly tuned servers, reaching the 64k ports limit per server comes very quickly. I've seen several times the case where admins had to add multiple IP addresses to their servers just to hack around the limit, declaring each of them in the LB as if they were distinct servers. Also, even if Linux is now smart enough to try to pick a random port that's valid for your tuple, once your ports are exhausted, the connect() system call can cost quite a lot because it performs multiple tries until finding one that works. That's precisely what IP_BIND_ADDRESS_NO_PORT improves, by letting the port being chosen at the last moment.
  H2 allows to work around all this more elegantly by simply multiplexing multiple client streams into a single connection. And that's very welcome with WebSocket since usually each stream has little traffic. The network also sees much less packets since you can merge many small messages into a single packet. So there are cases where it's better.
  Another often overlooked point is that cancelling a download over H1 means breaking the connection. Over H2 you keep the connection opened since you simply send an RST_STREAM frame for that stream in the connection. The difference is important on the frontend when clients abort downloads multiple times per browsing session (you save the TLS setup again), but it can also make a difference on the backend, because quite often an aborted transfer on the front will also abort an H1 connection on the back, and then that's much less fun for your backend servers.
  
  mike_d 4 months ago
  
  > It's amazing how people having visibly never dealt with high loads
  I've built multiple systems at 1M+ r/s and Tb+ scale.
  > The case where ports are quickly exhausted is with long connections, typically WebSocket
  Yes, HTTP2 is great for websockets. I was never advocating against it. The comment I was replying to was under the false assumption that you needed an outbound backend connection for every incoming connection. All of his concerns are solved problems in any modern open source load balancer. See https://www.haproxy.com/blog/http-keep-alive-pipelining-mult... ;)
  
  wtarreau 4 months ago
  
  But it's the same for other long sessions such as slow downloads and git clones. Sites concerned by the number of source ports are not those dealing with just favicon.ico and bullet.png, but mainly those dealing with long transfers.
  Also there's a cascade effect on large sites, where as long as your servers respond fast, everything's OK. Suddenly a database experiences a hiccup, everything saturates, and once you enter the situation where the LB has all of its ports in use, it can take a while to recover because of connect() getting much slower (I already observed delays up to 50ms!). At this point there's no hope to recover in a sane time, because excess connections are not even served by the servers, they're in the accept queue in the system, so they keep a port busy, slowing down connect() which means more even connections are needed for other incoming requests. If the LB is not properly sized and tuned, you'd rather just kill it to get rid of all the connections at once, wait a second or two for the RST storm to calm down and start again.
  H2 can avoid that, at the expense of other issues I mentioned in another response above (i.e. don't multiplex too much to the servers, 5-10 streams max, to avoid the risk of inter-client HoL). But H2 also comes with higher xfer costs than H1 for large objects due to framing.

feyman_r 5 months ago

CDNs like Akamai still don’t support H2 back to origins.

That’s likely not because of the wisdom in the article per se, but because of rising complexity in managing streams and connections downstream.

monus 4 months ago

> bringing HTTP/2 all the way to the Ruby app server is significantly complexifying your infrastructure for little benefit.

I think the author wrote it with encryption-is-a-must in the mind and after he corrected those parts, the article just ended up with these weird statements. What complexity is introduced apart from changing the serving library in your main file?

hinkley 4 months ago

In a language that uses forking to achieve parallelism, terminating multiple tasks at the same endpoint will cause those tasks to compete. For some workflows that may be a feature, but for most it is not.
So that's Python, Ruby, Node. Elixir won't care and C# and Java... well hopefully the HTTP/2 library takes care of the multiplexing of the replies, then you're good.
- dgoldstein0 4 months ago
  
  A good python web server should be single process with asyncio , or maybe have a few worker threads or processes. Definitely not fork for every request
  
  hinkley 4 months ago
  
  Your response explains the other one, which I found just baffling.
  I didn't say forking per request, good god. I meant running a process per core, or some ratio to the cores to achieve full server utilization. Limiting all of HTTP/2 requests per user to one core is unlikely to result in good feelings for anybody. If you let nginx fan them out to a couple cores it's going to work better.
  These are not problems Java and C# have.
- neonsunset 4 months ago
  
  I don't think any serious implementation would do forking when using HTTP/2 or QUIC. Fork is a relic of the past.
byroot 4 months ago

You are correct about the first assumption, but even without encryption dealing with multiplexing significantly complexify things, so I still stand by that statement.
If you assume no multiplexing, you can write a much simpler server.
- nitely 4 months ago
  
  In reality you would build your application server on top of the HTTP/2 server, so you'd not have to deal with multiplexing, the server will hide that from you, so it's the same as an HTTP/1 server (ex: you pass some callback that gets called to handle the request). If you implement HTTP/2 from scratch, multiplexing is not even the most complex part... It's rather the sum of all parts: HPACK, flow-control, stream state, frames, settings, the large amount of validations, and so on.
  
  byroot 4 months ago
  
  This may be true with some stacks, but my answer has to be understood in the context of Ruby where the only real source of parallelism is `fork(2)`, hence the natural way to write server is an `accept` loop, which fits HTTP/1 very well, but not HTTP/2.
  
  nitely 4 months ago
  
  There is a gem that implements lightweight threads[0], and there is an HTTP/2 server that seems to abstract things out[1]. Your point probably still holds in the context of ruby + async + http/2; but then it's not http/2 fault, but rather ruby for not having a better concurrency story, like say golang.
  [0]: https://github.com/socketry/async
  [1]: https://github.com/socketry/falcon
  
  byroot 4 months ago
  
  The Ruby concurrency story is fine, the problem is parallelism. I have a whole list of posts about all that.
  > it's not http/2 fault, but rather ruby
  My post is to be read primarily in the context of Ruby, as the intro clearly explains it. I'm not the one who posted it here, it really isn't intended for the HN audience. I would never submit my posts here.
  Many of my points are more general than just Ruby-centric, but yes, if your stack of choice have very good support for HTTP/2 I'm not saying not to use it in your DC.
  My point is that as a Ruby user, there isn't much reason to lament over the lack of HTTP/2 support in Puma or some other servers.

immibis 4 months ago

If your load balancer is converting between HTTP/2 and HTTP/1.1, it's a reverse proxy.

Past the reverse proxy, is there a point to HTTP at all? We could also use SCGI or FastCGI past the reverse proxy. It does a better job of passing through information that's gathered at the first point of entry, such as the client IP address.

SJC_Hacker 4 months ago

Keeping everything HTTP makes testing a bit easier.

chucky_z 5 months ago

gRPC?

agf 5 months ago

Surprised not to see this mentioned in the article.
Lots of places (including a former employer) have done tons of work to upgrade internal infrastructure to support HTTP/2 just so they could use gRPC. The performance difference from JSON-over-HTTP APIs was meaningful for us.
I realize there are other solutions but this is a common one.
- 0x457 4 months ago
  
  Probably because it only works correctly outside of browser. Browsers don't support "native" grpc. You normally use something with specifically gRPC support rather than just h2 in a spherical vacuum.
nosequel 4 months ago

This entirely. When I first read the title, I thought, lets see what they say about gRPC. gRPC is so much nicer working across applications compared to simple REST servers/clients.

dangoodmanUT 4 months ago

Yet in experience I see massive speedups on my LOCALHOST going from 1.1 to 2, where are the numbers and tests OP?

gwbas1c 4 months ago

> So the main motivation for HTTP/2 is multiplexing, and over the Internet ... it can have a massive impact.

> But in the data center, not so much.

That's a very bold claim.

I'd like to see some data that shows little difference with and without HTTP/2 in the datacenter before I believe that claim.

0xbadcafebee 4 months ago

Datacenters don't typically have high latency, low bandwidth, and varying availability issues. If you have a saturated http/1.1 network (or high CPU use) within a DC you can usually just add capacity.
- gwbas1c 4 months ago
  
  Well, when you assume you make an ass out of u and me! (I get that this probably isn't a high-priority area to study.)
  > If you have a saturated http/1.1 network (or high CPU use) within a DC you can usually just add capacity.
  Then I'd be curious how much a lack of HTTP/2 support in the application layer costs. It might be small change for an early stage startup, but when companies get large, these corner cases end up paying peoples' salaries.

miyuru 5 months ago

The TLS requirement from HTTP2 also hindered http2 origin uptake. The TLS handshake adds latency and is unnecessary on some instances. (This is mentioned in heading "Extra Complexity" in the article)

tankenmate 5 months ago

For HTTP/3 you get 0-RTT however which largely mitigates this.
- stavros 5 months ago
  
  0-RTT resumption (unless I'm mistaken), which doesn't help with the first connection (but that might be OK).
  
  rochacon 4 months ago
  
  Correct, to achieve 0-RTT the application need to perform the handshake/certificate exchange at least once, otherwise, how would it encrypt the payload? This could be cached preemptively iirc, but it is not worth it.
  The problem will be that QUIC uses more userland code and UDP is not as optimized as TCP inside kernels. So far, the extra CPU penalty has discouraged me from adopting QUIC everywhere, I've kept it mostly on the edge-out where the network is far less reliable.
- bawolff 4 months ago
  
  Umm dont you get 0-RTT resumption in all versions of http? Its a TLS feature not an http feature. It does not require QUIC

awinter-py 5 months ago

plus in my experience some h2 features behave oddly with load balancers

I don't understand this super well, but could not get keepalives to cross the LB boundary w/ GCP

_ache_ 5 months ago

HTTP keepalive it a feature from HTTP 1.1, not HTTP2.
- awinter-py 5 months ago
  
  ping frames?

a-dub 5 months ago

i think it probably varies from workload to workload. reducing handshake time and header compression can have substantial effects.

it's a shame server side hunting/push never caught on. that was always one of the more interesting features.

KingMob 5 months ago

It didn't catch on because it was hard to see the benefits, and if done incorrectly, could actually make things slightly slower.
Essentially, the server had to do things like compute RTT and understand the status of the browser's cache to do optimal push.
- a-dub 5 months ago
  
  hmm, maybe the client could include a same origin cache state bloom filter in the request.
  although i suppose it's a solved problem these days.
  
  KingMob 5 months ago
  
  Bloom filters are small relative to the amount of data they can hash, but aren't realistic bloom filters still tens of kB at minimum? Might be too heavyweight to send up.
  
  a-dub 5 months ago
  
  1024 capacity, 1 in 1M false positive rate. (false positives fail safe- sending something the client already has), 3.6KB
  https://hur.st/bloomfilter/?n=1024&p=1.0E-6&m=&k=
  
  KingMob 4 months ago
  
  Ahh, nice. That's not too bad.

wczekalski 5 months ago

It is very useful for long lived (bidirectional) streams.

m00x 5 months ago

Only if you're constrained on connections. The reason that HTTP2 is much better for websites is because of the slow starts of TCP connections. If you're already connected, you don't suffer those losses, and you benefit from kernel muxing.
- 0x457 4 months ago
  
  You've missed the bidirectional part.
  
  nightpool 4 months ago
  
  How is http2 bidirectional streams better than websockets? I thought they were pretty much equivalent.
  
  0x457 4 months ago
  
  Well, IMO h2 streams are more flushed out and offer better control than websockets, but that's just my opinion. In fact, websockets are your only "proper" option if you want hat bidirectional stream be binary - browsers don't expose that portion of h2 to JS.
  Here is a silly thing that is possible with h2 over a single connection, but not with websockets:
  Multiple page components (Islands) each have their own stream of events over a single h2 connection. With websockets, you will need to roll your own multiplexing[1].
  [1]: I think you can multiplex multiple websockets over a single h2 connection tho, but don't quote me on this.
  
  nightpool 4 months ago
  
  Multiplexing websockets has always seemed trivial to me. Curious to learn more about h2 streams though! Support is coming to JS and has already landed on Firefox and Chrome: https://developer.mozilla.org/en-US/docs/Web/API/WebTranspor...

nitwit005 4 months ago

I'd agree it's not critical, but discard the assumption that requests within the data center will be fast. People have to send requests to third parties, which will often be slow. Hopefully not as slow as across the Atlantic, but still magnitudes worse than an internal query.

You will often be in the state where the client uses HTTP2, and the apps use HTTP2 to talk to the third party, but inside the data center things are HTTP1.1, fastcgi, or similar.

nightpool 4 months ago

Why does HTTP2 help with this? Load balancers use one keepalive connection per request and don't experience head of line blocking. And they have slow start disabled. So even if the latency of the final request is high, why would HTTP2 improve the situation?
- nitwit005 4 months ago
  
  If every request is quick, you can easily re-use connections, file handles, threads, etc. If requests are slow, you will often need to spin up new connections, as you don't want to wait for the response that might take hundreds of milliseconds.
  But I did start by saying it's not important. It's a small difference, unless you hit a connection limit.

sluongng 4 months ago

Hmm it’s weird that this submission and comments are being shown to me as “hours ago” while they are all 2 days old

layer8 4 months ago

https://news.ycombinator.com/item?id=26998309
wglb 4 months ago

Sometimes the moderators will effectively boost a post that they think is interesting so it gets more views.

kam1kazer 5 months ago

nah, I'm using HTTP/3 everywhere

najmlion 4 months ago

Http2 is needed for a GRPC route on OpenShift.

wiggidy 4 months ago

Yeah yeah, whatever, just make it work in the browser so I can do gRPC duplex streams, thank you very much.

spintin 4 months ago

[dead]

_ache_ 5 months ago

I remember been bashed on HN saying that HTTP is hard. Yet, I saw non-sens here in the comment about HTTP. The whole article is good but:

> HTTP/2 is fully encrypted, so you need all your application servers to have a key and certificate

Nope. h2c is a thing and is official. But the article is right, the value HTTP/2 provides isn't for LAN, so HTTP 1.1 or HTTP/2 it doesn't matter much.

HTTP/3 however, is fully encrypted. h3c doesn't exists. So yeah, HTTP3 slower you connection, it isn't suited for LAN and should not be used.

BUT if you actually want to encrypt even in you LAN, use HTTP/3, not HTTP/2 encrypted. You will have a small but not negligible gain from 0-RTT.

gmokki 5 months ago

I would not use http/3 for lan. Even the latest Linux kernels struggle with it. Http/1 aka TCP has fully supported encryption and other offload support. UDP consumes still much more CPU for same amount of traffic.
- _ache_ 5 months ago
  
  Do you have source for that? I'm very interested. There is no technical reason for UDP to be slower than TCP (at CPU level).
  The only field that is computed in UDP is checksum and the same exists in TCP and it must be recomputed each time someone actually re-route the packet (eg: bridge to VM) since TTL is decreased.
  So I doubt your assertion.
  _____
  Writing my comment I understood what your are talking about. There is a bunch of encryption done at user mode in HTTP/3 that doesn't need to be done in user mode. In HTTP/2 it was sometime done in kernel mode (kTTL), so was quicker. The slowness comes for the CPU needed it to be copied out of kernel mode. I didn't follow the whole story so I trust you on this.
  
  SahAssar 4 months ago
  
  Encryption is one thing (if you run kTLS which is still not done in most manual setups) but the much bigger IIRC is how much of the networking stack needs to run in userspace and has not been given the optimization love of TCP. If you compared non-kTLS h2 with non-kTLS h3 over a low-latency link the h2 connection could handle a lot more traffic compared to h3.
  That is not to say that h3 does not have its place, but the networking stacks are not optimized for it yet.
  
  _ache_ 4 months ago
  
  You are right my last sentence was an approximation. The gain of H3 over H2 is theoretical not practical.
  
  eqvinox 4 months ago
  
  > There is no technical reason for UDP to be slower than TCP (at CPU level).
  The technical reason is 30+ years of history of TCP being ≥90% of Internet traffic and services. There's several orders of magnitude in resources more spent to make TCP fast starting at individual symbols on Ethernet links all the way up into applications.
  
  _ache_ 4 months ago
  
  That is not a technical reason. And no. It doesn't work like that. UDP is quicker than TCP. It's not like 40 years of history will change that.
  
  eqvinox 4 months ago
  
  If you're so sure of this, why did you even ask to begin with?
  (Also, what is your definition of "quick"? I have no association for that particular wording, are you referring to achievable thruput, CPU load, latency, congestion control, …?)
  
  _ache_ 4 months ago
  
  Are you a troll? The question was about H1 quicker than H3, and the OP explains the difference by saying it was TCP quicker than UDP, which is not a reasonable explanation, so I asked for source. But it was all about encryption (and only encryption) and how it was handled within the kernel, and I understood re-reading the first comment (before event sending mine).
  Here "quicker" was about CPU load and well defined.
  Please re-read too.
  
  0x457 4 months ago
  
  https://docs.kernel.org/networking/tls-offload.html
  
  koakuma-chan 4 months ago
  
  https://news.ycombinator.com/item?id=41890784

Guthur 5 months ago

The RFC said "SHOULD not" not "MUST not" couldn't we have just ignored the 2 connection limit?

KingMob 5 months ago

That's what browsers actually did.
youngtaff 4 months ago

Browsers started ignoring the 2 connection limit on H/1.x long before H2 came along
petee 4 months ago

Came here to say the same thing; had they read the RFC they'd realize it's not actually a limit, just a suggestion - thats why its in the "Practical Considerations" section too.
Whomever downvoted you is probably unaware words like SHOULD have specific meaning in RFCs

kittikitti 4 months ago

If we ever get to adopting this, I will send every byte to a separate IPv6 address. Big Tech surveillance wouldn't work so many don't see a point like the author.

lmm 5 months ago

I think this post gets the complexity situation backwards. Sure, you can use a different protocol between your load balancer and your application and it won't do too much harm. But you're adding an extra protocol that you have to understand, for no real benefit.

(Also, why do you even want a load balancer/reverse proxy, unless your application language sucks? The article says it "will also take care of serving static assets, normalize inbound requests, and also probably fend off at least some malicious actors", but frankly your HTTP library should already be doing all of those. Adding that extra piece means more points of failure, more potential security vulnerabilities, and for what benefit?)

pixelesque 5 months ago

> Sure, you can use a different protocol between your load balancer and your application and it won't do too much harm. But you're adding an extra protocol that you have to understand, for no real benefit.
Well, that depends...
At a certain scale (and arguably, not too many people will ever need to think about this), using UNIX sockets (instead of HTTP TCP) between the application and load balancer can be faster in some cases, as you don't go through the TCP stack...
> Also, why do you even want a load balancer/reverse proxy, unless your application language sucks?
Erm... failover... ability to do upgrades without any downtime... it's extra complexity yes, but it does have some benefits...
- lmm 5 months ago
  
  > At a certain scale (and arguably, not too many people will ever need to think about this), using UNIX sockets (instead of HTTP TCP) between the application and load balancer can be faster in some cases, as you don't go through the TCP stack...
  Sure (although as far as I can see there's no reason you can't keep using HTTP for that). You can go even further and use shared memory (I work for a company that used Apache with Jk back in the day). But that's an argument for using a faster protocol because you're seeing a benefit from it, not an argument for using a slower protocol because you can't be bothered to implement the latest standard.
  
  tuukkah 5 months ago
  
  > using a slower protocol because you can't be bothered to implement the latest standard.
  I thought we were discussing HTTP/2 but now you seem to be invoking HTTP/3? It's even faster indeed but brings a whole lot of baggage with it. Nice comparison point though: Do you want to add the complexity of HTTP/2 or HTTP/3 in your backend? (I don't.)
  
  lmm 5 months ago
  
  > I thought we were discussing HTTP/2 but now you seem to be invoking HTTP/3?
  The article talks about HTTP/2 but I suspect they're applying the same logic to HTTP/3.
  > Do you want to add the complexity of HTTP/2 or HTTP/3 in your backend? (I don't.)
  I'd like to use the same protocol all the way through. I wouldn't want to implement any HTTP standard by hand (I could, but I wouldn't for a normal application), but I'd expect an established language to have a solid library implementation available.
harshreality 5 months ago

> why do you even want a load balancer/reverse proxy, unless your application language sucks?
Most load balancer/reverse proxy applications also handle TLS. Security-conscious web application developers don't want TLS keys in their application processes. Even the varnish authors (varnish is a load balancer/caching reverse proxy) refused to integrate TLS support because of security concerns; despite being reverse-proxy authors, they didn't trust themselves to get it right.
An application can't load-balance itself very well. Either you roll your own load balancer as a separate layer of the application, which is reinventing the wheel, or you use an existing load balancer/reverse proxy.
Easier failover with fewer (ideally zero) dropped requests.
If the app language isn't compiled, having it serve static resources is almost certainly much slower than having a reverse proxy do it.
- lmm 5 months ago
  
  > Security-conscious web application developers don't want TLS keys in their application processes.
  If your application is in a non-memory-safe language, sure (but why would you do that?). Otherwise I would think the risk is outweighed by the value of having your connections encrypted end-to-end. If your application process gets fully compromised then an attacker already controls it, by definition, so (given that modern TLS has perfect forward secrecy) I don't think you really gain anything by keeping the keys confidential at that point.
  
  nickelpro 5 months ago
  
  I write application servers for a living, mostly for Python but previously for other languages.
  Nobody, nobody, writes application servers with the intent of having them exposed to the public internet. Even if they're completely memory safe, we don't do DOS protections like checking for reasonable header lengths, rewriting invalid header fields, dropping malicious requests, etc. Most application servers will still die to slowloris attacks. [1]
  We don't do this because it's a performance hog and we assume you're already reverse proxying behind any responsible front-end server, which all implement these protections. We don't want to double up on that work. We implement the HTTP spec with as low overhead as possible, because we expect to have pipelined HTTP/1.1 connections from a load balancer or other reverse proxy.
  Your application server, Gunicorn, Twisted, Uvicorn, whatever, does not want to be exposed to the public internet. Do not expose it to the public internet.
  [1]: https://en.wikipedia.org/wiki/Slowloris_(cyber_attack)
  
  mr_person 5 months ago
  
  As someone who designs load-balancer solutions for a living I cannot agree with this more.
  I likewise assume that all servers are insecure, always, and we do not want them exposed without a sane load balancer layer.
  Your server was probably not made to be exposed to the public internet. Do not expose it to the public internet.
  
  SahAssar 4 months ago
  
  > Nobody, nobody, writes application servers with the intent of having them exposed to the public internet
  For rust, go, lua (via nginx openresty) and a few others this is a viable path. I probably wouldn't do it with node (or bun or deno), python, or similar but there are languages where in certain circumstances it is reasonable and might be better.
  
  nickelpro 4 months ago
  
  For Go, net/http is not something you should expose to the public internet, there's no secret sauce in there. It will just die to the first person to hit it with a slowloris or other DOS attack. Same with the common C++ options like boost.beast unless you're writing the logic yourself (but why bother? Just reverse proxy).
  I'm unfamiliar with the common rust frameworks for http, but find it unlikely the situation is very different.
  
  badmintonbaseba 4 months ago
  
  boost.beast also only talks HTTP/1.1 AFAIK, so if you want HTTP/2 or /3 on your frontend then you must put it behind a reverse proxy anyway.
  
  koakuma-chan 4 months ago
  
  > We don't do this because it's a performance hog and we assume you're already reverse proxying behind any responsible front-end server
  What application servers have you written? I have never seen an application server readme say DON'T EXPOSE DIRECTLY TO THE INTERNET, WE ASSUME YOU USE REVERSE PROXY.
  
  nickelpro 4 months ago
  
  Most of them have a disclaimer in their deployment or tutorial docs, some with more strong language than others. Again, nothing bad happens if you don't, we don't write memory vulnerabilities into these servers. You are just far more vulnerable to DOS attacks.
  * "We strongly recommend using Guincorn behind a proxy server" [1]
  * "As a general rule, you probably want to: ... run behind Nginx for self-hosted deployments." [2]
  * "A reverse proxy such as nginx or Apache httpd should be used in front of Waitress." [3]
  For some, like uWSGI, they don't even want to talk HTTP (uWSGI supports its own protocol) and it's just assumed you're using a dedicated webserver to talk to public traffic. [4]
  [1]: https://docs.gunicorn.org/en/latest/deploy.html
  [2]: https://www.uvicorn.org/deployment/
  [3]: https://flask.palletsprojects.com/en/stable/deploying/waitre...
  [4]: https://uwsgi-docs.readthedocs.io/en/latest/tutorials/Django...
  
  p_ing 4 months ago
  
  Of course, don't expose to the public Internet.
  Also don't expose plain text traffic to the internal corpnet where most attack originate from.
  
  kreetx 5 months ago
  
  You use a reverse proxy because whenever you "deploy to prod", you'll be using one anyway, thus by not having TLS in your app, you had not built something you don't actually need.
  
  harshreality 5 months ago
  
  Speculative execution cpu bugs. Or whatever the next class of problems is that can expose bits of process memory without software memory bugs.
  That's already a fringe case. Do you really think everyone's writing web applications in a language like rust without any unsafe (or equivalent)?
Galanwe 5 months ago

> Also, why do you even want a load balancer/reverse proxy, unless your application language sucks
- To terminate SSL
- To have a security layer
- To load balance
- To have rewrite rules
- To have graceful updates
- ...
- tuukkah 5 months ago
  
  - To host multiple frontends, backends and/or APIs under one domain name
  
  0x457 4 months ago
  
  > backends and/or APIs under one domain name
  On one IP, sure, for one domain you could use an API gateway.
  
  SahAssar 4 months ago
  
  API gateway is a fancy term for a configurable reverse proxy often bought as a service.
  
  0x457 4 months ago
  
  No, API gateway is a web service that has a main purpose of routing requests based on incoming requests.
  Load Balancer maing purpose is to...balance the load across multiple backends.
  Just because both can be implemented with a reverse proxy such as NGINX doesn't mean it's the same thing.
  
  SahAssar 4 months ago
  
  I said "API gateway is a fancy term for a configurable reverse proxy often bought as a service" and both "load balancer" and "API gateway" are common configurations of "configurable reverse proxy", often bought as a service.
  
  p_ing 4 months ago
  
  Many load balancers have this functionality within them, even ones from years ago that aren't around anymore like Microsoft ISA/TMG. They're not web services, but they can route based on requests.
- lmm 5 months ago
  
  > To terminate SSL
  To make sure that your connections can be snooped on over the LAN? Why is that a positive?
  > To have a security layer
  They usually do more harm than good in my experience.
  > To load balance
  Sure, if you're at the scale where you want/need that then you're getting some benefit from that. But that's something you can add in when it makes sense.
  > To have rewrite rules > To have graceful updates
  Again I would expect a HTTP library/framework to handle that.
  
  harshreality 5 months ago
  
  > To make sure that your connections can be snooped on over the LAN? Why is that a positive?
  No, to keep your app from having to deal with SSL. Internal network security is an issue, but sites that need multi-server architectures can't really be passing SSL traffic through to the application servers anyway, because SSL hides stuff that's needed for the load balancers to do their jobs. Many websites need load balancers for performance, but are not important enough to bother with the threat model of an internal network compromise (whether it's on the site owner's own LAN, or a bare metal or VPS hosting vlan).
  > Sure, if you're at the scale where you want/need that then you're getting some benefit from that. But that's something you can add in when it makes sense.
  So why not preface your initial claims by saying you trust the web app to be secure enough to handle SSL keys, and a single instance of the app can handle all your traffic, and you don't need high availability in failure/restart cases?
  That would be a much better claim. It's still unlikely, because you don't control the internet. Putting your website behind Cloudflare buys you some decreased vigilance. A website that isn't too popular or attention-getting also reduces the risk. However, Russia and China exist (those are examples only, not an exclusive list of places malicious clients connect from).
  
  lmm 5 months ago
  
  > So why not preface your initial claims by saying you trust the web app to be secure enough to handle SSL keys, and a single instance of the app can handle all your traffic, and you don't need high availability in failure/restart cases?
  Yeah, I phrased things badly, I was trying to push back on the idea that you should always put your app behind a load balancer even when it's a single instance on a single machine. Obviously there are use cases where a load balancer does add value.
  (I do think ordinary webapps should be able to gracefully reload/restart without losing connections, it really isn't so hard, someone just has to make the effort to code the feature in the library/framework and that's a one-off cost)
  
  Galanwe 5 months ago
  
  > > To terminate SSL
  > To make sure that your connections can be snooped on over the LAN? Why is that a positive?
  Usually your "LAN" uses whole link encryption, so that whatever is accessed in your private infrastructure network is encrypted (being postgres, NFS, HTTP, etc). If that is not the case, then you have to configure encryption at each service level, which is both error prone, time consuming, and not always possible. If that is not case then you can have internal SSL certificates for the traffic between RP and workers, workers and postgres, etc.
  Also you don't want your SSL server key to be accessible from business logic as much as possible, having an early termination and isolated workers achieves that.
  Also, you generally have workers access private resources, which you don't want exposed on your actual termination point. It's just much better to have a public termination point RP with a private iface sending requests to workers living in a private subnet accessing private resources.
  > > To have a security layer
  > They usually do more harm than good in my experience.
  Right, maybe you should detail your experience, as your comments don't really tell much.
  > To have rewrite rules
  > To have graceful updates
  > > Again I would expect a HTTP library/framework to handle that.
  HTTP frameworks handle routing _for themselves_, this is not the same as rewrite rules which are often used to glue multiple heterogeneous parts together.
  HTTP frameworks are not handling all the possible rewriting and gluing for the very reason that it's not a good idea to do it at the logic framework level.
  As for graceful updates, there's a chicken and egg problem to solve. You want graceful update between multiple versions of your own code / framework. How could that work without a third party balancing old / new requests to the new workers one at a time.
  
  fragmede 5 months ago
  
  You terminate SSL as close to the user as possible, because that round trip time is greatly going to affect the user experience. What you do between your load balancer and application servers is up to you, (read: should still be encrypted) but terminating SSL asap is about user experience.
  
  lmm 5 months ago
  
  > You terminate SSL as close to the user as possible, because that round trip time is greatly going to affect the user experience. What you do between your load balancer and application servers is up to you, (read: should still be encrypted) but terminating SSL asap is about user experience.
  That makes no sense. The latency from your load balancer to your application server should be a tiny fraction of the latency from the user to the load balancer (unless we're talking about some kind of edge deployment, but at that point it's not a load balancer but some kind of smart proxy), and the load balancer decrypting and re-encrypting almost certainly adds more latency compared to just making a straight connection from the user to the application server.
  
  ndriscoll 4 months ago
  
  Say your application and database are in the US West and you want to serve traffic to EU or AUS, or even US East. Then you want to terminate TCP and TLS in those regions to cut down on handshake latency, slow start time, etc. Your reverse proxy can then use persistent TLS connections back to the origin so that those connection startup costs are amortized away. Something like nginx can pretty easily proxy like 10+ Gb/s of traffic and 10s of thousands of requests per second on a couple low power cores, so it's relatively cheap to do this.
  Lots of application frameworks also just don't bother to have a super high performance path for static/cached assets because there's off-the-shelf software that does that already: caching reverse proxies.
  
  fragmede 5 months ago
  
  It depends on your deployment and where your database and app servers and POPs are. If your load balancer is right next to your application server; is right next to your database, you're right. And it's fair to point out that most people have that kind of deployment. However there are some companies, like Google, that have enough of a presence that the L7 load balancer/smart proxy/whatever you want to call it is way closer to you, Internet-geographically, than the application server or the database. For their use case and configuration, your "almost certainly" isn't what was seen emperically.
  
  ahoka 5 months ago
  
  You usually re-encrypt your traffic after the GW, either by using an internal PKI and TLS or some kind of encapsulation (IPSEC, etc).
  Security and availability requirements might vary, so much to argue about. Usually you have some kind of 3rd party service you want to hide, control CORS, Cache-Control, etc headers uniformly, etc. If you are fine with 5-30 minutes of outage (or until someone notices and manually restores service), then of course you don’t need to load balance. But you can imagine this not being the case at most companies.
  
  unification_fan 5 months ago
  
  Tell me you never built an infrastructure without telling me you never built an infrastructure
  The point being that all the code on the stack is not necessarily yours
  
  lmm 5 months ago
  
  I've built infrastructure. Indeed I've built infrastructure exactly like this, precisely because maintaining encryption all the way to the application server was a security requirement (this was a system that involved credit card information). It worked well.
toast0 5 months ago

Load balancers are nice to have if you want to move traffic from one machine to another. Which sometimes needs to happen even if your application language doesn't suck and you can hotload your changes... You may still need to manage hardware changes, and a load balancer can be nice for that.
DNS is usable, but some clients and recursive resolvers like to cache results for way beyond the TTL provided.
guappa 5 months ago

C fast, the rest slow. You don't want to serve static assets in non-C.
- SahAssar 4 months ago
  
  If you call sendfile with kTLS I imagine it'd be fast in any language
tuukkah 5 months ago

Answers from the article - the "extra" protocol is just HTTP/1.1 and the reason for a load balancer is the ability to have multiple servers:
> But also the complexity of deployment. HTTP/2 is fully encrypted, so you need all your application servers to have a key and certificate, that’s not insurmountable, but is an extra hassle compared to just using HTTP/1.1, unless of course for some reasons you are required to use only encrypted connections even over LAN.
> So unless you are deploying to a single machine, hence don’t have a load balancer, bringing HTTP/2 all the way to the Ruby app server is significantly complexifying your infrastructure for little benefit.
- conradludgate 5 months ago
  
  I've deployed h2c (cleartext) in many applications. No tls complexity needed
  
  tuukkah 5 months ago
  
  Good to know - neither the parent nor the article mention this. h2c seems to have limited support by tooling (e.g. browsers, curl), which is a bit discouraging.
  EDIT: Based on the HTTP/2 FAQ, pure h2c is not allowed in the standard as it requires you to implement some HTTP/1.1 upgrade functionality: https://http2.github.io/faq/#can-i-implement-http2-without-i...
  
  _ache_ 5 months ago
  
  Why do you think that curl doesn't support h2c ?
  It does. Just use `--http2` or `--http2-prior-knowledge`, curl deduce the clear or not clear by `http` or `https` URL protocol prefix (clear the default).
  
  tuukkah 5 months ago
  
  I said limited support and gave curl as an example because curl --http2 sends a HTTP/1.1 upgrade request first so fails in a purely HTTP/2 environment.
  Thanks for bringing up --http2-prior-knowledge as a solution!