The article seems to make an assumption that the application backend is in the same datacenter as the load balancer, which is not necessarily true: people often put their load balancers at the network edge (which helps reduce latency when the response is cached), or just outsource those to a CDN vendor.
> In addition to the low roundtrip time, the connections between your load balancer and application server likely have a very long lifetime, hence don’t suffer from TCP slow start as much, and that’s assuming your operating system hasn’t been tuned to disable slow start entirely, which is very common on servers.
A single HTTP/1.1 connection can only process one request at a time (unless you attempt HTTP pipelining), so if you have N persistent TCP connections to the backend, you can only handle N concurrent requests. Since all of those connections are long-lived and are sending at the same time, if you make N very large, you will eventually run into TCP congestion control convergence issues.
Also, I don't understand why the author believes HTTP/2 is less debuggable than HTTP/1; curl and Wireshark work equally well with both.
I think the more common architecture is for edge network to terminate SSL, and then transmit to the load balancer which is actually in the final data center? In which case you can http2 or 3 on both those hops without requiring it on the application server.
That said I still disagree with the article's conclusion: more connections means more memory so even within the same dc, there should be benefits of http2. And if the app server supports async processing, there's value in hitting it with concurrent requests to make the most of its hardware, and http1.1 head of line blocking really destroys a lot of possible perf gains when the response time is variable.
I suppose I haven't had a true bake off here though - so it's possible the effect of http2 in the data center is a bit more marginal than I'm imagining.
The maximum number of connections thing in HTTP/1 always makes me think of queuing theory, which gives surprising conclusions like how adding a single extra teller at a one-teller bank can cut wait times by 50 times, not just by 2.
However, I think the problem is the Poisson process isn't really the right process to assume. Most websites which would run afoul of the 2/6/8/etc connections being opened are probably trying to open up a lot of connections at the same time. That's very different from situations where only 1 new person arrives every 6 minutes on average, and 2 new people arriving within 1 second of each other is a considerably rarer event.
And if memory serves if you care about minimizing latency you want all of your workers running an average of 60% occupied. (Which is also pretty close to when I saw P95 times dog-leg on the last cluster I worked on).
Most analyses I've read say the threshold is around the 80% mark [1], although it depends on how model the distribution, and there's nothing magical about the number. The main thing is to avoid getting close to 100%, because wait times go up exponentially as you get closer to the max.
Little's Law is fundamental to queueing theory, but there's also the less well-known Kingman's formula, which incorporates variability of arrival rate and task size [2].
Really both of those models show 60% as about the limit to where you're still effectively at the baseline for latency. 80% is just about the limit to where you're up there in the exponential rise, any higher and things become unusable.
0-60 and you're still at minimum latency. 60-80 you're at twice the latency but it's probably worth the cost savings of the extra compute density since it's still pretty low. Higher than 80 and things are already slowing down and getting exponentially worse by the request
Could be that, could also be the people taking a long time aren't at least causing a bottleneck (assuming there arent two of them at the same time). So you have this situation like this: first person takes 10 minutes, while there are 9 waiting in line that take only one minute a piece. With one teller, average wait time is ~15 minutes. With two tellers, its now ~5 minutes.
Which is why it is highly annoying when there's only one worker at the coffee stand, and there's always this one jerk at the front of the queue who orders a latte when you just want a coffee. With two workers, the people who just want coffee won't have to wait 15 minutes for the lattee people
And I've also noticed a social effect, when people wait a long time it seems to reinforce how they perceive the eventual serviced, that is, they want more out of the interaction, so take longer. Which makes it the situation even worse
> there's always this one jerk at the front of the queue
Here in the espresso world, that’s not so bad. But the ‘vanilla oat milk decaf, and also a hot muffin with butter’ is tedious.
There is a roaster in Auckland that’s been there since the ‘80s. On the counter it says ‘espresso, flat white or fuck off’. Clear and concise. I like it.
https://millerscoffee.co.nz/
There was a discussion not tool long ago about modern banks still with archaic practices. I have accounts at two different banks, and if I make a transfer request before 1:45PT, it is counted as same day. That makes no damn sense to me why that's a limitation at all today. It's not like a human needs to look at it, but even so, why the 1:45PT cutoff? Is it because it is 4:45ET? Then why not list it as that? And why does a banking computer system care about timezones or bankers' hours at all. It's all just mind boggling lame
It's because when you do transfers, the banks will reconcile their accounts at the end of the day (e.g, if one bank deposits more to another, they will need to make up the difference with their own capital).
These cutoff means banks have certainty about the transaction, as these reconciliation is batched, rather than real time.
I know my father sometimes had to take 1 am phone calls because the insurance industry runs a lot of batch processing overnight, when the systems aren't competing with OLTP traffic. Banking software may be built the same way.
I find a lot of value in being able to get a water or a coffee, use the restroom, have sidebar conversations with fellow employees, begrudgingly attend meetings, or take a walk to stretch my legs for a minute and think, personally.
Almost every web forum enters a phase where participants bring in their pet politics into unrelated discussions. Whether they last or not depends entirely on whether the flamebait/troll creates a large reply structure or a non-existent one. This is why shadowbans are more effective than large groups of people responding angrily. Or, to cite the Deep Magic: "don't feed the trolls".
Personally, I'd like to see more HTTP/2 support. I think HTTP/2's duplex streams would be useful, just like SSE. In theory, WebSockets do cover the same ground, and there's also a way to use WebSockets over HTTP/2 although I'm not 100% sure how that works. HTTP/2 though, elegantly handles all of it, and although it's a bit complicated compared to HTTP/1.1, it's actually simpler than WebSockets, at least in some ways, and follows the usual conventions for CORS/etc.
The problem? Well, browsers don't have a JS API for bidirectional HTTP/2 streaming, and many don't see the point, like this article expresses. NGINX doesn't support end-to-end HTTP/2. Feels like a bit of a shame, as the streaming aspect of HTTP/2 is a more natural evolution of the HTTP/1 request/response cycle versus things like WebSockets and WebRTC data channels. Oh well.
Duplex streams are not really a HTTP/2-only feature. You can do the same bidirectional streaming with HTTP/1.1 too. The flow is always: 1. The client sends a header set. 2. It can then start to stream data in the form of an unlimited-length byte-stream to the server. 3. The server starts to send a header set back to the client. 4. The server can then start to stream data in the form of an unlimited-length byte-stream to the client.
There is not even a fixed order between 2) and 3). The server can start sending headers or body data before the client sent any body byte.
What is correct is that a lot of servers and clients (including javascript in browsers) don't support this and make stricter assumptions regarding how HTTP requests are used - e.g. that the request bytes are fully sent before the response happens. I think ReadableStream/WritableStream APIs on browsers were supposed to change that, but I haven't followed the progress in the last few years.
NGINX falls into the same category. It's HTTP/2 support (and gRPC support) had been built with a very limited use-case in mind. That's also why various CDNs and service meshes use different kinds of HTTP proxies - so that various streaming workloads don't break in case way the protocol is used is not strictly request->response.
No browser I'm aware of is planning on allowing the request and response bodies to be streamed simultaneously for the same request using ReadableStream and WriteableStream. When using streaming request bodies, you have to set the request explicitly to half-duplex.
Anyways, yes, this is technically true, but the streaming semantics are not really that well-defined for HTTP/1.1, probably because it was simply never envisioned. The HTTP/1.1 request and response were viewed as unary entities and the fact that their contents were streamed was mostly an implementation detail. Most HTTP/1.1 software, not just browsers, ultimately treat the requests and responses of HTTP as different and distinct phases. For most uses of HTTP, this makes sense. e.g. for a form post, the entire request entity is going to need to be read before the status can possibly be known.
Even if we do allow bidirectional full-duplex streaming over HTTP/1.1, it will block an entire TCP connection for a given hostname, since HTTP/1.1 is not multiplexed. This is true even if the connection isn't particularly busy. Obviously, this is still an issue even with long-polling, but that's all the more reason why HTTP/2 is simply nicer.
NGINX may always be stuck in an old school HTTP/1 mindset, but modern software like Envoy shows a lot of promise for how architecting around HTTP/2 can work and bring advantages while remaining fully backwards compatible with HTTP/1 software.
HTTP2 works great on the LAN, or if you have really good network.
It starts to really perform badly when you have dropped packets. So any kind of medium quality wifi or 4/5g kneecaps performance.
It was always going to do this, and as webpages get bigger, the performance degradation increases.
HTTP2 fundamentally underperforms in the real world, and noticeably so on mobile. (My company enthusiastically rolled out http2 support when akamai enabled it.)
Personally I feel that websockets are a hack, and frankly HTTP 3 should have been split into three: a file access protocol, a arbitrary TCP like pipe and a metadata channel. But web people love hammering workarounds onto workarounds. so we are left with HTTP3
HTTP/2, in my experience, still works fine on decent connections, but the advantages definitely start to level out as the connection gets worse. HTTP/2 definitely has some inherent disadvantages over HTTP/1 in those regards. (Though it depends on how much you are constrained by bandwidth vs latency, to be sure.)
However, HTTP/3 solves that problem and performs very well on both poor quality and good quality networks.
Typically, I use HTTP/2 to refer to both HTTP/2 and HTTP/3 since they are basically the same protocol with different transports. Most people don't really need to care about the distinction, although I guess since it doesn't use TCP there are cases where someone may not be able to establish an HTTP/3 connection to a server. Still, I think the forward looking way to go is to try to push towards HTTP/3, then fall back to HTTP/2, and still support HTTP/1.1 indefinitely for simple and legacy clients. Some clients may get less than ideal performance, but you get the other benefits of HTTP/2 on as many devices as possible.
> HTTP 3 should have been split into three: a file access protocol, a arbitrary TCP like pipe and a metadata channel
HTTP3 is basically just HTTP2 on top of QUIC… so you already have the tcp-like pipe, it’s called QUIC. And there’s no reason to have a metadata channel when there are already arbitrary separate channels in QUIC itself.
If you want to stream data inside a HTTP body (of any protocol), then the ReadableStream/WritableStream APIs would be the appropriate APIs (https://developer.mozilla.org/en-US/docs/Web/API/Streams_API) - however at least in the past they have not been fully standardized and implemented by browsers. Not sure what the latest state is.
WebTransport is a bit different - it offers raw QUIC streams that are running concurrently with the requests/streams that carry the HTTP/3 requests on shared underlying HTTP/3 connections and it also offers a datagram API.
I think the problem is that duplex communication on the web is rarely useful except in some special cases, and usually harder to scale as you have to keep state around and can't as easily rotate servers.
Some applications it is important but for most websites the benefits just dont outweigh the costs.
It seems like the author is agreeing that HTTP/2 is great (or at least good) for browser -> web server communication, but not useful for the REST-style APIs that pervade modern app design. He makes a good case, but HTTP was never really a good choice for API transport _either_, it just took hold because it was ubiquitous.
The big difference for simple applications is that Caddy is easier to set up, and nginx has a smaller memory footprint. Performance is similar between the two.
AFAIK both proxies are capable of serving at line rate for 10Gbps or more at millions of concurrent connections. I can't possibly see how performance would significantly differ if they're properly configured.
nginx's memory footprint is tiny for what it delivers. A common pattern I see for homelab and self-hosted stuff is a lightweight bastion VPS in a cloud somewhere proxying requests to more capabile on-premise hardware over a VPN link. Using a cheap < $5mo means 1GB or less of RAM, so you have to tightly watch what is running on that host.
1 GB should be way more than either should need. I run nginx, unbound, postfix, dovecot plus all the normal suff (ssh, systemd, etc) for a Linux system on a VPS w/ 500MB of RAM. Currently the system has ~270MB used. It actually has 1GB available due to a plan auto-upgrade but have never bothered as I just don't need it.
1GB for a VPC that runs an HTTP load balancer/reverse proxy and a handful of IPsec or Wireguard tunnels back to the app servers (origin) is overkill. You could successfully run that in 512MB, and probably even 256MB. (That's the scenario described).
What needs to run on this that's a memory hog making 512MB too small? By my (very rough) calculations youd need 50-100MB for kernel + systemd + sshd + nginx base needs + tunnels home. That leaves the rest for per-request processing.
Each request starts needing enough RAM to parse the https headers into a request object, open a connection back to the origin, and buffer a little bit of traffic that comes in while that request is being processed/origin connection opens. After that you only need to maintian 2 connections plus some buffer space - Generously 50KB initially and 10KB ongoing. There's enough space for a thousand concurrent requests in the ram not used by the system. Proxying is fairly cheap - the app servers (at the origin) may need much much more, but that's not the point of the VPS being discussed.
Also worth noting that the cheap VPS is not a per-project cost - that is the reverse proxy that handles all HTTP traffic into your homelab.
The Go crowd, like the Rust crowd, likes to advertise the language their software is written in. I agree that that specific sentence is a bit ambiguous, though, as if it's some kind of middleware that hooks into Go applications.
It's not, it's just another standalone reverse proxy.
Terraform providers seem to work pretty well, but as far as I know, they're basically separate executables and the main process communicates with them using sockets.
> The Go crowd, like the Rust crowd, likes to advertise the language their software is written in.
Probably because end users appreciate that usually that means a single binary + config file and off you go. No dependency hell, setting up third party repos, etc.
> Probably because end users appreciate that usually that means a single binary + config file and off you go. No dependency hell, setting up third party repos, etc.
Until you have to use some plugin (e.g. cloudflare to manage DNS for ACME checks), now it's exactly "dependency hell, setting up third party repos, etc."
I also fully expect to see a few crashes from unchecked `err` in pretty much any Go software. Also, nginx qualifies for `single binary + config`, it's just NGINX is for infra people and Caddy is for application developers.
Actually, all of it applies to rust. The only stable ABI in Rust is C-ABI and IMO at that point it stops being rust. Even dynamically loading rustlib in rust application is unsafe and only expected to work when both compiled with the same version. In plugins context, it's the same as what Caddy making you do.
However, Rust Evangelical Strike Force successfully infiltrated WASM committee and when WASM Components stabilize, it can be used for plugins in some cases (see Zed and zellij). (Go can use them as well, rust is just the first (only?) to support preview-2 components model.
Yeah, I don't really do dynamic loading in my corner of Rust. And I can always target some MSRV, cargo package versions, and be happy with it. Definitely beats the dependency hell I've had to deal with elsewhere
Don't get me wrong, I love rust and use it almost every day. Doing `cargo run` in a project it handles everything is good. This gets lost once you start working in a plugin context. Because now you're not dealing in your neatly organized workplace, you're working across multiple workplaces from different people.
IIRC it's more than just MSRV or even matching version exactly. It also requires flags that were used to compile rustc match (there is an escape hatch tho).
It shouldn't, which is why I think wording there is strange. Nginx doesen't market itself as "platform to serve your sites, services, and apps, written in C". Reading the first sentence I don't even know what Caddy is, what does a platform mean in this context? Arriving on Nginx's site the first sentence visible to me is
>nginx ("engine x") is an HTTP web server, reverse proxy, content cache, load balancer, TCP/UDP proxy server, and mail proxy server.
Back when Caddy first came out over 10 years ago, the fact that it was written in Go was just simply more notable. For Go, it also at least tells you the software is in a memory-safe programming language. Now neither of those things is really all that notable, for new software.
I didn't even read the article, but I love the comments on the thread.
Yes. The implementation language of a system should not matter to people in the least. However, they are used as a form of prestige by developers and, sometimes, as a consumer warning label by practitioners.
There's certainly some aspect of that going on, but I think mainly it's just notable when you write something in a programming language that is relatively new.
Does it matter? In theory no, since you can write pretty much anything in pretty much any language. In practice... It's not quite that black and white. Some programming languages have better tooling than others; like, if a project is written in pure Go, it's going to be a shitload easier to cross compile than a C++ project in most cases. A memory-safe programming language like Go or Rust will tell you about the likely characteristics of the program: the bugs are not likely to be memory or stack corruption bugs since most of the code can't really do that. A GC'd language like Go or Java will tell you that the program will not be ideal for very low latency requirements, most likely. Some languages, like Python, are languages that many would consider easy to hack on, but on the other hand a program written in Python probably doesn't have the best performance characteristics, because CPython is not the fastest interpreter. The discipline that is encouraged by some software ecosystems will also play a role in the quality of software; let's be honest, everyone knows that you CAN write quality software in PHP, but the fact that it isn't easy certainly says something. There's nothing wrong with Erlang but you may need to learn about deploying BEAM in production before actually using Erlang software, since it has its own unique quirks.
And this is all predicated on the idea that nobody ever introduces a project as being "written in C." While it's definitely less common, you definitely do see projects that do this. Generally the programming language is more of a focus for projects that are earlier in their life and not as refined as finished products. I think one reason why it was less common in the past is because writing that something is written in C would just be weird. Of course it's written in C, why would anyone assume otherwise? It would be a lot more notable, at that point, if it wasn't.
I get why people look at this in a cynical way but I think the cynical outlook is only part of the story. In actuality, you do get some useful information sometimes out of knowing what language something is written in.
Python certainly was too, back in the day. It feels like it's roughly a "first 10 years of the language" thing, maybe stretched another 5 if there's an underdog aspect (like being interpreted.)
For an open source product, it's fun to say "written in X language". It also advertises the project to developers who may be willing to contribute.
If you put "product made with Go", I'm not going to contribute as I don't know Go, though that wouldn't prevent me from using it should it fit my needs. But if you wrote your project in .NET, I may certainly be willing to contribute.
There is a distinct lack of elegance in the HTTP/2 protocol. It's exceptionally complex and it has plenty of holes in it. That it simply does a job does not earn it "elegant."
Honestly, I don't understand this critique. The actual protocol is pretty straight-forward for what it does. I'm not sure it can be much simpler given the inflexible requirements. I find it more elegant than HTTP/1.1.
Versus HTTP/1.1, some details are simplified by moving the request and status line parts into headers. The same HEADERS frame type can be used for both the headers and trailers on a given stream. The framing protocol itself doesn't really have a whole lot of cruft, and versus HTTP/1 it entirely eliminates the need for the dancing around with Content-Length, chunked Transfer-Encoding, and trailers.
In practice, a lot of the issues around HTTP/2 implementations really just seem to be caused by trying to shoehorn it into existing HTTP/1.1 frameworks, where the differences just don't mesh very well (e.g. Go has some ugly problems here) or just simply a lack of battle-testing due to trouble adopting it (which I personally think is mainly caused by the difficulty of configuring it. Most systems will only use HTTP/2 by default over TLS, after all, so in many cases end-to-end HTTP/2 wasn't being tested.)
From the perspective of a user, where it starts to seem inelegant is when grpc comes into the picture. You get grpc to function but then plain http traffic breaks and vice versa. It seems to be odd implementation details on specific load balancer products. When in theory, all of it should operate the same way, but it doesn’t.
grpc doesn't really do anything special on top of http/2. Load balancers that are aware of http/2 on both sides shouldn't have any trouble with either.
The problem that people run into load balancing grpc is that they try to use a layer 4 load balancer to balance layer 7 requests; that is, if there are 4 backends, the load balancer tells you the address of one of them, and then you wonder why the other 3 backends don't get 25% of the traffic. That's because grpc uses 1 TCP connection and it sends multiple requests over that connection ("channel"). If your load balancer tells you the addresses of all 4 servers, then you can open up 4 channels and load balance inside your application (this was always the preferred approach at google, with a control channel to gracefully drain certain backends, etc.). If your load balancer is aware of http/2 at the protocol level (layer 7), then you open up one channel to your load balancer, which already has one channel for each backend. When a request arrives, it inspects it and picks a backend and proxies the rest of the exchange.
Ordinary http/2 works like this, it's just that you can get away with a network load balancer because http clients open new connections more regularly (consider the lifetime of a browser page with the lifetime of a backend daemon). Each new connection is a load balancing opportunity for the naive layer 4 balancer. If you never make new connections, then it never has an opportunity to load balance.
grpc has plenty of complexity for "let applications do their own load balancing", including built-in load balancing algorithms and built-in service discovery and health discovery (xDS); http/2 doesn't have any of this. Whether these are actually part of grpc or just random add-ons to popular client libraries is somewhat up for debate, however.
The same arguments apply to WebSockets probably— yes the implementation is a little hairy, but if the end result is a clean abstraction that does a good job of hiding that complexity from the rest of the stack, then it's elegant.
Do you want to risk the complexity and potential performance impact from the handshake that the HTTP/2 standard requires for non-encrypted connections? Worst case, your client and server toolings clash in a way that every request becomes two requests (before the actual h2c request, a second one for the required HTTP/1.1 upgrade, which the server closes as suggested in the HTTP/2 FAQ).
If you're going that route, you may as well just do HTTPS again. If you configure your TLS cookies and session resumption right, you'll get all of the advantages of fancy post-quantum crypto without having to go back to the days of manually setting up encrypted tunnels like when IPSec did the rounds.
There's a security angle: Load balancers have big problems with request smuggling. HTTP/2 does something to the picture, maybe someone is more up to date if it's currently better or worse?
This is why I configured my company's AWS application load balancer to disable HTTP2 when I first saw the linked post, and haven't changed that configuration since then. Unless we have definitive confirmation that all major load balancers have fixed these vulnerabilities, I'll keep HTTP2 disabled, unless I can figure out how to do HTTP2 between the LB and the backend.
A h2 proxy usually wouldn't proxy through the http2 connection, it would instead accept h2, load-balance each request to a backend over a h2 (or h1) connection.
The difference is that you have a h2 connection to the proxy, but everything past that point is up to the proxies routing. End-to-end h2 would be more like a websocket (which runs over HTTP CONNECT) where the proxy is just proxying a socket (often with TLS unwrapping).
> A h2 proxy usually wouldn't proxy through the http2 connection, it would instead accept h2, load-balance each request to a backend over a h2 (or h1) connection.
Each connection need to keep state of all processed requests (the HPACK dynamic headers table), so all request for a given connection need to be proxied through the same connection. Not sure I got what you meant, though.
Apart from that, I think the second sentence of my comment makes clear there is no smuggling as long as the connection before/past proxy is http2, and it's not downgraded to http1. That's all that I meant.
Personally, this lack of support doesn’t bother me much, because the only use case I can see for it, is wanting to expose your Ruby HTTP directly to the internet without any sort of load balancer or reverse proxy, which I understand may seem tempting, as it’s “one less moving piece”, but not really worth the trouble in my opinion.
The amusing thing is that HTTP/2 is mostly useful for sites that download vast numbers of tiny Javascript files for no really good reason. Like Google's sites.
Indeed, there is a reason most mapping libraries still support specifying multiple domains for tiles. It used to be common practice to setup a.tileserver.test, b.tileserver.test, c.tileserver.test even if they all pointed to the same IP/server just to get around the concurrent request limit in browsers.
> bringing HTTP/2 all the way to the Ruby app server is significantly complexifying your infrastructure for little benefit.
I think the author wrote it with encryption-is-a-must in the mind and after he corrected those parts, the article just ended up with these weird statements. What complexity is introduced apart from changing the serving library in your main file?
In a language that uses forking to achieve parallelism, terminating multiple tasks at the same endpoint will cause those tasks to compete. For some workflows that may be a feature, but for most it is not.
So that's Python, Ruby, Node. Elixir won't care and C# and Java... well hopefully the HTTP/2 library takes care of the multiplexing of the replies, then you're good.
A good python web server should be single process with asyncio , or maybe have a few worker threads or processes. Definitely not fork for every request
You are correct about the first assumption, but even without encryption dealing with multiplexing significantly complexify things, so I still stand by that statement.
If you assume no multiplexing, you can write a much simpler server.
Datacenters don't typically have high latency, low bandwidth, and varying availability issues. If you have a saturated http/1.1 network (or high CPU use) within a DC you can usually just add capacity.
Surprised not to see this mentioned in the article.
Lots of places (including a former employer) have done tons of work to upgrade internal infrastructure to support HTTP/2 just so they could use gRPC. The performance difference from JSON-over-HTTP APIs was meaningful for us.
I realize there are other solutions but this is a common one.
Probably because it only works correctly outside of browser. Browsers don't support "native" grpc. You normally use something with specifically gRPC support rather than just h2 in a spherical vacuum.
This entirely. When I first read the title, I thought, lets see what they say about gRPC. gRPC is so much nicer working across applications compared to simple REST servers/clients.
The TLS requirement from HTTP2 also hindered http2 origin uptake. The TLS handshake adds latency and is unnecessary on some instances. (This is mentioned in heading "Extra Complexity" in the article)
Correct, to achieve 0-RTT the application need to perform the handshake/certificate exchange at least once, otherwise, how would it encrypt the payload? This could be cached preemptively iirc, but it is not worth it.
The problem will be that QUIC uses more userland code and UDP is not as optimized as TCP inside kernels. So far, the extra CPU penalty has discouraged me from adopting QUIC everywhere, I've kept it mostly on the edge-out where the network is far less reliable.
Google measured their bandwidth usage and discovered that something like half was just HTTP headers! Most RPC calls have small payloads for both requests and responses.
HTTP/2 compresses headers, and that alone can make it worthwhile to use throughout a service fabric.
Bloom filters are small relative to the amount of data they can hash, but aren't realistic bloom filters still tens of kB at minimum? Might be too heavyweight to send up.
Only if you're constrained on connections. The reason that HTTP2 is much better for websites is because of the slow starts of TCP connections. If you're already connected, you don't suffer those losses, and you benefit from kernel muxing.
Well, IMO h2 streams are more flushed out and offer better control than websockets, but that's just my opinion. In fact, websockets are your only "proper" option if you want hat bidirectional stream be binary - browsers don't expose that portion of h2 to JS.
Here is a silly thing that is possible with h2 over a single connection, but not with websockets:
Multiple page components (Islands) each have their own stream of events over a single h2 connection. With websockets, you will need to roll your own multiplexing[1].
[1]: I think you can multiplex multiple websockets over a single h2 connection tho, but don't quote me on this.
If your load balancer is converting between HTTP/2 and HTTP/1.1, it's a reverse proxy.
Past the reverse proxy, is there a point to HTTP at all? We could also use SCGI or FastCGI past the reverse proxy. It does a better job of passing through information that's gathered at the first point of entry, such as the client IP address.
I would not use http/3 for lan. Even the latest Linux kernels struggle with it. Http/1 aka TCP has fully supported encryption and other offload support.
UDP consumes still much more CPU for same amount of traffic.
Do you have source for that? I'm very interested.
There is no technical reason for UDP to be slower than TCP (at CPU level).
The only field that is computed in UDP is checksum and the same exists in TCP and it must be recomputed each time someone actually re-route the packet (eg: bridge to VM) since TTL is decreased.
So I doubt your assertion.
_____
Writing my comment I understood what your are talking about. There is a bunch of encryption done at user mode in HTTP/3 that doesn't need to be done in user mode. In HTTP/2 it was sometime done in kernel mode (kTTL), so was quicker. The slowness comes for the CPU needed it to be copied out of kernel mode. I didn't follow the whole story so I trust you on this.
> There is no technical reason for UDP to be slower than TCP (at CPU level).
The technical reason is 30+ years of history of TCP being ≥90% of Internet traffic and services. There's several orders of magnitude in resources more spent to make TCP fast starting at individual symbols on Ethernet links all the way up into applications.
Encryption is one thing (if you run kTLS which is still not done in most manual setups) but the much bigger IIRC is how much of the networking stack needs to run in userspace and has not been given the optimization love of TCP. If you compared non-kTLS h2 with non-kTLS h3 over a low-latency link the h2 connection could handle a lot more traffic compared to h3.
That is not to say that h3 does not have its place, but the networking stacks are not optimized for it yet.
If we ever get to adopting this, I will send every byte to a separate IPv6 address. Big Tech surveillance wouldn't work so many don't see a point like the author.
Came here to say the same thing; had they read the RFC they'd realize it's not actually a limit, just a suggestion - thats why its in the "Practical Considerations" section too.
Whomever downvoted you is probably unaware words like SHOULD have specific meaning in RFCs
I think this post gets the complexity situation backwards. Sure, you can use a different protocol between your load balancer and your application and it won't do too much harm. But you're adding an extra protocol that you have to understand, for no real benefit.
(Also, why do you even want a load balancer/reverse proxy, unless your application language sucks? The article says it "will also take care of serving static assets, normalize inbound requests, and also probably fend off at least some malicious actors", but frankly your HTTP library should already be doing all of those. Adding that extra piece means more points of failure, more potential security vulnerabilities, and for what benefit?)
> Sure, you can use a different protocol between your load balancer and your application and it won't do too much harm. But you're adding an extra protocol that you have to understand, for no real benefit.
Well, that depends...
At a certain scale (and arguably, not too many people will ever need to think about this), using UNIX sockets (instead of HTTP TCP) between the application and load balancer can be faster in some cases, as you don't go through the TCP stack...
> Also, why do you even want a load balancer/reverse proxy, unless your application language sucks?
Erm... failover... ability to do upgrades without any downtime... it's extra complexity yes, but it does have some benefits...
> At a certain scale (and arguably, not too many people will ever need to think about this), using UNIX sockets (instead of HTTP TCP) between the application and load balancer can be faster in some cases, as you don't go through the TCP stack...
Sure (although as far as I can see there's no reason you can't keep using HTTP for that). You can go even further and use shared memory (I work for a company that used Apache with Jk back in the day). But that's an argument for using a faster protocol because you're seeing a benefit from it, not an argument for using a slower protocol because you can't be bothered to implement the latest standard.
> using a slower protocol because you can't be bothered to implement the latest standard.
I thought we were discussing HTTP/2 but now you seem to be invoking HTTP/3? It's even faster indeed but brings a whole lot of baggage with it. Nice comparison point though: Do you want to add the complexity of HTTP/2 or HTTP/3 in your backend? (I don't.)
> I thought we were discussing HTTP/2 but now you seem to be invoking HTTP/3?
The article talks about HTTP/2 but I suspect they're applying the same logic to HTTP/3.
> Do you want to add the complexity of HTTP/2 or HTTP/3 in your backend? (I don't.)
I'd like to use the same protocol all the way through. I wouldn't want to implement any HTTP standard by hand (I could, but I wouldn't for a normal application), but I'd expect an established language to have a solid library implementation available.
> why do you even want a load balancer/reverse proxy, unless your application language sucks?
Most load balancer/reverse proxy applications also handle TLS. Security-conscious web application developers don't want TLS keys in their application processes. Even the varnish authors (varnish is a load balancer/caching reverse proxy) refused to integrate TLS support because of security concerns; despite being reverse-proxy authors, they didn't trust themselves to get it right.
An application can't load-balance itself very well. Either you roll your own load balancer as a separate layer of the application, which is reinventing the wheel, or you use an existing load balancer/reverse proxy.
Easier failover with fewer (ideally zero) dropped requests.
If the app language isn't compiled, having it serve static resources is almost certainly much slower than having a reverse proxy do it.
> Security-conscious web application developers don't want TLS keys in their application processes.
If your application is in a non-memory-safe language, sure (but why would you do that?). Otherwise I would think the risk is outweighed by the value of having your connections encrypted end-to-end. If your application process gets fully compromised then an attacker already controls it, by definition, so (given that modern TLS has perfect forward secrecy) I don't think you really gain anything by keeping the keys confidential at that point.
I write application servers for a living, mostly for Python but previously for other languages.
Nobody, nobody, writes application servers with the intent of having them exposed to the public internet. Even if they're completely memory safe, we don't do DOS protections like checking for reasonable header lengths, rewriting invalid header fields, dropping malicious requests, etc. Most application servers will still die to slowloris attacks. [1]
We don't do this because it's a performance hog and we assume you're already reverse proxying behind any responsible front-end server, which all implement these protections. We don't want to double up on that work. We implement the HTTP spec with as low overhead as possible, because we expect to have pipelined HTTP/1.1 connections from a load balancer or other reverse proxy.
Your application server, Gunicorn, Twisted, Uvicorn, whatever, does not want to be exposed to the public internet. Do not expose it to the public internet.
> Nobody, nobody, writes application servers with the intent of having them exposed to the public internet
For rust, go, lua (via nginx openresty) and a few others this is a viable path. I probably wouldn't do it with node (or bun or deno), python, or similar but there are languages where in certain circumstances it is reasonable and might be better.
For Go, net/http is not something you should expose to the public internet, there's no secret sauce in there. It will just die to the first person to hit it with a slowloris or other DOS attack. Same with the common C++ options like boost.beast unless you're writing the logic yourself (but why bother? Just reverse proxy).
I'm unfamiliar with the common rust frameworks for http, but find it unlikely the situation is very different.
> We don't do this because it's a performance hog and we assume you're already reverse proxying behind any responsible front-end server
What application servers have you written? I have never seen an application server readme say DON'T EXPOSE DIRECTLY TO THE INTERNET, WE ASSUME YOU USE REVERSE PROXY.
Most of them have a disclaimer in their deployment or tutorial docs, some with more strong language than others. Again, nothing bad happens if you don't, we don't write memory vulnerabilities into these servers. You are just far more vulnerable to DOS attacks.
* "We strongly recommend using Guincorn behind a proxy server" [1]
* "As a general rule, you probably want to: ... run behind Nginx for self-hosted deployments." [2]
* "A reverse proxy such as nginx or Apache httpd should be used in front of Waitress." [3]
For some, like uWSGI, they don't even want to talk HTTP (uWSGI supports its own protocol) and it's just assumed you're using a dedicated webserver to talk to public traffic. [4]
You use a reverse proxy because whenever you "deploy to prod", you'll be using one anyway, thus by not having TLS in your app, you had not built something you don't actually need.
I said "API gateway is a fancy term for a configurable reverse proxy often bought as a service" and both "load balancer" and "API gateway" are common configurations of "configurable reverse proxy", often bought as a service.
Many load balancers have this functionality within them, even ones from years ago that aren't around anymore like Microsoft ISA/TMG. They're not web services, but they can route based on requests.
To make sure that your connections can be snooped on over the LAN? Why is that a positive?
> To have a security layer
They usually do more harm than good in my experience.
> To load balance
Sure, if you're at the scale where you want/need that then you're getting some benefit from that. But that's something you can add in when it makes sense.
> To have rewrite rules
> To have graceful updates
Again I would expect a HTTP library/framework to handle that.
> To make sure that your connections can be snooped on over the LAN? Why is that a positive?
No, to keep your app from having to deal with SSL. Internal network security is an issue, but sites that need multi-server architectures can't really be passing SSL traffic through to the application servers anyway, because SSL hides stuff that's needed for the load balancers to do their jobs. Many websites need load balancers for performance, but are not important enough to bother with the threat model of an internal network compromise (whether it's on the site owner's own LAN, or a bare metal or VPS hosting vlan).
> Sure, if you're at the scale where you want/need that then you're getting some benefit from that. But that's something you can add in when it makes sense.
So why not preface your initial claims by saying you trust the web app to be secure enough to handle SSL keys, and a single instance of the app can handle all your traffic, and you don't need high availability in failure/restart cases?
That would be a much better claim. It's still unlikely, because you don't control the internet. Putting your website behind Cloudflare buys you some decreased vigilance. A website that isn't too popular or attention-getting also reduces the risk. However, Russia and China exist (those are examples only, not an exclusive list of places malicious clients connect from).
> So why not preface your initial claims by saying you trust the web app to be secure enough to handle SSL keys, and a single instance of the app can handle all your traffic, and you don't need high availability in failure/restart cases?
Yeah, I phrased things badly, I was trying to push back on the idea that you should always put your app behind a load balancer even when it's a single instance on a single machine. Obviously there are use cases where a load balancer does add value.
(I do think ordinary webapps should be able to gracefully reload/restart without losing connections, it really isn't so hard, someone just has to make the effort to code the feature in the library/framework and that's a one-off cost)
> To make sure that your connections can be snooped on over the LAN? Why is that a positive?
Usually your "LAN" uses whole link encryption, so that whatever is accessed in your private infrastructure network is encrypted (being postgres, NFS, HTTP, etc). If that is not the case, then you have to configure encryption at each service level, which is both error prone, time consuming, and not always possible. If that is not case then you can have internal SSL certificates for the traffic between RP and workers, workers and postgres, etc.
Also you don't want your SSL server key to be accessible from business logic as much as possible, having an early termination and isolated workers achieves that.
Also, you generally have workers access private resources, which you don't want exposed on your actual termination point. It's just much better to have a public termination point RP with a private iface sending requests to workers living in a private subnet accessing private resources.
> > To have a security layer
> They usually do more harm than good in my experience.
Right, maybe you should detail your experience, as your comments don't really tell much.
> To have rewrite rules
> To have graceful updates
> > Again I would expect a HTTP library/framework to handle that.
HTTP frameworks handle routing _for themselves_, this is not the same as rewrite rules which are often used to glue multiple heterogeneous parts together.
HTTP frameworks are not handling all the possible rewriting and gluing for the very reason that it's not a good idea to do it at the logic framework level.
As for graceful updates, there's a chicken and egg problem to solve. You want graceful update between multiple versions of your own code / framework. How could that work without a third party balancing old / new requests to the new workers one at a time.
You terminate SSL as close to the user as possible, because that round trip time is greatly going to affect the user experience. What you do between your load balancer and application servers is up to you, (read: should still be encrypted) but terminating SSL asap is about user experience.
> You terminate SSL as close to the user as possible, because that round trip time is greatly going to affect the user experience. What you do between your load balancer and application servers is up to you, (read: should still be encrypted) but terminating SSL asap is about user experience.
That makes no sense. The latency from your load balancer to your application server should be a tiny fraction of the latency from the user to the load balancer (unless we're talking about some kind of edge deployment, but at that point it's not a load balancer but some kind of smart proxy), and the load balancer decrypting and re-encrypting almost certainly adds more latency compared to just making a straight connection from the user to the application server.
Say your application and database are in the US West and you want to serve traffic to EU or AUS, or even US East. Then you want to terminate TCP and TLS in those regions to cut down on handshake latency, slow start time, etc. Your reverse proxy can then use persistent TLS connections back to the origin so that those connection startup costs are amortized away. Something like nginx can pretty easily proxy like 10+ Gb/s of traffic and 10s of thousands of requests per second on a couple low power cores, so it's relatively cheap to do this.
Lots of application frameworks also just don't bother to have a super high performance path for static/cached assets because there's off-the-shelf software that does that already: caching reverse proxies.
It depends on your deployment and where your database and app servers and POPs are. If your load balancer is right next to your application server; is right next to your database, you're right. And it's fair to point out that most people have that kind of deployment. However there are some companies, like Google, that have enough of a presence that the L7 load balancer/smart proxy/whatever you want to call it is way closer to you, Internet-geographically, than the application server or the database. For their use case and configuration, your "almost certainly" isn't what was seen emperically.
You usually re-encrypt your traffic after the GW, either by using an internal PKI and TLS or some kind of encapsulation (IPSEC, etc).
Security and availability requirements might vary, so much to argue about. Usually you have some kind of 3rd party service you want to hide, control CORS, Cache-Control, etc headers uniformly, etc. If you are fine with 5-30 minutes of outage (or until someone notices and manually restores service), then of course you don’t need to load balance. But you can imagine this not being the case at most companies.
I've built infrastructure. Indeed I've built infrastructure exactly like this, precisely because maintaining encryption all the way to the application server was a security requirement (this was a system that involved credit card information). It worked well.
Load balancers are nice to have if you want to move traffic from one machine to another. Which sometimes needs to happen even if your application language doesn't suck and you can hotload your changes... You may still need to manage hardware changes, and a load balancer can be nice for that.
DNS is usable, but some clients and recursive resolvers like to cache results for way beyond the TTL provided.
Answers from the article - the "extra" protocol is just HTTP/1.1 and the reason for a load balancer is the ability to have multiple servers:
> But also the complexity of deployment. HTTP/2 is fully encrypted, so you need all your application servers to have a key and certificate, that’s not insurmountable, but is an extra hassle compared to just using HTTP/1.1, unless of course for some reasons you are required to use only encrypted connections even over LAN.
> So unless you are deploying to a single machine, hence don’t have a load balancer, bringing HTTP/2 all the way to the Ruby app server is significantly complexifying your infrastructure for little benefit.
Good to know - neither the parent nor the article mention this. h2c seems to have limited support by tooling (e.g. browsers, curl), which is a bit discouraging.
It does. Just use `--http2` or `--http2-prior-knowledge`, curl deduce the clear or not clear by `http` or `https` URL protocol prefix (clear the default).
I said limited support and gave curl as an example because curl --http2 sends a HTTP/1.1 upgrade request first so fails in a purely HTTP/2 environment.
Thanks for bringing up --http2-prior-knowledge as a solution!
I'd agree it's not critical, but discard the assumption that requests within the data center will be fast. People have to send requests to third parties, which will often be slow. Hopefully not as slow as across the Atlantic, but still magnitudes worse than an internal query.
You will often be in the state where the client uses HTTP2, and the apps use HTTP2 to talk to the third party, but inside the data center things are HTTP1.1, fastcgi, or similar.
Why does HTTP2 help with this? Load balancers use one keepalive connection per request and don't experience head of line blocking. And they have slow start disabled. So even if the latency of the final request is high, why would HTTP2 improve the situation?
If every request is quick, you can easily re-use connections, file handles, threads, etc. If requests are slow, you will often need to spin up new connections, as you don't want to wait for the response that might take hundreds of milliseconds.
But I did start by saying it's not important. It's a small difference, unless you hit a connection limit.
The article seems to make an assumption that the application backend is in the same datacenter as the load balancer, which is not necessarily true: people often put their load balancers at the network edge (which helps reduce latency when the response is cached), or just outsource those to a CDN vendor.
> In addition to the low roundtrip time, the connections between your load balancer and application server likely have a very long lifetime, hence don’t suffer from TCP slow start as much, and that’s assuming your operating system hasn’t been tuned to disable slow start entirely, which is very common on servers.
A single HTTP/1.1 connection can only process one request at a time (unless you attempt HTTP pipelining), so if you have N persistent TCP connections to the backend, you can only handle N concurrent requests. Since all of those connections are long-lived and are sending at the same time, if you make N very large, you will eventually run into TCP congestion control convergence issues.
Also, I don't understand why the author believes HTTP/2 is less debuggable than HTTP/1; curl and Wireshark work equally well with both.
I think the more common architecture is for edge network to terminate SSL, and then transmit to the load balancer which is actually in the final data center? In which case you can http2 or 3 on both those hops without requiring it on the application server.
That said I still disagree with the article's conclusion: more connections means more memory so even within the same dc, there should be benefits of http2. And if the app server supports async processing, there's value in hitting it with concurrent requests to make the most of its hardware, and http1.1 head of line blocking really destroys a lot of possible perf gains when the response time is variable.
I suppose I haven't had a true bake off here though - so it's possible the effect of http2 in the data center is a bit more marginal than I'm imagining.
The maximum number of connections thing in HTTP/1 always makes me think of queuing theory, which gives surprising conclusions like how adding a single extra teller at a one-teller bank can cut wait times by 50 times, not just by 2.
However, I think the problem is the Poisson process isn't really the right process to assume. Most websites which would run afoul of the 2/6/8/etc connections being opened are probably trying to open up a lot of connections at the same time. That's very different from situations where only 1 new person arrives every 6 minutes on average, and 2 new people arriving within 1 second of each other is a considerably rarer event.
[1]: https://www.johndcook.com/blog/2008/10/21/what-happens-when-...
And if memory serves if you care about minimizing latency you want all of your workers running an average of 60% occupied. (Which is also pretty close to when I saw P95 times dog-leg on the last cluster I worked on).
Queuing theory is really weird.
Most analyses I've read say the threshold is around the 80% mark [1], although it depends on how model the distribution, and there's nothing magical about the number. The main thing is to avoid getting close to 100%, because wait times go up exponentially as you get closer to the max.
Little's Law is fundamental to queueing theory, but there's also the less well-known Kingman's formula, which incorporates variability of arrival rate and task size [2].
[1] https://www.johndcook.com/blog/2009/01/30/server-utilization...
[2] https://taborsky.cz/posts/2021/kingman-formula/
Really both of those models show 60% as about the limit to where you're still effectively at the baseline for latency. 80% is just about the limit to where you're up there in the exponential rise, any higher and things become unusable.
0-60 and you're still at minimum latency. 60-80 you're at twice the latency but it's probably worth the cost savings of the extra compute density since it's still pretty low. Higher than 80 and things are already slowing down and getting exponentially worse by the request
If you look at the chart in the second link, where does the wait time leave the origin? Around 60%.
The first one is even worse; by 80% you're already seeing twice the delay of 70%.
If I were to describe the second chart I'd say 80% is when you start to get into trouble, not just noticing a slowdown.
I said minimize latency, not optimize latency.
Why 60%? I suppose if they are less than 1% then latency will be even less.
dog food dog leg dog ram lol
dog leg is what people who aren't pretentious prats call an 'inflection point'.
Can't it cut wait times by infinity? For example, if the arrivals are at 1.1 per minute, and a teller processes 1 per minute.
Could be that, could also be the people taking a long time aren't at least causing a bottleneck (assuming there arent two of them at the same time). So you have this situation like this: first person takes 10 minutes, while there are 9 waiting in line that take only one minute a piece. With one teller, average wait time is ~15 minutes. With two tellers, its now ~5 minutes.
Which is why it is highly annoying when there's only one worker at the coffee stand, and there's always this one jerk at the front of the queue who orders a latte when you just want a coffee. With two workers, the people who just want coffee won't have to wait 15 minutes for the lattee people
And I've also noticed a social effect, when people wait a long time it seems to reinforce how they perceive the eventual serviced, that is, they want more out of the interaction, so take longer. Which makes it the situation even worse
> there's always this one jerk at the front of the queue
Here in the espresso world, that’s not so bad. But the ‘vanilla oat milk decaf, and also a hot muffin with butter’ is tedious.
There is a roaster in Auckland that’s been there since the ‘80s. On the counter it says ‘espresso, flat white or fuck off’. Clear and concise. I like it. https://millerscoffee.co.nz/
I'd order a "fuck off espresso" in that situation just to see what happens.
[flagged]
"On the counter it says ‘espresso, flat white or fuck off’"
Sounds a bit pretentious to me. I generally order a coffee, no milk ... ta.
Try ordering a “just a cup of coffee” in AU/NZ and they will look at you with a blank expression. Espresso is the norm there.
Luckily, we don't get stuck behind someone using a check any more.
You've forgotten about banker's hours :)
There was a discussion not tool long ago about modern banks still with archaic practices. I have accounts at two different banks, and if I make a transfer request before 1:45PT, it is counted as same day. That makes no damn sense to me why that's a limitation at all today. It's not like a human needs to look at it, but even so, why the 1:45PT cutoff? Is it because it is 4:45ET? Then why not list it as that? And why does a banking computer system care about timezones or bankers' hours at all. It's all just mind boggling lame
It's because when you do transfers, the banks will reconcile their accounts at the end of the day (e.g, if one bank deposits more to another, they will need to make up the difference with their own capital).
These cutoff means banks have certainty about the transaction, as these reconciliation is batched, rather than real time.
I know my father sometimes had to take 1 am phone calls because the insurance industry runs a lot of batch processing overnight, when the systems aren't competing with OLTP traffic. Banking software may be built the same way.
> Is it because it is 4:45ET? Then why not list it as that?
Because a lot of their customers are too stupid to understand timezones.
I guess that's saying more about the left coast customers then, right? as all of the customers from the other time zones have to do the conversion.
Well, if someone misunderstands and sends their payment by 1:45ET or 1:45CT, it's not a problem for the bank.
[flagged]
I find a lot of value in being able to get a water or a coffee, use the restroom, have sidebar conversations with fellow employees, begrudgingly attend meetings, or take a walk to stretch my legs for a minute and think, personally.
[flagged]
Almost every web forum enters a phase where participants bring in their pet politics into unrelated discussions. Whether they last or not depends entirely on whether the flamebait/troll creates a large reply structure or a non-existent one. This is why shadowbans are more effective than large groups of people responding angrily. Or, to cite the Deep Magic: "don't feed the trolls".
[flagged]
[flagged]
Personally, I'd like to see more HTTP/2 support. I think HTTP/2's duplex streams would be useful, just like SSE. In theory, WebSockets do cover the same ground, and there's also a way to use WebSockets over HTTP/2 although I'm not 100% sure how that works. HTTP/2 though, elegantly handles all of it, and although it's a bit complicated compared to HTTP/1.1, it's actually simpler than WebSockets, at least in some ways, and follows the usual conventions for CORS/etc.
The problem? Well, browsers don't have a JS API for bidirectional HTTP/2 streaming, and many don't see the point, like this article expresses. NGINX doesn't support end-to-end HTTP/2. Feels like a bit of a shame, as the streaming aspect of HTTP/2 is a more natural evolution of the HTTP/1 request/response cycle versus things like WebSockets and WebRTC data channels. Oh well.
Duplex streams are not really a HTTP/2-only feature. You can do the same bidirectional streaming with HTTP/1.1 too. The flow is always: 1. The client sends a header set. 2. It can then start to stream data in the form of an unlimited-length byte-stream to the server. 3. The server starts to send a header set back to the client. 4. The server can then start to stream data in the form of an unlimited-length byte-stream to the client.
There is not even a fixed order between 2) and 3). The server can start sending headers or body data before the client sent any body byte.
What is correct is that a lot of servers and clients (including javascript in browsers) don't support this and make stricter assumptions regarding how HTTP requests are used - e.g. that the request bytes are fully sent before the response happens. I think ReadableStream/WritableStream APIs on browsers were supposed to change that, but I haven't followed the progress in the last few years.
NGINX falls into the same category. It's HTTP/2 support (and gRPC support) had been built with a very limited use-case in mind. That's also why various CDNs and service meshes use different kinds of HTTP proxies - so that various streaming workloads don't break in case way the protocol is used is not strictly request->response.
No browser I'm aware of is planning on allowing the request and response bodies to be streamed simultaneously for the same request using ReadableStream and WriteableStream. When using streaming request bodies, you have to set the request explicitly to half-duplex.
Anyways, yes, this is technically true, but the streaming semantics are not really that well-defined for HTTP/1.1, probably because it was simply never envisioned. The HTTP/1.1 request and response were viewed as unary entities and the fact that their contents were streamed was mostly an implementation detail. Most HTTP/1.1 software, not just browsers, ultimately treat the requests and responses of HTTP as different and distinct phases. For most uses of HTTP, this makes sense. e.g. for a form post, the entire request entity is going to need to be read before the status can possibly be known.
Even if we do allow bidirectional full-duplex streaming over HTTP/1.1, it will block an entire TCP connection for a given hostname, since HTTP/1.1 is not multiplexed. This is true even if the connection isn't particularly busy. Obviously, this is still an issue even with long-polling, but that's all the more reason why HTTP/2 is simply nicer.
NGINX may always be stuck in an old school HTTP/1 mindset, but modern software like Envoy shows a lot of promise for how architecting around HTTP/2 can work and bring advantages while remaining fully backwards compatible with HTTP/1 software.
HTTP2 works great on the LAN, or if you have really good network.
It starts to really perform badly when you have dropped packets. So any kind of medium quality wifi or 4/5g kneecaps performance.
It was always going to do this, and as webpages get bigger, the performance degradation increases.
HTTP2 fundamentally underperforms in the real world, and noticeably so on mobile. (My company enthusiastically rolled out http2 support when akamai enabled it.)
Personally I feel that websockets are a hack, and frankly HTTP 3 should have been split into three: a file access protocol, a arbitrary TCP like pipe and a metadata channel. But web people love hammering workarounds onto workarounds. so we are left with HTTP3
HTTP/2, in my experience, still works fine on decent connections, but the advantages definitely start to level out as the connection gets worse. HTTP/2 definitely has some inherent disadvantages over HTTP/1 in those regards. (Though it depends on how much you are constrained by bandwidth vs latency, to be sure.)
However, HTTP/3 solves that problem and performs very well on both poor quality and good quality networks.
Typically, I use HTTP/2 to refer to both HTTP/2 and HTTP/3 since they are basically the same protocol with different transports. Most people don't really need to care about the distinction, although I guess since it doesn't use TCP there are cases where someone may not be able to establish an HTTP/3 connection to a server. Still, I think the forward looking way to go is to try to push towards HTTP/3, then fall back to HTTP/2, and still support HTTP/1.1 indefinitely for simple and legacy clients. Some clients may get less than ideal performance, but you get the other benefits of HTTP/2 on as many devices as possible.
> HTTP 3 should have been split into three: a file access protocol, a arbitrary TCP like pipe and a metadata channel
HTTP3 is basically just HTTP2 on top of QUIC… so you already have the tcp-like pipe, it’s called QUIC. And there’s no reason to have a metadata channel when there are already arbitrary separate channels in QUIC itself.
Yeah, it's a shame you can't take advantage of natural HTTP/2 streaming from the browser. There's the upcoming WebTransport API (https://developer.mozilla.org/en-US/docs/Web/API/WebTranspor...), but it could have been added earlier.
If you want to stream data inside a HTTP body (of any protocol), then the ReadableStream/WritableStream APIs would be the appropriate APIs (https://developer.mozilla.org/en-US/docs/Web/API/Streams_API) - however at least in the past they have not been fully standardized and implemented by browsers. Not sure what the latest state is.
WebTransport is a bit different - it offers raw QUIC streams that are running concurrently with the requests/streams that carry the HTTP/3 requests on shared underlying HTTP/3 connections and it also offers a datagram API.
I think the problem is that duplex communication on the web is rarely useful except in some special cases, and usually harder to scale as you have to keep state around and can't as easily rotate servers.
Some applications it is important but for most websites the benefits just dont outweigh the costs.
It seems like the author is agreeing that HTTP/2 is great (or at least good) for browser -> web server communication, but not useful for the REST-style APIs that pervade modern app design. He makes a good case, but HTTP was never really a good choice for API transport _either_, it just took hold because it was ubiquitous.
I thought http/2 was great for reducing latency for JS libraries like Turbo Links and Hotwire.
Which is why the Rails crowd want it.
Is that not the case?
H2 still suffers from head of line blocking on unstable connections (like mobile).
H3 is supposed to solve that.
[dead]
[flagged]
I have an nginx running on my VPS supporting my startup. Last time I had to touch it was about 4 years ago. Quality software
I really like Caddy, but these nginx performance comparisons are never really supported in benchmarks.
There have been numerous attempts to benchmark both (One example: https://blog.tjll.net/reverse-proxy-hot-dog-eating-contest-c... ) but the conclusion is almost always that they're fairly similar.
The big difference for simple applications is that Caddy is easier to set up, and nginx has a smaller memory footprint. Performance is similar between the two.
AFAIK both proxies are capable of serving at line rate for 10Gbps or more at millions of concurrent connections. I can't possibly see how performance would significantly differ if they're properly configured.
nginx's memory footprint is tiny for what it delivers. A common pattern I see for homelab and self-hosted stuff is a lightweight bastion VPS in a cloud somewhere proxying requests to more capabile on-premise hardware over a VPN link. Using a cheap < $5mo means 1GB or less of RAM, so you have to tightly watch what is running on that host.
To be fair 1GB is a lot, both caddy and nginx would feel pretty good with it I'd imagine.
1 GB should be way more than either should need. I run nginx, unbound, postfix, dovecot plus all the normal suff (ssh, systemd, etc) for a Linux system on a VPS w/ 500MB of RAM. Currently the system has ~270MB used. It actually has 1GB available due to a plan auto-upgrade but have never bothered as I just don't need it.
1GB would be for everything running on the server, not just the reverse proxy.
For small personal projects, you don't usually buy a $5/month VPS just to use as a dedicated reverse proxy.
1GB for a VPC that runs an HTTP load balancer/reverse proxy and a handful of IPsec or Wireguard tunnels back to the app servers (origin) is overkill. You could successfully run that in 512MB, and probably even 256MB. (That's the scenario described).
What needs to run on this that's a memory hog making 512MB too small? By my (very rough) calculations youd need 50-100MB for kernel + systemd + sshd + nginx base needs + tunnels home. That leaves the rest for per-request processing.
Each request starts needing enough RAM to parse the https headers into a request object, open a connection back to the origin, and buffer a little bit of traffic that comes in while that request is being processed/origin connection opens. After that you only need to maintian 2 connections plus some buffer space - Generously 50KB initially and 10KB ongoing. There's enough space for a thousand concurrent requests in the ram not used by the system. Proxying is fairly cheap - the app servers (at the origin) may need much much more, but that's not the point of the VPS being discussed.
Also worth noting that the cheap VPS is not a per-project cost - that is the reverse proxy that handles all HTTP traffic into your homelab.
Why would you use either when there is OpenBSD w/ carp + HAProxy?
There's lots of options out there. I mean, even IIS can do RP work.
Ultimately, I would prefer a PaaS solution over having to run a couple of servers.
You're going to need to show your homework for this to be a credible claim.
Is it just me or did anyone else completely miss Caddy for it's opening sentence?
>Caddy is a powerful, extensible platform to serve your sites, services, and apps, written in Go.
To me it reads that if your application is not written in Go, don't bother
The Go crowd, like the Rust crowd, likes to advertise the language their software is written in. I agree that that specific sentence is a bit ambiguous, though, as if it's some kind of middleware that hooks into Go applications.
It's not, it's just another standalone reverse proxy.
When I see software written in Go, I know that it has a very sad plugin support story.
Terraform providers seem to work pretty well, but as far as I know, they're basically separate executables and the main process communicates with them using sockets.
Yes, works very well for terraform. You probably can see why it's not going to work for a webserver?
> The Go crowd, like the Rust crowd, likes to advertise the language their software is written in.
Probably because end users appreciate that usually that means a single binary + config file and off you go. No dependency hell, setting up third party repos, etc.
> Probably because end users appreciate that usually that means a single binary + config file and off you go. No dependency hell, setting up third party repos, etc.
Until you have to use some plugin (e.g. cloudflare to manage DNS for ACME checks), now it's exactly "dependency hell, setting up third party repos, etc."
I also fully expect to see a few crashes from unchecked `err` in pretty much any Go software. Also, nginx qualifies for `single binary + config`, it's just NGINX is for infra people and Caddy is for application developers.
Fortunately I don't think any of that applies to Rust ;-)
Actually, all of it applies to rust. The only stable ABI in Rust is C-ABI and IMO at that point it stops being rust. Even dynamically loading rustlib in rust application is unsafe and only expected to work when both compiled with the same version. In plugins context, it's the same as what Caddy making you do.
However, Rust Evangelical Strike Force successfully infiltrated WASM committee and when WASM Components stabilize, it can be used for plugins in some cases (see Zed and zellij). (Go can use them as well, rust is just the first (only?) to support preview-2 components model.
Yeah, I don't really do dynamic loading in my corner of Rust. And I can always target some MSRV, cargo package versions, and be happy with it. Definitely beats the dependency hell I've had to deal with elsewhere
Don't get me wrong, I love rust and use it almost every day. Doing `cargo run` in a project it handles everything is good. This gets lost once you start working in a plugin context. Because now you're not dealing in your neatly organized workplace, you're working across multiple workplaces from different people.
IIRC it's more than just MSRV or even matching version exactly. It also requires flags that were used to compile rustc match (there is an escape hatch tho).
Why should a reverse proxy give a single shit about what your lang application is written in
It shouldn't, which is why I think wording there is strange. Nginx doesen't market itself as "platform to serve your sites, services, and apps, written in C". Reading the first sentence I don't even know what Caddy is, what does a platform mean in this context? Arriving on Nginx's site the first sentence visible to me is
>nginx ("engine x") is an HTTP web server, reverse proxy, content cache, load balancer, TCP/UDP proxy server, and mail proxy server.
Which is perfect
when it says 'written in Go', the subtext is - i'm fast, i'm new, i'm modern, go buddies love me please
the better one is 'written in rust', the subtext is - i'm fast, i'm new, i'm futurism, and i'm memory-safe, rust buddies love me please
--- cynicism end ---
i do think sometimes it's worth to note the underlying tech stack, for example, when a web server claims it's based on libev, i know it's non-blocking
Back when Caddy first came out over 10 years ago, the fact that it was written in Go was just simply more notable. For Go, it also at least tells you the software is in a memory-safe programming language. Now neither of those things is really all that notable, for new software.
I didn't even read the article, but I love the comments on the thread.
Yes. The implementation language of a system should not matter to people in the least. However, they are used as a form of prestige by developers and, sometimes, as a consumer warning label by practitioners.
"Ugh. This was written in <language-I-hate>."
"Ooo! This was written in <language-I-love>!"
There's certainly some aspect of that going on, but I think mainly it's just notable when you write something in a programming language that is relatively new.
Does it matter? In theory no, since you can write pretty much anything in pretty much any language. In practice... It's not quite that black and white. Some programming languages have better tooling than others; like, if a project is written in pure Go, it's going to be a shitload easier to cross compile than a C++ project in most cases. A memory-safe programming language like Go or Rust will tell you about the likely characteristics of the program: the bugs are not likely to be memory or stack corruption bugs since most of the code can't really do that. A GC'd language like Go or Java will tell you that the program will not be ideal for very low latency requirements, most likely. Some languages, like Python, are languages that many would consider easy to hack on, but on the other hand a program written in Python probably doesn't have the best performance characteristics, because CPython is not the fastest interpreter. The discipline that is encouraged by some software ecosystems will also play a role in the quality of software; let's be honest, everyone knows that you CAN write quality software in PHP, but the fact that it isn't easy certainly says something. There's nothing wrong with Erlang but you may need to learn about deploying BEAM in production before actually using Erlang software, since it has its own unique quirks.
And this is all predicated on the idea that nobody ever introduces a project as being "written in C." While it's definitely less common, you definitely do see projects that do this. Generally the programming language is more of a focus for projects that are earlier in their life and not as refined as finished products. I think one reason why it was less common in the past is because writing that something is written in C would just be weird. Of course it's written in C, why would anyone assume otherwise? It would be a lot more notable, at that point, if it wasn't.
I get why people look at this in a cynical way but I think the cynical outlook is only part of the story. In actuality, you do get some useful information sometimes out of knowing what language something is written in.
> ... and i'm memory-safe...
Go is memory safe..
Pretty sure Ruby on Rails sites were the same way.
Python certainly was too, back in the day. It feels like it's roughly a "first 10 years of the language" thing, maybe stretched another 5 if there's an underdog aspect (like being interpreted.)
For an open source product, it's fun to say "written in X language". It also advertises the project to developers who may be willing to contribute.
If you put "product made with Go", I'm not going to contribute as I don't know Go, though that wouldn't prevent me from using it should it fit my needs. But if you wrote your project in .NET, I may certainly be willing to contribute.
> elegantly
There is a distinct lack of elegance in the HTTP/2 protocol. It's exceptionally complex and it has plenty of holes in it. That it simply does a job does not earn it "elegant."
Honestly, I don't understand this critique. The actual protocol is pretty straight-forward for what it does. I'm not sure it can be much simpler given the inflexible requirements. I find it more elegant than HTTP/1.1.
Versus HTTP/1.1, some details are simplified by moving the request and status line parts into headers. The same HEADERS frame type can be used for both the headers and trailers on a given stream. The framing protocol itself doesn't really have a whole lot of cruft, and versus HTTP/1 it entirely eliminates the need for the dancing around with Content-Length, chunked Transfer-Encoding, and trailers.
In practice, a lot of the issues around HTTP/2 implementations really just seem to be caused by trying to shoehorn it into existing HTTP/1.1 frameworks, where the differences just don't mesh very well (e.g. Go has some ugly problems here) or just simply a lack of battle-testing due to trouble adopting it (which I personally think is mainly caused by the difficulty of configuring it. Most systems will only use HTTP/2 by default over TLS, after all, so in many cases end-to-end HTTP/2 wasn't being tested.)
From the perspective of a user, where it starts to seem inelegant is when grpc comes into the picture. You get grpc to function but then plain http traffic breaks and vice versa. It seems to be odd implementation details on specific load balancer products. When in theory, all of it should operate the same way, but it doesn’t.
grpc doesn't really do anything special on top of http/2. Load balancers that are aware of http/2 on both sides shouldn't have any trouble with either.
The problem that people run into load balancing grpc is that they try to use a layer 4 load balancer to balance layer 7 requests; that is, if there are 4 backends, the load balancer tells you the address of one of them, and then you wonder why the other 3 backends don't get 25% of the traffic. That's because grpc uses 1 TCP connection and it sends multiple requests over that connection ("channel"). If your load balancer tells you the addresses of all 4 servers, then you can open up 4 channels and load balance inside your application (this was always the preferred approach at google, with a control channel to gracefully drain certain backends, etc.). If your load balancer is aware of http/2 at the protocol level (layer 7), then you open up one channel to your load balancer, which already has one channel for each backend. When a request arrives, it inspects it and picks a backend and proxies the rest of the exchange.
Ordinary http/2 works like this, it's just that you can get away with a network load balancer because http clients open new connections more regularly (consider the lifetime of a browser page with the lifetime of a backend daemon). Each new connection is a load balancing opportunity for the naive layer 4 balancer. If you never make new connections, then it never has an opportunity to load balance.
grpc has plenty of complexity for "let applications do their own load balancing", including built-in load balancing algorithms and built-in service discovery and health discovery (xDS); http/2 doesn't have any of this. Whether these are actually part of grpc or just random add-ons to popular client libraries is somewhat up for debate, however.
The same arguments apply to WebSockets probably— yes the implementation is a little hairy, but if the end result is a clean abstraction that does a good job of hiding that complexity from the rest of the stack, then it's elegant.
First 80% of the article was great, but it ends a bit handwavey when it gets to its conclusion.
One thing the article gets wrong is that non-encrypted HTTP/2 exists. Not between browsers, but great between a load balancer and your application.
> One thing the article gets wrong is that non-encrypted HTTP/2 exists
Indeed, I misread the spec, and added a small clarification to the article.
Do you want to risk the complexity and potential performance impact from the handshake that the HTTP/2 standard requires for non-encrypted connections? Worst case, your client and server toolings clash in a way that every request becomes two requests (before the actual h2c request, a second one for the required HTTP/1.1 upgrade, which the server closes as suggested in the HTTP/2 FAQ).
most places where you'd use it use h2c prior knowledge, that is, you just configure both ends to only speak h2c, no upgrades or downgrades.
Not according to Edward Snowden, if you're Yahoo and Google.
You can just add encryption to your backend private network (e.g. Wireguard)
Which has the benefit of encrypting everything and avoids the overhead of starting a TLS socket for every http connection.
If you're going that route, you may as well just do HTTPS again. If you configure your TLS cookies and session resumption right, you'll get all of the advantages of fancy post-quantum crypto without having to go back to the days of manually setting up encrypted tunnels like when IPSec did the rounds.
There's a security angle: Load balancers have big problems with request smuggling. HTTP/2 does something to the picture, maybe someone is more up to date if it's currently better or worse?
ref: https://portswigger.net/web-security/request-smuggling
This is why I configured my company's AWS application load balancer to disable HTTP2 when I first saw the linked post, and haven't changed that configuration since then. Unless we have definitive confirmation that all major load balancers have fixed these vulnerabilities, I'll keep HTTP2 disabled, unless I can figure out how to do HTTP2 between the LB and the backend.
In theory request smuggling is not possible with end-to-end HTTP/2. It's only possible if there is a downgrade to HTTP/1 at some point.
A h2 proxy usually wouldn't proxy through the http2 connection, it would instead accept h2, load-balance each request to a backend over a h2 (or h1) connection.
The difference is that you have a h2 connection to the proxy, but everything past that point is up to the proxies routing. End-to-end h2 would be more like a websocket (which runs over HTTP CONNECT) where the proxy is just proxying a socket (often with TLS unwrapping).
> A h2 proxy usually wouldn't proxy through the http2 connection, it would instead accept h2, load-balance each request to a backend over a h2 (or h1) connection.
Each connection need to keep state of all processed requests (the HPACK dynamic headers table), so all request for a given connection need to be proxied through the same connection. Not sure I got what you meant, though.
Apart from that, I think the second sentence of my comment makes clear there is no smuggling as long as the connection before/past proxy is http2, and it's not downgraded to http1. That's all that I meant.
Yes HTTP/2 is much less prone to exploitable request smuggling vulnerabilities. Downgrading to H/1 at the load balancer is risky.
Personally, this lack of support doesn’t bother me much, because the only use case I can see for it, is wanting to expose your Ruby HTTP directly to the internet without any sort of load balancer or reverse proxy, which I understand may seem tempting, as it’s “one less moving piece”, but not really worth the trouble in my opinion.
That seems like a massive benefit to me.
The amusing thing is that HTTP/2 is mostly useful for sites that download vast numbers of tiny Javascript files for no really good reason. Like Google's sites.
I've seen noticeable, meaningful speed improvements with HTTP/2 on pages with only 1 Javascript file.
But I'd like to introduce you/them to tight mode:
https://docs.google.com/document/d/1bCDuq9H1ih9iNjgzyAL0gpwN...
https://www.smashingmagazine.com/2025/01/tight-mode-why-brow...
Or small icon/image files.
Anyone remember those sprite files?
You ever had to host map tiles? Those are the worst!
Indeed, there is a reason most mapping libraries still support specifying multiple domains for tiles. It used to be common practice to setup a.tileserver.test, b.tileserver.test, c.tileserver.test even if they all pointed to the same IP/server just to get around the concurrent request limit in browsers.
That’s not quite true… lots of small files still have the overhead of IPC in the browser
CDNs like Akamai still don’t support H2 back to origins.
That’s likely not because of the wisdom in the article per se, but because of rising complexity in managing streams and connections downstream.
> bringing HTTP/2 all the way to the Ruby app server is significantly complexifying your infrastructure for little benefit.
I think the author wrote it with encryption-is-a-must in the mind and after he corrected those parts, the article just ended up with these weird statements. What complexity is introduced apart from changing the serving library in your main file?
In a language that uses forking to achieve parallelism, terminating multiple tasks at the same endpoint will cause those tasks to compete. For some workflows that may be a feature, but for most it is not.
So that's Python, Ruby, Node. Elixir won't care and C# and Java... well hopefully the HTTP/2 library takes care of the multiplexing of the replies, then you're good.
A good python web server should be single process with asyncio , or maybe have a few worker threads or processes. Definitely not fork for every request
I don't think any serious implementation would do forking when using HTTP/2 or QUIC. Fork is a relic of the past.
You are correct about the first assumption, but even without encryption dealing with multiplexing significantly complexify things, so I still stand by that statement.
If you assume no multiplexing, you can write a much simpler server.
> So the main motivation for HTTP/2 is multiplexing, and over the Internet ... it can have a massive impact.
> But in the data center, not so much.
That's a very bold claim.
I'd like to see some data that shows little difference with and without HTTP/2 in the datacenter before I believe that claim.
Datacenters don't typically have high latency, low bandwidth, and varying availability issues. If you have a saturated http/1.1 network (or high CPU use) within a DC you can usually just add capacity.
Yet in experience I see massive speedups on my LOCALHOST going from 1.1 to 2, where are the numbers and tests OP?
gRPC?
Surprised not to see this mentioned in the article.
Lots of places (including a former employer) have done tons of work to upgrade internal infrastructure to support HTTP/2 just so they could use gRPC. The performance difference from JSON-over-HTTP APIs was meaningful for us.
I realize there are other solutions but this is a common one.
Probably because it only works correctly outside of browser. Browsers don't support "native" grpc. You normally use something with specifically gRPC support rather than just h2 in a spherical vacuum.
This entirely. When I first read the title, I thought, lets see what they say about gRPC. gRPC is so much nicer working across applications compared to simple REST servers/clients.
The TLS requirement from HTTP2 also hindered http2 origin uptake. The TLS handshake adds latency and is unnecessary on some instances. (This is mentioned in heading "Extra Complexity" in the article)
For HTTP/3 you get 0-RTT however which largely mitigates this.
0-RTT resumption (unless I'm mistaken), which doesn't help with the first connection (but that might be OK).
Correct, to achieve 0-RTT the application need to perform the handshake/certificate exchange at least once, otherwise, how would it encrypt the payload? This could be cached preemptively iirc, but it is not worth it.
The problem will be that QUIC uses more userland code and UDP is not as optimized as TCP inside kernels. So far, the extra CPU penalty has discouraged me from adopting QUIC everywhere, I've kept it mostly on the edge-out where the network is far less reliable.
Umm dont you get 0-RTT resumption in all versions of http? Its a TLS feature not an http feature. It does not require QUIC
plus in my experience some h2 features behave oddly with load balancers
I don't understand this super well, but could not get keepalives to cross the LB boundary w/ GCP
HTTP keepalive it a feature from HTTP 1.1, not HTTP2.
ping frames?
Google measured their bandwidth usage and discovered that something like half was just HTTP headers! Most RPC calls have small payloads for both requests and responses.
HTTP/2 compresses headers, and that alone can make it worthwhile to use throughout a service fabric.
i think it probably varies from workload to workload. reducing handshake time and header compression can have substantial effects.
it's a shame server side hunting/push never caught on. that was always one of the more interesting features.
It didn't catch on because it was hard to see the benefits, and if done incorrectly, could actually make things slightly slower.
Essentially, the server had to do things like compute RTT and understand the status of the browser's cache to do optimal push.
hmm, maybe the client could include a same origin cache state bloom filter in the request.
although i suppose it's a solved problem these days.
Bloom filters are small relative to the amount of data they can hash, but aren't realistic bloom filters still tens of kB at minimum? Might be too heavyweight to send up.
1024 capacity, 1 in 1M false positive rate. (false positives fail safe- sending something the client already has), 3.6KB
https://hur.st/bloomfilter/?n=1024&p=1.0E-6&m=&k=
Ahh, nice. That's not too bad.
It is very useful for long lived (bidirectional) streams.
Only if you're constrained on connections. The reason that HTTP2 is much better for websites is because of the slow starts of TCP connections. If you're already connected, you don't suffer those losses, and you benefit from kernel muxing.
You've missed the bidirectional part.
How is http2 bidirectional streams better than websockets? I thought they were pretty much equivalent.
Well, IMO h2 streams are more flushed out and offer better control than websockets, but that's just my opinion. In fact, websockets are your only "proper" option if you want hat bidirectional stream be binary - browsers don't expose that portion of h2 to JS.
Here is a silly thing that is possible with h2 over a single connection, but not with websockets:
Multiple page components (Islands) each have their own stream of events over a single h2 connection. With websockets, you will need to roll your own multiplexing[1].
[1]: I think you can multiplex multiple websockets over a single h2 connection tho, but don't quote me on this.
Multiplexing websockets has always seemed trivial to me. Curious to learn more about h2 streams though! Support is coming to JS and has already landed on Firefox and Chrome: https://developer.mozilla.org/en-US/docs/Web/API/WebTranspor...
nah, I'm using HTTP/3 everywhere
Http2 is needed for a GRPC route on OpenShift.
If your load balancer is converting between HTTP/2 and HTTP/1.1, it's a reverse proxy.
Past the reverse proxy, is there a point to HTTP at all? We could also use SCGI or FastCGI past the reverse proxy. It does a better job of passing through information that's gathered at the first point of entry, such as the client IP address.
Keeping everything HTTP makes testing a bit easier.
Hmm it’s weird that this submission and comments are being shown to me as “hours ago” while they are all 2 days old
https://news.ycombinator.com/item?id=26998309
Sometimes the moderators will effectively boost a post that they think is interesting so it gets more views.
Yeah yeah, whatever, just make it work in the browser so I can do gRPC duplex streams, thank you very much.
I remember been bashed on HN saying that HTTP is hard. Yet, I saw non-sens here in the comment about HTTP. The whole article is good but:
> HTTP/2 is fully encrypted, so you need all your application servers to have a key and certificate
Nope. h2c is a thing and is official. But the article is right, the value HTTP/2 provides isn't for LAN, so HTTP 1.1 or HTTP/2 it doesn't matter much.
HTTP/3 however, is fully encrypted. h3c doesn't exists. So yeah, HTTP3 slower you connection, it isn't suited for LAN and should not be used.
BUT if you actually want to encrypt even in you LAN, use HTTP/3, not HTTP/2 encrypted. You will have a small but not negligible gain from 0-RTT.
I would not use http/3 for lan. Even the latest Linux kernels struggle with it. Http/1 aka TCP has fully supported encryption and other offload support. UDP consumes still much more CPU for same amount of traffic.
Do you have source for that? I'm very interested. There is no technical reason for UDP to be slower than TCP (at CPU level).
The only field that is computed in UDP is checksum and the same exists in TCP and it must be recomputed each time someone actually re-route the packet (eg: bridge to VM) since TTL is decreased.
So I doubt your assertion.
_____
Writing my comment I understood what your are talking about. There is a bunch of encryption done at user mode in HTTP/3 that doesn't need to be done in user mode. In HTTP/2 it was sometime done in kernel mode (kTTL), so was quicker. The slowness comes for the CPU needed it to be copied out of kernel mode. I didn't follow the whole story so I trust you on this.
> There is no technical reason for UDP to be slower than TCP (at CPU level).
The technical reason is 30+ years of history of TCP being ≥90% of Internet traffic and services. There's several orders of magnitude in resources more spent to make TCP fast starting at individual symbols on Ethernet links all the way up into applications.
Encryption is one thing (if you run kTLS which is still not done in most manual setups) but the much bigger IIRC is how much of the networking stack needs to run in userspace and has not been given the optimization love of TCP. If you compared non-kTLS h2 with non-kTLS h3 over a low-latency link the h2 connection could handle a lot more traffic compared to h3.
That is not to say that h3 does not have its place, but the networking stacks are not optimized for it yet.
https://news.ycombinator.com/item?id=41890784
https://docs.kernel.org/networking/tls-offload.html
[dead]
If we ever get to adopting this, I will send every byte to a separate IPv6 address. Big Tech surveillance wouldn't work so many don't see a point like the author.
The RFC said "SHOULD not" not "MUST not" couldn't we have just ignored the 2 connection limit?
That's what browsers actually did.
Came here to say the same thing; had they read the RFC they'd realize it's not actually a limit, just a suggestion - thats why its in the "Practical Considerations" section too.
Whomever downvoted you is probably unaware words like SHOULD have specific meaning in RFCs
Browsers started ignoring the 2 connection limit on H/1.x long before H2 came along
I think this post gets the complexity situation backwards. Sure, you can use a different protocol between your load balancer and your application and it won't do too much harm. But you're adding an extra protocol that you have to understand, for no real benefit.
(Also, why do you even want a load balancer/reverse proxy, unless your application language sucks? The article says it "will also take care of serving static assets, normalize inbound requests, and also probably fend off at least some malicious actors", but frankly your HTTP library should already be doing all of those. Adding that extra piece means more points of failure, more potential security vulnerabilities, and for what benefit?)
> Sure, you can use a different protocol between your load balancer and your application and it won't do too much harm. But you're adding an extra protocol that you have to understand, for no real benefit.
Well, that depends...
At a certain scale (and arguably, not too many people will ever need to think about this), using UNIX sockets (instead of HTTP TCP) between the application and load balancer can be faster in some cases, as you don't go through the TCP stack...
> Also, why do you even want a load balancer/reverse proxy, unless your application language sucks?
Erm... failover... ability to do upgrades without any downtime... it's extra complexity yes, but it does have some benefits...
> At a certain scale (and arguably, not too many people will ever need to think about this), using UNIX sockets (instead of HTTP TCP) between the application and load balancer can be faster in some cases, as you don't go through the TCP stack...
Sure (although as far as I can see there's no reason you can't keep using HTTP for that). You can go even further and use shared memory (I work for a company that used Apache with Jk back in the day). But that's an argument for using a faster protocol because you're seeing a benefit from it, not an argument for using a slower protocol because you can't be bothered to implement the latest standard.
> using a slower protocol because you can't be bothered to implement the latest standard.
I thought we were discussing HTTP/2 but now you seem to be invoking HTTP/3? It's even faster indeed but brings a whole lot of baggage with it. Nice comparison point though: Do you want to add the complexity of HTTP/2 or HTTP/3 in your backend? (I don't.)
> I thought we were discussing HTTP/2 but now you seem to be invoking HTTP/3?
The article talks about HTTP/2 but I suspect they're applying the same logic to HTTP/3.
> Do you want to add the complexity of HTTP/2 or HTTP/3 in your backend? (I don't.)
I'd like to use the same protocol all the way through. I wouldn't want to implement any HTTP standard by hand (I could, but I wouldn't for a normal application), but I'd expect an established language to have a solid library implementation available.
> why do you even want a load balancer/reverse proxy, unless your application language sucks?
Most load balancer/reverse proxy applications also handle TLS. Security-conscious web application developers don't want TLS keys in their application processes. Even the varnish authors (varnish is a load balancer/caching reverse proxy) refused to integrate TLS support because of security concerns; despite being reverse-proxy authors, they didn't trust themselves to get it right.
An application can't load-balance itself very well. Either you roll your own load balancer as a separate layer of the application, which is reinventing the wheel, or you use an existing load balancer/reverse proxy.
Easier failover with fewer (ideally zero) dropped requests.
If the app language isn't compiled, having it serve static resources is almost certainly much slower than having a reverse proxy do it.
> Security-conscious web application developers don't want TLS keys in their application processes.
If your application is in a non-memory-safe language, sure (but why would you do that?). Otherwise I would think the risk is outweighed by the value of having your connections encrypted end-to-end. If your application process gets fully compromised then an attacker already controls it, by definition, so (given that modern TLS has perfect forward secrecy) I don't think you really gain anything by keeping the keys confidential at that point.
I write application servers for a living, mostly for Python but previously for other languages.
Nobody, nobody, writes application servers with the intent of having them exposed to the public internet. Even if they're completely memory safe, we don't do DOS protections like checking for reasonable header lengths, rewriting invalid header fields, dropping malicious requests, etc. Most application servers will still die to slowloris attacks. [1]
We don't do this because it's a performance hog and we assume you're already reverse proxying behind any responsible front-end server, which all implement these protections. We don't want to double up on that work. We implement the HTTP spec with as low overhead as possible, because we expect to have pipelined HTTP/1.1 connections from a load balancer or other reverse proxy.
Your application server, Gunicorn, Twisted, Uvicorn, whatever, does not want to be exposed to the public internet. Do not expose it to the public internet.
[1]: https://en.wikipedia.org/wiki/Slowloris_(cyber_attack)
As someone who designs load-balancer solutions for a living I cannot agree with this more.
I likewise assume that all servers are insecure, always, and we do not want them exposed without a sane load balancer layer.
Your server was probably not made to be exposed to the public internet. Do not expose it to the public internet.
> Nobody, nobody, writes application servers with the intent of having them exposed to the public internet
For rust, go, lua (via nginx openresty) and a few others this is a viable path. I probably wouldn't do it with node (or bun or deno), python, or similar but there are languages where in certain circumstances it is reasonable and might be better.
For Go, net/http is not something you should expose to the public internet, there's no secret sauce in there. It will just die to the first person to hit it with a slowloris or other DOS attack. Same with the common C++ options like boost.beast unless you're writing the logic yourself (but why bother? Just reverse proxy).
I'm unfamiliar with the common rust frameworks for http, but find it unlikely the situation is very different.
> We don't do this because it's a performance hog and we assume you're already reverse proxying behind any responsible front-end server
What application servers have you written? I have never seen an application server readme say DON'T EXPOSE DIRECTLY TO THE INTERNET, WE ASSUME YOU USE REVERSE PROXY.
Most of them have a disclaimer in their deployment or tutorial docs, some with more strong language than others. Again, nothing bad happens if you don't, we don't write memory vulnerabilities into these servers. You are just far more vulnerable to DOS attacks.
* "We strongly recommend using Guincorn behind a proxy server" [1]
* "As a general rule, you probably want to: ... run behind Nginx for self-hosted deployments." [2]
* "A reverse proxy such as nginx or Apache httpd should be used in front of Waitress." [3]
For some, like uWSGI, they don't even want to talk HTTP (uWSGI supports its own protocol) and it's just assumed you're using a dedicated webserver to talk to public traffic. [4]
[1]: https://docs.gunicorn.org/en/latest/deploy.html
[2]: https://www.uvicorn.org/deployment/
[3]: https://flask.palletsprojects.com/en/stable/deploying/waitre...
[4]: https://uwsgi-docs.readthedocs.io/en/latest/tutorials/Django...
Of course, don't expose to the public Internet.
Also don't expose plain text traffic to the internal corpnet where most attack originate from.
You use a reverse proxy because whenever you "deploy to prod", you'll be using one anyway, thus by not having TLS in your app, you had not built something you don't actually need.
Speculative execution cpu bugs. Or whatever the next class of problems is that can expose bits of process memory without software memory bugs.
That's already a fringe case. Do you really think everyone's writing web applications in a language like rust without any unsafe (or equivalent)?
> Also, why do you even want a load balancer/reverse proxy, unless your application language sucks
- To terminate SSL
- To have a security layer
- To load balance
- To have rewrite rules
- To have graceful updates
- ...
- To host multiple frontends, backends and/or APIs under one domain name
> backends and/or APIs under one domain name
On one IP, sure, for one domain you could use an API gateway.
API gateway is a fancy term for a configurable reverse proxy often bought as a service.
No, API gateway is a web service that has a main purpose of routing requests based on incoming requests.
Load Balancer maing purpose is to...balance the load across multiple backends.
Just because both can be implemented with a reverse proxy such as NGINX doesn't mean it's the same thing.
I said "API gateway is a fancy term for a configurable reverse proxy often bought as a service" and both "load balancer" and "API gateway" are common configurations of "configurable reverse proxy", often bought as a service.
Many load balancers have this functionality within them, even ones from years ago that aren't around anymore like Microsoft ISA/TMG. They're not web services, but they can route based on requests.
> To terminate SSL
To make sure that your connections can be snooped on over the LAN? Why is that a positive?
> To have a security layer
They usually do more harm than good in my experience.
> To load balance
Sure, if you're at the scale where you want/need that then you're getting some benefit from that. But that's something you can add in when it makes sense.
> To have rewrite rules > To have graceful updates
Again I would expect a HTTP library/framework to handle that.
> To make sure that your connections can be snooped on over the LAN? Why is that a positive?
No, to keep your app from having to deal with SSL. Internal network security is an issue, but sites that need multi-server architectures can't really be passing SSL traffic through to the application servers anyway, because SSL hides stuff that's needed for the load balancers to do their jobs. Many websites need load balancers for performance, but are not important enough to bother with the threat model of an internal network compromise (whether it's on the site owner's own LAN, or a bare metal or VPS hosting vlan).
> Sure, if you're at the scale where you want/need that then you're getting some benefit from that. But that's something you can add in when it makes sense.
So why not preface your initial claims by saying you trust the web app to be secure enough to handle SSL keys, and a single instance of the app can handle all your traffic, and you don't need high availability in failure/restart cases?
That would be a much better claim. It's still unlikely, because you don't control the internet. Putting your website behind Cloudflare buys you some decreased vigilance. A website that isn't too popular or attention-getting also reduces the risk. However, Russia and China exist (those are examples only, not an exclusive list of places malicious clients connect from).
> So why not preface your initial claims by saying you trust the web app to be secure enough to handle SSL keys, and a single instance of the app can handle all your traffic, and you don't need high availability in failure/restart cases?
Yeah, I phrased things badly, I was trying to push back on the idea that you should always put your app behind a load balancer even when it's a single instance on a single machine. Obviously there are use cases where a load balancer does add value.
(I do think ordinary webapps should be able to gracefully reload/restart without losing connections, it really isn't so hard, someone just has to make the effort to code the feature in the library/framework and that's a one-off cost)
> > To terminate SSL
> To make sure that your connections can be snooped on over the LAN? Why is that a positive?
Usually your "LAN" uses whole link encryption, so that whatever is accessed in your private infrastructure network is encrypted (being postgres, NFS, HTTP, etc). If that is not the case, then you have to configure encryption at each service level, which is both error prone, time consuming, and not always possible. If that is not case then you can have internal SSL certificates for the traffic between RP and workers, workers and postgres, etc.
Also you don't want your SSL server key to be accessible from business logic as much as possible, having an early termination and isolated workers achieves that.
Also, you generally have workers access private resources, which you don't want exposed on your actual termination point. It's just much better to have a public termination point RP with a private iface sending requests to workers living in a private subnet accessing private resources.
> > To have a security layer
> They usually do more harm than good in my experience.
Right, maybe you should detail your experience, as your comments don't really tell much.
> To have rewrite rules
> To have graceful updates
> > Again I would expect a HTTP library/framework to handle that.
HTTP frameworks handle routing _for themselves_, this is not the same as rewrite rules which are often used to glue multiple heterogeneous parts together.
HTTP frameworks are not handling all the possible rewriting and gluing for the very reason that it's not a good idea to do it at the logic framework level.
As for graceful updates, there's a chicken and egg problem to solve. You want graceful update between multiple versions of your own code / framework. How could that work without a third party balancing old / new requests to the new workers one at a time.
You terminate SSL as close to the user as possible, because that round trip time is greatly going to affect the user experience. What you do between your load balancer and application servers is up to you, (read: should still be encrypted) but terminating SSL asap is about user experience.
> You terminate SSL as close to the user as possible, because that round trip time is greatly going to affect the user experience. What you do between your load balancer and application servers is up to you, (read: should still be encrypted) but terminating SSL asap is about user experience.
That makes no sense. The latency from your load balancer to your application server should be a tiny fraction of the latency from the user to the load balancer (unless we're talking about some kind of edge deployment, but at that point it's not a load balancer but some kind of smart proxy), and the load balancer decrypting and re-encrypting almost certainly adds more latency compared to just making a straight connection from the user to the application server.
Say your application and database are in the US West and you want to serve traffic to EU or AUS, or even US East. Then you want to terminate TCP and TLS in those regions to cut down on handshake latency, slow start time, etc. Your reverse proxy can then use persistent TLS connections back to the origin so that those connection startup costs are amortized away. Something like nginx can pretty easily proxy like 10+ Gb/s of traffic and 10s of thousands of requests per second on a couple low power cores, so it's relatively cheap to do this.
Lots of application frameworks also just don't bother to have a super high performance path for static/cached assets because there's off-the-shelf software that does that already: caching reverse proxies.
It depends on your deployment and where your database and app servers and POPs are. If your load balancer is right next to your application server; is right next to your database, you're right. And it's fair to point out that most people have that kind of deployment. However there are some companies, like Google, that have enough of a presence that the L7 load balancer/smart proxy/whatever you want to call it is way closer to you, Internet-geographically, than the application server or the database. For their use case and configuration, your "almost certainly" isn't what was seen emperically.
You usually re-encrypt your traffic after the GW, either by using an internal PKI and TLS or some kind of encapsulation (IPSEC, etc).
Security and availability requirements might vary, so much to argue about. Usually you have some kind of 3rd party service you want to hide, control CORS, Cache-Control, etc headers uniformly, etc. If you are fine with 5-30 minutes of outage (or until someone notices and manually restores service), then of course you don’t need to load balance. But you can imagine this not being the case at most companies.
Tell me you never built an infrastructure without telling me you never built an infrastructure
The point being that all the code on the stack is not necessarily yours
I've built infrastructure. Indeed I've built infrastructure exactly like this, precisely because maintaining encryption all the way to the application server was a security requirement (this was a system that involved credit card information). It worked well.
Load balancers are nice to have if you want to move traffic from one machine to another. Which sometimes needs to happen even if your application language doesn't suck and you can hotload your changes... You may still need to manage hardware changes, and a load balancer can be nice for that.
DNS is usable, but some clients and recursive resolvers like to cache results for way beyond the TTL provided.
C fast, the rest slow. You don't want to serve static assets in non-C.
If you call sendfile with kTLS I imagine it'd be fast in any language
Answers from the article - the "extra" protocol is just HTTP/1.1 and the reason for a load balancer is the ability to have multiple servers:
> But also the complexity of deployment. HTTP/2 is fully encrypted, so you need all your application servers to have a key and certificate, that’s not insurmountable, but is an extra hassle compared to just using HTTP/1.1, unless of course for some reasons you are required to use only encrypted connections even over LAN.
> So unless you are deploying to a single machine, hence don’t have a load balancer, bringing HTTP/2 all the way to the Ruby app server is significantly complexifying your infrastructure for little benefit.
I've deployed h2c (cleartext) in many applications. No tls complexity needed
Good to know - neither the parent nor the article mention this. h2c seems to have limited support by tooling (e.g. browsers, curl), which is a bit discouraging.
EDIT: Based on the HTTP/2 FAQ, pure h2c is not allowed in the standard as it requires you to implement some HTTP/1.1 upgrade functionality: https://http2.github.io/faq/#can-i-implement-http2-without-i...
Why do you think that curl doesn't support h2c ?
It does. Just use `--http2` or `--http2-prior-knowledge`, curl deduce the clear or not clear by `http` or `https` URL protocol prefix (clear the default).
I said limited support and gave curl as an example because curl --http2 sends a HTTP/1.1 upgrade request first so fails in a purely HTTP/2 environment.
Thanks for bringing up --http2-prior-knowledge as a solution!
I'd agree it's not critical, but discard the assumption that requests within the data center will be fast. People have to send requests to third parties, which will often be slow. Hopefully not as slow as across the Atlantic, but still magnitudes worse than an internal query.
You will often be in the state where the client uses HTTP2, and the apps use HTTP2 to talk to the third party, but inside the data center things are HTTP1.1, fastcgi, or similar.
Why does HTTP2 help with this? Load balancers use one keepalive connection per request and don't experience head of line blocking. And they have slow start disabled. So even if the latency of the final request is high, why would HTTP2 improve the situation?
If every request is quick, you can easily re-use connections, file handles, threads, etc. If requests are slow, you will often need to spin up new connections, as you don't want to wait for the response that might take hundreds of milliseconds.
But I did start by saying it's not important. It's a small difference, unless you hit a connection limit.