Containers, microservices, and service meshes

jpetazzo.github.io

273 points by aberoham 6 years ago

stevenacreman 6 years ago

I keep a Google sheet updated with feature differences between LinkerD, LinkerD2, Consul Connect and Istio.

https://docs.google.com/spreadsheets/d/1OBaKrwR030G39i0n_47i...

From my own experience I've had some great success with LinkerD in the past on Mesos DC/OS.

Since moving companies, and switching to Kubernetes, we've yet to deploy any service mesh into production.

The blog from Jerome highlights many of the benefits already.

From my perspective the bigs ones in the past were:

    - Tracing (with Zipkin)
    - Retries which removed or fixed dodgy app reconnect logic
    - Time series metrics in LinkerDViz showing realtime rates and latency and errors between services

The reason we haven't used any service mesh at my current company is mostly based on stability concerns.

Istio gets all of the cool press attention and blogs written about it. Yet, you also read a lot of warnings about it always being 6 months away from being really robust. Even at version 1 we read some horror stories about obscure bugs showing up in fairly standard use cases.

Connectivity between services is too scary to gamble on. It's a similar deal with CNI (we're still on Calico despite arguably cooler stuff being out there) and Ingresses (still on ingress-nginx).

AWS have a service mesh that is probably going to be the one we trial next at work.

Improved observability and retries would definitely be of benefit on our current platform. Another driving factor is also our security team wanting mutual TLS between services.

coredog64 6 years ago

Does anyone else think mTLS on the public cloud is a waste of CPU cycles (and therefore money)?
- rroeserr 6 years ago
  
  Yes - esp if you have a sidecar which speaks in-securely to your application. Data theft happens from application issues, or employees with access stealing things - not because of unencrypted traffic in a secure network.
- mattupstate 6 years ago
  
  There's a cost to ensure all your data is encrypted in transit regardless of how you do it.
- pm90 6 years ago
  
  Even if it does cost more its probably worth it.

discreteevent 6 years ago

Has anyone got an opinion on what makes a service mesh better than using a message broker for most distributed systems? Is it performance? Is it that http has become the lowest common denominator and people just don't want to use any other communication protocol?

scraegg 6 years ago

If you see an airplane that flies you might conclude airplanes are a good transportation pattern. But if you see there's a boat, that promises to fly, and you see that people taped a lot of helicopters onto it to make it fly, then probably you would conclude that boats are bad at flying.
Message Brokers are the airplanes. Even if you can't understand how they work you will see that many companies use them who really rely on distributed systems working, eg telephone companies, banks, trading companies.
Service Meshes try to solve a problem that K8s was meant to solve in the first place. So K8s is the boat and Service Mesh is the attached helicopter. By itself it might be a good idea to use it, but the way things are taped together right now it's just an anti-pattern.
If you don't have a pointy-haired boss forcing you to use it, then it's probably better to avoid the whole thing.
I'd rather see how far the development around k8s-less pods with podman will go and take care of the distributed architecture of my systems myself.
- geggam 6 years ago
  
  These layers of abstraction are nice but they seem to ignore the fact you still have to manage the systems underneath and the networking around it.
  At some point folks need to realize they arent google and they really dont need to abstract the development layer because they dont have a team of google SRE to manage what their pods / containers / mesh runs on
- solipsism 6 years ago
  
  This is a strange answer. A message broker is not in the same category as Istio. Istio is for effecting policy changes (regarding security, traffic routing, monitoring, etc) without updating your code. Which is especially useful if your code is a bunch of diverse microservices.
  Message brokers are... not that.
  These are apples and elephants you're comparing.
- pm90 6 years ago
  
  Kubernetes is a platform, it lets you do whatever the f you want to. Service meshes are one thing to build on top of the platform.
  
  scraegg 6 years ago
  
  You can do what you want even without k8s.
  The reason to use a tool is to make a task simpler or take a task out of your hands in most cases but the rare edge case. For instance grep makes it easier to go to a file to find a certain line.
  If you need to know networking, storage, microservice architecture yourself you don't need any tools.
  In k8s you need to know all this, and on top you need to know k8s to achieve the same thing. And then you either need to write k8s plugins or also additionally know all the k8s plugins and wether any of them will actually solve the problem you have.
- sigmonsays 6 years ago
  
  this isn't a good analogy. message brokers by design are asynchronous and dont have a typical request/response flow of http.
jatins 6 years ago

Message brokers only work for "fire-and-forget" kind of async operations. As in, when you make a request you are not waiting for a response.
I don't think it is service mesh OR brokers. Both cater to different use cases.
- wmfiv 6 years ago
  
  This just isn't accurate. Messaging solutions have always supported request-reply. Just as examples,
  https://activemq.apache.org/how-should-i-implement-request-r...
  https://nats.io/documentation/concepts/nats-req-rep/
  https://www.rabbitmq.com/direct-reply-to.html
  
  VectorLock 6 years ago
  
  While its certainly possible to use in this mode from what I've experienced most people who deploy applications at scale use an async message bus/queue rather than a request/response. At that point is there even much of a benefit vs. simply using a synchronous request/response oriented service?
  
  gshulegaard 6 years ago
  
  I may be off but asynchronously queuing two messages and implementing a synchronous block in business logic does not sound the same to me as a TCP request response cycle.
  
  rroeserr 6 years ago
  
  TCP doesn't have a request/response cycle - its a stream of bytes between applications
  
  gshulegaard 6 years ago
  
  Correct but TCP provides a reliable data stream between two hosts, the (particular) messaging protocol over it wasn't part of what I was trying to point out.
- discreteevent 6 years ago
  
  This is true but I would think that if you had to pick one then it is much easier to make a message broker do reliable request-reply/pull than service mesh do reliable events/push.
  
  sagichmal 6 years ago
  
  The opposite tends to be true.
  https://programmingisterrible.com/post/162346490883/how-do-y...
  
  discreteevent 6 years ago
  
  Good article, thanks:
  "A protocol is the rules and expectations of participants in a system, and how they are beholden to each other. A protocol defines who takes responsibility for failure.
  The problem with message brokers, and queues, is that no-one does.
  Using a message broker is not the end of the world, nor a sign of poor engineering. Using a message broker is a tradeoff. Use them freely knowing they work well on the edges of your system as buffers. Use them wisely knowing that the buck has to stop somewhere else. Use them cheekily to get something working.
  I say don’t rely on a message broker, but I can’t point to easy off-the-shelf answers. HTTP and DNS are remarkable protocols, but I still have no good answers for service discovery."
  I would not disagree with any of this but I don't know enough about service meshes to know if they mean that you don't need a protocol (meaning the edges still need to deal with failures after the mesh or broker has retried etc).
  One thing a broker gives you is that if the problem is that the other service is busy or down then it will eventually get the message. So it avoids one specific kind of failure in the case where eventual consistency is acceptable.
jordanbeiber 6 years ago

My take. Might be clueless though, but in my experience:
The protocol has less to do with it, see grpc for example.
Having run an operationally critical microservice infrastructure with consul for service discovery I realized that a mesh makes sense.
When you start hitting a few hundreds of services you’ll want:
1. The observability. Scenario: Why is the application slow? The application depends on two different services each with individual sets of downstream dependencies.
2. Making the services discovery process more resilient. An outage of consul without other means of discovery sucks!
Message brokering as well as event sourcing seems to this layman like awesome patterns where eventual consistency is perfectly ok.
In most scenarios it probably is, but for some it is not.
Whatever works!
- fogetti 6 years ago
  
  I don't really see how the service mesh solves eventual consistency. You might call a service which handles all the tasks asynchronously similarly to a message in a message broker system.
  Service discovery is also inherent in the routing rules itself when you use something like AMQP for example.
  Observability might be something that the broker system lacks of I agree.
  
  jordanbeiber 6 years ago
  
  Oh no, it doesn’t and it wasn’t what I wanted to imply.
  But you might have scenarios where you want or require synchronous or continous direct service to service communication is what I meant.
pmlnr 6 years ago

It's only better when you have a MASSIVE system and the chances of the message broker infrastructure not being capable enough is a problem.

nickstinemates 6 years ago

Jerome, I am always amazed at the quality and depth of your knowledge. Thanks for the lesson.

WestCoastJustin 6 years ago

+1 for Jerome. Love reading these.
Nick, long time no see. Hope thing are well. Amazing who you find browsing these threads :)
- nickstinemates 6 years ago
  
  <3

kissgyorgy 6 years ago

Honestly I'm learning every single day for 6 years, but there are a bunch of words I simply don't know and a bunch of abstractions I don't understand the need described in this article. The future (maybe even the present) looks unnecessary complex to me.

folkrav 6 years ago

The "old" way still works, and won't stop doing so. If your deployment needs are not too fancy, these are still easier to reason with and maintain, too. e.g. a one-man operation who manages a handful of mom-and-pop businesses' websites, it makes no sense to go too fancy. A simple git clone on a simple LAMP setup, or hell, a shared host through FTP, works just fine.
Introducing these things prematurely are just asking for trouble. They come with their share of complexity and overhead. However they do solve some problems. In a lot of cases, if you don't know about these things yet, you probably don't need to.

peterwwillis 6 years ago

I think where service mesh goes wrong is when the design is monolithic or proprietary, where your whole cluster usually needs to use one particular set of compatible products.

At its core, a service mesh is just a complicated, multi-tier, higher-level router and VPN. For TCP/IP routing there's lots of protocols, from the local level to LAN to WAN and beyond. Many of them are designed for specific use cases. But even so, traditional routers aren't centralized, they don't care what other routers they connect to, and they all speak common languages, transfer information, make independent decisions, and act in isolation. This is a really good design that I think service mesh should utilize.

I think what we really want from a service mesh is a router for applications with a standard protocol that doesn't share state (or at least, state is asynchronous and non-consistent, and only verified at the end of an operation, similar to IP routing). Additional features are needed, and those can be codified into the protocol. That would make the system more resilient to change and compatible with different products, rather than now where each service mesh software is basically a unique incompatible router and protocol.

thewarrior 6 years ago

For someone who has no idea what most of this means is there some place to get started understanding this stuff ?

pm90 6 years ago

No, it is not.

jacques_chester 6 years ago

> Istio was designed to work with Kubernetes; and if you want to use it outside of Kubernetes, you will need to run an instance of the Kubernetes API server (and a supporting etcd service).

Istio is used in Cloud Foundry without needing a Kubernetes master.

mrbonner 6 years ago

I tried to read a few articles on service mesh but then still don’t understand what it tries to solve. Is it just another niche technology (I.e nodejs ecosystem)?

Bombthecat 6 years ago

It solves the team / management problem,that for example.you have five teams and five to eight micro services. Maybe even more by pulling data from outside sources.
Those five teams iterate over there service and product. And maybe even the outside service iterates there product.
How do you make sure that everyone is on the same page and is using the right version and that the right service talks to the rights service in the right way? Especially the outside service for example?
imtringued 6 years ago

Basically you're running the load balancer on the same node as your app instead of running it centrally. If you have 200 nodes each with 1gbit then your load balancer needs to have at least two 200gbit network cards. The service mesh just lets service A nodes talk to individual service B nodes directly which means each node only needs 1gbit. Do you have this problem? I doubt it and even if you do there are lots of hardware/cloud companies willing to take your money in exchange for solving your problem without changing the software.

draw_down 6 years ago

I work mostly in JS these days. I would just like to say, it’s funny to read something like this and reflect on frontend development’s reputation for constant churn, always new tools to solve old problems.

While this article does a good job of describing why these services came to be (and the ways in which they are better than what came before), I wonder how many of them didn’t exist 5 years ago, or weren’t in common usage then?

Maybe our worlds are not so different.