JohnMakin 13 days ago

I'm fine with forcing upgrades this way - however, from an operations standpoint, it is an absolute nightmare.

For one, depending on your situation/CRD's/automation, doing these upgrades in-place can be next to impossible. Updating an EKS minor version can only be done one version at a time - e.g., if you want to go from 1.24 -> 1.28, you need to do 1.25, then 1.26, then 1.27, then 1.28. So teams without a lot of resources are probably in a tough spot depending on how far they are behind here. Often, it's far more efficient to build an entirely new cluster from scratch and then cut over - which seems ridiculous.

Why are upgrading EKS versions such a pain? Well, if you're using any cluster add-ons, for one, all those need to be upgraded to the correct versions, and the compatibility matrix there can be rough. Stuff often breaks at this stage. Care needs to be taken around PV's, the CNI, and god help you if you have some helm charts or CRD's that rely on some deprecated EKS API - even if the upstream repository has a fix for it, you will often find this yak-shaving nightmare of fixing all the stuff that breaks on upgrading that, and then whatever downstream services THAT service breaks - etc.

What is the solution? I don't know. I'm not a kubernetes architect, but I work with it a lot. I understand there are security patches and improvements constantly, but the release cycle, at least from an infrastructure/operations perspective, IME places considerable strain on teams, to the point where I have literally seen a role in a company whose primary responsibility was upgrading EKS cluster versions.

I have a sneaking suspicion this is to try to encourage people to migrate to more expensive managed container orchestration services.

  • watermelon0 13 days ago

    EKS release cycle is related to Kubernetes release cycle. I'm not sure it's fair to expect AWS to freely support outdated K8s versions, that don't have upstream support.

    If K8s would be backwards compatible, upgrading would be a lot easier, and if they would support LTS releases, like other projects, manual upgrades would be needed only every X years.

    For example, the reason that you can use PostgreSQL with the same major version for 5 years on RDS is due to PostgreSQL actively supporting it, and minor versions are non-breaking and can be seamlessly applied (restart or failover to standby replica is still needed during upgrade).

    • JohnMakin 13 days ago

      Completely understand why it is this way, and like I said I don't know the solution - unless AWS was able to or would want to fork Kubernetes in the same way that they did ElasticSearch, but that is understandable why they may not want to do that. Was mostly just griping that this process is a complete pain in the ass for tons of people (IME).

  • rho138 13 days ago

    I recently did the upgrade from 1.24->1.28 on a neglected cluster after testing the upgrade in a dev environment and it was honestly not that terrible. It really comes down to having the capability and man hours to manage the procedure. In reality the longest part was waiting for cluster nodes to upgrade to X version of k8s, but the complete upgrade only took 3 weeks of testing and a single 4 hour outage with no loss in processing over the period.

    Realistically those workloads being run would have been better suited in an horizontal-scaling EC2 deployment but that was a future goal that never came to fruition.

    • JohnMakin 13 days ago

      Like I said, it depends on your situation. Sometimes a /v1/beta api gets deprecated and causes complete chaos for a deployment. Sometimes your IAC is resistant to these kinds of frequent changes. There's really a billion scenarios.

      For reference, I have done upgrades from 1.12 -> 1.28 and most of the time if get into a messy project and I can get away with it, I will just rebuild a cluster from scratch.

  • easton 13 days ago

    > try to encourage people to migrate to more expensive managed container orchestration services.

    The question is: what service? ECS is the competing Amazon built service and it’s entirely free for management, you just pay for compute. We don’t use k8s because ECS is free and we don’t plan on leaving AWS.

    Sure, you’re more locked in with ECS, but if you aren’t doing funky stuff with the APIs you can probably off ramp to k8s pretty easily. I know I could move us in a week or less, we’d have far bigger problems with the other AWS services we use.

    • JohnMakin 13 days ago

      Lightsail? I’m sure there’s more examples other than ecs.

    • pas 13 days ago

      how are you managing/configuring/monitoring/understanding ECS? (terraform?) to me it's a complete spaghetti with thick WTF-sauce. at least with k8s there's YAML and only YAML. (and there's CDK8S, and it supports tests, and spinning up a new cluster to test things is straightforward.)

      sure, I guess there's a whole industry that offers services to help manage AWS, but at that point the whole thing could be just spun off to the lowest bidder. (although I understand that people can make up a neverending list of reasons (excoughses) to keep paying the AWS tax.)

      ... okay, I'm probably too salty. if it works it works, if it's profitable and business is happy, it's hard to argue with the stack.

      • easton 12 days ago

        CloudFormation for us. CF is wack, but the ECS concepts seemed as easy as k8s.

  • noctarius 13 days ago

    Didn't think of this suspicion beforehand, but doesn't sound like a total miss.

  • pid-1 13 days ago

    As K8s matures it's likely we will get some kind of LTS versioning scheme.

    Having new realeases so often for such a core infrastructure component is kinda insane unless it was explicitly architected to allow seeamless upgrades.

    • mdaniel 13 days ago

      There's a tiny bit of nuance there about "allow seamless upgrades" in that they do what I think is a fantastic job of version skew toleration between all the parts that interact (kubectl, kubelet, apiserver, etc). So that part, I think, is not the long pole in any such tent, especially because if the control-plane gets wiped out, kubelet will continue to manage the last state of affairs it knew about, and traffic will continue to flow to those pods

      The hairy bit is the rando junk that gets shoved into clutsers, without any sane packaging scheme to roll it up or back. I even recently had to learn the deep guts of the sh.helm.v1.foo secret because we accidentally left an old release in a cluster which no longer supported its apiVersion. No problem, says I, $(helm uninstall && helm install --version new-thing) but har-de-har-har helm uses that Secret to fully rehydrate the whole manifest of the release before deleting it so when helm tries (effectively) kubectl delete thing/v1beta1/oldthing and pukes, well, no uninstall for you, even if those objects are already gone

    • noctarius 13 days ago

      I hope you're right. Apart from that, yes I think it's necessary.

  • cjk2 13 days ago

    Yeah this. My average day when I go near EKS upgrades: Waltz in, fuck up the ALB ingress controller in some new and interesting way, spend all day bouncing AWS support tickets around, find out it was AWS's fault, find half the manifest YAML schema in the universe is now deprecated, sob into my now soaking wet trousers and wonder why the fuck I ended up doing this for living.

    Yesterday I spent 3 hours trying to fix something and find it's an indent error somewhere.

htrp 14 days ago

This is also the right way to deprecate. Charge people an arm and a leg to keep things running (and eventually force them to migrate).

  • solatic 14 days ago

    100%. People are responsible for an ever-increasing amount of things; people will focus on business priorities and stuff that is working will be left the hell alone. As long as the bills are manageable and the business pays - the lights will be kept on forever. Passing increasing support costs to customers realigns interests between customer and provider without danger of user impact.

    And for Kubernetes, honestly, charging 6x for extended support is probably a bargain, considering the pace of change and difficulty of hiring engineers for unsexy maintenance work.

    • mdaniel 14 days ago

      I do appreciate that the devil is always in the details, but I'll be straight that their new(?) "Upgrade insights" tab/api <https://docs.aws.amazon.com/eks/latest/userguide/cluster-ins...> goes a long way toward driving down the upgrade risk from a "well, what are we using that's going to get cut in the new version?"

      We just rolled off of their extended version and it was about 19 minutes to upgrade the control plane, no downtime, and then varying between 10 minutes and over an hour to upgrade the vpc-cni add-on. It seemed just completely random, and without any cancel button. We also had to manually patch kube-proxy container version, which OT1H, they did document, but OTOH, well, I didn't put those DaemonSets on the Nodes so why do I suddenly have to manage its version? Weird

      Touching CNI is always a potential downtime inducing event, but for the most part it was manageable

  • TheP1000 14 days ago

    Agreed. I would imagine the previous approach of forced upgrades ended up burning lots of customers in worse ways than just their pocketbook.

  • noctarius 14 days ago

    True, but I guess it'll be a surprise to many. And, unfortunately, upgrading isn't always the easiest thing with deprecations and stuff

noctarius 14 days ago

Article by Mary Henry. I was shocked to see how much more the extended support (per hour) cost is for Kubernetes on AWS.

Haven't had that situation myself on AWS yet, but ran into it a few times on Azure

I can't remember to have paid extra on Azure though, but maybe we did. Certainly not 6x the price though.

PS: not sure why it got flagged the first time, but I think because I used a different title. Sorry.

  • res0nat0r 13 days ago

    We just got emails yesterday about the EKS price increase. It's another reason we're trying to move the main app to the vendors SaaS because I don't have enough time and resources to be a fulltime k8s admin. The ecosystem moves way too fast and upgrades/deprecation happens way too quickly to keep up and to have time test / plan / rollout proper upgrades without breaking our critical production workloads.

  • qqtt 14 days ago

    AWS also recently ended support for Mysql 5, so if you had an RDS instance with that version running past the cutoff, your support costs ballooned exorbitantly.

    • VectorLock 14 days ago

      Yup this one hit me hard. USE2-ExtendedSupport:Yr1-Yr2:MySQL5.7 sent my bill up 70%.

      • hughesjj 14 days ago

        How long was it before the notice and you getting charged extra?

    • noctarius 14 days ago

      Seems like I'm a lucky one. Neither using RDS nor MySQL. But seriously, ouch. I mean, I get why they want people to migrate to supported versions but ...

      • SteveNuts 14 days ago

        I wish we could implement this internally via chargebacks. The teams that refuse to upgrade their stuff should be forced to pay for the externalities they cause.

chrisjj 14 days ago

> running unsupported versions makes it harder to get help from a community that’s currently focused on the latest version

Great example of misuse of that simple word 'that'.

Should be 'which'.

  • TecoAndJix 14 days ago

    Always learning something new[1]:

    "The difference between which and that depends on whether the clause is restrictive or nonrestrictive.

    In a restrictive clause, use that.

    In a nonrestrictive clause, use which.

    Remember, which is as disposable as a sandwich wrapper. If you can remove the clause without destroying the meaning of the sentence, the clause is nonessential (another word for nonrestrictive), and you can use which.

    [1] https://www.grammarly.com/blog/which-vs-that/#:~:text=Which%....

thebeardisred 13 days ago

This is something most people don't realize is an aspect of Red Hat's value. Extended Lifecycle Support (ELS) + Extended Update Support (EUS) are available _just in case_ you really can't figure out how to migrate off of those Red Hat Enterprise Linux 6 systems running on x86 (32 bit). https://access.redhat.com/support/policy/updates/errata

VectorLock 14 days ago

Had this bite me for my small-scale personal AWS setup. Have an AWS account I run some personal sites on, a Mastodon instance, etc. Got some Billing Alarms I setup that my bill went from normally $100 to $180. Got a $75 charge for USE2-ExtendedSupport:Yr1-Yr2:MySQL5.7 I mean I'm very used to Amazon's ridiculous fee structure but even this one caught me for a loop.

  • steelaz 14 days ago

    To be fair to AWS, they announced the deprecation of MySQL 5.7 in January 2021, and many emails warned of this change throughout 2024.

    • VectorLock 5 days ago

      Deprecating a service is one thing.

      Charging an arm and a leg for a deprecated service is another.

  • noctarius 14 days ago

    Ouch. Glad you had the alarm (and that it reacted "early enough"). Anyhow, I think you may not be along with that surprise.

bushbaba 13 days ago

Why the AWS hate when this is an issue of the k8s cncf team from constant churn. There needs to be a cncf blessed LTS release of k8s. AWS is just filling in a gap here with headaches involved of back porting security patches.

abrookewood 13 days ago

Doesn't just apply to EKS - we are currently going through the same thing with MySQL on RDS. It's a big jump in support cost, but at the same time, I understand why they are doing it.