edelbitter 14 hours ago

Why does the title say "Zero Trust", when the article explains that this only works as long as every involved component of the Cloudflare MitM keylogger and its CA can be trusted? If hosts keys are worthless because you do not know in advance what key the proxy will have.. than this scheme is back to trusting servers merely because they are in Cloudflare address space, no?

  • hedora 10 hours ago

    Every zero trust architecture ends up trusting an unbounded set of machines. Like most marketing terms, it’s probably easier to assume it does the inverse of what it claims.

    My mental model:

    With 1 trust (the default) any trusted machine with credentials is provided access and therefore gets one unit of access. With 2-trust, we’d need at least units of trust, so two machines. Equivalently, each credential-bearing machine is half trusted (think ssh bastion hosts or 2FA / mobikeys for 2 trust).

    This generalizes to 1/N, so for zero trust, we place 1/0 = infinite units of trust in every machine that has a credential. In other words, if we provision any one machine for access, we necessarily provision an unbounded number of other machines for the same level of access.

    As snarky as this math is, I’ve yet to see a more accurate formulation of what zero trust architectures actually provide.

    YMmV.

    • choeger 7 hours ago

      I think your model is absolutely right. But there's a catch: Zero Trust (TM) is about not giving any machine any particular kind of access. So it's an infinite amount of machines with zero access.

      The point of Zero Trust (TM) is to authenticate and authorize the human being behind the machine, not the machine itself.

      (Clearly, that doesn't work for all kinds of automated access and it comes with a lot of question in terms of implementation details (E.g., do we trust the 2FA device?) but that's the gist.)

  • varenc 12 hours ago

    https://www.cloudflare.com/learning/security/glossary/what-i...

    Zero Trust just means you stop inherently trusting your private network and verify every user/device/request regardless. If you opt in to using Cloudflare to do this then it requires running Cloudflare software.

    • michaelt an hour ago

      Zero Trust means you stop trusting your private network, and start trusting Cloudflare, and installing their special root certificate so they can MITM all your web traffic. To keep you safe.

    • PLG88 11 hours ago

      Thats one interpretation... ZT also posits assuming the network is compromised and hostile, that also applies to CF and their cloud/network. It blows my mind that so many solutions claim ZT while mandating TLS to their infra/cloud, you can trust their decryption of your date, and worst IMHO, they will MITM your OICD/SAML key to ensure the endpoint can authenticate and access services... that is a hell of a lot of implicit trust in them, not least of them being served a court order to decrypt your data.

      Zero trust done correctly done not have those same drawbacks.

      • sshine 11 hours ago

        One element is buzzword inflation, and another is raising the bar.

        On the one hand, entirely trusting Cloudflare isn't really zero trust.

        On the other hand, not trusting any network is one narrow definition.

        I'll give you SSH keys when you pry them from my cold, dead FDE SSDs.

    • bdd8f1df777b 12 hours ago

      But with public key auth I'm already distrusting everyone on my private network.

      • resoluteteeth 12 hours ago

        Technically I guess that's "zero trust" in the sense of meeting the requirement of not trusting internal connections more than external ones, but in practice I guess "zero trust" also typically entails making every connection go through the same user-based authentication system, which uploading specific keys to specific servers manually definitely doesn't achieve.

  • fs111 4 hours ago

    Zero Trust is a marketing label that executives can seek out and buy a thing for because it is super-hot thing to have these days. That's mostly it.

  • pjc50 an hour ago

    > Cloudflare MitM keylogger

    Would you like to explain what you mean by this?

    • jdbernard an hour ago

      Different responder, but I imagine they are referring to CloudFlare's stated ability to:

      Provide command logs and session recordings to allow administrators to audit and replay their developers’ interactions with the organization’s infrastructure.

      The only way they can do this is if they record and store the session text, effectively a keylogger between you and the machine you are SSH'ing into.

  • ozim 7 hours ago

    “Zero Trust” is not assuming user has access or he is somehow trusted just because he is in trusted context. So you always check users access rights.

    TLS having trusted CA cert publisher is not context of “Zero Trust”.

  • znpy 3 hours ago

    > Why does the title say "Zero Trust", when the article explains that this only works as long as every involved component of the Cloudflare MitM keylogger and its CA can be trusted?

    The truth-yness of "zero trust" really depends on who's trusting who.

TechnicalVault 2 hours ago

The whole MITM just makes me deeply uncomfortable, it's introducing a single point of trust with the keys to the kingdom. If I want to log what someone is doing, I do it server side e.g. some kind of rsyslog. That way I can leverage existing log anomaly detection systems to pick up and isolate the server if we detect any bad behaviour.

  • naikrovek 20 minutes ago

    yeah the MITM thing is ... concerning.

    this just moves the trusted component from the SSH key to Cloudflare, and you still must trust something implicitly. except now it's a company that has agency and a will of its own instead of just some files on a filesystem.

    I'll stick to forced key rotation, thanks.

tptacek 13 hours ago

I'm a fan of SSH certificates and cannot understand why anyone would set up certificate authentication with an external third-party CA. When I'm selling people on SSH CA's, the first thing I usually have to convince them of is that I'm not saying they should trust some third party. You know where all your servers are. External CAs exist to solve the counterparty introduction problem, which is a problem SSH servers do not have.

  • michaelt an hour ago

    > I'm a fan of SSH certificates and cannot understand why anyone would set up certificate authentication with an external third-party CA.

    I think the sales pitch for these sorts of service is: "Get an SSH-like experience, but it integrates with your corporate single-sign-on system, has activity logs that can't be deleted even if you're root on the target, sorts out your every-ephemeral-cloud-instance-has-a-different-fingerprint issues, and we'll sort out all the reverse-tunnelling-through-NAT and bastion-host-for-virtual-private-cloud stuff too"

    Big businesses pursuing SOC2 compliance love this sort of thing.

  • kevin_nisbet 9 hours ago

    I'm with you, I imagine it's mostly people just drawing parallels, they can figure out how to get a web certificate so think SSH is the same thing.

    The second order problem I've found is when you dig in there are plenty of people who ask for certs but when push comes to shove really want functionality where when user access is cancelled all active sessions get torn down immediatly as well.

  • xyst 13 hours ago

    Same reasons for companies still buying “CrowdStrike” and installing that crapware. It’s all for regulatory checkboxes (ie, fedramp cert).

    • tptacek 12 hours ago

      I do not believe you in fact need any kind of SSH CA, let alone one run by a third party, to be FedRAMP-compliant.

nanis an hour ago

> the SSH certificates issued by the Cloudflare CA include a field called ValidPrinciples

Having implemented similar systems before, I was interested to read this post. Then I see this. Now I have to find out if that really is the field, if this was ChatGPT spellcheck, or something else entirely.

  • blueflow an hour ago

    For the others: The correct naming is "principals".

blueflow an hour ago

Instead of stealing your password/keypair, the baddies will now have to spoof your authentication with cloudflare. If thats just a password, you gained nothing. If you have 2FA set up for that, you could equally use that for SSH directly, using a ssh key on a physical FIDO stick. OpenSSH already has native support for that (ecdsa-sk and ed25519-sk key formats).

The gain here is minimal.

mdaniel 15 hours ago

I really enjoyed my time with Vault's ssh-ca (back when it had a sane license) but have now grown up and believe that any ssh access is an antipattern. For context, I'm also one of those "immutable OS or GTFO" chaps because in my experience the next thing that happens after some rando ssh-es into a machine is they launch vi or apt-get or whatever and now it's a snowflake with zero auditing of the actions taken to it

I don't mean to detract from this, because short-lived creds are always better, but for my money I hope I never have sshd running on any machine again

  • blueflow 35 minutes ago

    Even for immutable OSes, SSH is a great protocol for bidirectionally authenticated data / file transfer.

  • akira2501 8 hours ago

    > any ssh access is an antipattern.

    Not generally. In one particular class of deployments allowing ssh access to root enabled accounts without auditing may be.. but this is an exceptionally narrowed definition.

    > I hope I never have sshd running on any machine again

    Sounds great for production and ridiculous for development and testing.

  • advael 10 hours ago

    Principle of least privilege trivially prevents updating system packages. Like if you don't want people using apt, don't give people root on your servers?

  • ozim 14 hours ago

    How do you handle db.

    Stuff I work on is write heavy so spawning dozens of app copies doesn’t make sense if I just hog the db with Erie locks.

    • mdaniel 14 hours ago

      I must resist the urge to write "users can access the DB via the APIs in front of it" :-D

      But, seriously, Teleport (back before they did a licensing rug-pull) is great at that and no SSH required. I'm super positive there are a bazillion other "don't use ssh as a poor person's VPN" solutions

      • zavec 13 hours ago

        This led me to google "teleport license," which sounds like a search from a much more interesting world.

        • aspenmayer 10 hours ago

          You might be interested in Peter F. Hamilton's Commonwealth Saga sci-fi series, then.

          Among other tech, it involves the founding of a megacorp that exploits the discovery and monopolization of wormhole technology for profit, causing a rift between the two founders, who each remind me of Steve Jobs and Steve Wozniak in their cooperation and divergence.

          https://en.wikipedia.org/wiki/Commonwealth_Saga

          • Hikikomori 2 hours ago

            Yo, dudes, how’s it hanging?

            • aspenmayer 2 hours ago

              Is this a reference to the books? It's been a while since I read them.

              • Hikikomori 2 hours ago

                Its what Ozzie or Nigel say over the radio after they landed.

                • aspenmayer 2 hours ago

                  Ah yeah, that's a great scene! The bravado and hubris of gatecrashing an interplanetary livestream to launch your startup out of stealth is just chef's kiss.

  • ashconnor 10 hours ago

    You can audit if you put something like hoop.dev, Tailscale, Teleport or Boundary in between the client and server.

    Disclaimer: I work at Hashicorp.

  • riddley 14 hours ago

    How do you troubleshoot?

    • bigiain 14 hours ago

      I think ssh-ing into production is a sign of not fully mature devops practices.

      We are still stuck there, but we're striving to get to the place where we can turn off sshd on Prod and rely on the CI/CD pipeline to blow away and reprovision instances, and be 100% confident we can test and troubleshoot in dev and stage and by looking at off-instance logs from Prod.

      How important it is to get there is something I ponder about my motivations for - it's cleary not worthwhile if your project is one or 2 prod servers perhaps running something like HA WordPress, but it's obvious that at Netflix type scale that nobody is sshing into individual instances to troubleshoot. We are a long way (a long long long long way) from Netflix scale, and are unlikely to ever get there. But somewhere between dozens and hundreds of instances is about where I reckon the work required to get close to there stars paying off.

      • sleepydog an hour ago

        It's a good mindset to have, but I think ssh access should still be available as a last resort on prod systems, and perhaps trigger some sort of postmortem process, with steps to detect the problem without ssh in the future. There is always going to be a bug, that you cannot reproduce outside of prod, that you cannot diagnose with just a core dump, and that is a show stopper. It's one thing to ignore a minor performance degradation, but if the problem corrupts your state you cannot ignore it.

        Moreover, if you are in the cloud, part of your infrastructure is not under your control, making it even harder to reproduce a problem.

        I've worked with companies at Netflix's scale and they still have last-resort ssh access to their systems.

      • imiric 13 hours ago

        Right. The answer is having systems that are resilient to failure, and if they do fail being able to quickly replace any node, hopefully automatically, along with solid observability to give you insight into what failed and how to fix it. The process of logging into a machine to troubleshoot it in real-time while the system is on fire is so antiquated, not to mention stressful. On-call shouldn't really be a major part of our industry. Systems should be self-healing, and troubleshooting done during working hours.

        Achieving this is difficult, but we have the tools to do it. The hurdles are often organizational rather than technical.

        • bigiain 13 hours ago

          > The hurdles are often organizational rather than technical.

          Yeah. And in my opinion "organizational" reasons can (and should) include "we are just not at the scale where achieving that makes sense".

          If you have single digit numbers of machines, the whole solid observability/ automated node replacement/self-healing setup overhead is unlikely to pay off. Especially if the SLAs don't require 2am weekend hair-on-fire platform recovery. For a _lot_ things, you can almost completely avoid on-call incidents with straightforward redundant (over provisioned) HA architectures, no single points of failure, and sensible office hours only deployment rules (and never _ever_ deploy to Prod on a Friday afternoon).

          Scrappy startups, and web/mobile platforms for anything where a few hours of downtime is not going to be an existential threat to the money flow or a big story in the tech press - probably have more important things to be doing than setting up log aggregation and request tracing. Work towards that, sure, but probably prioritise the dev productivity parts first. Get your CI/CD pipeline rock solid. Get some decent monitoring of the redundant components of your HA setup (as well as the Prod load balancer monitoring) so you know when you're degraded but not down (giving you some breathing space to troubleshoot).

          And aspire to fully resilient systems and have a plan for what they might look like in the future to avoid painting yourself into a corner that makes it harder then necessary to get there one day.

          But if you've got a guy spending 6 months setting up chaos monkey and chaos doctor for your WordPress site that's only getting a few thousand visits a day, you're definitely going it wrong. Five nines are expensive. If your users are gonna be "happy enough" with three nines or even two nines, you've probably got way better things to do with that budget.

          • Aeolun 12 hours ago

            > For a _lot_ things, you can almost completely avoid on-call incidents with straightforward redundant (over provisioned) HA architectures, no single points of failure, and sensible office hours only deployment rules (and never _ever_ deploy to Prod on a Friday afternoon).

            For a lot of things the lack of complexity inherent in a single VPS server will mean you have better availability than any of those bizarrely complex autoscaling/recovery setups

          • imiric 6 hours ago

            I'm not so sure about all of that.

            The thing is that all companies regardless of their scale would benefit from these good practices. Scrappy startups definitely have more important things to do than maintaining their infra, whether that involves setting up observability and automation or manually troubleshooting and deploying. Both involve resources and trade-offs, but one of them eventually leads to a reduction of required resources and stability/reliability improvements, while the other leads to a hole of technical debt that is difficult to get out of if you ever want to improve stability/reliability.

            What I find more harmful is the prevailing notion that "complexity" must be avoided at smaller scales, and that somehow copying a binary to a single VPS is the correct way to deploy at this stage. You see this in the sibling comment from Aeolun here.

            The reality is that doing all of this right is an inherently complex problem. There's no getting around that. It's true that at smaller scales some of these practices can be ignored, and determining which is a skill on its own. But what usually happens is that companies build their own hodgepodge solutions to these problems as they run into them, which accumulate over time, and they end up having to maintain their Rube Goldberg machines in perpetuity because of sunk costs. This means that they never achieve the benefits they would have had they just adopted good practices and tooling from the start.

            I'm not saying that starting with k8s and such is always a good idea, especially if the company is not well established yet, but we have tools and services nowadays that handle these problems for us. Shunning cloud providers, containers, k8s, or any other technology out of an irrational fear of complexity is more harmful than beneficial.

      • xorcist 5 hours ago

        > at Netflix type scale that nobody is sshing into individual instances to troubleshoot

        Have you worked at Netflix?

        I haven't, but I have worked with large scale operations, and I wouldn't hesitate to say that the ability to ssh (or other ways to run commands remotely, which are all either built on ssh or likely not as secure and well tested) is absolutely crucial to running at scale.

        The more complex and non-heterogenous environments you have, the more likely you are to encounter strange flukes. Handshakes that only fail a fraction of a percent of all times and so on. Multiple products and providers interaction. Tools like tcpdump and eBPF becomes essential.

        Why would you want to deploy on a mature operating system such as Linux and not use tools such as eBPF? I know the modern way is just to yolo it and restart stuff that crashes, but as a startup or small scale you have other things to worry about. When you are at scale you really want to understand your performance profile and iron out all the kinks.

        • Hikikomori 2 hours ago

          Can also use stuff like Datadog NPM/APM that uses eBPF to pick up most of what you need. Its been a long time since I've needed anything else.

      • naikrovek 8 minutes ago

        > I think ssh-ing into production is a sign of not fully mature devops practices.

        that's great and completely correct when you are one of the very few places in the universe where everything is fully mature and stable. the rest of us work on software. :)

      • otabdeveloper4 6 hours ago

        A whole lot of words to say "we don't troubleshoot and just live with bugs, #yolo".

    • mdaniel 14 hours ago

      In my world, if a developer needs access to the Node upon which their app is deployed to troubleshoot, that's 100% a bug in their application. I am cognizant that being whole-hog on 12 Factor apps is a journey, but for my money get on the train because "let me just ssh in and edit this one config file" is the road to ruin when no one knows who edited what to set it to what new value. Running $(kubectl edit) allows $(kubectl rollout undo) to put it back, and also shows what was changed from what to what

      • megous a minute ago

        Your world is very narrow and limited. Some devs also have to deal with customer provisioned HW infrastructure, with buggy interactions between HW/virtualization solutions that every 5 minutes duplicate all packets for a few seconds; with applications that interact with customer only onsite HW you only have remote access to via production deployment; with quirky virtualization like vmware stopping the vCPU on you for hundreds of ms if you load it too much which you'll not replicate locally; with things you can't predict you'll need to observe ahead of time, etc. And it does not involve editing any configs. It's just troubleshooting.

      • yjftsjthsd-h 14 hours ago

        How do you debug the worker itself?

        • mdaniel 14 hours ago

          Separate from my sibling comment about AWS SSM, I also believe that if one cannot know that a Node is sick by the metrics or log egress from it, that's a deployment bug. I'm firmly in the "Cattle" camp, and am getting closer and closer to the "Reverse Uptime" camp - made easier by ASG's newfound "Instance Lifespan" setting to make it basically one-click to get onboard that train

          Even as I type all these answers out, I'm super cognizant that there's not one hammer for all nails, and I am for sure guilty of yanking Nodes out of the ASG in order to figure out what the hell has gone wrong with them, but I try very very hard not to place my Nodes in a precarious situation to begin with so that such extreme troubleshooting becomes a minor severity incident and not Situation Normal

          • __turbobrew__ 11 hours ago

            If accidentally nuking a single node while debugging causes issues you have bigger problems. Especially if you are running kubernetes any node should be able to fall off the earth at any time without issues.

            I agree that you should set a maximum lifetime for a node on the order of a few weeks.

            I also agree that you shouldn’t be giving randos access to production infra, but and the end of the day there needs to be some people at the company who have the keys to the kingdom because you don’t know what you don’t know and you need to be able to deal with unexpected faults or outages of the telemetry and logging systems.

            I once bootstrapped an entire datacenter with tens of thousands of nodes from an SSH terminal after an abrupt power failure. It turns out infrastructure has lots of circular dependencies and we had to manually break that dependency.

            • ramzyo 8 hours ago

              Exactly this. Have heard it referred to as "break glass access". Some form of remote access, be it SSH or otherwise, in case of serious emergency.

          • viraptor 9 hours ago

            Passive metrics/logs won't let you debug all the issues. At some point you either need a system for automatic memory dumps and submitting bpf scripts to live nodes... or you need SSH access to do that.

            • otabdeveloper4 6 hours ago

              This "system for automatic dumps" 100 percent uses ssh under the hood. Probably with some eternal sudo administrator key.

              Personal ssh access is always better (from a security standpoint) than bot tokens and keys.

              • viraptor 3 hours ago

                There's a thousand ways to do it without SSH. It can be built into the app itself. It can be a special authenticated route to a suid script. It can be built into the current orchestration system. It can be pull-based using the a queue for system monitoring commands. It can be part of the existing monitoring agent. It can be run through AWS SSM. There's really no reason it has to be SSH.

                And even got SSH you can have special keys with access authorised to only specific commands, so a service account would be better than personal in that case.

        • from-nibly 11 hours ago

          You don't. you shoot it in the head and get a new one. If you need logging / telemetry bake it into the image.

          • otabdeveloper4 5 hours ago

            Are you from techsupport?

            Actually not every problem is solved with the "have you tried turning it off and back on again" trick.

    • LtWorf 3 hours ago

      He asks the senior developer to do it.

  • namxam 15 hours ago

    But what is the alternative?

    • mdaniel 14 hours ago

      There's not one answer to your question, but here's mine: kubelet and AWS SSM (which, to the best of my knowledge will work on non-AWS infra it just needs to be provided creds). Bottlerocket <https://github.com/bottlerocket-os/bottlerocket#setup> comes batteries included with both of those things, and is cheaply provisioned with (ahem) TOML user-data <https://github.com/bottlerocket-os/bottlerocket#description-...>

      In that specific case, one can also have "systemd for normal people" via its support for static Pod definitions, so one can run containerized toys on boot even without being a formal member of a kubernetes cluster

      AWS SSM provides auditing of what a person might normally type via ssh, and kubelet similarly, just at a different abstraction level. For clarity, I am aware that it's possible via some sshd trickery one could get similar audit and log egress, but I haven't seen one of those in practice whereas kubelet and AWS SSM provide it out of the box

    • ndndjdueej 11 hours ago

      IaC, send out logs to Splunk, health checks, slow rollouts, feature flags etc?

      Allow SSH in non prod environments and reproduce issue there?

      In prod you are aiming for "not broken" rather than "do whatever I want as admin".

    • candiddevmike 12 hours ago

      I built a config management tool, Etcha, that uses short lived JWTs. I extended it to offer a full shell over HTTP using JWTs:

      https://etcha.dev/docs/guides/shell-access/

      It works well and I can "expose" servers using reverse proxies since the entire shell session is over HTTP using SSE.

      • g-b-r 11 hours ago

        “All JWTs are sent with low expirations (5 seconds) to limit replability”

        Do you know how many times a few packets can be replayed in 5 seconds?

        • candiddevmike 11 hours ago

          Sure, but this is all happening over HTTPS (Etcha only listens on HTTPS), it's just an added form of protection/expiration.

      • artificialLimbs 9 hours ago

        I don’t understand why this is more secure than limiting SSH to local network only and doing ‘normal’ ssh hardening.

        • candiddevmike 44 minutes ago

          None of that is required here? Etcha can be exposed on the Internet with a smaller risk profile than SSH:

          - Sane, secure defaults

          - HTTP-based--no fingerprinting, requires the correct path (which can be another secret), plays nicely with reverse proxies and forwarders (no need for jump boxes)

          - Rate limited by default

          - Only works with PKI auth

          - Clients verify/validate HTTPS certificates, no need for SSHFP records.

antoniomika 14 hours ago

I wrote a system that did this >5 years ago (luckily was able to open source it before the startup went under[0]). The bastion would record ssh sessions in asciicast v2 format and store those for later playback directly from a control panel. The main issue that still isn't solved by a solution like this is user management on the remote (ssh server) side. In a more recent implementation, integration with LDAP made the most sense and allows for separation of user and login credentials. A single integrated solution is likely the holy grail in this space.

[0] https://github.com/notion/bastion

  • mdaniel 14 hours ago

    Out of curiosity, why ignore this PR? https://github.com/notion/bastion/pull/13

    I would think even a simple "sorry, this change does not align with the project's goals" -> closed would help the submitter (and others) have some clarity versus the PR limbo it's currently in

    That aside, thanks so much for pointing this out: it looks like good fun, especially the Asciicast support!

    • antoniomika 13 hours ago

      Honestly never had a chance to merge it/review it. Once the company wound down, I had to move onto other things (find a new job, work on other priorities, etc) and lost access to be able to do anything with it after. I thought about forking it and modernizing it but never came to fruition.

shermantanktop 13 hours ago

I didn’t understand the marketing term “zero trust” and I still don’t.

In practice, I get it - a network zone shouldn’t require a lower authn/z bar on the implicit assumption that admission to that zone must have required a higher bar.

But all these systems are built on trust, and if it isn’t based on network zoning, it’s based on something else. Maybe that other thing is better, maybe not. But it exists and it needs to be understood.

An actual zero trust system is the proverbial unpowered computer in a bunker.

  • athorax 10 hours ago

    It means there is zero trust of a device/service/user on your network until they have been fully authenticated. It is about having zero trust in something just because it is inside your network perimeter.

  • wmf 12 hours ago

    The something else is specifically user/service identity. Not machine identity, not IP address. It is somewhat silly to have a buzzword that means "no, actually authenticate users" but here we are.

  • ngneer 13 hours ago

    With you there. The marketing term makes Zero Sense to me.

koutsie 29 minutes ago

How is trusting Cloudflare "zero-trust" ?

keepamovin 7 hours ago

Does this give CloudFlare a backdoor to all your servers? That would not strictly be ZT, as some identify in the comments here.

  • udev4096 2 hours ago

    For cloudflare, all their fancy ZT excludes themselves. It's just like the well known MiTM they perform while using their CA

  • knallfrosch 5 hours ago

    Everything rests on CloudFlare's key.

udev4096 2 hours ago

> You no longer need to manage long-lived SSH keys

Well, now you are managing CAs. Sure, it's short lived but it's not different than having a policy for rotating your SSH keys

johnklos 13 hours ago

So... don't trust long lived ssh keys, but trust Cloudflare's CA. Why? What has Cloudflare done to earn trust?

If that alone weren't reason enough to dismiss this, the article has marketing BS throughout. For instance, "SSH access to a server often comes with elevated privileges". Ummm... Every authentication system ever has whatever privileges that come with that authentication system. This is the kind of bull you say / write when you want to snow someone who doesn't know any better. To those of us who do understand this, this is almost AI level bullshit.

The same is true of their supposed selling points:

> Author fine-grained policy to govern who can SSH to your servers and through which SSH user(s) they can log in as.

That's exactly what ssh does. You set up precisely which authentication methods you accept, you set up keys for exactly that purpose, and you set up individual accounts. Do Cloudflare really think we're setting up a single user account and giving access to lots of different people, and we need them to save us? (now that I think about it, I bet some people do this, but this is still a ridiculous selling point)

> Monitor infrastructure access with Access and SSH command logs

So they're MITM all of our connections? We're supposed to trust them, even though they have a long history of not only working with scammers and malicious actors, but protecting them?

I suppose there's a sucker born every minute, so Cloudflare will undoubtedly sell some people on this silliness, but to me it just looks like yet another way that Cloudflare wants to recentralize the Internet around them. If they had their way, then in a few years, were they to go down, a majority of the Internet would literally stop working. That should scare everyone.

amar0c 3 hours ago

Is there anything similar ("central point of SSH access/keys management" ) that is not Cloudflare ? I know about Tailscale and it's SSH but recently it introduced so much latency (even tho they say it's P2P between A and B) that is unusable.

Ideally something self hosted but not hard requirement

  • udev4096 an hour ago

    Like a ssh key management cli?

EthanHeilman 15 hours ago

I'm a member of the team that worked on this happy to answer any questions.

We (BastionZero) recently got bought by Cloudflare and it is exciting bringing our SSH ideas to Cloudflare.

cyberax 13 hours ago

Hah. I did pretty much all the same stuff in my previous company.

One thing that we did a bit better: we used AWS SSM to provision our SSH-CA certificates onto the running AWS EC2 instances during the first connection.

It would be even better if AWS allowed to use SSH CA certs as keys, but alas...

  • pugz 9 hours ago

    FYI I love your work with Gimlet, etc.

    I too would love "native" support for SSH CAs in EC2. What I ended up doing is adding a line to every EC2 userdata script that would rewrite the /home/ec2-user/.ssh/authorized_keys file to treat the provided EC2 keypair as a CA instead of a regular pubkey.

arianvanp 12 hours ago

Zero trust. But they don't solve the more interesting problem: host key authentication.

Would be nice if they can replace TOFU access with SSH CA as well. Ideally based on device posture of the server (e.g. TPM2 attestation)

INTPenis 5 hours ago

Properly setup IaC, that treats Linux as an appliance instead, could get rid of SSH altogether.

I'm only saying this because after 20+ years as a sysadmin I feel like there have been no decent solutions presented. On the other hand, to protect my IaC and Gitops I have seen very decent and mature solutions.

  • otabdeveloper4 4 hours ago

    I don't know what exactly you mean by "IaC" here, but the ones I know use SSH under the hood somewhere. (Except with some sort of "bot admin" key now, which is strictly worse.)

    • INTPenis 2 hours ago

      I mean that you treat Linux servers as appliances, you do everything in IaC at provisioning and you never login over SSH.

anilakar 5 hours ago

Every now and then a new SSH key management solution emerges and every time it is yet another connection-terminating proxy and not a real PKI solution.

advael 10 hours ago

You know you can just do this with keyauth and a cron job, right?

  • wmf 10 hours ago

    And Dropbox is a wrapper around rsync.

    • advael 10 hours ago

      Generally speaking a lot of "essential tools" in "cloud computing" are available as free, boring operating system utilities.

      • kkielhofner 8 hours ago

        It’s a joke from a famous moment in HN history:

        https://news.ycombinator.com/item?id=9224

        • advael 6 hours ago

          That is pretty funny, and the whole idea that you can't make money packaging open-source software in a way that's more appealing to people is definitely funny given that this is the business model of a lot of successful companies

          I do however think this leads to a lot of problems when those companies try to protect their business models, as we are seeing a lot of today

xyst 13 hours ago

Underlying tech is “Openpubkey”.

https://github.com/openpubkey/openpubkey

BastionZero just builds on top of that to provide a “seamless” UX for ssh sessions and some auditing/fedramp certification.

Personally, not a fan of relying on CF. Need less centralization/consolidation into a few companies. It’s bad enough with MS dominating the OS (consumer) space. AWS dominating cloud computing. And CF filling the gaps between the stack.

  • datadeft 5 hours ago

    > BastionZero just builds on top of that to provide a “seamless” UX

    Isn't this what many of the companies do?

  • debarshri 5 hours ago

    I think teleport operates in similar style.

  • ranger_danger 12 hours ago

    Completely agree. I also don't want to trust certificate authorities for my SSH connections let alone CF. Would not be surprised if it/they were compromised.

rdtsc 4 hours ago

By “ValidPrinciples” did they mean “ValidPrincipals”?

And by ZeroTrust they really mean OneTrust: trust CF. A classic off-by-one error :-)