S3 configuration is awful, full stop. It’s not just that the interface is a disaster and configuration options are full of jargon and access rules are written in JSON.
The problem is that for someone who only periodically uses S3, I’m lost. I’m not lost in other services…Cloudflare, Firebase, Mailgun, and dozens of others somehow manage to allow people to use their service without so much agony.
I’m almost positive my S3 bucket is misconfigured because of how absurdly complex it is.
If you disagree and have spent more than 200 hours working within S3 I submit that it’s because you’re just an expert. I shouldn’t need a certification to upload files and retrieve them securely.
More specifically, each possible action on a bucket or file has a permission. You have the option of going granular or not, depending on what you want.
S3 can be thought of as a service and an API. It might be easier for you as a dev to look at the api docs and the various options then map that to the UI.
For all services there are possible actions and authorization checks for those actions. It's up to you to allow or disallow the actions you want on a per user basis.
Complicated? Sure, it can be. Welcome to modern computing.
Tech should be simple and fail immediately if used incorrectly. It's a tech issue if the dispersal of customer records is happening frequently enough to be a meme.
S3 does fail safe. The default is to deny access, but the problem is that every so often someone “solves” a problem by granting access broadly rather than learning how the tool works or reading the prominent warnings. This isn’t like C++, it’s more an example of how people don’t consistently respect the ancillary tools they use enough to learn how they work. I’ve seen the same pattern in many other contexts and it always came down to “I need to get my real work done, I don’t have time for this sysadmin stuff!”
You can try to be Rust and refuse such access rules. Yes, I know it sometimes makes sense. Rust rejects perfectly fine codes too. People are really into such things.
Rust has unsafe, too, because the implementers know that they need to work in a messy world. S3 has options for globally blocking public access, there are warnings and tools which continuously audit for things like cross-account access, etc. but they have a lot of customers who legitimately need things like that so there has to be a way to do it, just as Rust allows you to say “I know what I’m doing, call libc!”
This does a great job of highlighting why properly configuring infrastructure is hard: S3 buckets (one of the most simple cloud infra services) have 70 configuration options.
Imagine you're a junior dev and your manager says "just spin up an S3 bucket and drop the data there, and make sure your app can access it".
S3 does have some sensible defaults, but a lot of Terraform modules do not...imagine somebody who now has to decipher S3's basic properties, ACLs, IAM, etc.
Totally agree with you on this one, it's the role of specialized platform teams + managers to make sure devs have the tools they need while also accounting for their skill level(s).
Not everyone has access to a specialized platform team or technical manager; very often it is a solo (or small group of) devs just trying to Get Stuff Done
Part of this is a consequence of Amazon's hesitance towards shutting down old features. They are getting better at this lately, but S3 remains an example of a system with too many ways of doing things simply because they don't want to take the step of eliminating legacy functionality.
Introduce S4 [1] with a reduced set of configuration options and nudge customers into using it instead. Then eventually make it the default and encourage migrations.
Maybe compared to other AWS offerings S3 is a simple service. But on the scale of all services it's incredibly complex. There is no shortage of providers offering cloud storage that's actually easy to set up, and intuitive to set up correctly
Inexperienced devs/admins are always a risk when it comes to infra. There isn't much difference between a misconfigured mysql test user and a freely accessible S3 bucket. One might be more modern then the other, but thats about it. Only real difference is that with cloud infra, junior employees can do more harm then they used to be able to do with local infra, because stuff has grown since.
This is just a list of 'how to do x with awscli [and if the bucket allows unauthenticated users to do x then you will get a result]'.
Unless I'm missing something there's nothing particularly.. interesting or thought out here? May as well read the docs for available s3/s3api operations - there's more!
The article is a lengthy discussion of something simple.
1) use a proxy or VPN
2) write a bucket guesser in python (use your imagination)
3) run this https://github.com/sa7mon/S3Scanner
Now you have list/read/write status info +/- existence per S3 scanner.
The interesting thing is, most people wouldn't do the same things (say, chmod 777 all the things) on a public NAS.
If this assumption is true, it begs the question. Why do people act like public cloud storage is more secure than "private", on prem storage?
Do users expect safe defaults (as in, "default deny")?
Is it just a matter of attitude, where people think public cloud is more secure because it's not managed by (potentially short-staffed) corporate IT teams, even if it's not completely managed by the cloud provider?
Buckets are default deny, now. But for many many years they were not, and the defaults almost certainly changed due to the many many examples of "accidental exposure".
I'd love to know what the original architects of S3 think now, looking back, of S3 buckets being globally unique.
AWS has certainly enjoyed a class of vulnerabilities caused by the way they allocate resources and expose them over DNS, but S3 is just a simple namespace.
Can you explain what you mean by "for many years they were not [default deny]"? I've been using S3 for 10+ years and I can't remember a time when they were ever open-by-default.
If you mean there were years where it was dangerously easy to accidentally open up a bucket, you'll get no disagreement from me. But I can't think of a time when they weren't default deny.
IMO, "cloud" usually translates to web based development. I think a lot of HN readers are at a higher level than your average developer, making these types of issues seem crazy to have, and getting into web development seems to be easier than other types of development. This may just be perception, as the place to find web tutorials is already in the environment they will be used in. I know web developers get a lot of hate, and I'm not trying to perpetuate that (especially going on almost 20 years as one), but I do think the ease of getting into web development is a contributing factor to the surprise most of us have at issues like this. In general, I think most web developers don't really know what they are doing, as they try to just get up and running without even knowing to think about security. It seems like a fake it until you make it type of industry, compared to other types of development. I try to keep up in other areas of development, and understand that security isn't something to be bolted on at a later stage of development, but I also started my professional career learning network administration and security before I ever started web development. Lots of developers get their application working, are too afraid to start looking into what they could improve, and that includes looking into security. This has been my personal experience any time I've tried mentoring other developers, and by nature of being web based, security is a much bigger deal than in a private network.
It's because systems are highly complex and there are a thousand little things that you need to know to use them. Once you have experience working with it (or even reading enough docs), you know which of those thousand things should take up the most space in your mind.
But if you are new and are pressed for time, you only look at maybe a fraction of those thousand things, and inevitably you miss some important things.
AWS used to not have sane defaults for S3 buckets.
> The interesting thing is, most people wouldn't do the same things (say, chmod 777 all the things) on a public NAS.
I ran one of those for years in a place where the median user had a science Ph.D. This happened more than once.
People also made public FTP upload folders or had PHP accepting uploads to save time.
I’m skeptical that this is more common in S3 versus being a popularity contest, and reflecting the likelihood that some “temporary troubleshooting” mistake will be noticed in S3 is much greater than on a private server.
Reconsider your assumption. Why does ransomware work? It's because the same people who have habits and/or systems that make compromise of them possible and likely are also given full write and delete access as well as, of course, read access.
What would a ransomware attack look like if the same kind of employee who downloads bootleg versions of PDF editors was only given read access to the files they need and write access to only their own files? It'd look like a big nothing.
The fact that we see ransomware attacks that affect entire huge corporations and organizations gives an idea of how many "admins" (who don't deserve the title) give 777 permissions to everyone.
I'm sorry, but no, most ransomware attacks are not caused by admins giving their ignorant and irresponsible end users root access to everything.
Most ransomware attacks start by phishing an end user who already has appropriately limited permissions for their job function.
The real damage comes from the attacker exploiting widely known vulnerabilities, almost always in Microsoft Windows, to escalate their own privileges irrespective of the permissions of the end user they phished.
Microsoft Windows is by far the most significant factor here, not dumbass end users with root access.
Of course Windows is a huge factor, but 1) nobody said anything about giving users root access, and 2) this has happened plenty of times with data stored on non-Windows systems, too, that weren't compromised.
Trying to make it an either-or thing is not correct. It's multiple things, but the lack of real permissions is a non-trivial percentage of cases.
Everything is possible, I know, but the amount of hacks related to S3 misconfigurations (https://github.com/nagwww/s3-leaks), including major companies, still makes me wonder.
Cloud permissions are just more complicated, which means people make mistakes like this. No one is intentionally setting their buckets to be world readable.
In 2018 I added S3 bucket monitoring to my SaaS, Cronitor.io but we eventually retired it because AWS seems mostly to have solved this.
It’s hard in the console to make buckets public, it’s obvious when they are, and Amazon sends emails about public buckets just in case you’re not using the console.
There is the AWS S3 service and the S3 protocol. While I agree all AWS S3 buckets are usually created without public access by default, I can't say for other providers offering S3 compatible storage services.
You still have the risk that someone somewhere is using some random copy/pasted terraform/cloud formation recipes or aws cli commads that grant public access on an account that is bound to an email address nobody ever reads without realizing the consequences.
It also doesn't show effective accesses from users/groups/roles/things-across-accounts; I've found that even Access Analyzer gets it very wrong sometimes. I wrote a bit on this here: https://eng.lyft.com/iam-whatever-you-say-iam-febce59d1e3b.
Thinking about creating intentionally misconfigured buckets with encrypted files that look like they have valuable stuff so the hackers waste tons of resources decrypting them only to see they are worthless
It costs money to serve S3 objects out to the internet though. S3 GET request billing + the usual AWS egress fees, after you've burned through the free quotas. Egress is currently $0.09 per GB + tax.
I realize probably many (most?) people open this site on a mobile device, and the design is optimized for that. Still, does it bother anyone that on a desktop monitor, less than a third of the horizontal width is used for content?
I suspect they are using the 50-75 character rule for the text [0]. It makes the page look empty with so much white space. For me 100 characters is a better line length as I prefer to scan text on the web over reading it and the page doesn't look as empty.
I am not sure it is much better on a mobile device because even on a desktop monitor in rotated 90 degrees or on a browser windows using half the width of the desktop in landscape mode, there is an awful lot of white margin on both sides of the text.
S3 configuration is awful, full stop. It’s not just that the interface is a disaster and configuration options are full of jargon and access rules are written in JSON.
The problem is that for someone who only periodically uses S3, I’m lost. I’m not lost in other services…Cloudflare, Firebase, Mailgun, and dozens of others somehow manage to allow people to use their service without so much agony.
I’m almost positive my S3 bucket is misconfigured because of how absurdly complex it is.
If you disagree and have spent more than 200 hours working within S3 I submit that it’s because you’re just an expert. I shouldn’t need a certification to upload files and retrieve them securely.
S3 isn't a big deal. It could be you're thinking about it wrong.
More specifically, each possible action on a bucket or file has a permission. You have the option of going granular or not, depending on what you want.
S3 can be thought of as a service and an API. It might be easier for you as a dev to look at the api docs and the various options then map that to the UI.
For all services there are possible actions and authorization checks for those actions. It's up to you to allow or disallow the actions you want on a per user basis.
Complicated? Sure, it can be. Welcome to modern computing.
No offense, but spending 200 hours on S3 and if you still don’t understand it, it’s a skill issue.
lol I haven’t spent 200 hours.
I’m just aware of a lot of people who spend a lot of their life working in AWS and then get cranky when everyone isn’t as experty as them. :)
I'm sorry, but this sounds like C++ apologism.
Tech should be simple and fail immediately if used incorrectly. It's a tech issue if the dispersal of customer records is happening frequently enough to be a meme.
S3 does fail safe. The default is to deny access, but the problem is that every so often someone “solves” a problem by granting access broadly rather than learning how the tool works or reading the prominent warnings. This isn’t like C++, it’s more an example of how people don’t consistently respect the ancillary tools they use enough to learn how they work. I’ve seen the same pattern in many other contexts and it always came down to “I need to get my real work done, I don’t have time for this sysadmin stuff!”
You can try to be Rust and refuse such access rules. Yes, I know it sometimes makes sense. Rust rejects perfectly fine codes too. People are really into such things.
Rust has unsafe, too, because the implementers know that they need to work in a messy world. S3 has options for globally blocking public access, there are warnings and tools which continuously audit for things like cross-account access, etc. but they have a lot of customers who legitimately need things like that so there has to be a way to do it, just as Rust allows you to say “I know what I’m doing, call libc!”
This does a great job of highlighting why properly configuring infrastructure is hard: S3 buckets (one of the most simple cloud infra services) have 70 configuration options.
Imagine you're a junior dev and your manager says "just spin up an S3 bucket and drop the data there, and make sure your app can access it".
S3 does have some sensible defaults, but a lot of Terraform modules do not...imagine somebody who now has to decipher S3's basic properties, ACLs, IAM, etc.
While this is true, a manager really shouldn't be giving an inexperienced dev enough rope to hang himself with.
Totally agree with you on this one, it's the role of specialized platform teams + managers to make sure devs have the tools they need while also accounting for their skill level(s).
Not everyone has access to a specialized platform team or technical manager; very often it is a solo (or small group of) devs just trying to Get Stuff Done
That just can’t work for most companies.
Part of this is a consequence of Amazon's hesitance towards shutting down old features. They are getting better at this lately, but S3 remains an example of a system with too many ways of doing things simply because they don't want to take the step of eliminating legacy functionality.
Introduce S4 [1] with a reduced set of configuration options and nudge customers into using it instead. Then eventually make it the default and encourage migrations.
[1] I know the name doesn't make sense.
Super Simple Storage Service
Simple Secure Storage Service
> one of the most simple cloud infra services
Maybe compared to other AWS offerings S3 is a simple service. But on the scale of all services it's incredibly complex. There is no shortage of providers offering cloud storage that's actually easy to set up, and intuitive to set up correctly
Inexperienced devs/admins are always a risk when it comes to infra. There isn't much difference between a misconfigured mysql test user and a freely accessible S3 bucket. One might be more modern then the other, but thats about it. Only real difference is that with cloud infra, junior employees can do more harm then they used to be able to do with local infra, because stuff has grown since.
This is just a list of 'how to do x with awscli [and if the bucket allows unauthenticated users to do x then you will get a result]'.
Unless I'm missing something there's nothing particularly.. interesting or thought out here? May as well read the docs for available s3/s3api operations - there's more!
The article is a lengthy discussion of something simple. 1) use a proxy or VPN 2) write a bucket guesser in python (use your imagination) 3) run this https://github.com/sa7mon/S3Scanner Now you have list/read/write status info +/- existence per S3 scanner.
There, see? Didn't need a whole article.
The interesting thing is, most people wouldn't do the same things (say, chmod 777 all the things) on a public NAS.
If this assumption is true, it begs the question. Why do people act like public cloud storage is more secure than "private", on prem storage?
Do users expect safe defaults (as in, "default deny")?
Is it just a matter of attitude, where people think public cloud is more secure because it's not managed by (potentially short-staffed) corporate IT teams, even if it's not completely managed by the cloud provider?
Or is there something else?
Buckets are default deny, takes several steps to enable public access with warnings along the way, if you use web console.
Buckets are default deny, now. But for many many years they were not, and the defaults almost certainly changed due to the many many examples of "accidental exposure".
I'd love to know what the original architects of S3 think now, looking back, of S3 buckets being globally unique.
AWS has certainly enjoyed a class of vulnerabilities caused by the way they allocate resources and expose them over DNS, but S3 is just a simple namespace.
Can you explain what you mean by "for many years they were not [default deny]"? I've been using S3 for 10+ years and I can't remember a time when they were ever open-by-default.
If you mean there were years where it was dangerously easy to accidentally open up a bucket, you'll get no disagreement from me. But I can't think of a time when they weren't default deny.
>If this assumption is true, it begs the question. Why do people act like public cloud storage is more secure than "private", on prem storage?
#1 risk people are concerned about is dataloss where cloud wins.
That's where cloud is more secure comes from. I've not lost any data in GCS or S3.
But same cannot be said for local copies of data.
321 strategy is best for most cases.
IMO, "cloud" usually translates to web based development. I think a lot of HN readers are at a higher level than your average developer, making these types of issues seem crazy to have, and getting into web development seems to be easier than other types of development. This may just be perception, as the place to find web tutorials is already in the environment they will be used in. I know web developers get a lot of hate, and I'm not trying to perpetuate that (especially going on almost 20 years as one), but I do think the ease of getting into web development is a contributing factor to the surprise most of us have at issues like this. In general, I think most web developers don't really know what they are doing, as they try to just get up and running without even knowing to think about security. It seems like a fake it until you make it type of industry, compared to other types of development. I try to keep up in other areas of development, and understand that security isn't something to be bolted on at a later stage of development, but I also started my professional career learning network administration and security before I ever started web development. Lots of developers get their application working, are too afraid to start looking into what they could improve, and that includes looking into security. This has been my personal experience any time I've tried mentoring other developers, and by nature of being web based, security is a much bigger deal than in a private network.
It's because systems are highly complex and there are a thousand little things that you need to know to use them. Once you have experience working with it (or even reading enough docs), you know which of those thousand things should take up the most space in your mind.
But if you are new and are pressed for time, you only look at maybe a fraction of those thousand things, and inevitably you miss some important things.
AWS used to not have sane defaults for S3 buckets.
> The interesting thing is, most people wouldn't do the same things (say, chmod 777 all the things) on a public NAS.
I ran one of those for years in a place where the median user had a science Ph.D. This happened more than once.
People also made public FTP upload folders or had PHP accepting uploads to save time.
I’m skeptical that this is more common in S3 versus being a popularity contest, and reflecting the likelihood that some “temporary troubleshooting” mistake will be noticed in S3 is much greater than on a private server.
Reconsider your assumption. Why does ransomware work? It's because the same people who have habits and/or systems that make compromise of them possible and likely are also given full write and delete access as well as, of course, read access.
What would a ransomware attack look like if the same kind of employee who downloads bootleg versions of PDF editors was only given read access to the files they need and write access to only their own files? It'd look like a big nothing.
The fact that we see ransomware attacks that affect entire huge corporations and organizations gives an idea of how many "admins" (who don't deserve the title) give 777 permissions to everyone.
I'm sorry, but no, most ransomware attacks are not caused by admins giving their ignorant and irresponsible end users root access to everything.
Most ransomware attacks start by phishing an end user who already has appropriately limited permissions for their job function.
The real damage comes from the attacker exploiting widely known vulnerabilities, almost always in Microsoft Windows, to escalate their own privileges irrespective of the permissions of the end user they phished.
Microsoft Windows is by far the most significant factor here, not dumbass end users with root access.
Of course Windows is a huge factor, but 1) nobody said anything about giving users root access, and 2) this has happened plenty of times with data stored on non-Windows systems, too, that weren't compromised.
Trying to make it an either-or thing is not correct. It's multiple things, but the lack of real permissions is a non-trivial percentage of cases.
> most people wouldn't do the same things (say, chmod 777 all the things)
I wouldn't be so sure about that, it still happens a lot that this is the default reply in many help forums.
On an exposed NAS?
Everything is possible, I know, but the amount of hacks related to S3 misconfigurations (https://github.com/nagwww/s3-leaks), including major companies, still makes me wonder.
Cloud permissions are just more complicated, which means people make mistakes like this. No one is intentionally setting their buckets to be world readable.
The same reason people 777 on Linux, stuff does not work how I fix the problem quicky.
Especially when fighting SELinux.
Sadly, clueless devs do chmod 777 all the times.
> Why do people act like public cloud storage is more secure than "private", on prem storage?
Because Microsoft, Google, Amazon told them that the "cloud is more secure".
In 2018 I added S3 bucket monitoring to my SaaS, Cronitor.io but we eventually retired it because AWS seems mostly to have solved this.
It’s hard in the console to make buckets public, it’s obvious when they are, and Amazon sends emails about public buckets just in case you’re not using the console.
There is the AWS S3 service and the S3 protocol. While I agree all AWS S3 buckets are usually created without public access by default, I can't say for other providers offering S3 compatible storage services.
You still have the risk that someone somewhere is using some random copy/pasted terraform/cloud formation recipes or aws cli commads that grant public access on an account that is bound to an email address nobody ever reads without realizing the consequences.
Yes good points!
Hah I've had some fun with this, and even submitted bug reports that were never looked at.
I have like the worlds largest collection of license plate photos now. :)
I wish AWS showed who has access to every S3 bucket created right at the S3 console. It shows permissions but doesn't show external view.
It also doesn't show effective accesses from users/groups/roles/things-across-accounts; I've found that even Access Analyzer gets it very wrong sometimes. I wrote a bit on this here: https://eng.lyft.com/iam-whatever-you-say-iam-febce59d1e3b.
Access Analyzer is the tool for that.
Thinking about creating intentionally misconfigured buckets with encrypted files that look like they have valuable stuff so the hackers waste tons of resources decrypting them only to see they are worthless
Couldn’t this backfire by possibly generating a large AWS bill?
they would copy the file offline and decrypt it there
It costs money to serve S3 objects out to the internet though. S3 GET request billing + the usual AWS egress fees, after you've burned through the free quotas. Egress is currently $0.09 per GB + tax.
Make the bucket requester-pays https://docs.aws.amazon.com/AmazonS3/latest/userguide/Reques...
even failed (403) PutObject costs money.
I realize probably many (most?) people open this site on a mobile device, and the design is optimized for that. Still, does it bother anyone that on a desktop monitor, less than a third of the horizontal width is used for content?
I suspect they are using the 50-75 character rule for the text [0]. It makes the page look empty with so much white space. For me 100 characters is a better line length as I prefer to scan text on the web over reading it and the page doesn't look as empty.
[0] https://en.wikipedia.org/wiki/Line_length#Electronic_text
I am not sure it is much better on a mobile device because even on a desktop monitor in rotated 90 degrees or on a browser windows using half the width of the desktop in landscape mode, there is an awful lot of white margin on both sides of the text.
Nice work!