| Svelte Hacker News

points by realaravinth 3 years ago

Thank you for your detailed response, you raise some very interesting and valid points!

> JS engines (or even WASM) aren't going to be as fast at this kind of work as native machine code would be

You are right. mCaptcha has a WASM and a JS polyfill implementations. Native code will definitely be faster than WASM but in an experiment I ran for fun[0], I discovered that the WASM was roughly 2s slower than native implementation.

> It's also based on the assumption that proof-of-work is going to increase the cost of doing business

mCaptcha is basically a rate-limiter. If an expensive endpoint(say registration: hashing + other validation is expensive) can handle 4k requests/seconds and has mCaptcha installed, then the webmaster can force the attacker to slow down to 1 request/second, significantly reducing the load on their server. That isn't to say that the webmaster will be able to protect themselves against sufficiently motivated attacker who has botnets. :)

> There's also the risk that any challenge that's sufficiently difficult may also make the user's browser angry that a script is either going unresponsive or eating tons of CPU, which isn't much different from cryptocurrency miner behavior.

Also correct. The trick is in finding optimum difficulty which will work for the majority of the devices. A survey to benchmark PoW performance of devices in the wild is WIP[1], which will help webmasters configure their CAPTCHA better.

[0]: https://mcaptcha.org/blog/pow-performance Benchmarking platforms weren't optimised for running benchmarks, kindly take it with a grain of salt. It was a bored Sunday afternoon experiment.

[1]: https://github.com/mcaptcha/survey

Full disclosure: I'm the author of mCaptcha

tusharsoni 3 years ago

> mCaptcha is basically a rate-limiter

This is a much better explanation of what it does than captcha where I expect "proof-of-human". A PoW based rate-limiter is a really interesting idea! Usually, the challenge with unauthenticated endpoints (ex. signups) is that the server has to do more work (db queries) than the client (make an http request) so it is really easy for the client to bring the server down. With PoW, we're essentially flipping that model where the client has to do more work than the server. Good work!

AbacusAvenger 3 years ago

About the benchmark data:

It looks like your pow_sha256 library is using is the "sha2" crate, which is a pure Rust implementation of SHA2. So your benchmark is around the delta of your library compiled to native code vs. your library compiled to WASM, which is an interesting benchmark but I don't think it answers the right question.

A more interesting benchmark would probably answer the question "what would those looking to defeat mCaptcha use and how does that performance compare?" So perhaps an implementation of an mCaptcha challenge solver using OpenSSL would be warranted for that.

realaravinth 3 years ago

That's a good idea, I'll be sure to do that!
- AbacusAvenger 3 years ago
  
  Also remember that native code can use multithreading, so if your challenge is something that could be farmed out to multiple CPU threads until one finds the solution, that's another factor in favor of native code performance.
  
  paranoidrobot 3 years ago
  
  Lets just assume that you solve the "it takes a while to run" thing through some clever bits of hard-to-optimise math, that's difficult to parallelise or multithread or whatever.
  If all it takes is computer time, then that's a cheap thing for the botnet operator to solve. They can just spin up another instance, and split up the crawling task to another computer (or 200).
  
  Retr0id 3 years ago
  
  A captcha can always be parallelised. All the attacker has to do is attempt n different captchas in parallel.
  
  realaravinth 3 years ago
  
  I haven't encountered a case where multithreading will make the algorithm weaker, but I do have a variation of the benchmark code(on disk, at the moment) that will spin up multiple worker threads to compute PoW.
- Retr0id 3 years ago
  
  You should compare against a GPU implementation of SHA256 (hashcat has a good one).
  You may also want to consider ASICs - although you won't be able to build your own ASIC, you can look at the hashrates offered by bitcoin mining hardware, and extrapolate.

mort96 3 years ago

> mCaptcha is basically a rate-limiter.

Hmm, is it a better rate limiter than others? I know that nginx, for example, makes it pretty easy to rate limit based on IP address with the `limit_req` and `limit_req_zone` directives.

In essence, ngix's rate limiter also works by making each request consume a resource, but it makes the resource consumed an IP address (or range) rather than compute resources. It seems intuitive that a malicious actor would have an easier time scaling compute than IP addresses, while a legitimate user will _always_ have an IP address but might be on a machine with 1/100000th the compute resources of the malicious actor.

nextaccountic 3 years ago

You can and should use multiple kinds of rate limiters
- mort96 3 years ago
  
  "Can" is true, and a good point. "Should" is a bit more dubious though; if IP-range-based rate limiting is enough, not wasting your users' battery with PoW-based rate limiting seems like a good thing. It seems like a potentially useful tool in your tool belt, which you should probably only deploy if IP-address-based rate limiting proves to be not enough.

operator-name 3 years ago

Can you elaborate on why you chose SHA256 as the hash function?

Attackers aren't exactly limited to web apis, and SHA265 is known to be trivially parallelizable on a GPU. RandomX is one such example[0], which reminds me of a similar initiative called RPC-Pay.

[0]: https://github.com/tevador/RandomX [1]: https://www.monerooutreach.org/stories/RPC-Pay.html

aeternum 3 years ago

I'd suggest you consider a new name. Captcha stands for Completely Automated Public Turing Test and this implementation has little to do with that.