We built a 270M local model to detect phishing URLs

YPCrumble 2 hours ago

Interesting - is it just the URL or is it actually crawling the phishing site to assess whether its phishing? And how does it distinguish the site bank.com from bąnk.com (with a little curly ą) if both are an identical clone of bank.com?

rwhaling an hour ago

That's a great question - right now it is only looking at the results from a battery of several dozen indicators that we compute upstream of the model itself (which saves massively on tokens)
As small models continue to improve, and edge hardware becomes more capable, we would really like to run larger models that could incorporate full page content and screengrab data, which would be more likely to catch these kinds of attacks.
But we also find that sites that do one shady thing usually do others, which is a big reason why a tiny model like this can work - and why we are betting on low latency being a differentiating factor in real-world impacts.