Ask HN: Whats a modern way to parse data from webpages at interval?

4 points by wohnung 2 months ago

I'm doing a personal project where I'd like to get different data from maybe 100 or so webpages every day (or maybe more often) and store them for future use/processing.

Its been a while since I did programming (used to do PHP/JS in another lifetime) so I thought I'd ask here whats the modern way of doing this thats also easy to implement?

Dunno if on my own PC, buy even better somewhere online where I'd set it up and forget about it for a while. Serverless seems to be a thing these days - is there a simple (newbie) way to set this up with JS snipplets somewhere? Have been reading about Cloudflare workers - are they a good fit for this? Could you retrieve pages a list of pages at interval, get their DOM and send parsed data somewhere?

AznHisoka 2 months ago

Easiest way in 2019 is the same way you'd do it in 2009, or even 1999. Setup a cron job that calls a simple PHP or Ruby script that calls CURL on a set of URLs and saves the HTML/extracted data into a MySQL database. All in a LAMP server.

It's not sexy, but you asked for the easiest way. No need for all this serverless, or Cloudflare worker nonsense :)