Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not quite the same, but in the past I've written (in python) scrapers that run off of the cache. E.g. it would extract recipes from web pages that I had visited. The script would run through the cache and run an appropriate scraper based on the url. I think I also looked for json-ld and microdata.

The down sides were that it only works with cached data, and I had to tweak it a couple of times because they changed the format of the cache keys.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: