Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Its hard to get readership writing blogs these days. Thats pretty demotivating.


Also difficult to distinguish a blog from a content farm if you are just crawling the web. Any content pattern you select for would likely be quickly adopted by SEOs.


I've found a direct correlation between the chance of a content farm and the number of ads on the blog. With 0 ads, the likelyhook of a content farm is 0%.


You could use machine learning instead of a hard-coded heuristic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: