> why can't webmasters spider everything Google returns? I'm sure you could star...

trendia · on Jan 16, 2018

> But if you're searching enough to eat up Google's bandwidth, they're paying for that data and they're under no obligation to keep serving you as a client

Do you not see the irony?

jefftk · on Jan 16, 2018

Sites allow search engines to index them (instead of telling them to go away with robots.txt) because the search traffic is worth it to them.

Search engines don't allow people to scrape them (resorting to blocking after scrapers ignore robots.txt) because they don't get anything similarly valuable in return.

(Disclosure: I work for Google, though not on search.)

fixermark · on Jan 16, 2018

No, I don't. Can you help clarify it for me?

Search engines crawling millions of sites each with---on average---a few MB of data distributes cost globally.

Extracting terabytes of index data from a single search engine's repository consolidates the cost on the back of that repository's bandwidth provision.

These are not symmetrical cost structures.

nbsd4lyfe · on Jan 16, 2018

Our git repository went down when crawlers decided to index it

dx034 · on Jan 16, 2018

But probably not Google. The google crawler is very careful and stops as soon as they encounter higher error rates. Bing appears to do the same.