This test compares: 1. Number of visitors as recorded in Google Analytics 2. Num...

ianhawes · on March 1, 2020

Only some bots execute JS but even less bots fetch images.

epidemian · on March 1, 2020

It's not just two numbers of total hits the article is comparing.

The author extracted the browser information from the server logs (presumably from the User-Agent header i guess?). If they were able to do this, i'd assume they also filtered out bots from the tally :)

ben0x539 · on March 1, 2020

I'd expect serious bots to masquerade as whatever the latest chrome user agent is.

epidemian · on March 1, 2020

Then you would expect for Chrome to show a higher number of GA blocking "users". But that's not the case: the article mentions that the percentage of users blocking GA on Chrome is on par with Safari.

And I don't know what you mean by "serious". The most common crawlers (Google, Baidu, Yandex, etc) identify themselves as bots on the User-Agent very clearly. Personally, those are the ones that I'd call the most "serious". And also the ones which I've seen generating the most on servers.

elondaits · on March 1, 2020

The net is full of unidentified bots scraping content or looking for vulnerabilities (contact forms , wordpress logins, etc). On many occasions I had traffic issues and had to check logs and these were very hard to block, because they ignore the robots file, don’t advertise themselves in the User Agent and use a large pool of IPs.

luckylion · on March 1, 2020

I don't know why this comment is downvoted, it mirrors my experience. I'm responsible for a few domestic high traffic websites and have done some analysis from log files to find suspicious traffic, i.e. user agents saying they are chrome but not loading images or css files, having many page views (i.e. 50 where our average user has 2) etc. It wasn't foolproof but the false-positives where < 10% in my random checks. These bots made up ~10-15% of page views.

ben0x539 · on March 1, 2020

I meant bots with sufficiently sophisticated adversarial motives (ad fraud? blog comment spam? automated wordpress exploitation?), who I'd expect want to avoid being recognized as such.

epidemian · on March 1, 2020

Ah, I see what you mean. Those are serious-ly malicious bots then! But yeah, completely agreed; those can be a PITA on sites with user-generated content.

But, is your experience that these kinds of bots cause much traffic? Because, from what I've seen, they can make a mess with fake accounts, fake content, fake clicks, etc, but as far as traffic goes, they were completely dwarfed by search engine crawlers and real users' traffic.

Thanks for the clarification :D

ben0x539 · on March 2, 2020

Mhm, we have a bunch of non-crawled content that sees a significant minority of request volume from disguised bots. Overall, it is definitely dwarfed by traffic from real users, but it still forces a lot of work to prevent the bots from gaming metrics/analytics.

ben0x539 · on March 1, 2020

Yeah, the numbers in the article are so far off my intuition that I'm happy to latch onto any explanation for why they're weird. Being unable to effectively discount bots seems likely.

ben0x539 · on March 1, 2020

I'm reconsidering my intuition in the face of the fact that the sample is from OP's blog and not a customer-facing business.

jjohansson · on March 1, 2020

How do you identify bots on your personal sites?

anjakefala · on March 1, 2020

++ for checking that you were using the correct pronoun for the author!

jefftk · on March 1, 2020

I didn't check; I just use "they" when I don't know