1. Number of visitors as recorded in Google Analytics
2. Number of loads of a 1x1 pixel served on a different domain
They see higher numbers for (2) than (1), and attribute the difference to users blocking Google Analytics.
I don't see them describing how they excluded bot traffic, however, and for my sites the majority of hits I get are from bots. Only some bots run JS, so I suspect their numbers for blocking users are thoroughly diluted by these bots.
(Disclosure: I work for Google, speaking only for myself)
It's not just two numbers of total hits the article is comparing.
The author extracted the browser information from the server logs (presumably from the User-Agent header i guess?). If they were able to do this, i'd assume they also filtered out bots from the tally :)
Then you would expect for Chrome to show a higher number of GA blocking "users". But that's not the case: the article mentions that the percentage of users blocking GA on Chrome is on par with Safari.
And I don't know what you mean by "serious". The most common crawlers (Google, Baidu, Yandex, etc) identify themselves as bots on the User-Agent very clearly. Personally, those are the ones that I'd call the most "serious". And also the ones which I've seen generating the most on servers.
The net is full of unidentified bots scraping content or looking for vulnerabilities (contact forms , wordpress logins, etc). On many occasions I had traffic issues and had to check logs and these were very hard to block, because they ignore the robots file, don’t advertise themselves in the User Agent and use a large pool of IPs.
I don't know why this comment is downvoted, it mirrors my experience. I'm responsible for a few domestic high traffic websites and have done some analysis from log files to find suspicious traffic, i.e. user agents saying they are chrome but not loading images or css files, having many page views (i.e. 50 where our average user has 2) etc. It wasn't foolproof but the false-positives where < 10% in my random checks. These bots made up ~10-15% of page views.
I meant bots with sufficiently sophisticated adversarial motives (ad fraud? blog comment spam? automated wordpress exploitation?), who I'd expect want to avoid being recognized as such.
Ah, I see what you mean. Those are serious-ly malicious bots then! But yeah, completely agreed; those can be a PITA on sites with user-generated content.
But, is your experience that these kinds of bots cause much traffic? Because, from what I've seen, they can make a mess with fake accounts, fake content, fake clicks, etc, but as far as traffic goes, they were completely dwarfed by search engine crawlers and real users' traffic.
Mhm, we have a bunch of non-crawled content that sees a significant minority of request volume from disguised bots. Overall, it is definitely dwarfed by traffic from real users, but it still forces a lot of work to prevent the bots from gaming metrics/analytics.
Yeah, the numbers in the article are so far off my intuition that I'm happy to latch onto any explanation for why they're weird. Being unable to effectively discount bots seems likely.
1. Number of visitors as recorded in Google Analytics
2. Number of loads of a 1x1 pixel served on a different domain
They see higher numbers for (2) than (1), and attribute the difference to users blocking Google Analytics.
I don't see them describing how they excluded bot traffic, however, and for my sites the majority of hits I get are from bots. Only some bots run JS, so I suspect their numbers for blocking users are thoroughly diluted by these bots.
(Disclosure: I work for Google, speaking only for myself)