Detecting Bot Behavior in Google Analytics

This is a follow-up to yesterday’s post about how I had stumbled onto a burst of bot traffic in Google Analytics. At the end of that post I linked to an article with helpful tips on filtering known bots based on browser and ISP.  This is a practical way to deal with known offenders and in most cases it’s probably all you need. But it doesn’t offer protection against future bots. So I wanted to see if it was possible to detect bot activity using behavioral signals instead.

I found five characteristics common to bots that can be used to identify bot behavior and build an Advanced Segment.

New Visitor

Generally speaking, bots do not store cookies so bots will always appear as New Visitors. You can select New Visitors by using a Count of Visits = 1 filter.

Visit Duration

Visit Duration in Google Analytics is calculated when a user triggers an event or visits a second page. Because cookies are required to string together multiple pageviews into a visit, bots will always report a Visit Duration of zero seconds. You can filter for this with Visit Duration = 0.

Source

Bot visits will not contain a referrer, meaning they will always show up as Direct Traffic. The filter for this is Source exactly matches (direct).

Page Depth

This is the same explanation as for Visit Duration. Bots will appear as single-page visits (bounces). Note that bots often trigger a visit but fail to trigger a pageview. This means you need to filter for a Page Depth value equal to or less than 1. Page Depth ≤ 1.

Gender

This one is fun. So again, one of the defining qualities of bots is that they don’t store cookies. When a new visitor arrives, if we could check to see whether they already have cookies stored from another site, that would tell us whether or not we were likely dealing with a human or a bot. How can we do this?

Demographics. Google Analytics demographics data is stored in a cookie. If a visitor has one of these cookies they are probably not a bot. Assuming you have demographics enabled, you can filter for this with a Gender does not match regex = male|female filter. If you do not have demographics enabled, you’ll need to leave this filter out.

All together it looks like this:

It’s important to mention that at this point we don’t have a segment of bots— we have a segment of visitors that are mostly bots. First-time visitors with no Google demographic data who arrive directly and bounce off will fall into this segment. Your mileage may vary but in my testing the false positive rate was extremely low, probably because most bounces arrive with a referrer in hand (e.g., via search).

You can import this Advanced Segment into your account with these links:

For bonus points, use this segment to create a custom alert to notify you whenever there’s a increase in traffic matching bot behavior.

For more stuff like this, follow me on Twitter.