Quantcast
Channel: Online Metrics
Viewing all articles
Browse latest Browse all 128

Best Strategies for Dealing with Bot Traffic in Google Analytics

$
0
0

Bot traffic on the web has exploded in the last years. In this post you will learn all details about bot traffic and what you should do to minimize the negative impact on your data.

I was inspired to write this article after a remark of one of my clients:

“I was meeting with a person from Akamai and she indicated that the majority of our site traffic is likely bots, not people.”

Happily, in all my Google Analytics Audits so far I didn’t encounter anything like that.

Bot Traffic in Google Analytics

Sit back and get ready to learn all about bot traffic and practical ways to deal with it.

Table of Contents

Bot Traffic Trend

Distil Networks Bad Bot Report (2018) indicates that bad bots went mainstream.

Mailicious bot traffic varies per industry, but often it’s in the range of 20 to 50+ percentage of all traffic (!)

“Hey Paul, but you just said you didn’t encounter these bot traffic numbers in Google Analytics so far?”

That’s right, you need to understand that not all bot traffic (potentially) shows up in Google Analytics. Just a very small percentage of it.

However, for security and other reasons you should be aware that bots can definitely do harm to your site. Hurting your Analytics stats is just one small part of the bigger issue.

Bad bots include, but are not limited to:

  • DDoS.
  • Site Scraping.
  • Comment Spam.
  • SEO Spam.
  • Fraud.

In the rest of the article we’ll focus on the impact of bots on Google Analytics.

How to Recognize Bot Traffic

You need to know how you can recognize bot traffic before you can properly deal with it.

I have found that in 95%+ of the cases you do good if you monitor two things:

1. Traffic on Main Hostname

Step 1: Navigate to Acquisition > All Traffic > Channels.

Step 2: Change primary dimension to Hostname (or add secondary dimension).

Hostname report Google Analytics

  • You want to see here a percentage of (near) 100% on your main website domains (that are part of your implementation).
  • (not set) and other not recognized domains are an indication of bot and/or spam traffic.

2. ISPs with High Bounce Rate

Step 1: Navigate to Audience > Technology > Network.

Step 2: Add a table filter on ISPs with substantial amount of sessions (e.g. 200) and bounce rate higher than 90%.

Table filter ISP

Step 3: Review report data.

Service Provider data

As you can see, Google Analytics hasn’t yet blocked Amazon’s AWS bots.

You will find that this proportion of “traffic” often comes from the city “Ashburn”. Ashburn is where one of the biggest data centers is located.

The proportion of bot traffic is rather low in this case so it won’t have a big impact on overall numbers and data-driven business decisions.

This is a good start to spot potential bot traffic issues in Google Analytics. Additionally, you could look into “source/medium” combinations, but I have found the two strategies above to be more effective.

Recommended resources:

Large vs Small Websites

Here is the thing, you should always take the impact of bot traffic seriously no matter whether you are on a small or large website.

Based on my experience, I can say:

  • The impact on data quality on smaller websites (< 10k sessions/month) can be relatively high. This due to the relative proportion of bot traffic on these sites.
  • Although in general bot traffic in Google Analytics grows when overall traffic grows, it grows at a slower pace.

Note: malicious bot traffic – not measured in GA – is often a bigger threat for relatively high-traffic websites.

How to Not Deal with Bot Traffic

Let’s first have a quick chat on how you should not deal with bot traffic in Google Analytics.

In the past I have come across many articles that advice to apply an exclude filter at the “hostname” and/or “source/medium” dimension.

A confusing, but very important topic in relation to include and exclude Google Analytics filters:

  • You can apply multiple exclude filters on the same dimension.
    • Filter 1: exclude medium = cpc.
    • Filter 2: exclude medium = organic.
  • However, applying more than one include filter on the same dimension will result in no data all.

So when you would want to include mediums cpc and organic, you would need to use a regular expressions.

Example of filter is shown below:

Include cpc and organic

Back to the exclude filter on “hostname” and/or “source/medium”.

The problem here is that the list of potential bots you want to filter keeps on growing (as long as Google doesn’t provide a robust solution).

You need to continuously monitor your Google Analytics data and several dimensions to keep your exclude list up to date. That’s why I limit myself on using exclude filters on the ISP in this case.

How to Deal with Bot Traffic

Now let’s look into actionable steps to greatly reduce bot and spam traffic in your Google Analytics account.

Modify View Level Setting

In Google Analytics, you can tick a box at the view level to filter out “known” bots and spiders.

Bot - Exclude in view settings

I recommend checking this box in all views except the “Raw Data View”.

Note: you can set up an extra view with this box unchecked if you want to know what data is excluded by checking this box.

Hostname Filter

A very useful strategy to reduce the amount of bot and spam traffic is by configuring a view level filter.

As a best practice I recommend setting up a hostname filter on the domains where you have implemented the GA tracking code. Most often this is one domain, but you might have subdomain or crossdomain tracking enabled.

Here is an advanced example:

  • First domain: fastmarathonrunner.com.
  • Second domain: marathonrunningshop.com.

Bot - Include Hostname Filter

This will ensure that traffic with hostname “(not set)” or another unknown hostname won’t appear in your Google Analytics views.

Once again, don’t apply this filter (and filters in general) to the “Raw Data View”.

Note: never set up two different “include” hostname filters in this case. It will result in zero traffic in your view(s).

ISP Filter

Early on in this post we discussed how to recognize “bot traffic” in general.

We saw that on the ISP level you can recognize a portion of the “bot traffic” that is sent to Google Analytics.

The two Service Providers that sent a substantial amount of traffic were: amazon.com inc. and amazon technologies inc..

In this case we want to use an “exclude” ISP filter if the hostname for this traffic matches the hostname filter we set up before.

This is what it looks like:

Bot - Exclude ISP

“ISP Organisation” matches “Service Provider”.

Quick tip: you can directly verify the Regular Expression in the corresponding Google Analytics report:

ISP Regex report GA

Note: read this in-depth guide on Regular Expressions if you want to learn more about how I create these filters.

Custom Alerts

The three previously mentioned strategies will get you up-to-speed with filtering out bot traffic.

Sometimes you want to monitor suspicious traffic inside or outside of Google Analytics.

Within Google Analytics you can use custom alerts to monitor sudden changes in traffic that might be unreal.

Read this post to learn more about Custom Alerts and how to make them extremely useful.

Measurement Protocol

Are you an advanced GA user? There is one more thing to take into account then.

Some of you might send data to Google Analytics from other sources, e.g. offline data.

This is when the Measurement Protocol comes in scope.

You can – but it is not required – define the Document Host Name when sending other hits to Google Analytics.

Here you can check out the Measurement Protocol Hit Builder if you want to learn more.

Measurement Protocol - Hit parameter details

The required parameters are:

  • Measurement Protocol Version (v).
  • Hit Type (t).
  • Tracking ID (tid).
  • Client ID (cid).

Here comes the issue:

    • Many companies don’t define the Document Host Name for all hits that are sent (as it is not required).
    • This would cause the hostname of these hits show up as (not set).
    • The “include” hostname filter that we previously discussed would exclude these hits in this case.

Take this into account when using the Measuring Protocol and Hostname filter at the same time!

Concluding Thoughts

Here are a few last remarks regarding bot and spam filtering in Google Analytics.

  • Be aware that not all bot and spam traffic (just a small proportion) will show up in Google Analytics.
  • The impact on data reliability is the largest on sites with low traffic numbers.
  • There are a few ways to demystify where bot traffic in GA is coming from.
  • Not only for filtering bot traffic, but for a lot of other reasons you want to learn more about filters and regular expressions in GA.
  • There are three main ways to filter out bots and spam:
    • Check the box in your Google Analytics views (not in Raw Data View).
    • Set up an include hostname filter.
    • Set up an exclude ISP filter.
    • Extra: use custom alerts to monitor suspicious traffic sources.
  • Be aware that using the Measurement Protocol might conflict with your hostname filter.

This is it from my side! What are your thoughts on bot and spam traffic in Google Analytics? What precautionary measures do you take? Happy to hear you comments!

One last thing... Make sure to get my automated Google Analytics Audit Tool. It contains 25 key health checks on the Google Analytics Setup.

Get Free Access to The Google Analytics Audit Tool

The post Best Strategies for Dealing with Bot Traffic in Google Analytics appeared first on Online Metrics.


Viewing all articles
Browse latest Browse all 128

Trending Articles