Quantcast
Channel: Online Metrics
Viewing all articles
Browse latest Browse all 128

How to Deal With Personally Identifiable Information (PII) in Google Analytics

$
0
0

In the last few weeks I have performed several in-depth GA audits and was blown away by the number of times PII was stored in Google Analytics. This post will uncover a ton of things about PII and Google Analytics.

Sending PII to Google Analytics is one of the worst things you can do.

Following the Google Analytics Terms of Service:

“You will not and will not assist or permit any third party to, pass information to Google that Google could use or recognize as personally identifiable information.”

And to further elaborate on that:

“The Analytics terms of service, which all Analytics customers must adhere to, prohibits sending personally identifiable information (PII) to Analytics (such as names, social security numbers, email addresses, or any similar data), or data that permanently identifies a particular device (such as a mobile phone’s unique device identifier if such an identifier cannot be reset). Your Analytics account could be terminated and your data destroyed if you use any of this information.”

Ok, this is something to take very seriously.

By reading this post you will learn about best practices in dealing with Personally Identifiable Information in Google Analytics.

Table of Contents

Looking for PII during the setup and testing phase of your Google Analytics implementation is recommended in order to avoid running into any PII collection issues later on.

In the testing phase you can still delete a reporting view without real negative consequences.

PII Check in Google Analytics

Simpling checking your content reports to see whether a query parameter contains an email address is far from enough.

Unfortunately PII can show up in many more places in Google Analytics. And most often, it is unintentionally.

Here is a list of places you should check at a minimum:

1. Query String Parameters

Navigate to Behavior >> All Pages >> Site Content.

pii-content-reports

Do a search on “\?” to find out more about the active query parameters in your account.

2. Data Import

The data import functionality can be extremely powerful, but make sure to not send any PII to Google while importing data.

Use this Google Analytics PII viewer if you want to map data (e.g. the User ID) stored in Google Analytics to PII such as name and email address stored locally.

pii-google-analytics-data-viewer

3. Event Dimensions

Another important feature in Google Analytics is event tracking.

pii-event-dimensions

Within a minute you will see whether there is any PII stored in “Event Category”.

Further you can easily switch the primary dimension to “Event Action” or “Event Label” to check whether any PII is stored in Google Analytics.

4. Custom Dimensions

Custom dimensions are powerful, but can be risky as well. Google allows you to pass additional information in GA (user, session, hit or product scope dimension).

You can quickly retrieve all (active) custom dimensions in the admin interface:

pii-custom-dimensions-admin

Let’s assume you want to check the “Sales Region” values.

I have quickly set up a custom report on one primary dimension (Default Channel Grouping) and a filter on “Affiliates”.

Since we already know the name of the “custom dimension”, it is easy to filter on this data:

pii-custom-dimension-sales-region

A good understanding of the Google Analytics API (to automatically export this data) and regular expressions (to set up filters or segments) can help with performing a deep and quick analysis. However, a partly manual scan can most often not be avoided.

5. Campaign Parameters

Be sure not to include PII in campaign parameters.

  • Campaign dimensions: Source, Medium, Keyword, Campaign, Content
  • Campaign parameters: utm_source, utm_medium, utm_term, utm_campaign, and utm_content.

It depends on how your campaign tracking is configured, but running an automated check could save a lot of time in many cases.

6. Site Search Dimensions

Most companies have a Site Search functionality on their website.

You don’t want to have PII captured in either the site search term or site search category dimension.

pii-site-search

And yes, controlling the “site search term” field can be rather difficult.

Simple Tip to Avoid Sending PII

Sending PII to Google Analytics goes often wrong with (sign up) forms.

Last month I audited an account where the email address value was passed in a query string after a newsletter sign up.

This is most often the difference between implementing a GET vs POST request on form submits.

pii-get-vs-post-request

Talk to your web developer if this is the case and make sure to get it solved properly!

How to Avoid Storing PII

There are four common ways to avoid storing PII in Google Analytics.

These tips help you to keep your content reports clean. Which method(s) to actually use depend on your website and data collection process.

Note:

Tip #1 is by far the most important step to take. Tip and action #2, #3 and #4 are better than nothing, but far from optimal.

1. Avoid That PII Data is Being Sent

Better to avoid this data being sent to Google Analytics at the first place than to rely on fixes or hacks.

Educate your organization members on PII and the Google Analytics restrictions. You will be amazed about how many people know a little to nothing about the actual rules.

I strongly recommend to set up a separate meeting to explain about PII and the possible consequences if something goes wrong. This can save a lot of pain later on!

2. Filter on Specific Query Parameters

Let’s assume you are involved with a site that has a lot of forms.

For some reason there is no option to use a POST request on form submits. This means that most probably additional (sensitive) data is sent as part of the URL.

Before these forms go live, you should retrieve any query string parameters that will be used to transfer sensitive data to Google Analytics.

Here is an example of the URL after a form submit:

  • www.pii-rocks.com/thank-you/?email=pii_test@gmail.com

You can see that “email” conveys sensitive information to Google Analytics.

To avoid getting in trouble you should add the query string parameter “email” to all views in the affected property:

pii-view-setting-query-parameterPlease note that this is not a cure if the harm is already done!

And if possible, make sure to not just rely on excluding URL query parameters for PII issues.

3. Remove All Query Parameters

Non-essential query parameters (that are not worthy in an analysis) should be filtered out to avoid storing duplicate pages in Google Analytics.

In the case you are 100% certain that your website just contains non-essential parameters, you can set up a filter that strips all query parameters from all pages on your website.pii-strip-all-query-parameters-from-urlThis will ensure your content reports won’t carry any sensitive data in a query parameter.

Once again, your can’t undo the outcome of this filter, so be warned!

4. Use an URL Rewrite Filter

An URL rewrite filter is another method you can use to keep your all pages report clean.

This can be especially useful if you have one or a couple of “base” URLs that contain non-essential and/or sensitive information.

A simple example:

  • www.pii-rocks.com/thank-you/?email=pii_test@gmail.com

On default Google Analytics would register this page as: /thank-you/?email=pii_test@gmail.com.

You can use a Search and Replace filter to strip the sensitive “email” query parameter and any other parameters that could be sent as part of the URL:

pii-search-and-replace-filterAs you can see there are many different ways that can help you to deal with PII issues. Keep in mind to always test your filters first!

How to Deal with PII in Your Account

Disclaimer: I am not a lawyer, and this part of the blog post does not constitute legal advice. I recommend seeking advice from legal counsel to confirm the appropriate policies and steps for your organization.

Take the following steps if you have found some form of PII in your account.

  • Work with your developers to immediately stop collecting PII (simply filtering out PII in the Google Analytics interface is only half of the job as Google requires that you stop sending any PII to their servers).
  • Backup your data or migrate your data into Google BigQuery (this service doesn’t have any PII limitations).
  • Create new views (copy views that contain PII) so that you start collecting PII free data.
  • Contact Google Support and inform them that your web property has been collecting PII. Google Support is much more likely to take certain measures if they find it out themselves. Now you also have the option to move your “corrupted” property to a different account.

Well, this is it from my side. Hope you can keep your account PII free! What is your experience with PII and Google Analytics?

One last thing... Make sure to get my extensive checklist for your Google Analytics setup. It contains 50+ crucial things to take into account when setting up Google Analytics.

Download My Google Analytics Setup Cheat Sheet

The post How to Deal With Personally Identifiable Information (PII) in Google Analytics appeared first on Online Metrics.


Viewing all articles
Browse latest Browse all 128

Trending Articles