Daily Usage Histograms in Google Analytics
A few years ago I wrote a post about measuring visit frequency in Google Analytics and mentioned that in GA it’s not possible to get a distribution of how many days users were active in the last week or month. Even using the API, there’s no way to get at the data.
The closest we can get is a distribution of the number of sessions users had in the date range. Useful, but not quite the same thing.
Well, there are two updates to this.
One, it is possible to build a day-based visit frequency distribution if you’re using a custom dimension to attach a user id for logged-in users. The user id gives us the hook we need to assemble the distribution, as explained below.
Two, as of last year or so, the clientId
dimension is now accessible via the API and this provides another hook. So it’s now possible to generate the report even if you’re not using a custom dimension.
The latter method using the clientId
is more accessible (doesn’t require a custom dimension) but it’s also going to have a lot of noise in the data. This type of report has much more value if you’re able to stitch GA sessions together with a user id.
Use cases
The point of the technique is to get a sense of user loyalty by measuring daily usage. How many users are visiting every day and how many are visiting once a month?
An L7 report tells us how many days users were active within a week. L28 tell us how many were active in a month.
This can be reported as a percentage, e.g., 12% of users are active 5+ days a week (Sequoia refers to this as L5+/7). Or we can look at it as a distribution, which is what the script below does, and this lets us evalulate the curve.
Daily usage is not the goal of every app but the same concept and technique applies for weekly or monthly use. The point is measuring frequency. The interval is up to you.
How to build the report
At a high level: we’ll make a query for each day in the date range to the get the list of users who visited that day, then build a dictionary/map for each user with the number of visits, then count the number of occurences of each visit number.
For the GA query we’ll use the user id custom dimension (or ga:clientId) for the dimension and Users for the metric. The date range is one day. This will return the list of user ids who were active that day. We’ll loop through the date range, making a query for each day.
The rest is just some basic data manipulation to rearrange the occurence of user ids into a distribution of visit counts. I’ve posted the complete script to Github but here’s the main part:
At this point we have a frequency distribution, not an actual histogram. You can use Matplotlib to plot the histogram directly from the script but for me the charts are too ugly so I copy the counts into charting tools or spreadsheets and generate the histogram there.
If you have any thoughts, feedback, or questions you can find me on Twitter: @robflaherty.