Self-Hosted Web Analytics - When a spare tyre turns into a life vest

Written by Adrian Grigore

In a previous post we looked at the implications and recommend actions for data and analysis teams following Google’s announcement of the sunset timeline for Universal Analytics.

If you’ve not read that one, you find it here: Analytics 4 and how to prepare for the future.

So, considering the imminent, permanent changes to the landscape, now might be the time that you and your business start thinking about building your own in-house web analytics capabilities.

But bringing your web analytics tech completely in-house can be difficult. It’s a process reliant upon many factors, both within and outside your company’s control. A complex undertaking indeed, and not for the fainthearted.

But is it worth taking the plunge?

There are 3 important points to consider:

First off, any 3rd-party service provider is on your side as long as it is profitable to them. All is fair in love and business, after all.

Secondly, especially with SaaS models, tools are rarely built to specifications (so you end up implementing workarounds).

Last, but not least, features that you rely on may be discarded at any time. What’s important to you might not necessarily be important to all the other businesses these SaaS companies work with, and they can be taken away without any chance to retain the functionality.

So, back to my opening gambit: Is it worth investing in in-house data tech?

For the most part, yes.

If you can handle the work and bear the costs, this is a step you should take if you want to own all your data, end-to-end. There are limitations to this, though – you may not want to build everything from scratch (unless you have hordes of developers & data scientists on stand-by).

But it should be fairly safe to pick some tools you can self-host and decide WHEN, HOW and IF you should change anything about them. Although it’s important to remember, that you now own the responsibility for maintaining and managing the whole connected system – through the good times, and the bad.

In case of emergency, you're on your own

Let’s begin with a step back. There are some interesting events occurring in the web analytics world: Universal Analytics is being phased out in just over 1 year; in less recent news, France and Austria have deemed Google Analytics illegal for sending EU citizens’ data to the US (due to how data is stored by GA).

This may have a ripple effect across the EU, and Google’s competitors in the space (Adobe Analytics, Mixpanel etc) could suffer the same fate or be forced to adjust their infrastructure. In either case, this series of unfortunate events (for marketers) presents several short and medium-term risks:

  • Less data (already a problem since GDPR)

  • Increased data collection costs

  • Increased data ops complexity

The first point is somewhat straightforward – power down your GA in markets that disallow it and you’re operating blindly, in some cases.

Points 2 & 3 are connected – to compensate for being driven out of some markets, Google may have to increase their prices in others, or, left with fewer choices, companies may gravitate towards other, more expensive options. Additionally, due to how popular GA is, a lot of web data analysts are familiar with it and have built their SOPs around it.

While it is true that, at their core, most of these services offer similar features and capabilities, updating technology as ubiquitous as an Analytics platform is bound to be slow, complicated, and expensive.

Even starting from scratch (with no backwards compatibility built into the data model) would cost a B2B company a pretty penny. And even if Google Analytics won’t go away completely in some markets (it won’t), Universal Analytics will. And for all intents and purposes, the same struggles remain.

How does the man who drives the snowplough drive to the snowplough?

Volkswagen ran this famous headline with a purpose – their cars were sold as tough, resilient, sturdy. Good enough to work in the absence of the snowplough. At least for a while.

If you look closely, you’ll see that most of these tools use their competitors in their apps (e.g.: Mixpanel’s website runs Google Analytics alongside Mixpanel itself). This is good practice!

Important business assets – and analytics systems should be considered one – are built to be resilient. But even though companies are quick to tout the value of data, they rarely invest enough in this sector (recent years have been the exception rather than the norm).

If we set aside the risks highlighted above for a second, it’s still dangerous to rely on a single analytics tool for the same reason it’s dangerous to rely on any other type of tool – the lack of redundancy leaves you exposed to factors outside your control, such as:

  • Platform biases (e.g.: sampling and measurement errors, black box attribution modelling etc.)

  • Lack of long-term support (e.g.: features being removed or discontinued, such as “Next Page Path”)

  • Lack of control over collected data (e.g.: hiding/not exposing some data dimensions for various reasons)

  • Lack of control over data permanence (e.g.: inability to automatically erase data older than a set amount of time)

Using multiple analytics tools won’t solve all these issues. But it should alleviate the symptoms of most of them.

If you can't make your own, store-bought is fine

A good start, especially for companies in highly regulated fields, is to pick up a solution they can manage entirely. Having recently reviewed the sector, four solutions stood out to us, given their features and available integrations:





The tools above provide User/Session/Pageview/Event counts and they can all add media campaign labels to traffic. Matomo and are by far the most feature-rich, and for a first dive into self-hosted analytics, they should be enough.

As you’d expect, there are various advantages and disadvantages to migrating away from Google/Adobe analytics, depending on your specific use cases. So, these systems should not be considered as replacements unless necessary (as mentioned before, some industries may have to comply with extremely strict legal policies).

Let’s take a closer look at some of the pros and cons of self-hosted analytics solutions.


  1. Full control over collected data

    (PII can be safely stored on the company's own servers and shared with other systems)

  2. All tracked data visible

    (GA, for instance, hides clientID, sessionID and other dimensions)

  3. Better GDPR compliance

  4. Lower likelihood of being blocked by adblockers

    (tracking requests are sent to the main domain)

  5. Improved Page Speed

  6. Configurable to measure fewer metrics/dimensions

  7. SAML login support

  8. No data sampling

    (GA Reports often estimate metrics to save processing power)


  1. No integration with some media buying/monitoring tools

    (e.g.: G. Ads/ Search Console)

  2. Not all CDPs integrate natively with the product

    (but this can be offset by the database hosting service that may integrate natively with CDPs, e.g.: GCP, AWS, Azure etc.)

  3. Smaller developer community

  4. Another product to maintain by the dev/IT team(s)

  5. Fewer analysts with a proven track record of using the tool

  6. Licensing costs for some features

Don’t rush in, but don’t take too long either.

So, where does this leave us?

Before considering the above, audit your data and the systems associated with it. Investigate what your stakeholders’ data needs are and how well they are being met. Honestly assess how you’re using (or how you should be using) data in your company. Ask how data literate your company is. Assess the risks and costs of action and inaction.

You may be surprised by what you find.

Need a hand?

If you’re not sure where to start looking, how to make sense of what you find, or what steps to take next, then we’re here to help! Drop us a line and our data team will be happy to set up a call to discuss your analytics setup.

Contact us.



You might also like