Analytics superhero (and one of my five favorite people I’ve never met in person) Avinash Kaushik wrote a marvelous blog recently about reconciling conflicting data between different platforms. Reading it, of course, would scare the bejesus out of the web analytics beginner, and rightfully so. However, I was at odds when Avinash seemed to recommend taking one platform and running with it.
Now, who am I to question his advice? Well, I think it depends on the audience. There are an immense amount of small companies (trust me, I talk to them all the time) who are using analytics but don’t understand that the data is, and always has been “dirty”. This isn’t an exact science yet folks. Beginners and casual analytics users may not always get this, which is why I always recommend that people use a few different analytics packages, to understand that the data’s not perfect.
This of course, leads to inquiry, which the man himself is a huge champion of. Understand how metrics are defined from one platform to the next (they’re rarely the same), and how each collects information. You’d be surprised of the differences. If you have huge discrepancies, go right to your vendor and ask! It’s that simple. Ultimately, I think it’s important to ask the right questions, and when you see the differences (by using multiple platforms), you’ll have a better idea what to ask.
If you haven’t done this before, let me show you a couple days worth of data from a very low volume site with MSN Analytics, Yahoo Web Analytics and Google Analytics. The metric of interest here is visitor origin.
I’m a big proponent of Google, and find myself working with GA as my primary tool more and more as they continue to make improvements. I’ve found historically, particularly for geographic data, that they are more accurate when compared to other platforms (based on other available data).
Next, Yahoo over the same time frame:
Quite the disparity, huh? For people starting out in analytics, I think this is a necessary lesson. One of the most difficult things in this business is to explain to people that the data’s not perfect. There’s a confidence level that you must come to, communicate that (to your boss, client, or whomever) when you supply these reports. Otherwise, you’re putting yourself, your company, or your client at risk. None of those are good things if you’re scoring at home.
It’s been a while since I looked into the technical specifics for each platform (not sure if I’ve ever looked into Microsoft’s too closely, since I don’t use it very often), but Google’s method for calculating location is as follows (as found here):
Google Analytics uses your visitors’ IP address to determine where they are located geographically. Using a 3rd-party datasource, the IP address is translated to a physical location. In most cases, Google Analytics is able to determine where your visitors are coming from; however, if our 3rd party vendor does not have an accurate record of the IP address to determine the location, Google Analytics will display a “(not set)” entry.
So, my recommendation? Try a few different platforms, get the feel for them and start to understand how to proceed with “dirty” data. Trust me, it’s more fun than it sounds.