So what exactly is BigQuery? BigQuery is a data warehouse where customers can store all their data. The data is stored in large Excel-like tables with millions (sometimes billions) of rows. The most common use is for customers to store data from Google Analytics, but it can also be CRM data. To retrieve the data, you need to tell BigQuery what you want to fetch, which you do by writing SQL code.
With that said, I want to take the opportunity to explain how SQL and BigQuery have made me a better analyst and why you should also learn SQL.
Ok, here we go:
For me personally, it has been an important part of future-proofing myself. How can I call myself an analyst if I can only access data through the Google Analytics interface? With SQL under my belt, I can now retrieve data from other databases such as customer registers, making the analyses even sharper.
Google Analytics only offers averages, which can be misleading and actually doesn't provide much information. If you are in Google Analytics and see that the average Number of Sessions per User is 5.5, it is very difficult to get an idea of how many visitors this actually applies to. 90% of visitors might only make one visit, while a few individuals raise the average to 5.5. This is especially important when it comes to revenue calculations, of course.
Since BigQuery gives you access to all the data that has been collected, it becomes extremely easy to troubleshoot. Retrieving all the hits a visitor has made, neatly sorted from the first visit to the last, makes it easy to find errors in tracking, but also to create customer journeys. It contributes to a better overall understanding of how Google Analytics works on the backend.
BigQuery gives you all the data, which means you avoid annoying signs like “This report is based on 60% of your data.” In regular Google Analytics, you can encounter sampled reports, which means that the numbers you see are calculated from a small sample of all traffic.
For larger sites, it can be a problem to create a good picture of movement patterns. You can't create 1000 page segments to see which pages the visitor has seen. In BigQuery, this is often easy to write with a few lines of code. Fast, unsampled, scalable.
Inside BigQuery, everything is just tables of data, which means you don't have any ready-made graphs to distract you. This means you need to work much more structured as you don't get anything for free.
Last but not least. Since all data is available in BigQuery, you can create your own metrics that best describe customer behavior on your site. How many page views do we have where a visitor has spent more than 30 seconds on a page?
Of course, there are also disadvantages to BigQuery. The high price of acquiring BigQuery means that it is usually larger companies that can afford the service. At the same time, it is almost more or less a must when the sampling becomes so large that the data no longer feels reliable.
BigQuery makes it possible to retrieve data that is otherwise not available in Google Analytics. How you choose to export the data from BigQuery can vary - in some cases, you want the answer aggregated directly. Other times, you might want to extract six million rows to do the analysis with Python and Jupyter Notebook, but that's for another blog post. Stay tuned!