follow us in feedly
Sample SQL Queries For Google Analytics BigQuery Public Dataset

Sample SQL Queries For Google Analytics BigQuery Public Dataset

Got messing around with BigQuery and thought of doing this post around using GA data in BigQuery. The Google merchandise store data is available for access on BQ and some of these queries should you help you. It uses a bit of what I learned off Udemy and StackOverflow.

A lot of dimensions can be swapped here to suit your needs. I have used samples here to stay within my BQ limits. Let's start with examples right away:

How to query all the distinct browsers that were used during a certain period 

#standardSQL find distincy browsers used
SELECT 
DISTINCT device.browser
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN 
'20170701' AND '20170701'
DISTINCT operator to show browsers.JPG

Let's revise this query a bit.

How to query a distinct list of browser where browser = Chrome

Here, we use the LIKE operator

#standardSQL
SELECT 
DISTINCT device.browser
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX 
BETWEEN '20170701' AND '20170701' 
AND device.browser 
LIKE 'Chrome'
LIKE operator to match Chrome.JPG

How to query all possible data against a particular visitId [Session ID]

#standardSQL
SELECT *
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE DATE 
BETWEEN '20170701' AND '20170701' 
AND 
visitId = 1498913144    

How to query using wildcard operator along with LIKE

Here, we use geoNetwork.country LIKE 'United%' - all possible matches AFTER United are included in this match. This yields UAE, UK and USA.

#standardSQL
SELECT 
DISTINCT geoNetwork.country
FROM 
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX 
BETWEEN '20170701' AND '20170701' 
AND
geoNetwork.country
LIKE 'United%'

How to query using wildcard operator before and after a word

In the below query, we'll use wildcard operator to find all distinct countries where the name contains Arab.

#standardSQL
SELECT 
DISTINCT geoNetwork.country
FROM 
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX 
BETWEEN '20170701' AND '20170701' 
AND
geoNetwork.country 
LIKE '%Arab%'
LIKE operator for countries - Arab.JPG

How to query all hits during second half of the day. This query uses UNNEST function to put all hits from an array to separate rows. This considers the fact that hits make up a session. Here's more details on the StackOverflow question I posted.

https://stackoverflow.com/questions/50889045/unable-to-filter-using-where-cannot-access-field-hour-on-a-value-with-type-arra

SELECT 
DISTINCT geoNetwork.country
FROM 
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) hit
WHERE _TABLE_SUFFIX 
BETWEEN '20170701' AND '20170701'
AND
hit.hour > 11;

https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays 

How to query data and SUM it within the query

SELECT 
sum(totals.visits)
FROM 
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
UNNEST(hits) hit
WHERE _TABLE_SUFFIX 
BETWEEN '20170701' AND '20170701';
SUM transactionRevenue.JPG

How to query data and assign a mathematical operator to the field. In this example, we summed the totalTransactionRevenue and used 20% of it as Cost Of Goods Sold. In Big Query, the Revenue value was coming off in scientific notation format. Therefore, the CAST operator uses the nested value of SUM *0.20 and formats it as numeric format. 

SELECT 
cast(SUM(totals.totalTransactionRevenue) * 0.20 as numeric) 
AS COGS
FROM 
`bigquery-public-data.google_analytics_sample.ga_sessions_*` 
WHERE _TABLE_SUFFIX 
BETWEEN '20170701' AND '20170701';
COST Cast Function.JPG

https://stackoverflow.com/questions/50904865/bigquery-convert-scientific-notation-to-decimal-format

In GA, there are four possible scopes: User, Session, Hit and Product.

Within a session, a user can have multiple transactions. In the above BigQuery, it does not need the UNNEST function as we had to in the hour of day query. The answer I got on StackOverflow wa that revenue is already an aggregated field made up of individual transaction and product revenues. Hence, you don't need to UNNEST it and move it from array to rows.

https://stackoverflow.com/questions/50903385/bigquery-why-is-unnest-operator-not-required-for-pulling-transactions-data-in

 

https://www.w3schools.com/sql/func_sqlserver_cast.asp

These are some of the examples that I could come up with it - thanks to Udemy and StackOverflow. Will create a separte post for more examples that I can find.

Anyone know what 'Lifetime Matched Audience Targeting Consumptions by Type' means?

How To Extract Hostname of External Domain in Google Tag Manager

How To Extract Hostname of External Domain in Google Tag Manager