follow us in feedly
How To Create a Sankey Diagram With Google Analytics Data In R Studio

How To Create a Sankey Diagram With Google Analytics Data In R Studio

Let’s say you want to rebuild the behavior flow report from Google Analytics in R Studio. This is only a basic tutorial and will add enhancements to this such as direct Google Analytics API data pull, multi-level Sankey etc [will cover in a separate post].

What we need for this:

The basic script is from a package: plotly

https://plot.ly/r/sankey-diagram/

Google Analytics Data on landing page by channel. Example:

Google Analytics data for landing page by channel.JPG

Here, the channel nodes have been marked in rows [Nodes 0-4] while landing pages are in columns [nodes 5-8]. Even though we know this, this is implied in the R code. First element of the node array is given a position of 0.


Full code:

  node = list(
    label = c("Organic Search", "Direct", "Referral", 
              "Social", "Paid Search",
              "Homepage","Products","Services","Contact"),

Once you know the node ID…you then just need to connect the source [Channels] to target [Landing pages] and assign values.

  link = list(
#All channels become the sources...so, nodes 0-4
    source = c(0, 0,0,0,0,
               1,1,1,1,1,
               2,2,2,2,2,
               3,3,3,3,3,
               4,4,4,4,4),
#Landing pages become the target...so, nodes 5-8
    target = c(5,6,7,8,
               5,6,7,8,
               5,6,7,8,
               5,6,7,8,
               5,6,7,8),
#Assigning values between nodes...
#example, Node 0 to 4...Organic Search to Homepage value = 400
    value =  c(400,40,15,2,
               100,120,50,30,
               75,12,12,5,
               124,11,11,4,
               120,12,15,0,
               )

In Source, the first row covers Organic Search or node 0…while in Target, the first row represents the different landing pages that are getting traffic from Organic Search. This is then linked to the first row in Values. [400, 40, 15, 2]…meaning Organic Search had 400 sessions starting from the Homepage, 40 Organic Search sessions from Products, 15 Organic Search sessions from Services and 2 Organic Search sessions from Contact. This is then repeated for other channels.

Once the code executes, you can then publish this data to a web page and then sharing this with others. Example, I created this one using the below code:

http://rpubs.com/madilkhan/sankey-diagram-channel-landing-page-data

Will create a separate blog post on how to directly pull GA data from the API and convert it to a Sankey [after I learn] and also, multi-level Sankey diagram [Level 1 - Landing page, Level 2 - Next page path]


Full code below:

install.packages("plotly")
library(plotly)

#create a basic sankey
p <- plot_ly(
  type = "sankey",
  orientation = "h",

#each element is a node here...Orgainc search is node 0,
#Direct is node 1,
#Contact is node 8
  node = list(
    label = c("Organic Search", "Direct", "Referral", 
              "Social", "Paid Search",
              "Homepage","Products","Services","Contact"),
#assign colours to each channel
        color = c("green","blue","yellow","pink","purple"),
    pad = 15,
    thickness = 20,
    line = list(
      color = "black",
      width = 0.5
    )
  ),
  
  link = list(
#All channels become the sources...so, nodes 0-4
    source = c(0, 0,0,0,0,
               1,1,1,1,1,
               2,2,2,2,2,
               3,3,3,3,3,
               4,4,4,4,4),
#Landing pages become the target...so, nodes 5-8
    target = c(5,6,7,8,
               5,6,7,8,
               5,6,7,8,
               5,6,7,8,
               5,6,7,8),
#Assigning values between nodes...
#example, Node 0 to 4...Organic Search to Homepage value = 400
    value =  c(400,40,15,2,
               100,120,50,30,
               75,12,12,5,
               124,11,11,4,
               120,12,15,0,
               )
  )
) %>% 
  layout(
    title = "Website Top Landing Pages by Channel",
    font = list(
      size = 10
    )
  )
p
# Create a shareable link to your chart
# Click on publish in Viewer tab and setup a RPub account
#example: http://rpubs.com/madilkhan/sankey-diagram-channel-landing-page-data
How To Calculate Month-On-Month Change In Website Traffic In R Studio

How To Calculate Month-On-Month Change In Website Traffic In R Studio