Piwik is the web analytics framework for hackers. By providing access to raw page view data, Piwik allows analysts to use general purpose tools for analysis. Piwik stores all of its data in a MySQL database. I’ve written an R library piwikr to download and clean the tables stored in Piwik’s database. To get started let’s connect to the database:
library(piwikr)
my_db <- src_mysql(
host = "host.com",
user = "andrew",
password = "xxxxx",
dbname = "piwik"
)
Below I retrieve tables describing all visits to the site and all actions taken by visitors to the site.
visits <- get_visits(my_db)
actions <- get_actions(my_db)
piwikr comes with functions to compute new tables from the primary tables. The four tables constructed below describe visitors to the site, days the site was actively collecting data, pages on the site, and sources of traffic to the site.
visitors <- compute_visitors(actions)
days <- compute_days(actions)
pages <- compute_pages(actions, base_url = "amarder.github.io")
sources <- compute_sources(visits)
piwikr also comes with functions for creating graphs. How much traffic has the site generated over time?
graph_visitors_vs_date(days)
nvisitors <- nrow(visitors)
ndays <- as.numeric(max(actions$day) - min(actions$day))
arrival_rate <- nvisitors / ndays
The site has attracted 3076 visitors over 155 days. The overall arrival rate was 19.85 visitors per day.
What pages on the site have been viewed by the most visitors?
library(dplyr)
library(pander)
pages %>%
mutate(Page = paste0('<a href="https://amarder.github.io', page, '">', page, "</a>")) %>%
select(Page, Visitors = visitors) %>%
head(10) %>%
pandoc.table(style = "rmarkdown", split.table = Inf, justify = "ll")
Page | Visitors |
---|---|
/power-analysis/ | 2364 |
/clustered-standard-errors/ | 320 |
/responsive-d3js/ | 280 |
/ | 147 |
/analytics/ | 62 |
/piwikr/ | 50 |
/diamonds/ | 48 |
/books/ | 43 |
/big-data/ | 17 |
/data-visualization/ | 17 |
How are visitors finding the site?
sources %>%
select(Source = source, Visitors = visitors) %>%
head(10) %>%
pandoc.table(style='rmarkdown', justify='ll')
Source | Visitors |
---|---|
(direct) | 2338 |
327 | |
t.co | 155 |
feedly.com | 52 |
flipboard.com | 43 |
news.ycombinator.com | 40 |
popurls.com | 10 |
post.oreilly.com | 9 |
us3.campaign-archive1.com | 8 |
us6.campaign-archive2.com | 8 |
How important is mobile / how large are the visitors’ browser windows?
graph_browser_resolutions(visits)
pct_mobile <- 100 * mean(visits$screen_width < 800, na.rm = TRUE)
14.6% of visits were performed on a screen with width less than 800 pixels.
piwikr can also visualize how users navigate from page to page on the site. Each node in the graph below represents a page on the site, the size of a node is proportional to the number of visitors who have viewed the page. The width of each edge is proportional to the number of visitors that traveled between the two pages.
set.seed(2)
graph_site_structure(actions, base_url = "amarder.github.io", n = 14)