Roku & PiHole – A Deep Dive
Update 1: (12/16/2018) – GitHub Repository Made Public Here.
Update 2: (12/16/2018) – Added a new analysis as /u/Anchor-shark within the /r/
This is a work in progress. It’s not perfect but it’s just starting to get cool and I’m digging deeper! I think this is going to be the first post in a series. I say that because I need to get my hands on some older hardware and there are some other gears moving as well. Anyhow, here’s the post.
A while back (years ago), I added a PiHole to my network. The thing is a damn workhorse! If you don’t know what PiHole is, well, you’re wrong and you should! Long story short, it’s a network-wide add blocker with a ton of features. But most importantly, it has lots of color and looks pretty.
Anyhow, I was recently looking over the data on my PiHole and noticed a serious amount of traffic coming from my Roku’s. Before I start getting into the specifics, let me first describe the systems on my network.
I own 3 different Roku’s. All of which are 1-2 years old which is important for a few different reasons.
- They’re all running Roku OS v8+
- Old features such as TCPdump are now unavailable.
- Secret menus contain less functionality.
The Roku’s are all on a 192.168.1.x/24 network. This isn’t massively important but, it’s worth noting.
Roku Traffic Analysis
PiHole was showing that a large majority of all the traffic on my home LAN was coming from my three Roku devices. This isn’t too surprising since they’re streaming devices and at any given time one or two of them are active (wife, kids, etc.).
I still decided to investigate and take a deeper look into the data to see what the Roku’s were actually doing. First, let’s just take a look at the data within the PiHole Web UI.
The traffic displayed int eh images above are DNS queries that has been blocked and queries that have nothing to do with my streaming services (i.e., Netflix, Amazon, etc.). But still, those are only the blocked domains that are being seen so, a deeper look was necessary.
The next logical thing to do is to pull the logs from the server and start to parse them. The problem is PiHole rotates the logs every 5 days. So before you can jump right in, you need to change the logrotate configuration. I changed mine to rotate every 100 days. Full disclosure, the data that I will present here is from a 14 day analysis. I don’t expect a massive difference but, I thought I would put it out there.
I waited 24 days, pulled the logs, and wrote some python to strip the logs for the data I wanted. My initial criteria to narrow the logs down to a manageable size was:
- Only log entries with the DNS request coming from the Roku IP’s.
- Only the Date, IP, URI attributes shall be parsed.
The logs themselves are not in the best form so, first things first, translate the data I want to a CSV and store to disk for later parsing. Also, this makes it SOO much easier to ingest the data into a pandas dataframe for analysis.
Once the logs are in CSV and in a dataframe object, we can them parse out the following:
- Log entries that have a Roku logging servers listed as the URI.
- Log entries for logging servers on a per IP basis for individual analysis.
This left me with several csv’s:
- all_logs.csv – Contains all logs parsed from the 14 log files from the Roku IP’s.
- roku_logs.csv – All log entries that are *.logs.roku.com
- <ip>.csv – Three logs segregated by IP subject to the roku_logs.csv
My main goal with this initial analysis is to determine how much traffic compared to all traffic do the Roku’s generate on my network and how much of that traffic was Roku logging traffic (i.e., not streaming traffic). The last thing is a differential time analysis. That is, how often are the Roku’s beaconing out to the logging servers.
When you initially look at the logs, it seems that most of the Roku’s beacon out every 30 seconds to their logging servers. Sometimes, well most times, it’s multiple beacons every 30 seconds to different servers. Here’s an example:
24 Days of aggregated data
- Roku overall traffic made up 34% of all traffic on my LAN
- Roku direct logging traffic made up 14% of all traffic on my LAN
- Total Number of Logging Records: 115,594
- Beacons on average every 18.69 seconds over 24 days
- Total Number of Logging Records: 129,408
- Beacons on average every 16.69 seconds over 24 days
- Total Number of Logging Records: 149,977
- Beacons on average every 14.40 seconds over 24 days
So what does this mean? Well, it means that on average, a Roku is logging information about you and your family about 380 (2-4 sDNS requests per 30-40 sec) times per hour and 8,800 times per day, give or take a few hundred.
Update 1: Roku Logging Allowed
Since this project came from my PiHole logs, I thought I would get some internet constructive criticism from the /r/
Before we look at the data, I want to be as transparent as possible. I have made some adjustments on my timing function. Whereas I was originally looking at only unique timestamps per IP and then obtaining the time differential of the datetime objects via pairs, I am now simply taking the number of records (DNS Requests) and dividing them by the total number of seconds. The total number of seconds is determined by taking the last record in the DELTA_DATES array (i.e. DELTA_DATES[len(DELTA_DATES)-1]), which is a DateTime object, and subtracting it by the most recent date (i.e., DELTA_DATES). I felt that not only is this much simpler but, it’s more representative as I am no longer just measuring unique records. I have edited the initial results to reflect the new changes.
For the last 4 days, here is the information:
- Roku Overall Traffic Made up 47% of all traffic on my LAN.
- Roku Direct Logging Traffic made up 9% of all traffic on my LAN
- Total Number of Logging Records: 17,694
- Average Beaconing Time Interval: 18.19s
- Total Number of Logging Records: 3,541
- Average Beaconing Time Interval: 90.72s
- Total Number of Logging Records: 6,439
- Average Beaconing Time Interval: 49.87s
This is very interesting to me. The Roku that gets the most use by far is the living room (192.168.1.58) because it’s connected to our large 4K TV and is at the center of everything in our home. This guy is in use pretty much all day when we are at home (i.e., Music, Netflix, Sling, etc.). So it would seem that the more the system is being used, the more it is going to beacon out. It’s also worth noting that the other two Roku’s are not used as much. Especially the basement system as that’s really only used during parties.
If we look specifically at the Living Room Roku and breakdown all traffic over the 4 days by
The data seems to support Reddit’s point that Roku’s will phone home more if they’re being blocked, however if you are using those Roku’s frequently, it’s a moot point as it seems they log just as much if not more.
Let’s put my assertion to the test. The Living Room analysis above supports my assumption so, if we look at the Master Bedroom, where the Roku gets used before going to bed, we should see a spike around that time. And we do.
I think the big takeaways are:
- The system log activity is directly correlated to use. This is probably objective for everyone.
- Mild/Medium use of a Roku system generates just as much traffic as blocking DNS requests coming from Roku. We see a minor change in overall traffic from the Living Room host when we allowed all logging (0.5% difference).
- For systems not is regular use, the DNS logging traffic decreases by 50-70 seconds per beacon if traffic is allowed supporting
Redditspoint that blocking traffic is, therefore, causing more traffic. Again, if you use a system frequently, it will generate just as much traffic if you were to block the DNS requests.
What’s Roku Logging?
- Email Address
- Postal Address
- Phone Number
- Birth Date
- Social Media Accounts
- OAuth login information
- Shipping Information
- Purchase information
- Web Cookies
- Roku App Purchases
- Gift purchases
- Credit Card Information
- Personal Information on friends/connections:
- IP address
- Operating System type
- Operating System Version
- WiFi network name
- WiFi networking connection metrics
- Web Cookie Data
My next attempt to pull data from the logging PCAP’s was to DNS and ARP cache poison one of the Roku’s while running an HTTPS proxy and a self-signed certificate. This just ended up in a CA authenticity failure which, was to be expected. Maybe SSLStrip could work? Not sure yet but, this might not be the right path.
Here is what I am currently working on in attempts to get more information all together.
Roku’s used to have a TCPDump utility when you enabled developer mode. All of my devices have been connected and have auto-update on. There’s also no way to revert the box to an older OS. However, I think I have a good lead on a system that has not been connected for a few years and might be interesting.
Roku’s developer API and Brightscript:
Roku uses their own programming language called brightscript. The API is pretty well documented and it’s very simple to enable developer mode via secret menus and start pulling XML information from the system. The issue is there is no direct contact to the underlying OS (Which is Linux) with the exception of a telnet shell with access to the free command. And trust me, I’ve tried all sorts of command injection with that!
Within the Roku Developer documentation though it does talk about the way brightscript applications are sand-boxed and have limited access to system functions. The brightscript might be a dead end but, it’s worth a shot so I will build a few basic apps and see what information I can get from the system or the effetive application hypervisor.
Another interesting part of the Roku External API is the capability to send remote commands to the system via HTTP GET requests. By remote commands I mean the literal Roku Remote (Home, Netflix, Back, Up, etc.). This is interesting because it means that it may be possible to enable developer mode and manipulate settings without physical access to the system. So, if there was a possible exploit vector via a Brightscript application, that path to exploitation could hypothetically be automated.
RichardDecember 8, 2018at11:25 am
I’ve got several Rokus and it’s obvious they are phoning home all the time.
I’m going to bookmark this so I can keep up with your work.
BrandonJanuary 20, 2019at10:42 am
Thanks for this work, and the write-up. I’ll check back for any updates!
charybdisFebruary 11, 2019at10:46 pm
Very interesting work. I have been also looking into a way to strip ssl to see what my roku is beaconing but with no luck yet.
Randy ProctorMarch 2, 2019at1:35 am
I’m fine with them constantly trying to resolve if they are blocked…..this is BS and I’ll be setting up my own pi-hole on Monday
Co2May 10, 2019at5:34 am
Brilliant write up bookmarking for progress
Harry P. NyceJune 19, 2019at3:04 pm
Thanks for sharing this wonderful project. Looking forward to any further research you’ve been able to find time for. Actually stumbled across this as I was trying to recall how to properly update my Roku device(s), couldn’t remember if I could just whitelist a few domains, or if I’ve had to completely disable the Pi-holes (two for redundancy!) on the network before it’ll allow updates… but that is neither here, nor there.
FYI, the domain I had to whitelist temporarily to confirm update (Roku v9.0.0) was –> longview.sw.roku.com