Block YouTube Ads on AppleTV by Decrypting and Stripping Ads from Profobuf

So many ads

In a Nutshell

I discovered that putting a man-in-the-middle proxy between my Apple TV and the world lets me decrypt HTTPS traffic. From there, I can read the Protocol Buffer data Google uses to populate YouTube with ads. It is too CPU-intensive to decode Protobuf on the fly, so instead, I found a flaw in the Protobuf format which allows me to reliably change one byte to obliterate ads.

What follows is a reference guide for setting up a bare-metal network router to block malicious ads, obnoxious ads, tracking, clickbait, crypto-jackers, scam popups, Windows spying on you, etc. using blocklists to protect all networked devices.


Goal: Let’s build a cryptographically-strong router with FreeBSD and pfSense to completely block YouTube ads using a flaw in the Google Protocol Buffer format to completely block pre-roll, mid-roll, and end-roll YouTube ads on Apple TV and iPhones, network-wide.
Disclaimer: I want to support content creators, so to be fair, after a few months of blocking YouTube ads, I am now paying for YouTube Premium; Just because I can break something, doesn’t mean I need to.

Sections

Part 1 – Setup pfSense on Bare-Metal

  1. Why block Ads and Behaviour Tracking?
  2. Required Router Hardware
  3. Unboxing the Hardware
  4. Install pfSense on Bare Metal
  5. First pfSense Boot
  6. Enable the AES-NI Cryptographic Instruction
  7. Enable RAM Disk
  8. Dashboard Widgets
  9. Adblocking with pfBlockerNG
  10. Isolate LANs for Security
  11. Class B IPv4 172.31.1.0/24 Network for Untrusted Devices
  12. Add Firewall Rules

Part 2 – Isolate Network LANs

  1. Setup the Untrusted Wi-Fi AP
  2. Automatic pfSense Configuration Backups
  3. Unable to Reach 172.31.1.x from 192.168.10.x
  4. Replace Stock Firmware on the AC1200 Wi-Fi Access Point
  5. Archer C5 v2 into the Refuse Bin, R7000 as the New Wi-Fi AP
  6. Set up the Trusted Wireless Network
  7. Network Devices Interconnectivity Check
  8. Windows File Sharing Gotchas
  9. Public Service Announcement: Edge Browser

Part 3 – Setup DNS Adblocking

  1. Block Clickbait, Incessant Ads, and Dangerous Sites
  2. Intercept all DNS Requests, Even to Hardcoded DNS Servers

Part 4 – Trick the YouTube Ad Algorithm

  1. How to Restrict Apple TV YouTube Ads?
  2. Trick the YouTube Ad Algorithm Instead
  3. Research into YouTube Advertizing Spend
  4. New Goal: Convince YouTube I’m 70 and in Italy
  5. Selectively Route Apple TV Over the VPN
  6. Selectively Route Apple TV YouTube Traffic Over the VPN
  7. Gotcha: DNS Race Condition
  8. Gotcha: Authentication Trouble, Forbidden 403 Error
  9. Gotcha: YouTube is Now Showing UK Ads, Not Italian Ads
  10. Find a VPN Exit Node with no ASN Leak
  11. Hijack Google Video DNS Queries
  12. New Goal: Programmatically add IPs to the Firewall Policy Rule
  13. Research Python Methods to Hijack DNS Queries
    i. Rsync Disk Backup
    ii. Install pfSense REST API
    iii. Explore the Unbound Python Module
  14. Smoke Test: A Python DNS Hijacking Script

Part 5 – Decrypt HTTPS Traffic

  1. New Goal: Research and install a Squid-like proxy
    i. Fun fact: Jailbreaking iPhones in Japan
  2. Install a Fake-but-Trusted CA Cert on Apple TV and iPhone?
  3. Experiment with Squid and SquidGuard
  4. Self-Host the CA Certificate
  5. Abandoning Squid: Too Slow, Too Heavy
    i. Rsync Diff of Changes
  6. Install MITMProxy in a FreeBSD Jail
  7. Exploring MITMProxy
  8. Patch MITMProxy Source Code for Server SNI Interrogation

Part 6 – Intercept Apple TV and iOS YouTube Ads

  1. Smoke Test: Intercept YouTube Ads with MITMProxy
  2. Examine uBlock Origin Regex Patterns for Inspiration
  3. Surgically Alter the JSON Response to Remove Ads
  4. The iOS YouTube App Uses Protobuf, not JSON
  5. Timing Analysis to Detect Ad Videos?
  6. Decode the YouTube Protobuf Responses
  7. Ad URL Polymorphism
  8. Smoke Test: Intercept and Decode Protobuf in Python
    i. Pure Python Benchmarks
    ii. Pure C++ Benchmarks
  9. Fuzzing the YouTube Video Ad Responses
  10. Enter Burp Suite Tools for Penetration Testing
  11. Exfil the Proto Schemas from the App, Cleanly?

Part 7 – Reverse-Engineer Protobuf Messages

  1. Hardcore Deep-Dive into Protobuf and Wire Format
  2. Exploit a Protobuf Flaw to Easily Remove All Ads by Changing One Byte
  3. Smoke Test: Remove Ads from Protobuf in O(n)-Time
  4. Analysis of this Successful Adblocking Technique
    i. Summary
    ii. Timing Analysis
    iii. Knock-On Benefits
    iv. Future-Proof
    v. Should Google be Worried?
  5. The MITMProxy YouTube Adblocking Script

Part 8 – Summary

  1. YouTube Premium
    i. Experiment in Ad Viewing
    ii. $0.15 as a Ballpark CPV
    iii. CPV from US Advertising Spend Divided by Total Views
    iv. Is YouTube Premium Worth It?
  2. DMCA, Sony, Viacom
  3. Summary of Accomplishments

Why block Malicious Ads and Behaviour Tracking?

You are a valuable commodity that is bought and sold without your knowledge or consent. You will be tricked with clickbait, distracted with large ads, and enticed to leave the site you are on at every opportunity. Plus, everything you do online is being monitored so your habits and searches can be remarketed and sold over and over again for years.

clickbait

Privacy – Knowing what you like to watch and read, what phone you have, what you watch on Netflix, what you shop for, what you ask Alexa about, yout taste in music, etc. is unbelievably valuable to advertisers. Spying on people is such a big problem that Europe passed the GDPR law so every site you visit asks if you are okay with cookies (and we blindly click “ok” to hide the banner). We must wrestle back privacy ourselves.

Bandwidth – If privacy doesn’t concern you, how about this: it is well-known that between 25% and 40% of network traffic is ads, tracking, JavaScript to load trackers (fingerprint.js, googletagmanager.js), websocket traffic to collect how you scroll and what you type (Hotjar), and the like. Do you have a 100 Mbps internet connection? Consider it 60 Mbps!

Clickbait – Then there is clickbait. “You won’t believe what Tom Cruise did. He…” and you may want to click. Then you are in the spider’s web. How about fake news? Or articles that don’t say “sponsored” in size-8 font, but now say “underscored” to be clever. What is even real anymore? As soon as you click on clickbait, you may end up on a page with a dozen more ads that aren’t approved by Google but lead to a dark world of maliciousness. Clickbait is so incredibly profitable to scammers.

Cryptojacking – Some websites will load crypto-mining JavaScript (e.g. CoinHive.js) so while you read, they overheat and abuse your computer to try to make a few pennies. Some sites will load JavaScript that tries to steal from your crypto wallet or trick you into transferring cryptocurrency.

Takeaway: It is highly lucrative yet detrimental to you to track and trick you, and only you can do something about it.

Top ↩


Required Router Hardware

A virtual machine, Docker image, or Raspberry Pi are not performant enough to protect a whole SMB network; We need dedicated hardware with a cryptographic instruction set so that its only function is to route, decrypt, and monitor packets in and out. Here is what I used.

  • A mini PC with the AES-NI instruction set (e.g. J4125)
  • Several gigabytes of DDR4 RAM (e.g. 32 GiB)
  • A decent mSATA SSD drive (e.g. 128 GiB)
  • A USB drive to transfer pfSense

Top ↩


Unboxing the Hardware

I’ve ordered a mini J4125 PC from AliExpress, ordered 32 GB of DDR4 RAM and a 128 GB mSATA from Amazon, and will assemble them for the first time now.

Warning: Out of caution, I searched diligently for a barebones mini PC that did not include RAM or an SSD; there is nothing stopping an overseas seller from including some generic RAM and SSD but charging Samsung prices.
Tip: 128 GB of disk space on a router? Yup. That should be plenty of space to hold logs and not wear down the SSD too quickly, and to allow beautiful packet capture (and maybe an edge cache for NPM and Docker?).

A beautiful box, isn’t it? It only has 3 LAN ports, but it can be extended with network switches.

The J4125 AES-NI quad-core fanless mini PC
The J4125 AES-NI quad-core fanless mini PC
The J4125 pfSense router from a fanless mini PC
The pfSense router from a J4125 fanless mini PC

Top ↩


Install pfSense on Bare Metal

I’ve never used pfSense before, so we will explore this together. The compressed image is about 360 MB and can be flashed to a USB drive with an AppImage binary of Etcher (very cool). Decisions, decisions: VGA install or serial? Let’s serial into the new router. Why not?

Let's not serial into the router
Let’s not serial into the router

Well, that looks painful. It would also be a whole production to serial into the box in case of an emergency because the serial port is inside, and there isn’t even an RS232 or JTAG connector – just some narrow header pins. Yikes. Let’s go with VGA and plug a keyboard into the USB port – get ready to navigate with arrows and tabs.

J4125 mini PC BIOS over a VGA cable
J4125 mini PC BIOS over a VGA cable

I’ll follow this guide on YouTube. I’ll pass on encrypting the disk since I would like to avoid entering a passphrase each time the mini PC reboots. A stripe disk is fine since there is only one disk. I have no idea what to expect yet, so I will pass on dropping to a shell for a more advanced configuration.

Top ↩


First pfSense Boot

I ejected the USB containing the boot image (important) and rebooted the little box. It played a melody on the internal speaker (there is an internal buzzer and thankfully it isn’t very loud).

Do I need to have a LAN connection already, or can I just start the thing? I’ll just start pfSense and let it complain to me if it wants… and according to the YouTube tutorial, I should guess which port is LAN 1. I’ll do that now.

I figured out that I should set the LAN 1 to a static IP address that is not in my existing router’s DHCP range, so I went with 192.168.1.3. Now I can access an admin web portal (admin/pfsense). Hooray.

Yikes, the mini PC beeped at me and informed me that ‘admin’ has logged in. That startled me a bit, but hey that is pretty neat.

First time logging into pFsense admin UI
First time logging into pFsense admin UI

Top ↩


Enable the AES-NI Cryptographic Instruction

I played around with the wizard, used defaults, and got to the web configurator. The first thing that caught my eye was AES-NI CPU Crypto: Yes (inactive). I went out of my way to get a mini PC with AES-NI. What gives?

Ah, this needs to be enabled in System > Advanced > Miscellaneous. Why not auto-detect this and use the best option? I’m glad I spotted that, or else this mini PC might as well be a Celeron J1900 of yesteryear.

Top ↩


Enable RAM Disk

Having 32 GiB of RAM, let’s take advantage of that and use a generous amount of RAM for /var and /tmp, and since hopefully this 128 GiB SSD has wear levelling, let’s take a RAM Disk backup every hour.

Let's take advantage of RAM disk
Let’s take advantage of RAM disk

Reboot! AES-NI is now active.

Top ↩


Dashboard Widgets

This dashboard is pretty slick. I’m just discovering that there are widgets that can be added to the Dashboard, including S.M.A.R.T to alert us if the SSD is going bad. Nice.

Final pfSense dashboard after all setup
Final pfSense dashboard after all setup

Hang on, when I added the Services Status widget, something called PC/SC Smart Card Daemon shows up. What is that? Research shows it’s a daemon for hardware smart keys that we can probably do without(?). It can be disabled in the /etc/rc.bootup file like so:

Wait. After some time went by, I noticed the router slowed down, fatally.

IPsec without the SD Card Service will cripple the router
IPsec without the SD Card Service will cripple the router
Warning: Do NOT try to disable the Smart Card Service as it is needed by IPsec; if you start experimenting with an IPsec VPN tunnel and the pcscd daemon is disabled, then your hard disk will fill up with logs and your CPU will run hot.

Top ↩


Adblocking with pfBlockerNG

This unboxing and setup has been fun, but I’d like to block all the bad traffic on my network. I’ve been using a workhorse of a DNS-level adblocker called Pi-Hole on a… yes, Pi, but it would be nice if I can reclaim that wee bit of hardware for something else and use a comparable add-on module in pfSense. Let’s explore that now.

pfBlockerNG is a very powerful package for pfSense® which provides advertisement and malicious content blocking along with geo-blocking capabilities.

Question: Do I install the first pfBlockerNG or the pfBlockerNG-devel which feels like a developer version? I’m a software developer, so this is for me, but am I a pfSense developer? No. Maybe it will show me advanced logs or I can mess about with LUA? Let’s Google this.

From here, random people are saying to install the development version. Another blogger advocates using the dev version as well. Meh, I guess we can install jq, rsync, and Python 3.8. This doesn’t feel like a development version since it has exciting dependencies.

Install pfBlockerNG-devel not the other one
Install pfBlockerNG-devel not the other one

That was painless and only added an extra 20 MiB. It seems a lot of the dependencies are part of pfSense already. The knight at the end of Raiders would say that I have chosen wisely (hey, why did Indy age like a normal person up to Indy 4 if he drank the immortality water that the thousand-year-old knight also drank?).

Wizard time.

The pfBlockerNG wizard had four steps but step three is like 50 steps in one
The pfBlockerNG wizard had four steps but step three is like 50 steps in one

There are a lot of options in step three. This is not like Pi-hole at all. I’m going to come back to this and set up my network instead so I accomplish retiring my Nighthawk R700 or giving it new life as a Wi-Fi AP.

Fix: If the pfb_dnsbl service won’t start or the status tab states [ Missing CRON task ], try deleting the empty file /var/run/booting (ref).

Top ↩


Isolate LANs for Security

An opportunity has presented itself: I can create real networks on each of the three router Gigabit ports (not VLANs), and should I do so? Yes, yes I should. I would like a dedicated hardware network for all my home-phoning spy devices (Alexas and Apple TV) so they don’t flood my network with metrics info and “sure I’m muted and not listening to you” audio payloads back to their HQs.

I can see it now: A Wi-Fi AP on a hardware LAN that is isolated from everything else and dedicated to these gadgets, and runs through the adblocker and traps hard-coded DNS queries to 1.1.1.1 and 9.9.9.9 and others (I’ll have to explore this) so YouTube on my TV doesn’t sneakily bypass Pi-Hole any DNS-level blocker. It’s so Utopian an outcome I may not be able to sleep.

I’ve decided that my bottom-shelf TP-Link wireless router that is so old that AC1200 might as well be “A.D. 1200” is going to be my Wi-Fi AP for those IoT spy devices.

In sum, there will be a dedicated hardware LAN

  • with a wireless AP (AC1200) for Amazon/Apple gadgets and the TV.
  • with a wired switch for all the beefy computers and clusters in my lab.
  • with another wireless AP (R7000) just for iPhones and watches.

As an aside, since doing an Offensive Security hacking course in my spare time, and I rare-earth-magnet-strongly suggest isolating Wi-Fi devices from any critical LAN segments connected to devices that touch daily banking or stock trading (or crypto wallets).

Top ↩


Class B IPv4 172.31.1.0/24 Network for Untrusted Devices

The class B IPv4 range 172.16/16 is a valid range of private IP addresses. I’m not uncomfortable with Alexa and Apple TV being even on the same class network as my main LAN segment, so I will banish them to the class B private network at the hardware level, and my more trusted LANs will be on the traditional class C network (192.168/16). This helps mitigate any misconfigured iptables rules by naturally having no routes between the two networks.

Set up a physical network for the untrusted smart devices
Set up a physical network for the untrusted smart devices

Be sure to enable the DHCP resolver on the physical NIC that will connect smart devices (which mainly just tell me the weather and creepily listen to me sleep).

From this point, DHCP works on this new network, but by default, it assigns IP addresses but does no routing. All traffic is blocked by default.

Top ↩


Add Firewall Rules

We need to manually add rules so traffic on the physical NICs goe somewhere.

Our first rule to allow eth3 to access the Internet
Our first rule to allow eth3 to access the Internet

There is a logging message. Let me reproduce it below.

Hint: the firewall has limited local log space. Don’t turn on logging for everything.

I read that to mean, “Congratulations on not cheaping out on your SSD. Now go forth and log everything, my son.”

I’m not a new-age, fancy-jazz, coloured-light- or smart-plug-controlling guy who forgot how to turn on a light without his phone, so I do not need to have smart devices on the same network as my phone (why create dozens of wireless attack vectors into your home?). I’m classicly trained to actuate an electromechanical current interrupter on the wall and light let there be.

Top ↩


Setup the Untrusted Wi-Fi AP

How do I reach the admin UI of AC1200 Wi-Fi AP now? I factory reset it, plugged the WAN NIC into the ETH3 NIC of the pfSense router, but both devices are just blinking at me.

I suppose I can just Wi-Fi into the factory-reset AC1200. Yikes, 2016 was a bad year for responsive web UIs I take it. This is horrible; I’ll pull out a netbook for this. One sec.

It seems the Archer C5 has no AP Mode. This is my problem, not yours, but I’m still going to vent.

Oh, and the “refresh” icon on the top of the DCHP Leases page in pfSense is not “refresh”, but “reload service”. Whoops.

Well, I bricked the AC1200 router. I will have to run an Ethernet cable manually… but, wait, my thin notebook has no Ethernet ports and needs a USB-NIC adapter. Happy Friday.

Tip: Connect LAN to LAN, not the AP’s WAN to pfSense’s LAN unless you want to do double NATing.

There were shenanigans, but I set the LAN IP of the AC1200 to 172.31.1.100, the ETH3 NIC IP of the pfSense router to 172.31.1.1/24, and set the pfSense DHCP service on ETH3 to assign addresses 172.31.1.101~150. What failed was setting the AC1200 to 172.31.1.2 as it was unreachable (reason unknown). Oh yes, I had to turn off firewally things and NAT Boost, and basically drop the horsepower of this TP-Link router down to that of a potato battery. The above settings allow me to access the AC1200 remotely now.

The other video ran its course, so I started following this YouTube video (set the speed to 1.5x).

There are some good tutorials on advanced pfSense... if you can sit through the ads
There are some good tutorials on advanced pfSense… if you can sit through the ads

One more thing: I installed the nmap package for pfSense and scanned the AC1200 router, and found some sneaky ports open.

Port 20005/tcp is a print server port that I’ve now closed. However, the Archer C5 AC1200 is vulnerable to all kinds of Kali mischief so it was wise to put it on its own network. I’m not sure how to close port 22 and the SSHd service on it the AC1200 because the stock firmware is ancient and crippled, so I’ll just have to block port 22 on the whole LAN segment.

I’ve also taken care of disallowing private networks to ingress on the WAN (see the next section to set up DMZ).

Top ↩


Unable to Reach 172.31.1.x from 192.168.10.x

Ping and Traceroute are aiding me in my efforts to connect to the AC1200 Wi-Fi AP from my Trusted LAN. I went ahead and added the subnet to the Symantec Firewall rules just in case (Symantec has its place now and then, but yes, definitely have available PC CPU horsepower to spare).

Configure Symantec to allow the Untrusted subnet
Configure Symantec to allow the Untrusted subnet

Now, it seems ICMP packets are no longer blocked between networks, but I still cannot ping the AP web management UI even though I can see the pings in the traffic logs.

I’ve even added an “any to any” firewall rule on the Untrusted network. No change.

Warning: If you run nmap as I did, software firewalls may detect a port scan and kick your network connection for an hour by default.

Disable Symantec port scan detection

Let’s try a stealth scan instead: sudo nmap -sS -v 172.31.1.*.

I think a port scan has been detected
I think a port scan has still been detected

Nope, pfSense doesn’t like that at all. And, the whole network stopped working. Nice security! Also, dang.

The good news is that I’ve isolated the packet malaise to the TP-Link AC1200 box itself. I suspect that I need to add net.ipv4.ip_forward=1 to forward packets with no addresses in them, but I’d need root access to the AC1200. Let’s burn it to the ground and rebuild from its sprinkler-soaked ashes.

Top ↩


Replace Stock Firmware on the AC1200 Wi-Fi Access Point

Of course, I cannot actually stop Untrusted LAN devices from reaching the AC1200 as they all exist downstream from the pfSense box.

DD-WRT open-source router firmware, meet my ancient Archer C5 and do your thing.

DD-WRT supports the Archer C5
DD-WRT supports the Archer C5

The Archer C5 did not accept the DD-WRT firmware. Hmm… how about OpenWRT?

OpenWRT supports the Archer C5
OpenWRT supports the Archer C5

The Archer C5 did not accept the OpenWRT firmware either. What the actual facepalm (WTAF)?

Wait. My hardware is revision 2 using the Broadcom chipsets which are notoriously difficult networking chips.

Careful: Devices with Broadcom Wi-Fi chipsets have limited OpenWrt supportability (due to limited FLOSS driver availability for Broadcom chips). (REF: OpenWRT.org)

Alright, so OpenWRT, DD-WRT, and Tomato projects have no firmware for this AC1200 with unpopular Broadcom chipsets. Into the refuse bin it goes.

Top ↩


Archer C5 v2 into the Refuse Bin, R7000 as the New Wi-Fi AP

I’ve dismantled the AC1200 so I do not forget why I threw it out. It’s too bad because it’s so pretty on the inside, and they always say, “It is what is inside that counts… except if you are a router with Broadcom chips.”

Inside the Archer C5 v2 with Broadcom chips
Inside the Archer C5 v2 with Broadcom chips

The R7000 is factory reset, and here is the first problem:

Tip: On factory reset, the Nighthawk R7000 is pretty uptight about the format of the password. One rule is that no more than two identical consecutive characters are allowed. Well, thanks Netgear for pretty much providing a Regex to password crackers. Let’s disable all those rules with a few keystrokes to delete the JavaScript “blocking” the form submission. Now my admin password does not conform to the Regex and is super long. Muhahaha to Netgear password crackers.

The R7000 is in AP mode, but I can still access the pfSense web management page from the Untrusted network. Let’s lock down the web UI in pfSense under Firewall Rules.

Untrusted network firewall rules
Untrusted network firewall rules

Top ↩


Set up the Trusted Wireless Network

The Untrusted network is now looking good. It’s time to make the other R7000 Nighthawk I have into a Wi-Fi AP as well so my phone and watch have a safe place to connect to, as well as a laptop when I want to RDP into my wired machines from the kitchen. I was saving that for a honeypot AP, but I can come back to that later.

Let’s see if I can Wi-Fi into the Wireless LAN’s R7000…

Tip: Remember to physically unplug the pfSense upstream router from the R7000 because the R7000 is too helpful and will enter into AP mode by sensing any upstream routers, then you cannot get into the web UI anymore.

Since only my trusted devices should be on the Wireless LAN, I’ll turn off 2.4 GHz wi-fi because anything recent and wireless should support 5 GHz. That means those pesky AliExpress Pineapple wi-fi password stealers on the cheap side only use 2.4 GHz, so a neighbour is going to have to put in some effort to snoop on my network. Plus, 5 GHz is blocked more easily by walls and concrete, so I prefer it for averting medium-range snooping. But, I so am going to set up a honeypot and to brake check my faith in humanity.

It is normally straightforward to put a Wi-Fi router into AP mode by disabling WAN and DHCP.

Top ↩


Network Devices Interconnectivity Check

Do all my dozens of computers, laptops, Pis, clusters, NAS drives, and the like still connect as before? Most important is my web-scraping bot in a hardened, RAIDed, dedicated machine with its own UPS. But alas, I cannot SSH into it even though the SSH handshake packets make it to the hefty box.

Could this be our old frienemy IPv4 forwarding being disabled? Possibly. I’m able to SSH into the machine from my iPhone (seriously) when on the same network.

Nope. Adding net.ipv4.ip_forward = 1 in the right place with a restart did not yield joy.

According to dmesg -w (to tail dmesg logs), UFW (Uncomplicated Firewall) is not blocking ICMP requests or TCP requests on port 22. When I do something nutty like try to SSH on, say, port 23, then I can see the UFW block logs in dmesg. Confirmed: Packets can reach that machine.

Running tcpdump src 192.168.10.100 where the IP is from the Trusted network on the target machine shows it is responding to pings. I’m even getting replies to SSH handshake requests. So now we know that return packets are being dropped. Interesting! Aside: tcpdump is awesome.

Let’s follow the trail. Digging a little deeper I see replies to ICMP and SSH handshakes are being sent to some IP over HTTPS that I do not recognize. Bizzare. When I run the usual ipinfo tools I see that replies are going over a VPN that I completely forgot about. Ha. Replies to a different subnet are egressing over the VPN, but cannot return properly. Neat.

VPN causes ACK packets to return over the wrong adapter
VPN causes ACK packets to return over the wrong adapter

Now that I remember what I did in 2019, I re-added NAT alias rules, and it’s showtime again.

Top ↩


Windows File Sharing Gotchas

Your path may be smoother, but I’ve always seem to make the Trench Run instead of remote-piloting a handful of lead-filled X-Wings at light speed right through the Death Star’s reactor to make it go boom: the easy way.

I’ve added some rules to allow Static DHCP devices to talk to each other – Windows devices – but by default, the Private Network in the Windows Defender uses the local subnet as the rule scope. That means different subnets are isolated. We can’t just relax the pfSense DHCP subnet mask to say 192.168.20.0/16 because it conflicts with another subnet. Instead, just to get file sharing working, I relax the scope in Advanced Settings like below. Be sure to modify In and Out for SMB and ICMP.

Windows file sharing across subnets
Windows file sharing across subnets

Again, please add whatever subnets you desire instead of any.

Top ↩


Public Service Announcement: Edge Browser

Why does the Microsoft Edge browser start automatically and run in the background, and why can’t I kill it when I ctrl+alt+del? If you’ve asked yourself this, you’re not alone. It turns out Edge starts up when you log in and it keeps running in the background. Here is the fix:

Prevent Microsoft Edge from starting or running in the background. Sneaky browser.
Prevent Microsoft Edge from starting or running in the background. Sneaky browser.

I suggest downloading Winaero Tweaker and applying registry tweaks to cut down on the Redmond Spy Machine.

Stop Microsoft from spying on you
Stop Microsoft from spying on you

Top ↩


Block Clickbait, Endless Ads, and Dangerous Sites

Thanks to web-browser and DNS-level adblockers (i.e. Pi-hole), it’s commonplace to block bad sites, crypto-miners, fingerprinters, trackers, remarketers, banners, pop-ups, fake tech-support scam alerts, and all manner of unscrupulousness designed to take advantage of you. Let’s take pfBlockerNG on pfSense for spin.

pfBlockerNG blocking ad domains with graphs
pfBlockerNG blocking ad domains with graphs

The pie chart looks great. I followed this pfBlockerNG tutorial.

Tekgru.com pfBlockerNG tutorial blog
Tekgru.com pfBlockerNG tutorial blog

This is mportant: If you have multiple network interfaces (the mini PC has four), then you need to enable the Permit Firewall Rules for multiple interfaces and select them.

DNSBL Perfmit Firewall Rules for multiple interfaces
DNSBL Perfmit Firewall Rules for multiple interfaces

Would you like to have discretion over blocklists? Let’s add a DNS blocklist related to gambling and reload pfBlockerNG to see if a poker site is blocked on the Trusted LAN.

Some sketchy poker sites are now blocked
Some sketchy poker sites are now blocked

If you would prefer the connection to just close instead of rendering a PHP page, create a new PHP script with the following code and select it in the pfBlockerNG settings page:

Top ↩


Intercept All DNS Requests, Even to Hardcoded DNS Servers

Let’s make sure all clients behind the pfSense router use the local Unbound DNS server so pfBlockerNG can act on them. We do not want apps and home assistants to bypass our DNS server, so we have to add some NAT rules.

Trap all DNS and DNS+TLS requests
Trap all DNS and DNS+TLS requests

First, we have to block DNS over TLS (for now) and only allow local DNS requests (note the rule order):

Overarching DNS rules allowing only internal DNS queries
Overarching DNS rules allowing only internal DNS queries
Note: DNS over TLS must be blocked (for now) for all clients behind the pfSense router in order to allow DNS query trapping to succeed. An iPhone may show a Privacy Warning that the network is blocking encrypted DNS traffic. That is okay because we are encrypting upstream DNS requests to Cloudflare.

We can ignore the Privacy Warning in devices behind the pfSense router

Here is a NAT rule for one interface. I started by making a rule for each interface except WAN (obviously) like this below.

Example rule to trap DNS queries on a given interface
Example rule to trap DNS queries on a given interface
Tip: NAT reflection should be disabled so the wild Internet cannot access our DNS server.

To make life simpler, I made a firewall alias of all non-WAN interfaces called Non_WAN. Covering IPv4 and IPV6 to redirect local DNS queries on port 53 to localhost are the following redirect rules:

Firewall DNS query redirect rules to localhost
Firewall DNS query redirect rules to localhost

Let’s also log trapped DNS requests. Head to the Services > DNS Resolver page, click “Display Custom Options”, and add the lines:

Well, hello there, Microsoft Windows. What are you up to trying to reach Google Tag Manager? Naughty OS. That request is now black-holed to a non-existent IP at 10.10.10.1.

Windows is trying to reach Google Tag Manager
Windows is trying to reach Google Tag Manager

Let’s turn our attention to the TV and see how it fares under DNS interception.

Top ↩


How to Restrict Apple TV and iPhone YouTube Ads?

YouTube: Regarding YouTube, YouTube has been showing 7-second and 15-second ads, twice, back-to-back, nearly every few minutes. Why are the ads incessant and so long? I do not mind the occasional ad, similar to live TV, but these frequent ads would warrant FTC complaints it they were on live TV.

YouTube is tricky because ads are also videos that come from the same domain, so domain-name blockers like pfBlockerNG cannot act on them. The best pfBlockerNG and Pi-hole can do is block googleadservices.com only after you watch an ad video and click on the ad.

Many people opt to use a web browser like Firefox or Chrome with uBlock Origin that acts on JavaScript as a workaround. It might be enough to watch YouTube on a web browser and stream that to a smart TV. However, we cannot restrict ads on the iPhone (without jailbreaking and compromising it).

What are our options? How can we safely restrict YouTube ads on all network devices?

Top ↩


Trick the YouTube Ad Algorithm Instead

Thought Experiment: Among friends, let’s say that English-speaking countries get ads for the most ridiculous things because their residents are assumed to have disposable income. Can we instead make YouTube think we are an undesirable advertising target?

What do ads in other parts of the world look like? Are those living in Antarctica or Low Earth Orbit getting a lot of ads too?

Xkcd.com: Mess with advertisers
Xkcd.com: Mess with advertisers

What would happen if we leverage the capabilities of this pfSense router to route YouTube Location Tracking information through a VPN that terminates in some remote part of the world with fewer YouTube viewers per capita? In other words, let’s make ourselves undesirable to advertisers and see if we get fewer ads.

Scotty from TNG episode 'Relics' understands the plan
Scotty from TNG episode ‘Relics’ understands the plan

Top ↩


Research into YouTube Advertizing Spend

Let’s do some YouTube demographics research to find a part of the world avoided by advertisers.

Mobile advertiser spend by country in 2020 (REF: statista.com)
Mobile advertiser spend by country in 2020

Let’s also check some YouTube statistics about viewers by country for insights. Thinking about following some Reddit advice and VPN’ing into India? Think again.

Total YouTube views by country in 2019 (REF: ChannelMeter)
Total YouTube views by country in 2019 (REF: ChannelMeter)

That was 2019. This is 2020:

Top ten YouTube countries with population (REF: backlinko.com)
Top ten YouTube countries with population (REF: backlinko.com)

I’m not a digital advertiser, but I can see that people in the UK and Canada watch a large number of videos per sitting. If I were an advertiser though, I’d pump those two countries with video ad after video ad because, statistically, those residents will take the eyeball kicking. All things being equal, I definitely need a VPN to terminate outside of Canada, the UK, and the United States (English-speaking countries) to enjoy YouTube more.

Does age play a factor? Who don’t advertisers want? I want to be that guy on paper.

YouTube age demographics as of 2020 (REF: backlinko.com)
YouTube age demographics as of 2020 (REF: backlinko.com)

Top ↩


New Goal: Let’s trick YouTube into believing I am a 70-year-old male living in Italy. Yes, that should definitely cut down on the Nespresso and Starbucks ads, at least.

How then to convince YouTube that I am a retired Sicilian living on a small chain island? I embellished that last part. Seventy and in Italy is sufficient.

Let’s do this. In the YouTube account…

I am 71 years old
I am Iron Man 71 years old
I am in Italy
I am in Italy

It is doubtful that this is all it takes for our goal. Let’s find a VPN exit point in Italy.

NordVPN has 60+ servers in Italy
NordVPN has 60+ servers in Italy

Nice. NordVPN has about 60 servers in Italy (that’s an affiliate link by the way).

Top ↩


Selectively Route Apple TV Over the VPN

Let’s go through some tutorials to set up OpenVPN in pfSense. Just kidding! We’re going to use WireGuard – we have the Intel AES-NI crypto instruction set because we didn’t go cheap and get a yesteryear J1900 mini PC that sellers are trying to offload.

I’ll now install the FreeBSD WireGuard package.

Install the WireGuard package in pfSense
Install the WireGuard package in pfSense

Next, add a tunnel and enable it. According to this thread and this thread on Reddit, we need to get some information for WireGuard and NordLynx from a sacrificial Linux VM to transpose the settings (i.e. private key) to the pfSense router. No problem.

Connect to Italy over WireGuard
Connect to Italy over WireGuard
WireGuard config information via wg show
WireGuard config information via wg show

Run sudo wg showconf nordlynx to see your private key needed by the pfSense tunnel config.

Here are various screenshots that show the steps in more detail.

VPN > WireGuard > Tunnels
VPN > WireGuard > Tunnels
VPN > WireGuard > Peers
VPN > WireGuard > Peers
Tip: Enter 1.0.0.0 and then 0 as the subnet mask. Do not go for 0.0.0.0 as there is a glitch or bug in the UI or whathaveyou. The result will still be 0.0.0.0/0.
VPN > WireGuard > Settings
VPN > WireGuard > Settings
Interfaces > Interface Assignments
Interfaces > Interface Assignments

That should be enough to allow Diagnostics to curl Italy.

Successfully connected to NordVPN through WireGuard on pfSense
Successfully connected to NordVPN through WireGuard on pfSense
Successfully connect to Italy and verified
Successfully connect to Italy and verified

Now that the easy part is out of the way, let’s set some Policy rules to send the Apple TV traffic over the VPN to Italy as a baseline test.

From Netgate, on the order of Firewall/NAT processing:

Traffic from LAN to WAN is processed as described in the following more detailed example.

  • Port forwards or 1:1 NAT on the LAN interface (e.g. proxy or DNS redirects)
  • Firewall rules for the LAN interface:
    • Floating rules inbound on LAN
    • Rules for interface groups including the LAN interface
    • LAN tab rules
  • 1:1 NAT or Outbound NAT rules on WAN
  • Floating rules that match outbound on WAN

I’ll make an alias, for now, to hold some clients that have static DHCP entries and hostnames I gave them in pfSense.

VPN clients in the Firewall > Aliases > IP page
VPN clients in the Firewall > Aliases > IP page

Floating rules in have high precedence, so I’ll add some rules below the automatic pfBlockerNG rules that were created, and I’ll add a nice little blue separator while I’m here.

Floating rule to route select clients over the VPN to Italy
Floating rule to route select clients over the VPN to Italy

And here is that rule as a very long screenshot:

Firewall > Rules > Floating rule to route select clients over the VPN
Firewall > Rules > Floating rule to route select clients over the VPN

Apply. Wait. Let’s try it out using one of my notebooks connected to the Untrusted network.

Google is entirely in Italian now
Google is entirely in Italian now

Google is in Italian. Very cool. Now for the Apple TV.

Apple TV's YouTube reports I am in Italy
Apple TV’s YouTube reports I am in Italy

Winner winner, chicken diner. All my YouTube is in Italian. I get some ads, not as many, but because Italians speak slowly and with a kind of sexy accent I do not mind the ads for Nutella at all.

With this technique, I no longer feel manipulated by non-English ads. I have personalized ads off, but given my new status as a retired gentleman I should turn that back on to scare away advertising dollars, er, euros. I wonder if Netflix and Amazon Prime behave any differently…

Some Netflix assets are being blocked
Some Netflix assets are being blocked

Dang. Netflix is having problems. Amazon Prime is even worse. It looks like some CSS or font files are blocked as well, and the thumbnails aren’t loading. It’s time to move to Phase Two: Tunnel only YouTube traffic over the VPN.

Warning: Do not try to send all the Apple TV traffic over a VPN because Netflix, Prime, and others are wise to VPN providers and have gotten great at geofencing.

Top ↩


Selectively Route Apple TV YouTube Traffic Over the VPN

Let’s start by adding Firewall Policy rules to send the most common YouTube domains over the VPN.

As I’m about to add the rules, my hands hover over the keyboard not knowing what domains to tunnel. They need to be FQDN (fully-qualified domain names, no wildcards). Let’s open up a Chromium-based browser and see what traffic it generates in DevTools.

Add the domains column to DevTools to see where YouTube calls
Add the domains column to DevTools to see where YouTube calls

Here are some candidate FQDNs to add:

But wait, I hear you ask, why accounts.google.com and gstatic.com? This is a preventative measure just in case one of those domains is geo-jacked (Geo-IP LowJacking). I wouldn’t put it past Google engineers to geo-jack the fonts domains like fonts.googleapis.com, but I’ll take a chance they don’t in the interest of scaling to billions of page views efficiently.

Here are my new rules where I chain two of them using a tag so I can limit YouTube tunnelling to only the same untrusted machines (including Apple TV).

YouTube domains to tunnel
YouTube domains to tunnel
Use a match rule before the tunnel rule
Use a match rule before the tunnel rule
The first rule matches VPN clients and tags them
The first rule matches VPN clients and tags them
The second rule tunnels tagged requests through the VPN
The second rule tunnels tagged requests through the VPN

And with that, YouTube thinks I’m in Milan, Netflix and Prime Video think I am still in Canada, and the ads… oh the ads… they are few and far between, and when they do come on, they are just a treat to listen to in that slow, lack-of-harsh-aspirants-or-yelling of a beautiful language Italian is.

YouTube, Italy
YouTube, Italy

Top ↩


Time goes by…


Gotcha: DNS Race Condition

A day has gone by and I’ve noticed that I only get Nutella and Ferrero Roche ads in the middle of videos, not at the start. Odd. I did some research and this is what I found:

Pertinent information about pfSense and hostname aliases
Pertinent information about pfSense and hostname aliases

This means that the hostnames are resolved to IP addresses once and those IPs are used in my VPN tunnelling policy rules.

A hostname entry in a host or network type alias is periodically resolved and updated by the firewall every few minutes. The default interval is 300 seconds (5 minutes), and can be changed by adjusting the value of Aliases Hostnames Resolve Interval on System > Advanced, Firewall & NAT tab. – pfSense

Ah-ha, so I suspect there is a DNS race condition. Let me explain:

This happens if, say, the Alias Daemon updates the IPs of the FQDNs. Then, I turn on the Apple TV for the first time all day. Since the usual TTL (time-to-live) of DNS queries is 1440 seconds (30 minutes), all the YouTube DNS entries will be cache misses and will need to be updated. At this point, the IPs from the second DNS queries may be from a pool and are not guaranteed to be the same that the Alias Daemon has. When the Alias Daemon checks again in five minutes, it may resolve the FQDNs to yet different IPs!

Let’s solve this by overwriting whatever TTL (time-to-live) YouTube has in its DNS entries:

Do not abide by the minimum TTL of the target DNS entry
Do not abide by the minimum TTL of the target DNS entry

And with that, no more DNS lookup race condition.

Top ↩


Gotcha: Authentication Trouble, Forbidden 403 Error

Sometimes videos will not play. For security, YouTube embeds your IP in the googlevideo.com request. I’ve known about this since my post about Download YouTube 4K Videos with PHP back in 2016. The new problem is that various JavaScript and “are you human?” assets are tunnelled over VPN, but those darn domains like r5---sn-hpa7kn76.googlevideo.com are not tunnelled and thus come from the wrong IP. Queue the 403 Forbidden error.

YouTube authentication failure
YouTube authentication failure

Let’s fail fast with a quick experiment: I’ve gotten the IP of the above second-level domain name (SLD), added it manually to the list of domains/IPs to VPN tunnel, applied the change, and refreshed YouTube:

Success. We need to route the mangled domains over the VPN as well.
Success. We need to route the mangled domains over the VPN as well.

Excellent. Now, we just need a way to tunnel that wildcard *.googlevideo.com domain. Unfortunately, the NAT and Firewall rules work with IPs, not wildcard domain names. Can we predict or enumerate these domains?

Here is a Wireshark capture of DNS requests to *.googlevideo.com to show that the SLDs (second-level domains) are not eyeballably predictable:

Random SLDs from googlevideo.com
Random SLDs from googlevideo.com

Let’s drop into a web browser with adblocking disabled and walk the HAR waterfall of my interaction with YouTube that led to ads showing up.

Waterfall showing ad interactions coming from www.youtube.com
Waterfall showing ad interactions coming from www.youtube.com

What are GET requests like

GET https://r7---sn-uxa0n-t8ge.googlevideo.com/generate_204

doing, exactly? I’ll give this problem some thought offline.

Top ↩


Gotcha: YouTube is Now Showing UK Ads, Not Italian Ads

Before I could even solve the previous gotcha, British ads started showing up with the same frequency as if we did nothing. Ads from the UK are even more incessant than those from Canada, trailing behind the USA and India according to my earlier stats. It would be a complete failure if we get UK ads. Why does this happen suddenly? I’ve opened a fresh browser in a VM and tunnelled all traffic through Italy. The only leak I can find is when I query ipinfo.io on my Italian tunnel and see a UK address in the ASN. Could this small leak be our undoing?

It is possible the VPN is leaking unintended information
It is possible the VPN is leaking unintended information

Even with my browser’s language set to en_US and location data off, this is the only leak I can spot. Then, in addition to a VPN exiting in Italy, it has to be one that doesn’t leak ASN (Autonomous System Numbers – used for automated routing) that gives up a different country. Dang, Google, you’re good. I’m going to have to bring my A+ game to this one.

Top ↩


Find a VPN Exit Node with no ASN Leak

By visiting https://nordvpn.com/servers/tools/, I can see the VPN endpoint nodes in Italy. There are many Wireguard endpoints with NordVPN. Just to move things forward for this exercise, I’ll add an OpenVPN tunnel in pfSense and connect to several VPN nodes and examine the ASNs. It’s better than nothing, and more importantly, I’d like to eliminate the ASN as the leak of GeoIP information. Here is the guide I used.

Through trial and error, I found a VPN node that is registered to an ISP in Italy as found in the Abuse and ASN info.

Found an exit node with no ASN leaks
Found an exit node with no ASN leaks

Beautiful. Bellissimo.

Italian content with Italian ads again
Italian content with Italian ads again

Top ↩


Hijack Google Video DNS Queries

To make any of this work, I need a technique to route the wildcard *.googlevideos.com domain through the VPN.

Thought Experiment: Suppose I write a plugin for pfSense that periodically greps the DNS query log, keeps track of the *.googlevideo.com queries, and adds them to a unique list of aliases for Google Video domains; if backed by an LRU eviction policy, this could keep working indefinitely. However, if each video uses a unique, mangled domain, then this does not work unless I hit refresh on every single video.

On the other hand, if I “hold up” the DNS query for the *.googlevideo.com domains, add the IPs to some alias list, then allow the DNS response to finish the round trip, we may be in business!

pfSense DNS resolver has user Python support
pfSense DNS resolver has user Python support

Where to even start? Here are some Python example scripts just to get some inspiration. A quick, mental reverse-engineering of a handful of scripts reveals that there are some event hooks available. Nice.

Among friends, let’s say that I can build up the pool of Google video IPs in real-time. How then to add these IPs programmatically to the firewall alias list for YouTube without restarting the firewall? One person actually hacked the PHP scripts in pfSense. Tempting, but I’ll do more research. Another person created a REST API for pfSense. Jackpot!

Top ↩


New Goal: We need to add IPs to the firewall policy rule to route YouTube videos over a VPN to avoid incessant and obnoxious North-American ads, but the IPs keep changing due to changing, mangled second-level domain names (SLDs). Using Python 3 and a REST API, we will monitor the appropriate DNS queries, note the IP(s) of the response, hold the response, add the IP(s) to the VPN tunnelling policy rule, then release the DNS query response.

Research Python Methods to Hijack DNS Requests

Why this approach? It’s future-proof, modular, elegant, maintainable, automated, and it lends itself to a future decision tree that could truly restrict YouTube ads outright.

First, I will enable SSHd in pfSense and take a peek around.

Enable SSHd in pfSense
Enable SSHd in pfSense
SSH into pfSense using the GUI credentials
SSH into pfSense using the GUI credentials

Rsync Disk Backup

Let’s take this opportunity to make a disk backup. du -h or “duh” shows that only 800 MiB is in use on the SSD. Let’s rsync the whole box from our local machine in about four minutes.

Tip: To verify the owners and permissions are set in the extended attributes locally, run
getfattr -d -m ^ -R -- ~/.pfsense-backup

Install pfSense REST API

Now that we have a pfSense backup (I’m told just backing up config.xml works too), let’s install the REST API.

This part had me confused. You see, I was looking at the bottom of the screen wondering how the heck I could copy a truncated hash as a token. After a few tries, I noticed the green message at the top that I had been trained to ignore. It has the token.

Tricky UI screen to get the API token
Tricky UI screen to get the API token

Next, with the API credentials set up, let’s try out the API:

Successful API test
Successful API test

Explore the Unbound Python Module

Running find / -name "py*" shows that the current version of Python is 3.8.

As for the Unbound DNS Resolver, I had some luck tinkering in nano and writing simple Python 3.8 code to log DNS query messages. We now have both parts needed to dynamically update the firewall aliases and tunnel all YouTube traffic once and for all.

If you are looking for Python module docs for Unbound, here they are:

There are no readily available Python module docs for Unbound
There are no readily available Python module docs for Unbound

Run these commands to quickly get the documentation.

Warning: The example code is from Python 2.4, so be prepared to run Black and PyCharm code formatting, or run 2to3. Also, the most important part of this whole exercise (getting the IPs from the DNS reply) is missing, so here is the hint: import ipaddress. Don’t forget to manually hack the byte strings to pull out the proper IP addresses in binary form, first.

Now we have Python docs and access to all the capabilities. Excellent.

Successful generation of Unbound Python docs with Sphinx
Successful generation of Unbound Python docs with Sphinx

Next, take a backup of your OS or VM and install libtools and swig wherever, ./configure --with-pythonmodule, make, fix some errors in the Unbound code, make again, then you’ll have the generated python module (unboundmodule.py) in order to remove all the missing-method red error lines in PyCharm.

PyCharm can now find the missing methods we don't actually need to worry about
PyCharm can now find the missing methods we don’t actually need to worry about
First successful DNS response logging script
First successful DNS response logging script

Top ↩


Smoke Test: A Python DNS-Hijacking Script

Here is a smoke test of the ability to hijack *.google.com DNS requests with reply IPs that the script has caught in just a few minutes (the timestamps are just to maintain a crude LRU cache):

Smoke test for collecting IP addresses of *.google.com
Smoke test for collecting IP addresses of *.google.com

Duplicate IP addresses are possible, and that is fine. I let the smoke test run overnight. Here is the PoC (proof of concept) script I ran as the Unbound Python module script.

When I woke up, the Unbound DNS resolver service segfaulted. Here are the logs:

We can see a full FQDN alias re-process on each firewall config update
We can see a full FQDN alias re-process on each firewall config update
Failure: Capturing all the IPs from the DNS queries to *.googlevideo.com and *.google.com puts pfSense into a crawl as all the rules need to be reloaded on each addition.

Top ↩


New Goal: Research and install a Squid-like proxy, create a fake-but-trusted CA certificate, host it, install it in a browser as a PoC, decode TLS traffic, and victory dance.

Actually, it is not illegal to jailbreak most Apple TV boxes, so we could break in, add a root certificate valid for the pfSense box, MITM traffic from the Apple TV, and then Microsoft Bob is your uncle. That works because the pfSense box as the gateway can decrypt Apple TV traffic, inspect the request headers for the offending ad hostname, block the request, and re-encrypt other valid requests to Mountainview, California.

But, then my iPhone would still show ads because it is harder to jailbreak, plus banking apps may detect this and not work anymore. Jailbreaking is too extreme, anyway.

Fun fact: I used a jailbroken iPhone all the time in Japan because of a quirky cellphone law. You see, because of icky perverts who like to take photos inappropriately on elevators and escalators, Japan passed a law that made the camera shutter sound mandatory on all photos.
 
Super unfortunate was that taking a screenshot of a web page also made the same loud, unmuteable shutter sound. Imagine you are on a train and you screenshot a Google map, it makes that loud shutter noise, and then you get dirty looks from the train riders. Yeah, I had to jailbreak and zero out the camera sound file.

Let’s see what it takes to spy on the HTTPS traffic from the Apple TV and iPhone to see if we can block ad URLs that way.

Top ↩


Install a Fake-but-Trusted CA Cert on Apple TV and iPhone?

Not wanting to jailbreak and add self-signed certs to Apple TV and iPhone, how hard would it be instead to add fake-but-trusted Certificate Authority (CA) certificates to each device?

The ‘A’ in CA means there is no one higher to vet such a certificate. The ‘A’ is so powerful, that back in 2001 only a Windows patch was able to revoke some dangerous Verisign certificates. As a thought experiment, new CAs must come into existence from time to time. Let’s Encrypt is relatively new, for example. There should then be an in-warranty way to get a fake, trusted CA cert into an Apple TV and iPhone. If that is possible, then an entire world of MITM spycraft is available to decrypt TLS packets in the clear and use good ‘ol URL blocking on requests like

Let’s see how easy this would be.

We can add fake, trusted CA certs to iPhone too
We can add fake, trusted CA certs to iPhone too

In fact, there are many, many CAs. Here is a quick find / -name "*.pem" in pfSense:

Many CAs exist already
Many CAs exist already

Top ↩


Experiment with Squid and SquidGuard

I’m aware of mitmproxy, but it needs to be side-channel installed onto the pfSense router. Let’s see if the squid3 proxy that is available as a pfSense package can do what we need. First, I will take a bare-metal backup again so I can roll back in case mitmproxy is better.

Install squid3 and ancillary packages
Install squid3 and ancillary packages

I’ve installed those packages, and naturally, there are more buttons and options than in a space shuttle. I’ll find a guide.

I’ve followed the steps in the guide, however, since I have a large SSD and generous RAM, I’ve made a dedicated folder /squid_cache (and chown squid:proxy) with 8 GiB of cache and a juicy allowance on the per-item cache size which should also help with Docker and NPM speed-up. Two birds, one stone. With Transparent HTTPS support, this should be pretty rad.

Tip: If web traffic slows down while using Squid, here are some System Tunables that can make Squid faster (ref):

vfs.read_max 128
kern.ipc.nmbclusters 32768

Also, for local disk cache, aufs is asynchronous ufs (great for Docker too) and uses POSIX-threads to avoid blocking the main Squid process on disk-I/O.

We can actually generate a CA cert in pfSense itself.

Generate a CA in pfSense
Generate a CA in pfSense

Now, how to get it into the Apple TV and iPhone? It should be hosted somewhere, right? How about on the router?

Top ↩


Self-Host the MITM CA Certificate

Self-hosting with a single command is ridiculously easy. From the SSH shell into pfSense, I can create a web folder and server like so:

When I visit //pfsense:8000 I should get a blank page with “Hello”. From here, clients behind the pfSense router can temporarily access static documents.

To make like easier, here is a PHP script to cause the MITM cert to download.

As another smoke test, I’ll add the MITM CA to Chrome (manually) and enable the SSL Filtering. The defaults are fine in Squid. Here is the log file when I visit https://ericdraken.com:

Successful capture of TLS requests from a downstream client
Successful capture of TLS requests from a downstream client

Excellent.

However, on every other browser and machine there are HTTPS errors like so:

MITM certificate errors if the CA cert is missing
MITM certificate errors if the CA cert is missing
Locked out? If you get locked out of pfSense with a TLS error, you may have to disable Remote Cert Checks as the pfSense web configurator uses a self-signed certificate. Or else, you can bypass the proxy for the pfSense UI under Bypass Proxy for These Destination IPs with pfsense; pfsense.localdomain.

Top ↩


Abandoning Squid: Too Slow, Too Heavy

After a day of painfully setting up Squid and SquidGuard and adding blacklists and even manual regex for things like .+?/pagead/.+, I’m having nothing but issues with Squid. Here are the top pain points:

  • It’s slow. It’s really slow.
  • The ACL (Access Control List) settings are cumbersome.
  • There is an issue with https://http/* (ref).
  • The SquidGuard URL filter takes eons to update a list.
  • The Squid UI is unbelievably lacking.

Squid makes me sad. I don’t get sad, but Squid makes me sad with its promise and ultimate letdown. I’ve now obliterated Squid and restored the router from the rsync backup I made earlier. Here is a handy little script to show a diff of what has been added by Squid and related packages.

Rsync Diff of Changes

The output is something like this under the --dry-run option:

Top ↩


Install MITMProxy in a FreeBSD Jail

Even though written in Python, I’ll give mitmproxy a try next; at the very least it can be purpose-built to block YouTube ads with its rich API and Python-hook extensibility. It was a coin toss between mitmproxy and SSLSplit – a Metasploit hack tool – to achieve on-the-fly TLS interception, but the former can be scripted with Python and has a satisfying UI. Let’s go.

Careful: Please read the whole section before trying any commands because I backtracked a bit but want to explain why.

You’ll notice that there are only three binaries about 24 MiB each. As I understand it, they have a self-contained Python 3 environment and frozen dependencies. I’d like to jail these binaries because, well, because. First, let’s see if there is a vulnerability report for mitmproxy at vuxml.freebsd.org. Nothing. How about at Exploit-DB? Nothing again. Good.

First, what version of FreeBSD is this pfSense install?

Now, according to this guide, I’ll need to set up jails myself as they are disabled in a default pfSense installation. Not knowing FreeBSD at all before today, I had to hack around to find a URL to download the ezjail package manually. After another bare-metal backup, here are the steps I took:

We need to do some hacking to get jail working on pfSense’s take on FreeBSD because jail is missing completely. What I’ve done is copy the jail binaries from a jail (via ezjail) back to the root system.

Let’s set up a jail for mitmproxy.

This is very important: We must enable raw sockets in this jail to allow transparent proxy mode to work. If not, MITMProxy will report errors like “Transparent mode failure: FileNotFoundError(2, ‘No such file or directory’)” or “Cannot open connection, no hostname given.” This is because raw sockets are inaccessible and server information is unavailable. We can easily edit the ezjail config file per jail like so:

This is also very important: MITMProxy calls sudo -n /sbin/pfctl -s state but there is no sudo in jail. Run pkg install sudo inside the jail.

Sanity Check: If you are unsuccessful when you run ping 1.1.1.1 inside the jail, you may get an error like this: “ssend socket: Operation not permitted”. If you are successful, then ping works as it needs access to raw sockets.

Now we can copy over the mitmproxy binaries and take them for a spin.

Things are getting tricky with this next part. Running any of the binaries above results in:

So, there is no /lib64 folder nor any similar dynamic linker that I could find. I tried this, however:

Apparently, there is a pkg install compat6x that can solve this for us (unavailable on pfSense), however, this is getting ridiculous! Let’s try a new tactic. Since we are in a jail, we are not bound to the crippled (read: secured) pfSense environment. Maybe we can install the mitmproxy package normally in a jail?

pkg install mitmproxy

And, Bingo was his name-o. After this, simply running mitmproxy in the jailed console opens the MITMProxy UI. Nice. Note, this version may be one or two minor versions behind the master branch. Let’s clean up with rm -rf ~/mitm* /lib64 and do another bare-metal backup.

Top ↩


Exploring MITMProxy

This is getting exciting. First, in pfSense, add a virtual IP for 127.0.1.1 attached to localhost. Then, add a NAT rule to temporarily forward port [Private IPs]:8080 to 127.0.1.1:8080 to access the proxy from the LANs.

If not in the jail console, I’ll run

and add the proxy setting 192.168.20.1:8080 to my sacrificial notebook (that is auto-wiped daily). When the browser opens, we can already see colourful log entries in the MITMProxy UI.

First logs of MITMProxy
First logs of MITMProxy

The next step is to get the auto-generated CA PEM file used by MITMProxy (~/.mitmproxy/mitmproxy-ca-cert.pem). Since any CA cert here is snake oil, I’ll use the provided one. TLS traffic from my devices is safe as long as I use my own proxies.

Let’s put our experience from our previous attempt at self-hosting a CA into action. However, there is no PHP in the jail, so we can use a Python 3 web server instead.

Tip: MITMProxy conveniently has onboarding settings to serve the same CA cert, as we did manually, just by visiting mitm.it.

After installing the CA in the Trusted Root Store on my clean notebook (and rebooting), I am treated to this display:

MITM TLS interception is working well
MITM TLS interception is working well

Let’s see if we can get this cert on my iPhone.

Successfully added a root CA to the iPhone
Successfully added a root CA to the iPhone

This is incredibly exciting. Can we LoJack the Apple TV box next?

Successfully installed a root CA on the Apple TV
Successfully installed a root CA on the Apple TV

Excellent.

But wait, the router is slowing down. mitmproxy is burning up the CPU… on idle.

MITMProxy is burning up the CPU while on idle
MITMProxy is burning up the CPU while on idle

Of course: Python is a single-threaded paradigm with the GIL (Global Interpreter Lock) ensuring threads do not actually run concurrently – unless they are blocking on I/O, which is the case here(?). Except, most of the CPU work is to generate TLS certs on the fly for each request. Yikes. Running mitmdump forgoes the UI and extreme logging. The extreme logging of all the headers and full responses heavily slows down mitmproxy, but mitmdump by default only logs entries like classic Apache logs – much kinder on the CPU.

Certificate Pinning Some advanced, high-security web servers have trouble with the MITMProxy certificates due to Certificate Pinning – this is a technique where the server or the client know the fingerprint of the expected certificate in advance so it cannot be forged. A workaround is to use the --ignore-hosts option to let them bypass the proxy.

For my fun, I’ll go with this CLI command:

While on YouTube, we can see the page ads clear as day with their unencrypted headers; can a simple regex now block them? They are exposed, and afraid, and their days have run out.

MITMProxy can see the YouTube ad URLs
MITMProxy can see the YouTube ad URLs

We can even see details about each request. For example, all the SAN info is laid out for this wide-reaching certificate. There are curiously a lot of *-cn.com domains covered by this cert.

We can see rich request and response details
We can see rich request and response details

Shortly, I’ll write a Python script to block YouTube /pagead/ URLs.

Top ↩


Patch MITMProxy Source Code for Server SNI Interrogation

This step may be optional for most, but as a reminder to myself, to make --allowed-hosts work better in Transparent Proxy Mode, the SNI of the server request needs to be checked against the list of regular expressions or else only the server’s IP is used for matching in many cases. Here is a quick patch I made that can be applied directly in the jail shell (or just type a few lines manually) for mitmproxy version 7.0.4:

With the above patch, I can now reliably intercept a few hosts and let all others pass through.

Reliable server host interception in MITMProxy transparent proxy mode
Reliable server host interception in MITMProxy transparent proxy mode

Top ↩


Smoke Test: Intercept YouTube Ads with MITMProxy

After reading the docs and navigating the mitmproxy source code in the PyCharm IDE, I’ve written a little script to block ads and tracking URLs coming from YouTube from my clean notebook. I won’t reproduce the code just yet because it didn’t succeed in blocking ads as hoped, so instead, I’ll spend the time investigating why.

Here are the smoke test filters I used where for a given top-level domain, URLs with the following partial strings are blocked:

My initial results on blocking are positive. Everything I wanted to be blocked is faithfully blocked. Note, the (failed) entries are due to my script, and the 502 failures are due to pfBlockerNG black-holing the request.

MITMProxy blocking script is working
MITMProxy blocking script is working

Even in the DevTools network panel, the requests are truly blocked.

YouTube requests are truly blocked in DevTools network panel
YouTube requests are truly blocked in DevTools network panel

Then how come I am still seeing ads? I’ve disabled HTTP/2 so that subsequent requests on the same channel don’t slide by. Mind you, sometimes the ads skip on their own, or fail to play, but they still show up. Interesting. Could YouTube be using WebSockets? I need some inspiration, so I’ll look at uBlock Origin’s regex filters for some ideas.

Tip: If you see the error OpenSSL Error([(‘SSL routines’, ‘ssl3_read_bytes’, ‘tlsv1 alert internal error’)]), then the DNS blocker (i.e. pfBlockerNG) is breaking the upstream TLS handshake for a given domain. Either whitelist it in pfBlockerNG (so the request goes through), or intercept it and block the connection in mitmproxy. This error happens to black-holed domains when the upstream TLS cert cannot be sniffed. The cleanest strategy is to use transparent MITM mode.

Top ↩


Examine uBlock Origin Regex Patterns for Inspiration

Here are some of the regex/filters that uBlock Origin uses on YouTube.

uBlock Origin YouTube regex/filters from a web browser
uBlock Origin YouTube regex/filters from a web browser

At first blush, it seems that a community of like-minded individuals is playing whack-a-mole with YouTube’s HTML and JavaScript. This has got me thinking: How does a video know to play an ad with JavaScript?

How does YouTube know if the ad converts? They must target ads for individuals, so a given video must receive some unique information about an ad, such as the click link and alt text. WebSockets would be a pain to maintain, especially with all the mobile clients. They must be using stateless JSON to relay that pertinent information in an innocuous URL request that has no telltale signs of ad-ness. Let’s hunt for this info in the JSON replies captured by mitmproxy.

Key advertizement information contained in a JSON response
Key advertizement information contained in a JSON response

Snap, Crackle, and Pop. We have a new plan: surgically alter the JSON response body to eliminate or Byzantine-up the ad information.

Top ↩


Surgically Alter the JSON Response to Remove Ads

After a bit more playful exploration, a trove of blocklorne URLs is right there in the JSON payload. In fact, most of what I am trying to block shows up right here:

However, YouTube has bobby-trapped their UI and there is more than one way their obfuscated JavaScript code can pull down the ad details.

Let’s blow it all away right now.

After a lot of fun taking apart the YouTube UI and HTTP workflow, taking into account cookies and naughty service workers, I am successfully able to strip away all the pre-roll, post-roll, mid-video, and, well, all the video ads. Here is a screenshot from mitmdump showing how select REST queries are intercepted, decrypted, modified, put back into the response, and the headers updated (content length, etc.).

Success in removing YouTube ads via decrypted JSON responses
Success in removing YouTube ads via decrypted JSON responses

With this new ability, we could even inject JavaScript into the main YouTube web page and subvert their JavaScript in a sort of ECMAScript arms race, possibly even leveraging some of the filters from uBlock Origin. However, we can hang our hats on this accomplishment for today.

Success: We can strip out ads from the JSON payload for YouTube web ads using a router.

Top ↩


The iOS YouTube App Uses Protobuf, not JSON

I can see very similar data in the Protocol Buffer (Protobuf) version of the same API calls as the web version to that of the YouTube iOS app. That complicates things, somewhat: We cannot lean on JSONPath to hunt down advertisement sections of JSON because with Protobuf the keys are just numbers that can even change.

The iOS version of the YouTube app uses Protobuf
The iOS version of the YouTube app uses Protobuf
Fun fact: YouTube compiles a large list of all the ads you are going to see and sends that to you in a sneaky payload. In fact, it is easier to visualize this when reading Protobuf. If you manage to exhaust that list, then another large list will be coming your way.

I can see strings like “Telus” and “Samsung TV” and “Boxing Week” and “Buy now”. Remember when YouTube was a fun place? A fable about a Golden Goose comes to mind, Alphabet.

What is a Protocol Buffer? Here is an infographic from Data Science Blog.

Protobuf introduction
Protobuf introduction (Credit: Data Science Blog)

As a consequence of being able to see unencrypted traffic from my iPhone, I’m taken aback by the sheer amount of tracking information laid bare; It’s like I have electrodes on my head and chest while I’m running on a treadmill and a bunch of scientists in white lab coats with clipboards are standing shoulder-to-shoulder recording everything about my internals.

Privacy concern: Your apps are tracking you like crazy: what you do, how long you dwell, when you leave a given app, and so much more. The URL https://play.googleapis.com/log/batch shows up a lot in my logs.

The next question is: Does the iOS app protocol behave like the web app?

Top ↩


Timing Analysis to Detect Ad Videos?

The iOS network traffic is not like the web traffic; Google has teams and teams of engineers dedicated to making sure blocking their ads isn’t computationally feasible. Daunted but undeterred, I was staring at network requests to let my mind zone out and wander when I noticed a pattern I had not noticed before.

For the web version of YouTube, I can eyeball which URLs are ads and which are the videos I want to watch. Take a look:

Which are ad videos and which are content videos?
Which are ad videos and which are content videos?

How am I able to eyeball which video URLs are ads in this chaos?

Two ad videos between content videos
Two ad videos between content videos

Take a look at the query parameter range. For the web version, a chunk of the video I want is fetched from the 0th byte, then immediately another video is fetched with a range starting again at the 0th byte. Both happen near-simultaneously – faster than a human can click on a new video. It turns out this, as well as examining the clen parameter for the length of the full video (short videos are likely ads), can reasonably allow us to detect and doctor ad videos.

However, the iOS YouTube protocol does not use the range query parameter or even the Range header; video chunks use a counter like &nr=2 and &nr=3 etc. We must reverse engineer the Protobuf responses.

Top ↩


Decode the YouTube Protobuf Responses

Here are some decoded Protobuf log files I created then opened in the PyCharm IDE.

Let's examine some Protobuf logs in the IDE
Let’s examine some Protobuf logs in the IDE

After logging decoded Protobuf messages to disk for offline analysis, I did notice something that piqued my interest.

I wonder what would happen if I were to, say, toggle those? This is tantalizing, but it is cheating, and hence no fun. Back to heuristics.

Thought Experiment: As with JSON, can I blow away the Protobuf sections that serve up ads? Could I instead detect the ad videos in the payload, then dynamically modify their responses to be, say, a cached 0.01s video file? The 30s ~ 300s of unskippable ads could be over in the blink of an eye without blocking all those URLs.
Intercepted ad URLs from the Protobuf payload
Intercepted ad URLs from the Protobuf payload

Let’s start by blocking the ads as intended.

Top ↩


Ad URL Polymorphism

The Protobuf responses are a hot mess of bytes, but there are human-readable URLs that can be grepped.

You’d think a simple LRU cache that blocks soon-encountered ad URLs could be the way to go, but, alas, the ad URLs do not quite match the URLs sent over the wire. Also, who is to say that YouTube won’t randomize the position of query-string parameters one day? We need an O(1) lookup of flagged ad URLs that are polymorphic (and group homomorphic) to live ad URLs.

Detected ad URLs vs intercepted ad URLs
Detected ad URLs vs intercepted ad URLs

It might be tempting to split a query string into a sorted dictionary and reassemble it, but we have no way of knowing what the query string boundary is. Plus, a live ad URL could add a key and disrupt the sorting.

Addionally, I’ve encountered URLs like this that purposely try to obfuscate the query params:

https://r4—sn-vgqsrns6.googlevideo.com/videoplayback
/expire/1640607416
/ei/WFrJYdWnFfyTsfIP4s2BsAk
/ip/121.35.98.26
/id/o-AE7swWOPOwXu3GyRght
/source/youtube
/requiressl/yes
/mh/wU/
mm/31,26/…

Notice how /ip/121.35.98.26/ is just &ip=121.35.98.26?

I propose heuristically scanning for query and path parameters of ad URLs with high entropy and using those as keys (fingerprints). For example, in

https://rr6—sn-uxa0n-t8gz.googlevideo.com/initplayback?source=youtube
&orc=1&oeis=1&c=IOS&oss=1&oda=1&oad=5500&ovd=5500&oaad=11000&oavd=11000
&ocs=700&oputc=1&oses=1&ofpcc=1&osbr=1&osnz=1&msp=1&odeak=1&odepv=1
&osfc=1&id=58cc678216d6aaca&ip=121.35.98.26&initcwndbps=2125000
&mt=1640373902

One could note the following candidates in descending order of length:

  • rr6—sn-uxa0n-t8gz
  • 58cc678216d6aaca
  • 121.35.98.26
  • 1640373902
  • 2125000

Any or all of them could be lookup keys each pointing to the same dictionary of deconstructed query parameters. A lookup of a live URL would involve the same process of finding the highest entropy parameters and checking the URL dictionary for a match. The cache data structure can even be multi-level with the root keys being just the length of the high-entropy strings.

Failure: Even with the ability to block polymorphic URLs, the video ads are still indistinguishable from content video without context from the Protobuf structure.

Top ↩


Smoke Test: Intercept and Decode Protobuf in Python

Python is Slow: Decoding ~500 kiB of raw Protobuf in pure Python is painfully slow.

Decoding ~500 kiB of Protobuf in pure Python, especially the decoding step of converting it to over 1 MiB of human-readable text to parse the ad URLs, takes more time than the connection timeout most of the time. I’ll run some benchmarks using pure Python vs. the native C++ library.

Pure Python Benchmarks

Pure C++ Benchmarks

If you caught that, it takes about 23s in Python, and 100ms in C++! In this Never Ending Story, we have to find a way to parse the raw Protobuf payloads in Python using the C++ library libprotobuf.so. In the interest of time, I’ll use subprocess.Popen and communicate with the C++ protoc binary directly (since raw decoding is not supported in Python anyway).

Top ↩


Fuzzing the YouTube Video Ad Responses

How about fuzzing the ad video responses? Now being able to isolate ad videos, as a smoke test, I sent back 200 responses with empty bodies and the iOS app went bananas; it was as if there is an infinite loop with no delay just hammering YouTube’s own servers trying to get the next part of the video in panic mode. I felt bad for their servers, so I stopped. Then, what would a happy-path response payload look like?

Infinite spin-lock loop of YouTube trying to get the next bytes of the ad video
Infinite spin-lock loop of YouTube trying to get the next bytes of the ad video

Try as I might, when I send back empty 200s, 404s, 503s, truncate response bodies, or just null-out part of the ad video, the iOS app crawls then crashes spectacularly with a dying breath of a messed up iOS UI. I now block some error reporting endpoint at /error_204/ that indicates a “dev assertion failed” so I don’t make some overworked QA pull out their hair.

Failure: We’ve learned that blocking ad URLs causes the app to deploy countermeasures and even when defeated, the app hangs forever on the ad screen. We’ve also learned that fuzzing ad videos often causes the app to crash – there is even session meta data in the video response chunks.

Let’s go back to what worked with JSON and obliterate the section of the Protobuf responses that contain the array of ad details.

Top ↩


Enter Burp Suite Tools for Penetration Testing

There is a library for Burp Suite called blackboxprotobuf (get the original Burp Suite version, not the PyPi fork, unless you like infinite recursion bugs) that is designed to decode raw Protobuf wire messages, inject something naughty, then re-encode them again to see how a Protobuf endpoint behaves. We are going to have so much fun together in this next section.

You may encounter a small world of pain because some forks of blackboxprotobuf will cause a stack overflow due to deep recursion. You can see this by adding sys.setrecursionlimit(200).

Compiling the original library source code for Burp Suite and using the C++ bindings will allow us to transcode ~500 kiB of raw Protobuf bytes in just a few seconds.

Tip: At the top of your import chain before you import protobuf, add

to use the C++ libprotobuf.so implementation whenever possible.

It is now possible to generate a best-guess .proto schema with a single function:

The schema isn’t perfect, and it is huge and deeply nested, and takes forever to pretty-print, and is probably wrong, but is just good enough to pull out the ad details like so (Protobuf to JSON in this sample):

Sample Protobuf to JSON showing a section of ads
Sample Protobuf to JSON showing a section of ads

The Python schema is huge and looks like this for about 250,000 more charcters:

Reverse engineering the Protobuf schema sounds good on paper, but our target is spectacularly complex and a moving target.

Top ↩


Exfil the Proto Schemas from the App, Cleanly?

As fun as it to reverse the Protobuf and generate a best-guess schema, wouldn’t it be more ninja-like to exfil the actual, working .proto or schema files from the smartphone app? Let’s pull out the Protobuf schemas from the Android version of the YouTube app and see if the schemas are the same or compatible.

This is what I tried at first, but it went nowhere with the Protobuf Toolkit (PBTK). I reproduce it here so I remember what I tried:

After installing Qt dependencies (pronounced “cute”), I was treated to a GUI.

PBTK - The Protobuf Toolkit
PBTK – The Protobuf Toolkit

Next, I got the most recent release of a 100 MiB Android APK file from apkpure.com.

Excited in vain, the most PBTK could get was a 59-byte proto file. Another tool called Apktool also looked promising, but the best it can do is disassemble bytecode, not decompile it – this may be good enough for Pen Testers, however.

What ended up working for APK decompilation is a combination of a dedicated person’s dex2jar tool and a Java Decompiler. A helpful guide can be found here.

You can see that Google went out of its way to complicate reverse engineering.

YouTube APK reversed into obfuscated Java classes
YouTube APK reversed into obfuscated Java classes

Google thoughtfully did leave some hints.

All the Protobuf schemas laid bare and human-readable
All the Protobuf classes laid bare and human-readable

Upon deeper inspection, the Protobuf classes are right here, in Java, decorated with getters and setters. Since we are using Python, and we cannot get the true schema files, I will leave this approach for now.

Top ↩


Hardcore Deep-Dive into Protobuf and Wire Format

After gazing into a sea of decrypted network traffic again, then triggering errors and assertion fails on my iPhone with Protobuf fuzzing, and taking a peek at the error logs being phoned home, I’ve noticed that ads register for “slots” in a given video. They can register for pre-roll, mid-roll, end-roll, full-page, and ad pods (back-to-back ads). Blocking an ad URL causes an error along the lines of “some ad that doesn’t exist booked a slot” and UI panic sets in.

I’m going to Sun Tzu the Protobuf Wire Format and come back in a bit…

I’m back. The Wire Format is surprisingly elegant, except for ZigZag encoding. Through trial and error, editing out chunks of Protobuf with a hex editor is just a no-go.

While computationally expensive, decoding, editing, and re-encoding without the original schema leads to a modified encoding. This is likely because we cannot detect if ZigZag encoding is being used, or if a number is an int32, int64, sint32/64, varint, etc., plus the order of object fields is normally non-deterministic. Here is some Protobuf trivia on the matter:

Protobuf serialization gotachas
Protobuf serialization gotachas

Top ↩


Exploit a Protobuf Flaw to Easily Remove All Ads by Changing One Byte

Casually poring over the C++ source code, an interesting comment in the Protobuf code caught my eye:

UnknownFieldSet is used to keep track of fields that were seen when parsing a protocol message but whose field numbers or types are unrecognized. This most frequently occurs when new fields are added to a message type and then messages containing those fields are read by old software that was compiled before the new types were added. (ref)

Yes, what to do with unknown fields? What to do indeed. And, how easy would it be to say, change a 49399797 field key to, say, 49399796 thus making an entire substructure of advertisement and tracking information suddenly unavailable? Tantalizing.

And, if we can calculate the field tags in bytes with bit-twiddling, then can we use a simple regex to AMF1 the section of ads in O(n) time?

As a motivating example, I’d like to find the field key 49399797 which is not as simple as searching for 2F1C7F5. Here is an implementation of a tag-scanning algorithm so you can see the bit-twiddling:

We know the wire type is 2 (length-delimited nested string/message), and one target field key is 49399797. When bit-twiddled, we get the target tag

AA FF B8 BC 01

where the final 01 happens to mean 2 (the wire type) in hex. In binary, this is:

10101010 11111111 10111000 10111100 00000001

Let’s lose the MSB from each byte as per the var-length wire format:

.0101010 .1111111 .0111000 .0111100 .0000001

Then we shift and add only the first four bytes since the LSB is first:

Finally, we shift out the number of wire type bits (3) to get back the field key:

395198378 >> 3 = 49399797

And that, folks, is a taste of how Wire Format works.

Fantastic. Now, all we have to do is scan the Protobuf bytes for classic ad URL signatures like /pagead/ to bound our field search, then move backward from there until we find the target(s) field tags and thus field keys we would like to denature (e.g. 49399797 –> 49399796).

Notice how the Protobuf response payload is 1.87 MiB? As I said, Google makes it computationally expensive to decode, alter, and re-encode without the C++ source proto files, but a quick linear scan takes no effort at all.

Walking backward from the ad marker
Walking backward from the ad marker

Just a quick note, there is more than one field tag, but not all of them represent ads. That is why we need to backtrack from the /pagead/ markers.

Multiple identical field tags may be present
Multiple identical field tags may be present

Top ↩


Smoke Test: Remove Ads from Protobuf in O(n)-Time

It works! In one pass with no additional memory, I’m able to scan a huge 1.8 MiB chunk of jibberish-looking Protobuf data, and in the screenshot below only at the 30,593th byte (of 1.8 MiB) is our target found, and then backtracking ~600 characters yields our target field key to denature. Not only is this amazing, but I don’t even need to block *.googleadservices.com or URLs with /pagead/ in them; Those requests are never made in the first place, anymore.

Successfully able to remove ads from the Protobuf response
Successfully able to remove ads from the Protobuf response

Top ↩


Analysis of this Successful Adblocking Technique

Summary

By taking advantage of a feature (flaw?) in Protobuf that allows it to be backward compatible with schema changes, along with the fact that Protobuf is very sensitive to byte changes due to its compact nature, we can change a single byte in a critical location and tell Protobuf that an entire section of deeply-nested data is from a future schema version and it should be ignored.

Timing Analysis

Google returns huge responses in Protobuf (e.g. 1.8 MiB) – including even the layout of the iOS app – so only C++/Swift is fast enough to understand it all before the connection times out. I’ve shown that Python is several orders of magnitude too slow in decoding these Protobuf payloads, so connections do time out waiting on Python. With web-based JSON, the whole payload needs to be parsed, edited, and re-serialized; With my Protobuf technique, it takes microseconds thanks to a single linear scan and then ultra-quick backtracking. This technique is suitable for real-time adblocking without blocklists.

Knock-On Benefits

All those *.googleadservices.com and /pagead/* URLs on Apple devices originate from the Protobuf payload. This means they all go away for free – we don’t need to block them. In fact, the YouTube app is zippier because fewer connections are made to ad URLs in the first place. This means we can avoid keeping a blocklist of YouTube ad URLs and stay on the sidelines of the whack-a-mole fun. Ads do not register for video location “slots” on the Apple devices and the content just plays.

Future-Proof

This is a heuristic technique that looks for two strings: /pagead/ and some calculated field tag nearby, so this technique is designed to be future-proof.

Walking backward from the ad marker
Walking backward from the ad marker to find the field key

Even if Google changes the field tag (and breaks millions of apps and Apple TVs before they upgrade), it’s an academic exercise to enhance the following script to discover the new field tag(s) automatically.

Should Google be Worried?

No, not at all.

This is a highly-specialized technique to block Apple-device YouTube ads (or Instagram, Whatsapp, Facebook, etc. tracker blocking). The CPU requirements to decrypt and re-encrypt HTTPS traffic greatly exceed those available to Raspberry Pis. Even if some company takes my script and considers making and selling a NIC dongle, it would likely not be powerful enough. An Nvidia Shield could handle it, but if you already have Android devices, then just hack the binaries; My technique is for Apple device owners where we don’t want to compromise the OS so that further reduces the audience of this technique.

Top ↩


The MITMProxy YouTube Adblocking Script

Here is the MITMProxy addon script that serves as a proof-of-concept to block YouTube ads on networked Apple devices. The script can be run as follows (note the prerequisites in the script and be sure to install them first). Name it youtube.py and run the following command:

mitmdump --listen-port 8080 --listen-host 127.0.0.1 -s "youtube.py"

Here is the script, including a fairness function to allow ads 5% of the time:

This script happens to work in Python for a TLS-decrypting man-in-the-middle proxy written in Python. As a working proof-of-concept, it’s pretty rad. Of course, it can be rewritten in Rust or Go or anything but single-threaded Python, but as an intellectual exercise to defeat ads that are served from the same domain as content, it’s elegant.

Top ↩


YouTube Premium

It’s unknown if CAD $9.99/mo $11.99/mo ($13.43/mo with tax) is even reasonable: Do I personally incur CAD $11.99 of cost to advertisers each month?

How much does YouTube advertising cost?
Source

Since ads are auctioned, the CPV (cost-per-view) varies. Also, many ad campaigns have a capped daily budget, so theoretically there should be fewer ads in the evenings as budgets run out during the day.

Experiment in Ad Viewing

I watched YouTube on and off for a day on a clean notebook computer with private browsing. My history showed that I only “watched” 10 videos:

  • I fast-forwarded through a few of them to get past the “like and subscribe” runtime padding.
  • I jumped to the end of one just to get to the “top three” from a “top twenty” list.
  • Two were low quality so I left early.
  • The rest were music videos.

In all, for watching parts of 10 videos, I was exposed to 8 ads, and only two were skippable (which I skipped).

$0.15 as a Ballpark CPV

Let’s use USD $0.15 as a CPV. In one day, let’s say, I incurred 8 x $0.15, or $1.20 to advertisers. Extrapolated to one month, that is roughly USD $36/mo. Do I really cost advertisers USD $36/mo for very casual YouTube viewing? That sounds terrible for advertisers.

CPV from US Advertising Spend Divided by Total Views

From Statistica, in 2019, US YouTube advertisers spent $15.1 billion dollars. Also in 2019, US residents had 916 billion views (ref). That works out to an average of $15.1B / 916B, or USD $0.0165 per view. Then for me, that is only USD 13 cents. Extrapolated to one month, I theoreticaly cost advertisers only USD $3.96/mo.

Is YouTube Premium Worth It?

YouTube Premium subscription fee as of Jan, 2022

When I allowed ads for my experiment, I hit the hardware mute button. I also looked away because I have several computers with a lot going on. Ad spend is wasted on me, but I still want to support content creators. For me, CAD $13.48/mo is more than I incur on actual ads and more than I pay for a Netflix subscription. The only way to justify the cost is to have YouTube playing constantly in the background on a TV.

However, I truly enjoy a handful of creators, so I may start watching them in the background on non-stop play. Let’s give the three-month YouTube Premium trial a chance, and I will still be monitoring what they track about me.

YouTube Premium network traffic
YouTube Premium network traffic

Top ↩


DMCA, Sony, Viacom

Recently I learned that due to abuses of the DMCA Act of 1998, YouTube content creators who make reaction videos and “easter egg” videos may have their videos claimed by big companies like Sony and Viacom. That means that from when a claim is made, all ad revenue goes to those big companies, and not even to the creators. That means in all likelihood I unknowingly may not even be supporting my favourite YouTube creators.

Did you know? Many fair-use and video-game-commentary videos may have automated copyright claims against them, meaning that ad revenue goes to big companies with deep legal pockets and your favourite creators may get nothing, so more and more creators leave YouTube for Twitch.

Top ↩


Summary of Accomplishments

I rarely give up, so this is an example of going into an extreme problem-solving mode to solve a fun problem loosely using cryptography and reverse engineering. In the end, a single byte turned it all around, so it was all worth it to come to an elegant and satisfying solution.

Success: We were able to set up a hardware router from scratch, segment LANs into trusted and untrusted zones, set up traditional DNS adblocking, add a transparent MITM proxy, and ultimately block YouTube ads on networked Apple devices.

Note: This was a hard problem – now solved – so I am paying for YouTube Premium to give the CPU a rest.

Top ↩


Notes:

  1. Adios, My Friend