Attacks on privacy
The aim of this lecture is to give an overview of a wide range of privacy attacks. Before moving on we should first clarify what is meany by privacy in order to understand which behaviours can be seen as attacks. When starting to think about this question then it appears that there is no clear answer as it depends on the understanding what privacy is. The meaning of privacy can be linked to system of government, cultural background, legislation, etc. It may also not be straightforward to attribute the role of an attacker. For example, how far can the government of a country go in providing national security which may violate the privacy of its citizens?
In the following we focus on methods which can be used to violate the privacy of Internet users. In addition, we also see how people can be tracked in the physical world. The motivation for describing and demonstrating these attacks is to raise awareness, learn how to avoid becoming a victim, building motivation for learning and applying privacy enhancing technologies.
What is described on the following image?
Violating user's privacy in web
Tracking cookies
Cookies are small pieces of textual data (up to 4KB) that web sites store in user's computers. Cookies can be seen as the memory of web pages as they allow to store user's preferences and keep track of user's sessions. A cookie has multiple attributes but the two most important ones are the id and value of the cookie. The id may contain meaning for the usage of the cookie, while the value is usually a randomly generated string that is supposed to be unique.
Most cookies are set when the user first visits the domain, although some cookies like the session cookie are updated based on the user's actions (e.g., authentication or personalization of the web site). Cookies are tied to the browser which was used to visit the web site. Once cookies are stored in the browser, each time when the user visits the same web site all cookies that are associated with the given domain are sent along with the query. Now, think about how modern web sites are built. It becomes obvious that in order to visit a web site resources are fetched from multiple origins. Thus, all of the queried domains can either set cookies or if they have already done that then cookies associated with these domains are sent along with the query. This has been the default method for matching user's across web sites. Which services have the ability to track user's across the web sites?
- Social media companies: like buttons, share buttons, etc
- Advertisement companies: ads loaded from third party servers
- Analytics services: included JavaScript to get statistics about visitors
- Services that share content: images, videos that are embedded into the web site
The following video by MediaCrossing gives an overview of how ad bidding works while the user only sees that a web site being loaded. The video is accompanied by an explaining blog post: 200 Milliseconds: The Life of a Programmatic RTB Ad Impression. Apple also released an overview of how data gets used by advertisers: A Day in the Life of Your Data (2021).
A good overview of third party tracking is given by EFF: Behind the One-Way Mirror: A Deep Dive Into the Technology of Corporate Surveillance (2019)
We may also look into the following topics:
Tracking protections & how to bypass them
EFF has created an extension named Privacy Badger that attempts to detect third party trackers and block them. Similar functionality is also provided by other extensions like Disconnect, Ghostery.
Another quite popular approach is to use Pi-hole which acts as a local DNS server and thus also blocks tracking queries made by IoT devices. Troy Hunt wrote an overview about its user experience: Mmm... Pi-hole....
CNAME cloaking
CNAME cloaking is a novel method to bypass ad blockers. Although the method was previously known, it became widespread in 2019 probably due to the increase in blocking of third party cookies. For example, in September 2019 Mozilla's Firefox browser started to block third party cookies by default: Today’s Firefox Blocks Third-Party Tracking Cookies and Cryptomining by Default.
A good summary about CNAME cloaking is given in: CNAME Cloaking, the dangerous disguise of third-party trackers. A more recent overview of the issue was given in 2021: Large-scale Analysis of DNS-based Tracking Evasion - broad data leaks included?.
Let's try to see how it works in practice. As an example, take the web site of a French
online shop https://mathon.fr. When loading the web page we see that it makes several requests to a subdomain: 16ao.mathon.fr. Now, let's check if a CNAME alias has been set for that subdomain. You can do that by finding an online tool for doing CNAME lookup or by doing the DNS request on your own. This can be done with unix based command line with the following command: host -t cname 16ao.mathon.fr
. On Windows you can use the following command: nslookup -q=CNAME 16ao.mathon.fr
. We see from the answer of this query that 16ao.mathon.fr is an alias for mathon.eulerian.net.
Another example that was widely used last year was from the French news site liberation.fr, which leaked its session cookies to f7ds.liberation.fr. However, it seems that the subdomain is no longer loaded when visiting that news site. A list of identified CNAME tracking domains can be found from https://raw.githubusercontent.com/AdguardTeam/cname-trackers/master/combined_disguised_trackers_justdomains.txt
Profiling users
It would be naive to think that users are profiled only with the help of cookies. Even when third party cookies are blocked the trackers still have multiple ways for matching users across web sites.
Some examples:
- IP address
- Image cache: Cookieless cookies
- Leaky Images: Targeted Privacy Attacks in the Web
- Social media fingerprinting: Your Social Media Fingerprint, Social Media Login Detection
- Browser fingerprinting: Panopticlick, Am I Unique?
- Detection of proxies and ad blockers
- Behavioral Profiling: The password you can't change.
Intercepting communication
Topics:
- Man-in-the-middle (MITM) attacks
- Downgrade attacks
- Legal interception
- Backdoors and leakage of keys
- Hacking femtocells:
Other means of tracking & violating the privacy of people
GPS & other location data
Topics that we may cover:
- EXIF data on images
- Location information in Tweets: Not easily available anymore.
- Location data collected by Google:
- Location data collected by smart watches
- Building the Global Heatmap (2017)
- Fitness app Strava lights up staff at military bases (2018)
- This fitness app lets anyone find names and addresses for thousands of soldiers and secret agents (2018)
- After Strava, Polar is Revealing the Homes of Soldiers and Spies (2018) -- this feature was discontinued in December 2020.
Cell phone tracking
IMSI catchers
IMSI is an abbreviation for international mobile subscriber identity. It is assigned to a subscriber via SIM card. IMSI usually consists of 15 digits, out of which the first three represent country code, next 2-3 digits represent the mobile network and the rest of the code is used to identify the subscriber in the mobile network.
Step 1
Step 2
Link to EFF's report:
Tracking payments
While tracking credit card payments is restricted to law enforcement agencies, it is much easier to track the transactions done with cryptocurrencies. For example, all Bitcoin transactions are public. However, Bitcoin ledger does not contain information about identities as accounts are identified by cryptographic pseudonyms. Thus, moving funds inside the Bitcoin system may not lead to the identification of the sender. Still, there are multiple ways to identify who was behind a transaction. First, when a user converts cryptocurrency to other currencies, the exchange probably knowns the real identity as otherwise it would not be able to transfer the funds. Second, when a user orders something by paying with Bitcoin, the item may be tracked back to the receiver. Third, when a Bitcoin owner shares his "account number" in social media, etc., all transactions with the same account number can be connected with an identity. There are multiple tools for tracking cryptocurrency transactions but most of them are not free to use. One could use the web site blockchain.com to view transactions for a given wallet address. The following link shows the transactions for an account claimed to be related with Silk Road that moved over 100 million dollars worth of Bitcoin: https://www.blockchain.com/btc/address/15ihHoGs3onQBNnEH8afDFGvou9nD62Hm7.
Relevant links:
- Feds Warrantlessly Tracking Americans’ Credit Cards in Real Time (2010)
- When A Small Leak Sinks A Great Ship: Deanonymizing Tor Hidden Service Users Through Bitcoin Transactions Analysis (2018)
- Tracing Transactions Across Cryptocurrency Ledgers (Usenix, 2019)
Analyzing metadata and datasets
Metadata is data about data. For example, a network level attacker is able to monitor TLS traffic that is bypassing the location where the attakcker is located. Although properly configured TLS protects the confidentiality of the content, the metadata is still visible. An attacker is able to see from which IP address the request was made and similarly who is the recipient of the request. In addition, the time of the connection leaks along with the close estimate of the amount of data that was sent over the communication channel. Based on such information it is possible to map the behaviour of users behind a certain IP address. Similar metadata can be collected for different types of communication.
Related news stories:
- GCHQ taps fibre-optic cables for secret access to world's communications (2013)
- Sweden helps the US spy on the Baltics: report (2013)
Datasets
Pseudonymisation is often not sufficient to protect the identities of people who are included into the pseudonymised dataset. Netflix case is probably one of the most well known examples of identifying users based on additional data from other datasets or other sources. A more recent example comes from Australia where in 2016 health records from 2.9 million Australians were published in a pseudonymised format. However, it turned out that by using other publicly available information the pseudonymization can be reversed in some cases: The simple process of re-identifying patients in public health records.
Cambridge Analytica's Facebook data
The dataset in question was collected in multiple steps. First about 32000 US citizens were paid to fill a detailed survey about their personality and political preferences. In addition, some of their Facebook data was collected and the same was done to the people who were in the Facebook friend list of those people who took the survey. The survey results were combined with Facebook data and then used to create personally targeted ads with the aim of getting people to vote for a specific party or candidate. An overview of the issue can be read from the following Guardian story: Cambridge Analytica: how did it turn clicks into votes? (2018)
Other methods
Smart TV's
- LG smart TVs send data about users' files and viewing habits to the company (2013)
- Your Samsung SmartTV Is Spying on You, Basically (2015/2017)
- Here’s how to use the CIA’s ‘weeping angel’ smart TV hack (2017)
- How To Stop Your Smart TV From Spying on You (2017)
- Watching You Watch: The Tracking Ecosystem of Over-the-Top TV Streaming Devices (2019)
- How to Turn Off Smart TV Snooping Features (2020)
Leakage of data through other IoT devices
- Information Exposure From Consumer IoT Devices: A Multidimensional, Network-Informed Measurement Approach (2019)
- Smart speakers using Alexa, Siri
- Amazon Workers Are Listening to What You Tell Alexa (2019)
- Apple apologizes for Siri audio recordings, announces privacy changes going forward (2019)
- Amazon Echo’s privacy issues go way beyond voice recordings (2020)
- Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems (2019)
Lab tasks
Lab tasks can be found from lab page.
Relevant links
- Third party tracking and other relevant papers:
- Exposing the Hidden Web: Third-Party HTTP Requests On One Million Websites (2015)
- I never signed up for this! Privacy implications of email tracking (2018)
- Third Party Tracking in the Mobile Ecosystem (2018)
- Good News for People Who Love Bad News: Centralization, Privacy, and Transparency on US News Sites (2019)
- Privacy Policies over Time: Curation and Analysis of a Million-Document Dataset (2020)
- The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion (2021)
- Profiling users:
- Panopticlick
- Am i unique?
- Cookieless cookies
- Printer Tracking, List of Printers Which Do or Do Not Display Tracking Dots
- Behavioral Profiling: The password you can't change. (2015)
- De-anonymizing Web Browsing Data with Social Networks (2017)
- (Cross-)Browser Fingerprinting via OS and Hardware Level Features (2017)
- Leaky Images: Targeted Privacy Attacks in the Web (2019)
- Web trackers using CNAME Cloaking to bypass browsers’ ad blockers (2019)
- Comparison of data collected by Google Chrome and DuckDuckGo Privacy Browser (2021)
- Identifying user's based on datasets:
- Other means of tracking & violating people's privacy:
- Low-cost IMSI catcher for 4G/LTE networks tracks phones’ precise locations (2015)
- Privacy Threats through Ultrasonic Side Channels on Mobile Devices (2017)
- Shattered Trust: When Replacement Smartphone Components Attack, Project website (2017)
- MAC randomization: A massive failure that leaves iPhones, Android mobes open to tracking (2017)
- Fitness app Strava lights up staff at military bases (2018)
- Guide To Using Reverse Image Search For Investigations (2019)
- Gotta Catch 'Em All: Understanding How IMSI-Catchers Exploit Cell Networks (Probably) (EFF, 2019)
- New developments