Privacy and anonymity on the Internet
Cookies
Cookies are textual data stored on the computer by the websites. Websites use cookies to manage sessions, track users and personalize the sites based on the user preferences. Cookies can be seen as the memory of the websites. For example, with the help of a session cookie, an online store can memorize the items that were added to the shopping cart. However, such information is not stored in the cookie as the size of the cookie is limited to 4 kilobytes.
Cookies can be described as key-value pairs, where the key helps the website to determine which kind of information is stored, and the value contains the corresponding data that is saved to the user's computer. In addition, cookies can have several attributes like the domain, path, the expiry date and whether the cookie is transmitted securely. This is illustrated in the following screenshot.
So how can the website access the data in cookies that are saved to the user's computer? This is achieved with the help of web browsers. Each time the user visits a website, all cookies previously set by the same website (more precisely set by the domain) are sent along with the query. To be more precise, only the key-value pair is sent back to the webpage. Usually, the values stored in cookies are used as unique identifiers, which are tied to website visitors by relying on the website's database. Thus, the identifiers in the cookies can be used to find user-specific information from the website's database. With this approach, the website can find out if the user has an open session (is logged in to the website), but it also allows for tracking users. For example, suppose a website sets a cookie that expires in 2 years. In that case, the website will be able to track the user each time the user visits the website for the next two years (assuming that the user never deletes cookies and that the same computer and browser is used to visit the website).
The usage of cookies is regulated in the EU by the "cookie law". According to a Guardian article, there have been several examples of companies breaching the EU "cookie law", with one of them being Facebook. By 2021, we can say that the pop-up notifications mandated by EU's "cookie law" have failed to achieve its original goal: We need to fix GDPR’s biggest failure: broken cookie notices.
Classification of cookies
Session cookie
The websites use session cookies to manage web sessions. Session cookie(s) allows the website to determine if a query is made by a new user or by a user who has already logged in. The lifetime of a session cookie is usually determined by the session's lifetime, i.e., when the user logs out of the website, the cookie becomes invalid. Session cookies are usually deleted when the web browser is closed. It is essential that the session cookies would be transmitted over a secure channel. Demo: What could an attacker do when he can copy a session cookie?
Session cookies are used by websites that allow the user to log in. Not all websites that allow users to log in to the website use encrypted communication. Thus, in case the cookie is transmitted over an insecure channel (HTTP), an attacker could access the session cookie and take over the session. In order to protect against such attacks, it is important to encrypt the cookie when it is transmitted between the web browser and the webserver (the encryption is provided by HTTPS). Online banking sites transmit session cookies over a secure encrypted channel, and therefore an attacker can not access their session cookies.
Persistent cookie
Persistent cookies are used to save information for a longer period. The lifetime of a persistent cookie is determined either by the expiry date or by the maximum lifetime of the cookie. Persistent cookies are used to personalize websites and also to track users.
Third-party cookie
Third-party cookies are cookies that are set by domains other than the website which is being visited. Websites link advertisements, photos, videos, social media buttons and other content from foreign domains and these domains are also able to set cookies. So why are the third parties interested in setting cookies? Well, by doing that, it is possible to track users across websites. For example, online advertising companies usually collect information about website users' browsing habits to serve personalized ads across different websites as this will increase their advertising revenue. This is possible if the online advertising company hosts advertisements on many different websites. Therefore, using third-party cookies makes it possible to create a profile of a user's interests and browsing habits. Some users consider this a threat to privacy and configure their web browser to block third-party cookies.
Other types of cookies
Super cookie - a theoretical concept
Supercookies are cookies that are set by a top-level domain (e.g., .com, .eu, .ee) or by a domain that many people and websites use (e.g., co.uk, edu.ee). Web browsers automatically block such cookies as they are a risk to privacy and security. If browsers would not block super cookies, then a domain that sets the cookie could affect other sites on the same top-level domain.
Zombie cookie
Cookies that are resistant to deletion are called zombie cookies. They might be set in a way that makes them difficult to remove, and when deleted, they might have the means to restore themselves. Zombie cookies are used to identify and track users. So how could a website create cookies that can not be simply detected and removed? There are several options for this, e.g., HTML5 storage, Flash local storage, specially modified PNG files. The PNG trick is demonstrated by the following website http://lucb1e.com/rp/cookielesscookies/. More information about the different storage methods can be found in the Wikipedia article about Zombie cookie.
How to manage cookies
Browsers differ in the ways that they manage cookies. Some browsers allow fine-grained handling, while others limit the functionalities available to users. Out of the mainstream browsers, Safari has stood out by allowing strict handling of cookies to protect the user's privacy. Starting from macOS High Sierra and iOS 11, Apple has built Intelligent Tracking Prevention into its Safari web browser. This technology uses heuristic methods to limit and eventually purge third-party cookies originating from domains that the user does not explicitly visit (as the first party) in a given period of time. Such third-party cookies are partitioned, i.e. a third-party gets different sets of cookies per each first-party domain that the user visits, and that includes resources from the given third-party domain. There's a good overview of how Apple's Intelligent Tracking Prevention works in the Security Now podcast episode #629 (starting at 1:45:10).
Lately, Firefox has also stepped up with its strict cookie policy. At the beginning of 2021, Firefox 86 was released, which included total cookie protection that allows cookies to be confined to the website where they were created, much like it is done in Safari. This feature was improved in Firefox 91 by introducing enhanced cookie clearing that allows users to easily delete all cookies and supercookies set by a specified website. However, the aforementioned features are available only in case strict protection is selected in the configuration of Enhanced Tracking Protection.
Try it yourself: Using Google Chrome / Firefox, try to find out which cookies are set by an online news portal or by a social media website.
- Open a news portal or a social network. Make a right-click on the website and choose "Inspect" from the context menu (Ctrl + Shift + i).
- In the case of Google Chrome: locate and click on the "Application" button that is shown in the freshly opened view.
- In the case of Firefox: click on the "Storage" button that is shown in the freshly opened view.
- In the left panel, expand the item "Cookies" and then click on a domain whose cookies you would like to see.
- How many cookies are set by the website? How many different domains set cookies? Try to identify the values of the cookies and their expiry date. Can you detect any third-party cookies that are set by advertisement providers?
How to view cookies
- Google Chrome: Navigate to "Settings -> Privacy and security -> Site Settings -> Cookies and site data" and click on the button "See all cookies and site data".
- Firefox: The values of site-specific cookies can be viewed through the developer view: Ctrl + Shift + i -> Storage -> Cookies.
- Edge: The values of site-specific cookies can be viewed through the developer view: F12 -> Storage -> Cookies
How to restrict the usage of cookies
- Google Chrome: Navigate to "Settings -> Privacy and security -> Site Settings -> Cookies and site data". There you can set the restrictions for cookies.
- Firefox: Page specific settings can be found from "Preferences -> Privacy & Security -> Cookies & site data -> Manage Permissions". Then it is possible to set rules for specific websites regarding cookies. For configuring global cookie settings navigate to: "Preferences -> Privacy & Security -> Enhanced Tracking Protection -> Custom -> Cookies".
- Edge: Settings -> Privacy & security -> Browsing data -> Cookies.
How to delete cookies
- Google Chrome: Navigate to "Settings -> Privacy and security -> Clear browsing data", choose the time range and mark "Cookies and other site data". To remove the cookies for the selected time range, click on "Clear data".
- Firefox: Navigate to "Preferences -> Privacy & Security -> Cookies and site data". Then either select "Clear data" to remove all cookies or select "Manage Data" to remove cookies specific to a specific website.
- Edge: Settings -> Privacy & security -> Browsing data -> Clear browsing data.
Browser extensions
Disconnect
Disconnect is a browser extension that lets the user view how web pages are tracking the users. The extension allows viewing the graph of all possible domains that set cookies when a website is visited. It is also possible to block some domains from setting cookies. Browser extension "Disconnect" is available for Google Chrome and Firefox. In addition, they provide an application for iOS devices. An application for Android devices was removed from the Play store by Google; this is described by an article in TechTimes.
Task: Try to find out which domains track the user on a popular news site or a social media site. First, open the Google Chrome browser and install the extension “Disconnect”, which displays the domains that set cookies. To install the extension, go to the browser settings, then on the left menu click on the extensions menu and then click on the "get more extensions" link. "Disconnect" is also available for Firefox, but there is also an alternative that is called “DoNotTrackMe”.
Privacy Badger
The extension Privacy Badger allows limiting the number of tracking cookies. Similarly to Disconnect, it displays the domains that track users, but it also contains a smart feature that automatically blocks tracking cookies. Privacy Badger tries to learn which cookies are used for tracking, and once such cookies are found, they are blocked. Therefore, when the extension is used right after installation, it won't block third-party cookies. To get it working, visit a few websites, and you will notice that it starts to block third-party cookies. Sometimes a legitimate cookie might be blocked, and this could make it difficult to view a webpage. In such a case, the extension can be disabled for the webpage, or the cookies could be unblocked.
Privacy Badger is being developed by Electronic Frontier Foundation, which is an organization that tries to protect the digital rights of users. Privacy Badger can be installed for free from the Google Chrome store. Privacy Badger is also available for Firefox.
Private browsing
It is common for browsers to collect information about their user's behaviour. Some of the information gatherings can be declined by choosing to opt-out, but some features are built-in and can not be easily disabled. For example, it is common for browsers to send the information that is typed to the URL bar either to the search provider or to the browser vendor. From the mainstream browsers, this is done by Chrome, Firefox, Safari and Edge. In addition, modern browsers try to identify malware and phishing websites to reduce the harm to the user. Although some of the verification is done locally, it is also common for browsers to query information about some URL-s. For example, Microsoft's Smartscreen does the local matching for top traffic websites and for known phishing sites, but all other URL-s are sent to the Smartscreen service to be validated. Thus, a lot of data goes through that service. While Google's Safe Browsing service does the filtering locally by default, it is possible to opt-in for advanced protection, in which case the URL-s are sent to be checked by Google's servers. More information about browser-related privacy issues can be read from the following paper: Web Browser Privacy: What Do Browsers Say When They Phone Home? (2020). The following news story gives a brief overview of the paper is given in the following news story: Brave deemed most private browser in terms of 'phoning home'.
Almost all modern browsers can be used in a private browsing mode. Private browsing mode is designed to protect the privacy of the user by preventing the third party from finding out which websites were visited. Therefore, the effects of the private browsing mode can be divided into two: the protection that prevents the information to be read from the local computer and the protection that prevents the websites from identifying and tracking the users.
Private browsing modes usually provide protection against a local attacker who has access to the computer. Private browsing modes do not save browsing history, cookies, temporary files, etc. Therefore, private browsing should be used on public computers. However, it protects against the local attacker only when the computer user is careful, e.g., it is not wise to download files or create bookmarks during the private browsing session. It seems that there is a misconception about the functionalities provided by private browsing mode. This is illustrated by the following:
- Few Realize “Private Mode” Is Not Really Private
- Your Secrets Are Safe: How Browsers’ Explanations Impact Misconceptions About Private Browsing Mode (2018)
In theory, private browsing modes should also provide protection against external threats to privacy. Thus, a website should not be able to identify and track the user when private browsing mode is used, but in real life, this does not work. Usually, the only protection provided by the private browsing modes is the deletion of cookies. Some private browsing modes also disable browser extensions during the private browsing session to increase both privacy and security. But it is still possible to track users based on the IP address, zombie cookies, browser configuration, behavioral biometric data or even by intercepting the network traffic.
A good overview of the private browsing modes is provided by the MIT course 6.858 Computer Systems Security. The video lecture is made available through the MIT OpenCourseWare program.
Private browsing in Firefox
Firefox can be used in either the normal mode or in private browsing mode. Private browsing mode does not save:
- browsing history
- search history
- download history
- data entered into forms
- cookies
- temporary files
However, if a bookmark is set in the private browsing mode, then the bookmark is not removed after the private browsing session ends. In addition, files that were downloaded during the private browsing session remain on the drive. Thus, you can see that the private browsing mode is designed to make sure that traces about browsing would not be saved to the computer. The private browsing mode can be activated by entering the following keyboard shortcut: Ctrl+Shift+P. The same shortcut for mac users is Cmd+Shift+P.
Incognito browsing in Google Chrome
The private browsing mode in Google Chrome is called Incognito Browsing, and it works similarly to the private browsing mode in Firefox. The main difference between the privacy modes of Google Chrome and Firefox is the way how browser extensions are handled. Namely, Google Chrome disables browser extensions during the incognito browsing session as it is not known how if the extensions protect the user's privacy. In addition, browser extensions may contain bugs that lower the security level and in some rare cases, the browser extension might contain malicious code. The Incognito Browsing mode can be activated by entering the following keyboard shortcut: Ctrl+Shift+N. The same shortcut for mac users is Cmd+Shift+N.
InPrivate Browsing in Internet Explorer
The privacy mode in Internet Explorer works similarly to the Incognito Browsing in Google Chrome. In Internet Explorer, the privacy mode is called InPrivate Browsing. In the InPrivate Browsing mode, toolbars and browser extensions are disabled. The InPrivate Browsing mode can be activated by entering the following keyboard shortcut: Ctrl+Shift+P.
How to anonymize network traffic
Surely you have noticed that some websites and web services know where you are located even when you have not logged in to the website and have not given out such information. So how can the website know that you are currently located in Tartu? This is possible due to the way how IP addresses are distributed. Each device that is connected to the Internet has a unique IP address (unique either globally or in the local network), and when a query is sent out, the source IP address is added to the query. Thus, a website that you are visiting knows your IP address as otherwise, you would not be able to view the website in your browser. The web server has to know the source IP address in order to send the contents of the webpage back to the browser that made the query.
Now we know that the websites have access to our IP addresses, but we still don't know how they can find out our location based on the IP address. IP addresses are distributed to the Internet Service Providers (ISP) in blocks. Based on the block, it is possible to determine in which city the IP address is located. In addition, there are many GeoIP services that have collected mappings between IP addresses and locations. The additional information could have been collected so precisely that it is possible to map an IP address to the house number where the device is located. Google used to collect the location information by scanning wifi networks while taking photos for Google Maps; more information about the incident can be found from the Guardian article.
In addition, websites might ask for a permission to detect the location of the visitor. In these cases, the browser shows a dialogue asking if the user would like to share his/her location with the website. In case you do not prefer to share such information, then it is possible to block such queries by configuring your browser.
- Google Chrome:
- Settings -> Advanced -> Content Settings -> Location
- Replace Ask before accessing with Blocked
- Firefox:
- type to the address bar: about:config
- agree with the warning
- find dom.webnotifications.enabled and set it as false
- find geo.enabled and set it as false
Sometimes it might be useful or even necessary to anonymize web traffic in order to protect your real location. In order to do that, a proxy could be used, and the proxy would act as a middleman between the user and the server. Thus, the user would send the query to the proxy, and the proxy would relay it to the server. Then the server would send the answer back to the proxy, and by doing that, the server would not find out who really made the query.
The simplest solution is to use an HTTP proxy, which simply relays the queries made by the user. HTTP proxy was first used to cache websites in order to lower bandwidth usage. Nowadays, HTTP proxy could be used to access resources that are available only in a certain location.
However, HTTP proxy does not provide protection against an attacker who is listening to the network traffic, as HTTP proxy does not encrypt the packets. All network nodes that transmit HTTP packets can read the data that is inside the packets. If the data is sensitive or confidential, it should only be transmitted via HTTPS, which provides encryption between the endpoints, i.e., between the web browser and the webserver.
Some countries do not provide freedom of speech to their citizens, and some countries censor the Internet in order to keep citizens under control. It might be necessary for human rights activists and free speech activists to stay anonymous in such countries. Also, ordinary citizens might be interested in information that is not censored. So, how would it be possible to stay anonymous when browsing the Internet? To the human rights activists, it is important that even the state is not able to identify them as otherwise they might be arrested. Therefore, a good solution should provide both secrecy and anonymity. The most common services that hide the location while providing anonymity and secrecy are Tor and VPN.
VPN
VPN (Virtual Private Network) is a virtual network that can connect offices over the Internet. The traffic that goes through the VPN is encrypted. Therefore VPN can create an encrypted channel that allows to securely exchange data over the Internet. Companies use VPN in order to allow the employees to connect to the intranet. It also creates an opportunity to virtually connect two or more branches of the company.
As an alternative, it is also possible to use VPN to anonymize the web traffic. The traffic can be partially anonymized when it is routed through the VPN server. Thus, a website can only see that the query comes from the VPN server but can not detect who really made the query. In this sense, VPN can be seen as an encrypted proxy service. Even the ISP can not read VPN traffic as it is encrypted. ISP or any other party that is able to intercept the traffic only sees that the packets are directed to the VPN server or from the VPN server back to the client. However, it is important to realize that the anonymity provided by the VPN service depends heavily on the logging policy and on the legal system in which the VPN service operates. The VPN provider almost always knows who is using the VPN service as the user has to pay for the service, and that leaves a monetary trace. In addition, the VPN service provider knows the IP of the user and probably his or her contact information. It is also possible that the VPN service logs the queries made by the user so that they can be used when there is a legal dispute. Therefore, it is preferable to choose a VPN service that does not log user activities, and that operates in a suitable legal system.
Actually, it is also possible to create a similar encrypted tunnel by using SSH.
Tor
Tor is free software that provides a method for anonymous web browsing and communication. Tor works by relaying traffic through a network that consists of over 6000 relays that volunteers host. Therefore, Tor is not centralized, and everyone can participate as a relay in the Tor network. When Tor is used, the network traffic is directed randomly through several nodes (volunteer computers who act as relays) in the Tor network. It is important to note that a separate encryption key is used for each relay node. Each relay only knows the IP address of the previous relay and the IP address of the next relay. A relay node does not have access to the contents of the traffic. In addition, a relay node does not know who is the source of the traffic and who is the exit node of the traffic. Therefore, even the first relay can not distinguish between a real source node and a relay as they are both equally likely. Only the exit nodes can see where the traffic is directed and which packets are sent back to the Tor network.
Tor directs the network traffic through the volunteer computers, and therefore the latency of the connection depends on the latency between the relay nodes. Thus, one slow node in the randomly chosen relay chain can significantly slow down the traffic. However, the speed of the Tor network is not so important as it is created for the users who would like to protect their anonymity and not for the user who would like to watch online videos. More information about Tor can be found from the Wikipedia page or from the Tor project homepage.
Tor uses Onion routing in order to direct the packets through the Tor network. Onion routing encrypts the message layer by layer, and as the packets are relayed through the network, the encryption is peeled off layer by layer. Only the last node, i.e., the exit node, is able to read the contents of the message. The following picture describes onion routing:
Using Tor provides anonymity only when the end-user knows how Tor works and is careful when using it. Probably one of the most common mistakes while using Tor is to access accounts that contain identifiable information as the user might be leaking his / her identity. In addition, one should be careful when using Tor to download files that contain active content: https://2019.www.torproject.org/docs/faq.html.en#AmITotallyAnonymous.
Several theoretical attacks have been found against Tor, but most of these attacks are not feasible in practice. Still, it is possible that some countries with lots of resources can deanonymise some Tor users.
- Low-Cost Traffic Analysis of Tor (2006)
- Compromising Tor Anonymity Exploiting P2P Information Leakage (2010)
- One Bad Apple Spoils the Bunch: Exploiting P2P Applications to Trace and Profile Tor Users (2011)
- LASTor: A Low-Latency AS-Aware Tor Client (2012)
- The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network (2014)
An attack against Tor was found in the summer of 2014; the aim of the attack was to deanonymize Tor users. An overview of this attack was written by BBC News, and a more technical overview was given by the Tor project blog. In 2015 a new attack was found that allows deanonymizing hidden services. An overview of this attack is given in the Ars Technica article. In addition, Tor users have been identified by zombie cookies. This method is described in a document leaked from the NSA: 'Tor Stinks' presentation.
A good overview of Tor is provided by the MIT course 6.858 Computer Systems Security. The video lecture is made available through the MIT OpenCourseWare program. The lecture is given by Nick Mathewson, who is one of the main developers of Tor. The video helps to clarify why a service like Tor is necessary and how Tor functions.
I2P
Invisible Internet Project (I2P) is a darknet like Tor that allows communicating securely by using pseudonyms. I2P provides layered encryption where the identity of the endpoints is not revealed. In I2P network ,all parties (nodes) are identifiable only by their cryptographic pseudonyms. More information about I2P can be found on the I2P homepage.
Further reading
- GDPR
- HTTP cookie
- Identifying and fingerprinting users
- Tracking, deanonymizing, privacy
- Exposing the Hidden Web: Third-Party HTTP Requests On One Million Websites (2015)
- De-anonymizing Web Browsing Data with Social Networks (2017)
- I never signed up for this! Privacy implications of email tracking (2017)
- (Cross-)Browser Fingerprinting via OS and Hardware Level Features (2017)
- How a single line of computer code put thousands of innocent Turks in jail (2018)
- Fitness app Strava lights up staff at military bases (2018)
- Third Party Tracking in the Mobile Ecosystem (2018)
- Good News for People Who Love Bad News: Centralization, Privacy, and Transparency on US News Sites (2019)
- Privacy Policies over Time: Curation and Analysis of a Million-Document Dataset (2020)
- A Day in the Life of Your Data (2021)
- Large-scale Analysis of DNS-based Tracking Evasion - broad data leaks included? (2021)
- The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion (2021)
- Bugs in our Pockets: The Risks of Client-Side Scanning (2021)
- Security and Privacy Risks of Number Recycling at Mobile Carriers in the United States (2021)
- Are iPhones Really Better for Privacy? A Comparative Study of iOS and Android Apps (2022)
- Blocking JavaScript without Breaking the Web: An Empirical Investigation (2023)
- VPN
- Which VPN Services Take Your Anonymity Seriously? (2016)
- Why You Should Start Using a VPN (and How to Choose the Best One for Your Needs)
- Majority of Android VPNs can’t be trusted to make users more secure (2017)
- An Analysis of the Privacy and Security Risks of Android VPN Permission-enabled Apps (2016)
- Tor
- https://www.torproject.org/
- https://en.wikipedia.org/wiki/Tor_
- Low-Cost Traffic Analysis of Tor (2006)
- Compromising Tor Anonymity Exploiting P2P Information Leakage (2010)
- One Bad Apple Spoils the Bunch: Exploiting P2P Applications to Trace and Profile Tor Users (2011)
- LASTor: A Low-Latency AS-Aware Tor Client (2012)
- The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network (2014)
- Tor attack may have unmasked dark net users (2014)
- Advanced Tor Browser Fingerprinting (2016)
- Ultrasound Tracking Could Be Used to Deanonymize Tor Users (2017)
- When A Small Leak Sinks A Great Ship: Deanonymizing Tor Hidden Service Users Through Bitcoin Transactions Analysis (2018)