Authentication

Authentication is the act of confirming the truth of an attribute of a single piece of data (a datum) claimed true by an entity. (from Wikipedia)

There are basically three methods for authenticating a person:

Information
- Something only the user knows, e.g. a password
An item
- Something only this user owns, e.g. a smart card
Biometric data
- Something the user is, e.g. fingerprints

Single factor authentication

Usually, only one method from the above list is used to authenticate the user. The easiest is to verify the user's knowledge, e.g. ask for a password, PIN code or a pattern.

Examples:

Using password to log into webmail or study information system
Opening Android screen lock with pattern
Logging into an online banking site by using a password and a reusable code card. As this is considered insecure, low (daily) transaction limits are enforced when using single-factor authentication.

Authenticating with passwords

Secure passwords are long and randomly generated. Moreover, a unique password should be used for every service; otherwise, one leaked password may mean losing access to many or all of the used services. However, it is hard for people to memorize long and random passwords.

Source: XKCD

To illustrate the problem, take a look at a website that predicts how long it takes for one PC to crack a given password. Do not insert your real password on that site!

How do websites store and check passwords?

Next, we will give a brief overview of how a web service (and many other services) use passwords for user authentication. It is clear that the service needs to know something about a user's password to verify it. However, a service should not store users' passwords in plain text as a break-in would leak all users passwords. Instead, a service should store a hash of a password instead of the password itself.

A hash function (in Estonian: räsifunktsioon) takes an input of the arbitrary size and outputs a bitstring of fixed length. A hash function is deterministic, but its output does not reveal any information about the input.

For example, this is the SHA-256 hash value of the text "test":
sha256(“test”) = 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08

Service providers store hashes of passwords instead of the passwords themselves. If a user wants to authenticate to a service, he sends his password to the service as usual. The service then hashes the password and compares it to the hash in its database. If they match, then the user is successfully authenticated.

However, now we have a new problem - the hash function is deterministic, and instead of trying to crack the hash, an attacker can precompute hashes of common words. For example, an attacker could hash each word in the English dictionary together with some most common combinations and store the result as a (hash -> word) lookup table. Now, if a database with password hashes leaks, the attacker can just look through its database to see if any of the leaked password hashes are present in his lookup table. Hence, storing hashes of passwords instead of passwords themselves does not help if the password is a word or a simple combination of words and numbers.

To solve this problem, service providers should actually randomize the password hashes. This is accomplished by adding some random data, called salt, to the password before hashing it. With high probability, all hashes are unique, even if some users use the same password. To precompute a lookup table, an attacker now has to also take into account this randomness. However, if, for example, 64-bit salt is used, then there are 2^64 (2 to the power of 64) different salts, and the attacker would have to compute a different lookup table for each possible salt value. This is practically impossible in terms of computational power and storage.

An example of adding a salt to the hashing of "test":
sha2(“test+j2Bl”)=4cb0ccd18a4f985823c5640e97103b6c7ee23d175cffc01691baeb006773c365

Where to store the salt?

Even though this might seem insecure, the salt itself can be stored in plain text together with the salted hash value of the password. If this database leaks, an attacker would learn the salt values and can eventually crack some passwords if they use common words from dictionaries. However, the attacker can only start the guessing after seeing the leaked database as there is no way to precompute anything reasonable.

Example: Let's see how easy it is to guess a simple password by seeing just the hash. For example, if the password is only 4 characters long, it is enough to hash all 4-character passwords. Note that, in reality, the attacker would not know the length of the password based on the hash value.

Service providers make cracking the leaked password even harder by using a special password-based key derivation function instead of a hash function. The main feature of such functions is that they hash the passwords not once, but for example, 10,000 times, making it even more expensive for attackers to guess the password. Examples of such functions are Bcrypt, PBKDF2, scrypt, etc.

Examples of how user passwords must not be stored

Many service providers store users' passwords insecurely, e.g. in plain text or hashing without using salt. Some service providers do something even more interesting by, for example, encrypting the passwords. Encrypting passwords is a bad idea, as they can be decrypted if the decryption key leaks.

In August 2014, a user database of an Estonian webshop http://seemnemaailm.ee/ was leaked. The database contained plain text passwords of more than 10,000 users. A local newspaper also covered the story: Postimees: "Veebikeskkond jättis tuhanded kliendid andmelekkest teavitamata" (in Estonian). Interestingly, the newspaper story mentions encrypting the passwords as a possible solution, but this is the wrong way to protect passwords.
In spring 2014, the user database of Adobe leaked, containing information about 150 million users. Interestingly, Adobe had decided to encrypt the users' passwords instead of hashing them. This by itself is a bad idea as encrypted values can be decrypted, while hashing is a one-way operation. Moreover, they used an insecure solution for encryption (ECB operation mode) which meant that identical passwords gave identical encryptions. On top of that, the database contained password hints making this a large crossword puzzle:

Source: XKCD

Issues with passwords

As a user of a service, you have no control over how the service provider stores your password. The only thing a user can do is to use a unique and secure password for each service.

How to memorize and use a secure password? Secure (long and random) passwords are by definition hard to remember. We will talk about password management in the lab session.

Some ideas:

Memorize the passwords
Write the password down on a paper
Store passwords in a text document
Use a special password management software

Usernames and passwords are easy to copy and distribute. Secure passwords are hard to memorize, and short passwords are easy to crack. Moreover, passwords can be easily stolen by using a keylogger, a malware that listens to keyboard presses and sends them to an attacker.

Example attack: A web browser is infected with malware that intercepts the keyboard presses and sends them to an attacker.

Password managers

Password managers provide users with the means to securely handle passwords. Such software usually allows users to generate unique passwords and store information about accounts. An average internet user probably has at least tens of different accounts. This brings along the issue of managing the usernames and passwords for the aforementioned accounts. As humans are not good at remembering passwords, people who do not manage their passwords are very likely to reuse passwords. Password managers are essential in order to solve the issue of passwords being reused. There are many different applications that provide software management functionalities. Dedicated password managers provide many functionalities, but browser-based password managers may be sufficient for some tasks.

Browser-based password managers

All major web browsers ask if the user might want to save the username and password once it is entered into the login fields. When the username and password are saved, then next time, the user does not have to enter the fields again as the browser does the task. So, is this feature good for security? Actually, the security of these features depends on how password management is used. One might say that if the user can save the passwords, then the passwords could be longer and more complex. This could be true, but what happens if the user has to switch computers? Do browsers allow secure exporting and importing of the password database? What happens if a third party gets access to the exported password database, i.e., is it correctly encrypted? What happens if an attacker gets access to the computer where the passwords are saved? Can the attacker use the saved passwords to access the corresponding accounts, or might it even be possible to directly access the saved usernames and passwords?

You can find out more about Firefox's password manager from its support page: Password Manager - Remember, delete and edit logins and passwords in Firefox. Similar information can be found about Google Chrome's password manager.

Standalone password managers

Using a password manager would be similar to writing the passwords to a text file and then encrypting the file using AES with a strong password. However, special purpose password manager software has some additional benefits:

it is easy to use
browser extensions can automatically fill the login forms
it is possible to generate strong random passwords
it is not limited to one environment (this is the problem of the browser-based solution)
it might be possible to sync the database between different devices

In the following, we will focus on KeePass as this is one of the most commonly used offline password managers. There are also well-known cloud-based password managers like 1Password and LastPass, but we chose to focus on KeePass mainly for the reason that it is open-source and not cloud-based.

There are multiple applications that are able to use the KeePass file format. We recommend using KeePassXC as it is actively maintained, is cross-platform, and is open source. In addition, KeePassXC is also offered as a portable version, which does not need installation.

EFF has created good instructions, which show how to use KeePassXC: How to: Use KeePassXC. KeePassXC also has its own documentation and instructions, which can be accessed from KeePassXC: Getting Started Guide. The Estonian language instructions can be found from: https://courses.cs.ut.ee/2023/infsec/spring/Main/Keepassxc.

The third homework has a task, which involves creating a KeePassXC database. You can find detailed information and the requirements on the homework page.

Biometric authentication

Biometric authentication methods include authentication using fingerprints, retina or speech recognition.

Fingerprint scanner on a laptop.

Using biometric data for authentication by itself is not considered to be multi-factor authentication, and hence it should be used together with other authentication means, e.g. passwords or physical devices.

Two-factor authentication

To make authentication secure, something more than just the user's knowledge has to be used: either something that the user has or something that the user is. Therefore, an additional authentication factor is required. The additional factor could be a device that is owned by the user or something directly connected with the user, e.g., biometric properties.

In two-factor authentication, two items from the following list are verified:

user's knowledge
something that the user possesses
biometric data

In this lecture, we will give a brief summary of the common ways how two-factor authentication works. You can find out about other methods by reading the guide created by EFF: A Guide to Common Types of Two-Factor Authentication on the Web (2017)

Smart card-based authentication

Using a smart card (e.g. Estonian ID-card) is a two-factor authentication as it combines something physical that the user has (the card) with something that the user knows (PIN code). It is important that the private keys stored on the card cannot be copied, so they are strictly tied to the physical object.

More information about smart card-based authentication can be read from the lecture notes regarding the ID-card.

One-time passwords

Ordinary passwords may leak, and they can also be easily copied and distributed. Therefore, some systems are designed to use one-time passwords. One-time passwords are passwords that are used only once, and they can be used only if the client and the server are synchronized (we don't cover those algorithms in this course). More information about one-time passwords can be found from: https://en.wikipedia.org/wiki/One-time_password.

Now, one might have a question about the distribution of such passwords. How could the server secretly share the one-time passwords with the client? Actually, there are many options for doing this:

Delivered on paper - It is possible that the one-time passwords are sent to the client via post or delivery service. E.g., Nordea bank used to have one-time passwords which were sent to the clients by using the postal service. The new set of passwords could be used only after they had been activated. This protected the passwords from being read by a third party. If the client noticed that the special envelope is damaged, then he / she should have not activated the new set of one-time passwords.
Sent via SMS - It is a good option as almost everyone has a mobile device. The problem with this approach comes from the weakness of the encryption that is used to protect the SMS messages. In addition, when roaming is used, then the client has to trust the mobile service providers. This is also the reason why NIST is deprecating the use of SMS for two-factor authentication in their latest draft of Digital Authentication Guideline:

Note: Out-of-band authentication using the PSTN (SMS or voice) is deprecated and is being considered for removal in future editions of this guideline.

Delivered on a device - It is possible that a pre-synchronized algorithm is inserted into the device. E.g., PIN-calculators are synced in a way that allows them to generate valid one-time passwords. There are also smartphone apps for some services that allow to generate one-time passwords. The following subsection will focus on the PIN calculators and on authentication with mobile devices.

Authentication with the help of a mobile device or some other device

A mobile phone fits very well into the two-factor authentication system as it is not directly connected with the computer and has a separate communication channel (GSM, 3G, 4G). Therefore, in order to attack a two-factor authentication system that uses the mobile device as a second factor, one would also have to access or infect the mobile device. However, infecting both the computer and the mobile device would be much more difficult for the attacker, and thus it would also be much more expensive. Infecting these devices would be possible as mobile devices, especially Android smartphones, have lots of vulnerabilities, but the problem is related to delivering the malware. How is the attacker able to infect the secondary device? In some cases, the device might be connected to the computer, but this may not happen frequently and may not be enough to infect the device. Therefore, we could generalize and say that it is possible to target and infect an individual's mobile device that is used for two-factor authentication. However, targeting and infecting a large part of the population is probably too expensive. Therefore, using two-factor authentication should significantly increase your security (in case no one is specifically targeting you).

Examples of the second authentication factor:

PIN-calculators used in banks
Google's two-factor authentication
Facebook's two-factor authentication

PIN-calculator

PIN-calculator is a device that is able to generate a pseudorandom number that can be used for logging in to the online bank or for doing online bank transactions. It is important to understand how such devices work, i.e., why they are claimed to be secure. These devices have been synchronized with the bank; more specifically, the initial seed inside the PIN calculator that is used by the pseudorandom number generator is synchronized. Thus, the security of such devices depends on the specific pseudorandom number generator. In addition, it is important that the initial seed is random and it should not leak. The device itself does not have an internet connection, and it is also not connected to other devices.

So, how can the code be generated? One way is to use a time-based PIN-calculator, which uses the current time and the shared value to generate the codes. Another approach is to use a hash function such that the synchronized value is hashed in some predetermined way. For example, the bank and the PIN calculator could have a counter in addition to the shared secret value. Thus, each time the code would be generated, the counter would be increased on the device, and each time the code is correctly used, then the bank would increase its own counter value. We see that there should be a window of accepted codes just if a few codes were not entered in time. This would allow using the PIN-calculator in such situations and let the bank re-synchronize the counter value if the next code was valid. The window of accepted codes should be quite small to prevent random guesses. The hash function based code generation could also be used differently - the shared secret value could be hashed to create a hash chain. This way, the PIN-calculator would hash the secret value, then hash the hash of the secret value, etc. Now the first authentication would be done with the last value in the hash chain, the next authentication with the previous value, etc.

The pseudorandom code that the PIN-calculator generates can be used only once, i.e., it can not be used to log in to the online bank twice. Therefore, even malware can not really benefit from copying the code if the user has already used the code. Malware is only able to use the code if it is faster than the user, i.e., if it can log in with the code before the user.

What happens if a stranger finds the PIN calculator or if it gets stolen? Actually, if the device is protected with a strong PIN-code, then no one can use the calculator as it has to be unlocked before it can be used. The device only allows entering three wrong PIN codes before it gets blocked. Therefore, using a PIN-calculator is considered to be a quite secure two-factor authentication method. Therefore, the major banks in Estonia have not set daily transaction limits when the client authenticates himself / herself with a PIN-calculator.

Authenticating with a security key

A physical security key allows to generate and carry the private keys inside of a special hardware device. Such devices are small and usually connect over USB or NFC. Their main idea is similar to the ID-cards, to use the private key, then physical access is needed to the hardware device. Thus, malware can not copy the credentials or the private key.

You can get a good overview of using such devices by reading the following guide: Getting started with security keys. Multiple companies sell such physical security keys, but one of the most well-known ones is Yubico, which is a Swedish company. They claim to produce their devices in Sweden and the USA. If the claim is true, then the supply chain risks are significantly lowered. In case you wish to start using such devices, it is recommended to buy two of them. The other one is usually configured as a backup device in case something happens with the main authentication device.

Google two-factor authentication

Google's two-factor authentication requests the user to enter a verification code when the user is trying to log in from a new web browser (or a new computer). The verification code is sent to the user's mobile phone via SMS or a special smartphone app to verify if the user has access to his phone (or its SIM card). Also, it is possible to let Google call you and read out the verification code.

When using such a two-factor authentication mechanism, an attacker who has stolen your password (e.g. by using a keylogger) still cannot access your account as he does not have access to your phone. More information about Google's two-factor authentication: http://www.google.com/landing/2step/.

After enabling Google's two-factor authentication, it is important to enable backup authentication methods in case something happens with your phone or its SIM card. For this, you can write down the single-use backup codes that Google generates in order for you to access your account if your phone gets lost or stolen. These codes must be kept secret.

One will have to generate application-specific passwords for applications that do not support two-factor authentication but use the Google account. Such applications may include, for example, your desktop mail client. Application-specific passwords can be managed from: "Account -> Security -> 2-step verification -> Manage your application-specific passwords".

Facebook two-factor authentication

Facebook's two-factor authentication works similarly to Google's by requiring the user to enter a verification code sent to the user's phone via SMS. This extra verification step is required only when logging in from a new web browser or computer.

Other websites and services

In addition to Google and Facebook, many other service providers also support two-factor authentication, e.g. Microsoft, Apple, Twitter, WordPress. The list of services and websites, which support two-factor authentication is available from https://twofactorauth.org/.

Passkeys

Passkeys aim to replace passwords for authentication and use public-private key pairs instead. However, the key management is made transparent for the users. With passkeys, user's device generates a new key pair for each website that supports this technology. The public key is shared with the website and private key is securely kept on the user's device and/or backed up and synchronised using the platform provider's cloud solution. Access to the local private key is verified by on-device authentication, usually with biometrics or PIN code.

The underlying technology is not new (it uses FIDO and Webauthn), but for passkeys the big platform providers (Apple, Google, Microsoft) have agreed on a common user experience to make it seamless for the end users. Passkeys support started rolling out for major platforms in 2022. See this Youtube video for a demo os using Passkeys on different platforms.

Authenticating via the third party

We already mentioned that in addition to using secure, randomly generated passwords, it is important to use a different password for each service. At the same time, many passwords are difficult to remember. Using password managers is one solution, but there are also other options. For example, some web services allow users to authenticate themselves using an account at some other service provider. In this case, users do not have to create another user name and password for this web page.

OpenID

OpenID is one of the first solutions that allow users to authenticate themselves to a web service using some third party. In addition to the user and the web service, there is a third party - OpenID service provider - that is trusted by the web service requiring authentication. The authentication process itself is as follows: (source):

User goes to the web service that he wants to log into and enters his OpenID handle (usually an e-mail address or a domain name)
The web page redirects the user to the log-in page of the chosen OpenID service provider, where the user has to authenticate himself to this OpenID service provider. If the user is already authenticated at the OpenID service provider, this step is done automatically.
OpenID service provider asks the user if he wants to authenticate to the given web page and if he agrees that the OpenID service provider shares some information about the user with the web page. The information that has to be shared depends on the web page, but usually, it is an e-mail address, full name and/or profile picture, etc.
If the user accepts, then the OpenID service provider contacts the web page directly, shares the requested information and confirms that the user is successfully authenticated at the OpenID service provider.

To use OpenID, the following conditions have to be met:

the web page where the user wants to log into must support at least one OpenID provider;
the user must have an account at one of the supported OpenID service providers.

OpenID service providers have been, among others: Google, Yahoo!, Facebook, AOL, PayPal. Additionally, it is possible to use a personal homepage domain as an OpenID handle (that redirects to some other supported OpenID service provider). All connections between the parties should be encrypted. Lately, the number of OpenID providers have decreased as service providers switch to a more flexible OAuth/OpenID Connect framework.

In Estonia, there is an OpenID service provider OpenID.ee that allows users to authenticate themselves with an ID-card or mobile-ID. Web pages that trust this OpenID service provider get a more strict (cryptographically secure) guarantee that the provided information (full name and personal identification code) is real.

OAuth 2.0 and OpenID Connect

OAuth is a similar framework that allows user to authorise a service to act on his behalf at a third-party - OAuth service provider. As such, OAuth is actually not meant for authentication (identification) but rather for authorization (giving permissions). To also enable authentication, a separate layer, OpenID Connect, is built on top of OAuth 2.0.

As OAuth is meant for authorization, its usage differs from that of OpenID, although it seems quite the same for the end-user. In the case of OpenID, a user asks the OpenID service provider to confirm his identity to the web page. In OAuth, the user authorized the web page to ask for information from the OAuth service provider. Here is the difference, in addition to allowing requests about the user's identity, the user may also authorize the web page to access other resources from the OAuth service provider on his behalf. These resources can be contacts, calendars, photos, or whatever else the OAuth service provider stores about the user. The exact list of shared resources is confirmed during the authorization process, and the user can accept or reject it.

OAuth/OpenID Connect service providers are services that have other resources in addition to the user's identity to share, e.g. photos, contact list, etc. Some providers are Google, Facebook, Flickr. See Wikipedia for a partial list.

Why use OpenID or OAuth/OpenID Connect?

For a web page, authenticating its users with a third party means that the web page does not have to store a separate users database. Hence, no password hashes can be leaked as the web page just do not have those. Moreover, OpenID and OAuth service providers are recognized web services that have more knowledge and resources to protect their users' information. A small company behind a web page may not have that.

For a user, OpenID and OAuth provide a way to memorize (or manage) fewer passwords and identities. However, it must be noted that both the web page and user have to trust the identity service provider. The web page has to trust that the identity provider does not lie about the users' identities, and the user has to trust the identity provider not to misuse his identity.

Additional materials - MIT lecture about authentication

An overview of user authentication is given by the MIT course 6.858 Computer Systems Security. The following video lecture is made available through the MIT OpenCourseWare program. It is not compulsory to watch the video; it is extra material for the students who would like to get more information about user authentication.

Infoturve 2023/24 sügis