Public-key cryptography
Overview of public-key cryptography
Unlike symmetric key encryption, where a single key is used for both encryption and decryption, public key cryptography (PKC) uses two separate keys - a public key and a private key (the latter is also called a secret key).
A private key is used for decryption and signing, and it must be kept secret. A public key is used for encryption and verifying signatures given with the corresponding private key. Hence, different keys are used for opposite functions: encrypting/decrypting, signing/verifying a signature. Here comes the alternative name of public-key cryptography: asymmetric encryption.
Private and public keys are generated in pairs, and they are linked by the underlying mathematical formula. However, a public key does not reveal information about the corresponding private key, and so it cannot be misused.

Public key cryptography is based on making the public key public. It makes it possible to encrypt messages meant for the key pair owner and verify if a signature is given by the corresponding private key. Hence the name public key cryptography.
The security of public-key cryptosystems is based on mathematical hardness assumptions. For example, the common public key cryptosystem RSA uses the assumption that large numbers are hard to factor (if they have only a few large prime factors). Cryptosystems DSA and ElGamal are based on the fact that discrete logarithms are hard to compute in cyclic groups. Public key systems based on elliptic curves (Estonian: elliptkõverad) are based on the infeasibility of division of elliptic curves over finite fields.
New RSA factoring records were broken in 2019 and 2020. In 2019 a 795-bit RSA number was factored and in the beginning of 2020 the same was done with a 829-bit value (news story). While there is speculation that 1024-bit RSA keys could be factored, there are no such experiments to refer to.
Like in symmetric cryptosystems, the key length in public key cryptosystems is expressed in bits. Key length shows how hard it is for an attacker to crack the cryptography, i.e. "solve" the underlying hard problem that the cryptosystem is based on. As public-key cryptography is based on hardness assumptions, the keys used here are much longer than the ones used in symmetric cryptography.
| Security level in bits | RSA/DLOG | EC | 
|---|---|---|
| 48 | 480 | 96 | 
| 56 | 640 | 112 | 
| 64 | 816 | 128 | 
| 80 | 1248 | 160 | 
| 112 | 2432 | 224 | 
| 128 | 3248 | 256 | 
| 160 | 5312 | 320 | 
| 192 | 7936 | 384 | 
| 256 | 15424 | 512 | 
Table. Key length (in bits) for obtaining a required security level when using RSA, a cryptosystem based on discreet logarithm hardness assumption (DLOG) or elliptic curves (EC). The key length for symmetric cryptosystems is equivalent to the security level. Source: ECRYPT II Yearly Report on Algorithms and Keysizes, http://www.ecrypt.eu.org/
| RSA/DLOG | Security level in bits | 
|---|---|
| 512 | 50 | 
| 768 | 62 | 
| 1024 | 73 | 
| 1536 | 89 | 
| 2048 | 103 | 
Table. The symmetric encryption key length equivalent for most used RSA and discrete logarithm (DLOG) based public key cryptosystems. Source: ECRYPT II Yearly Report on Algorithms and Keysizes, http://www.ecrypt.eu.org/
Public key cryptography is used in:
- Estonian ID-card - contains two key pairs
- HTTPS protocol - used in SSL / TLS e.g., for key agreement
- PGP - provides end-to-end security for emails
- SSH - protocol that allows to securely create a terminal connection
- Bitcoin - virtual currency
- DNSSEC - provides integrity for DNS
Encryption
In public-key cryptography, public-key is used to encrypt messages, and private key is used to decrypt them. This means that encryption is targeted: everybody is able to encrypt a message as the public key is published. However, only the owner of the private key is able to decrypt these messages.
It is important to understand that public-key cryptography is much more resource-intensive (mainly CPU time) than symmetric cryptography. Therefore, for encrypting large messages, a hybrid encryption system is used. First, a random key is generated for the symmetric encryption algorithm (e.g. AES), which is used to encrypt the message. Next, the generated symmetric encryption key is encrypted with the public key and bundled with the encrypted message. To decrypt the message, the private key is first needed to decrypt the symmetric encryption key, and then the symmetric key can be used to decrypt the message.
Digital signature and hash functions
If keys are used in the opposite order: private key for encryption and public key for decryption, then this is called digital signing.
As mentioned before, operations in public-key cryptography, especially operations that use the private key (e.g. signing), are slow. Therefore, instead of signing the (potentially large) file, only its hash (Estonian: räsi) is signed.
Cryptographic hash functions
A hash function (Estonian: räsifunktsioon) is a deterministic algorithm that takes an arbitrary amount of data as input and produces a fixed-length output. Consequently, each possible output of a hash function can be produced by an infinite number of different inputs. It is important that similar (but still different) inputs give non-similar outputs.
Hash functions are widely used in cryptography, but a cryptographic hash function is restricted by the following requirements:
- One-wayness: a hash function is a one-way function, i.e. it is not possible to learn (any possible) input from a given output.
- Second pre-image resistance: Given an input value and its hash, it is hard to find any other input value that gives the same hash value.
- Collision resistance: It is hard to find a pair of different input values that yield the same hash value.
The abovementioned three properties of cryptographic hash functions make it possible to use them for digital signatures, where the hash is used as a reference to the file to be signed. When signing a document, a hash value (also called digest) is first computed from the document. This hash value is then encrypted with the private key of the signer, and the result is bundled with the document. To verify a given signature, the encrypted hash value is decrypted with the public key of the signer. A new hash value (using the same hash function) is then computed from the document and compared with the decrypted hash value. If the hash values match, it means that the document has not been changed after it was signed, and so the signature is valid.
Most used hash functions are MD5, SHA-1, SHA-2 family (SHA-224, SHA-256, SHA-384, SHA-512) and SHA-3 (Keccak). However, MD5 was shown not to be collision resistant in 2004, which means that it should not be used anymore for security critical tasks. SHA-1 should not be used in new applications due to a collision being found in 2017.
It is a non-trivial task to find a collision in a cryptographic hash function due to the collision resistance property. However, when a collision is found, it means that someone has managed to find two different inputs to a hash function that give the same output value. Such a possibility could be used for fraud. For example, it would be possible to ask for a signature to one of the colliding documents (i.e., to the hash value of the document), which would effectively also sign the other colliding document. Thus, after a collision is found, the corresponding cryptographic hash function is deemed insecure.
However, a collision does not invalidate the second pre-image resistance property. While both MD5 and SHA-1 are not collision-resistant, their second pre-image resistance property still holds as no one has been able to find a collision for a fixed input. Thus, in the case of MD5 and SHA-1, it is not possible to create a forged signature given a fixed hash value that is taken from an existing signature. Thereby, currently, the digital signatures issued with SHA-1 are still safe. It is not known whether the second pre-image resistance property of SHA-1 will be broken, but in general, attacks tend to improve over time. Thus, with the current knowledge, it can not be ruled out that sometimes in the future, the second pre-image resistance property could also be broken. For this reason, the digital signatures that were given with SHA-1 should be refreshed (resigned/timestamped) by relying on a more secure hash function.
An example: Hash values for the input sentence "cryptographic hash function is one-way":
| Hash function | Output length | Hash of the sentence | 
|---|---|---|
| MD5 | 128 bits | 217c7df9519a23f0f0a591b582a1f5ef | 
| SHA-1 | 160 bits | e9170bbca1c8085ee5d72852bde474296a506f7c | 
| SHA-256 | 256 bits | ba98a1a3e0c7cca08a12039f97d224c734aa88ba84b11733c83f207927399611 | 
Quantum computers and public-key cryptography
Quantum computers could break most of today's public key cryptography. Quantum computers use quantum effects and therefore are able to solve some computational tasks much more efficiently compared to traditional computers. E.g., integer factorization is a hard problem that is not known to be efficiently solvable on traditional computers. Such factors have to exist as all natural numbers bigger than one can be uniquely represented either as a prime or multiplication of primes. This is a result of the fundamental theorem of arithmetic.
However, quantum computers can solve the integer factorization problem efficiently by using quantum effects. The quantum algorithm for efficiently factoring integers on quantum computers is called Shor's algorithm. It could be used to break the RSA cryptosystem. However, in order to break a 2048-bit RSA key, one would need a quantum computer with more than ten thousand quantum bits. This number might be significantly higher depending on the required error correction codes.
The second important algorithm for quantum computers is Grover's algorithm, which improves the speed of brute force attacks. Trying all possible combinations with this algorithm can reduce the number of attempts by the square root of the number of all possible combinations. For example, in an ideal setting, Grover's algorithm could be used to brute force a 128-bit AES key with {$ 2^{64} $} operations (in practice, it may not be trivial to implement the algorithm efficiently). A 64-bit key is too weak, and thus NSA suggests US government institutions to use AES with 256-bit keys. Even if powerful quantum computers would appear in the near future, then they would not be able to brute force 256-bit keys. By applying Grover's algorithm, the security level would be reduced to 128-bits, which is still unbreakable.
It is estimated that quantum computers will become useful in 10-30 years. Therefore, it is important to start using public key algorithms that are secure even when quantum computers emerge. This problem is discussed in the following article: The Tricky Encryption That Could Stump Quantum Computers.
In recent years companies and universities have invested a lot of resources into developing quantum computers. IBM introduced a new quantum computer in 2017, which had 50 quantum bits. In the beginning of 2018, Google introduced a quantum computer with 72 quantum bits. In 2019 IBM revealed a quantum computer with 53 quantum bits. In September 2020, IBM revealed its roadmap for building a quantum computer having over 1000 quantum bits by 2023.
However, the existing quantum computers are not yet practical due to the amount of errors that come as a side-effect. Thus, error-correcting codes have to be applied in order to get correct computation results. Unfortunately, error correction takes up a lot of quantum bits. Therefore, research is being focused on reducing the amount of errors and fixing the errors with efficient error correction codes.
 
   
 IBM's 50 quantum bit quantum computer (on the left) and Google's 72 quantum bit chip on the right, Sources: IBM's quantum computer ja Google's chip.
 
 Google's estimate for the ratio of quantum bits to effective error connection codes.
Source: https://research.googleblog.com
Public-key infrastructure
Some of the most common questions regarding the usage of public-key cryptography are:
- How do I know that the web page I'm visiting is authentic, i.e. controlled by the entity it is supposed to be controlled by?
- How does a computer know whom the electronic ID (e.g. Estonian ID-card) belong to?
This describes a common problem known as the "key exchange problem". By design, public keys are supposed to be public and available for everyone. However, how can you verify that a public key (that is technically just a bunch of numbers) you found somewhere on the internet really belongs to the person/company you think it belongs to? If an attacker can substitute the real public key with his own, he can decrypt all the messages encrypted with the public key.
To tie a key pair to a specific person (or a company, a website), some textual information is bundled together with the public key. A public key together with the identifying information is called a certificate. The textual information on a certificate can be person/company name, contact information, validity information, usage constraints, etc. Like public keys, certificates are published. The private key is still connected with the public key in the certificate, but it is not part of the certificate and is kept secret.
Now we have a new problem - anybody can take his public key and create a certificate stating that he is the president of the United States. Technically, everybody could do that, but probably nobody would believe such certificates. To make a certificate trustworthy, a trusted third party must approve that certificate. For digital certificates this trusted party is called a certification authority (CA, Estonian: sertifitseerimiskeskus) and it approves a certificate (public key + information) by signing it with its private key.
Example: The general information from the web certificate of Facebook:
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            0d:c1:b3:ae:80:0a:66:a2:c8:9d:b1:3e:a5:2f:cf:80
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, O=DigiCert Inc, OU=www.digicert.com, CN=DigiCert SHA2 High Assurance Server CA
        Validity
            Not Before: Aug 24 00:00:00 2019 GMT
            Not After : Oct 19 12:00:00 2019 GMT
        Subject: C=US, ST=CA, L=Menlo Park, O=Facebook, Inc., CN=*.facebook.com
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:86:75:55:00:7c:30:35:f3:66:60:81:8e:bc:1e:
                    fe:e3:6f:40:22:1a:d6:63:38:4b:38:d2:cd:7e:48:
                    e0:46:bd:ee:11:c4:6a:9c:80:bf:7f:c0:43:5f:59:
                    12:15:5c:9e:1f:4c:24:9a:9b:ee:52:c3:7b:7d:41:
                    1f:4e:e8:a4:1a
                ASN1 OID: prime256v1
                NIST CURVE: P-256
This means that the certification authority also has its own private key and a certificate. Moreover, this CA's certificate may be signed by one of the even more prestigious certification authorities. This yields a tree-like trust relationship structure where the root certification authorities are widely trusted parties. Their certificates are signed by themselves (self-signed certificates). This is the public key infrastructure (PKI).

Further reading
- For improvements and specialisations on classical public-key encryption, as described here, we advise to read the Beyond public key encryption by Matthew Green. His post covers identity and attribute-based encryption as well as functional encryption.
- Findining collisions for SHA-1
- Other links


