Table of Contents

## Introduction

RSA is perhaps the most well-known asymmetric cryptographic algorithm. When it was first introduced back in 1977, it revolutionised cryptography. The public-key cryptosystem, developed by creators Ron Rivest, Adi Shamir, and Leonard Adleman, uses prime factorisation to create a trapdoor function that produces both a public and private key known as a key pair. And by creating separate keys for encryption and decryption, this groundbreaking innovation removed the need to secretly and securely share a single secret key to exchange encrypted messages; with wide distribution of the public key, effectively anyone could encrypt messages to send to the owner of the key pair. And provided the private key was kept secure, a high degree of confidence that the message remained obscured to any third parties was virtually guaranteed. This made widespread encryption practicably feasible for everyone. Remarkably, four years earlier the same algorithm had already been developed in 1973 by United Kingdom Government Communication Headquarters (GCHQ) cryptographer Clifford Cocks. Due to the sensitive nature of cryptography for state and military use, however, it was classified. And it was not until 20 years after Rivest, Shamir, and Adleman's namesake was publicly released that the UK government in 1997 declassified the findings, whereupon Cocks revealed its existence along with related contemporaneous research at a conference in England. The rest, as they say, is history.

But how exactly does the RSA encryption algorithm work? That's what we're going to learn in the forthcoming paragraphs through a practical examination of the process, albeit with some artificially weak parameters; the elegance of the algorithm is in its simplicity, which is more easily grasped with small numbers to illustrate the modular exponentiation process.

First, we'll look at how the public and private components are used in the
encryption and decryption operations of a short message. Then, we'll reconstruct
how the key pair was generated, which will highlight the
importance of primes; revealing one of the primary reasons
prime number research is of key importance in public-key cryptography.
(*Pardon the pun*.)

## Encryption

Encryption is performed with the public key, which is actually comprised of two components: (1) the exponent; and (2) the modulus. In this example, these two values are 13 and 12091, respectively. We'll elucidate why and how these numbers are determined after first looking at how they're used.

Let's assume Jim wants to send the message *RSA* to Tom. First, a numerical
representation is needed, which is obtained from the ASCII decimal value
of each character.

```
>>> plaintext = 'RSA'
>>> [c for c in map(ord, plaintext)]
[82, 83, 65]
```

For the sake of this demonstration, however, because the message only contains a subset of the twenty-six uppercase letters of the alphabet, we'll base-26 encode the input string to produce a smaller number; the relevance of which will become clear when generating the key pair.

```
>>> msg = 0
>>> for c in map(lambda x: (ord(x) - 65) % 26, plaintext):
... msg = (msg * 26) + c
...
>>> msg
11960
```

The plaintext message *RSA* in its base-26 equivalent form 11960 is now ready
for encryption, which is done by raising to the public exponent before
reducing by the modulus: *ciphertext* = *plaintext* ^{ key } mod *modulus*.

```
>>> ciphertext = msg ** 13 % 12091
>>> ciphertext
7928
```

The encryption of the base-26 encoded, plaintext message *RSA* produces the
ciphertext 7928, which can now be sent securely with the reassurance that
*even if it's intercepted* it cannot be deciphered by any third-parties; only
someone in possession of the private key can decrypt the ciphertext. As a final
step, for easy transmission across any medium the message will be base-64
encoded, which requires converting 7928 to the concatenated
binary representation of each character's 8-bit ASCII value.

```
>>> bitstring = ''.join([b for b in map(lambda x: bin(ord(x))[2:].zfill(8), str(ciphertext))])
>>> bitstring
'00110111001110010011001000111000'
```

And then forming 6-bit bytes that map to one of 64 characters. If necessary,
either two or four zero bits are appended to the bit string to conform to
multiples of six, with one or two `=` signs, respectively, appended to the
encoded output string to indicate the added padded bits.

```
>>> b = [int(bin(int(bitstring[i:i + 6], 2))[2:].zfill(6), 2) + 65 for i in range(0, len(bitstring), 6)]
>>> msg = ''.join([chr(c) if c < 91 else chr(c + 6) for c in b])
>>> msg += '=' * ((6 - len(bitstring) % 6) // 2)
>>> msg
'NzkyOA=='
```

Armed with his base-26-encoded-plaintext-message *RSA* now encrypted-cum-encoded
in base-64 ciphertext *NzkyOA==*, Jim emails it to Tom.

```
$ mail -s "Classified: TOP SECRET" tomac@rusky.com
Comrade Фома,
NzkyOA==
Regards,
Comrade Джеймс
.
EOT
```

## Decryption

Upon receipt, the message first needs to be base-64 decoded by essentially
reversing the encoding process: (1) enumerate any padded bits indicated by any
`=` signs appended to the input string; (2) convert each character sans any
equal signs to its 6-bit byte binary representation; (3) concatenate the bytes;
(4) then parse the resultant bit string—minus any padded bits noted in step
one—into 8-bit numerical values that map to their corresponding ASCII
characters.

```
>>> b = ''.join([bin(c - 65 if c < 91 else c - 71)[2:].zfill(6) for c in map(ord, msg[:msg.index('=')])])
>>> b
'001101110011100100110010001110000000'
>>> ciphertext = [chr(int(b[i:i + 8], 2)) for i in range(0, len(b[:len(msg) - msg.index('=') * 2]), 8)]
>>> ciphertext = int(''.join(ciphertext))
>>> ciphertext
7928
```

With the message decoded, the ciphertext of 7928 can now be decrypted. And
just as the public key is comprised of both an exponent and a modulus, so too is
the private key. In this case the exponent is 3653; the modulus, however, is
the *same* positive integer used with the public key—12091. Correspondingly, the
same formula is used to decrypt as was used to encrypt the message:
*m* ^{ d } mod *n*.

```
>>> decrypted = ciphertext ** 3653 % 12091
>>> decrypted
11960
```

Now that the message has been decrypted, all that remains is to decode the base-26 encoded 11960 into its original ASCII characters.

```
>>> msg = ''
>>> while decrypted:
... msg = chr((decrypted % 26) + 65) + msg
... decrypted //= 26
...
>>> msg
'RSA'
```

And that concludes the plaintext-encrypt-to-ciphertext-decrypt-back-to-plaintext process. Its simplicity, while impressive, explicates how the innovation contributed to increasing the adoption of cryptography, which plays an integral role in securing the Internet.

## Key Implementation Summary

Having seen the simplicity of the RSA cryptosystem, and the convenience with
which it enables encryption, it's a timely opportunity to acknowledge and
appreciate that it was precisely this innovation that laid the foundation upon
which the entire Internet ecosystem grew into what we take for granted today.
From online banking and the entire e-commerce marketplace to the myriad data
~~ floating up there in the cloud ~~ on someone else's computer in a
~~ Google fortress ~~ data center somewhere in The Dalles, Oregon; it's all
transmitted securely thanks in large part to the TLS hybrid cryptosystem in
which the RSA asymmetric algorithm continues to play an integral part.

Except we don't see the above process because it's largely transparent to the
end user. Instead we have applications such as `gpg` that handle the
underlying mechanisms so that the above steps are distilled into:

- compose the message
- highlight the text
- select
*encrypt*from the context menu, press ⇧⌘E, or at the command line enter:

```
$ echo "RSA" | gpg --armor --sign --encrypt -r tomac -
gpg: using "ABCDEF1234567890ABCDEF1234567890ABCDEF12" as default secret key for signing
gpg: ABCDEF1234567890: There is no assurance this key belongs to the named user
sub rsa4096/ABCDEF1234567890 2018-01-16 tomac <tomac@rusky.com>
Primary key fingerprint: A1B2 C3D4 E5F6 A1B2 C3D4 A1B2 C3D4 E5F6 A1B2 C3D4
Subkey fingerprint: A1B2 C3D4 E5F6 A1B2 C3D4 A1B2 C3D4 E5F6 A1B2 C3D4
It is NOT certain that the key belongs to the person named
in the user ID. If you *really* know what you are doing,
you may answer the next question with yes.
Use this key anyway? (y/N) yes
-----BEGIN PGP MESSAGE-----
hQIMAzD6Ae+cDDnAAQ//VbPNeY6YirBTg8ARwEx1iDFsOYovfRzb1nGEUY0ZHicM
...
Z2RDGCcYcioQ7qWlzA+7ASysBV73VA+WgA==
=4679
-----END PGP MESSAGE-----
```

Or it's designed into software such as your email client or Skype, or messaging apps like Telegram or iMessage, or embedded in protocols such as SSH, HTTPS, and XMPP so the user doesn't get his hands dirty with the nitty-gritty.

Besides which, instead of encrypting the actual message with the RSA asymmetric
algorithm, it would be encrypted with a symmetric-key algorithm such as AES
(Advanced Encryption Standard), the key of which would then be encrypted with
RSA. And together—the AES encrypted message and the RSA encrypted key to decrypt
the AES encrypted message—are then sent. This is because asymmetric encryption
is far more computation-intensive and thus significantly less efficient than
symmetric encryption, which necessitates the use of a hybrid cryptosystem
that, for example, TLS (Transport Layer Security) and PGP
(Pretty Good Privacy) and virtually all multiparty cryptographic protocols use.
The data itself is symmetrically encrypted, but the key to unlock the data is
asymmetrically encrypted. This requires encrypting only 128, 192, or 256 bits
(the key) with the expensive encryption algorithm rather than *n* bytes of
data where *n* is something much greater than the key size.

In summary, encrypting a plaintext message with the RSA asymmetric cryptographic algorithm requires raising the value of the data to the power of the public exponent before reducing the result by the shared modulus value. And to decrypt the ciphertext, the same exact formula and modulus are used except the encrypted value is raised to the power of the private exponent. The elegance of the RSA primitive is evident in its simplicity; and while Computer Science is all about reducing complexity—cryptographers are masters of reduction and complexity.

## Key Generation

So how exactly are the private and public keys determined? Ingeniously, with little more complexity than their use. This is attributable to the easy to compute but computationally difficult to reverse nature of one-way functions such that only those with knowledge of the trapdoor—in this case, the private key—can invert the output back to its original form. At its core, the difficulty stems from prime factorisation; a decision problem that belongs to a certain complexity class in one of the major unsolved problems in computer science—P versus NP. This conjecture states that although a solution to an NP problem can be verified quickly, there is no known way to quickly find a solution. As will be evidenced in the forthcoming key generation process, it's trivial to check the correctness of the prime numbers used to produce the modulus; however, given the modulus and the public key, there is no known polynomial-time algorithm for finding the prime factors. This puts the problem in the NP and not P class. If, however, it is proven that P = NP—it would have a profound impact across computer science and math domains including that of momentous consequence for asymmetric cryptography. That's the theory, let's look at it in practice and step through the key generation algorithm.

The first step is to choose two prime numbers. In reality, these need to be
exceedingly high values; something on the order of several hundred digits long.
But for the purposes of this demonstration, arbitrarily weak values were used to
keep it simple, and thus the chosen primes were 107 and 113. And it is from the
product of these two primes where we get the modulus 12091 that is used with
both the private and public key. These are denoted, respectively, as *p*,
*q*, and *n*:

```
>>> p = 107
>>> q = 113
>>> n = p * q
>>> n
12091
```

Originally, the RSA specification used Euler's totient function φ(*n*) to
find the number of totatives of *n* (i.e., the number of integers lesser than
and relatively prime to *n*) which, due to the mutiplicative nature of phi, can
be computed with the formula (*p* - 1)(*q* - 1) such that φ(*n*) = 11872. More
recently, however, in order to ensure the smallest possible private exponent is
generated, the standard calls for using Carmichael's lambda function λ(*n*)
defined as the smallest positive integer *m* such that
*a* ^{ m } ≡ 1 mod *n* for all *a* ∈ ℤ/*n* ℤ satisfying gcd(*a*, *n*) = 1
(i.e., any integer *a* that is relatively prime to *n*). Which, in this case, is
easily computed by finding the least common multiple of *p* - 1 and *q* - 1 such
that λ(*n*) = φ(*n*) ÷ gcd(*p* - 1, *q* - 1) = 5936. In either case, the result
is used to determine the public key—denoted as *e* for encryption—which needs to
conform to the following two criteria:

**Requirement 1.** Public key is between 1 and φ(*n*) [or λ(*n*)]

1 <e< 11872 [5936]

**Requirement 2.** Public key is coprime to both *n* and φ(*n*) [or λ(*n*)]

eis relatively prime to 12091 and 11872 [5936]

We can use Euclid's algorithm to factor the greatest common denominator of
φ(*n*) or λ(*n*) and *k*, *k* ∈ ℕ for all *k* less than φ(*n*) or λ(*n*) to
populate a list of coprimes that satisfy both requirements.

```
>>> gcd = lambda x, y: gcd(y, x % y) if y else x
>>> [d for d in range(11872) if gcd(d, 12091) == 1 and gcd(d, 11872) == 1]
[1, 3, 5, 9, 11, 13, 15, 17, 19, 23, 25, 27, 29, 31, 33, 37, 39, 41, 43, 45, 47, 51,
...
11845, 11847, 11849, 11853, 11855, 11857, 11859, 11861, 11863, 11867, 11869, 11871]
>>> [d for d in range(5936) if gcd(d, 12091) == 1 and gcd(d, 5936) == 1]
[1, 3, 5, 9, 11, 13, 15, 17, 19, 23, 25, 27, 29, 31, 33, 37, 39, 41, 43, 45, 47, 51,
...
5909, 5911, 5913, 5917, 5919, 5921, 5923, 5925, 5927, 5931, 5933, 5935]
```

For the above demonstration, the public key was 13. But as the list shows, there
were myriad values coprime to φ(*n*) or λ(*n*) that were suitable for selection
as a result of the initial two primes of 107 and 113. And the set of all
possible public exponents only grows as the value of the prime numbers increase.
However, most RSA implementations set the public exponent to 65537, which is
`0b10000000000000001` in binary, because it's a Fermat Prime and all but
two bits are zero, which makes the comparatively cheap squaring operation more
common than the costly multiplication that's required when bits are set and thus
much more efficient. In the event the public exponent *e* is chosen first,
primes *p* and *q* are then selectively chosen so that *n* and φ(*n*) or λ(*n*)
are coprime to *e*; essentially reversing the order of these first few steps.

Before moving onto the private key, let's recap what's heretofore been done in the key generation algorithm.

```
>>> p # first of two secret primes
107
>>> q # second of two secret primes
113
>>> n # modulus that forms part of private and public keys
12091
>>> φn = (p - 1) * (q - 1)
>>> φn # phi of n used to compute private and public keys
11872
>>> lcm = lambda x, y: (x * y) // gcd(x, y) if x and y else 0
>>> λn = lcm(p - 1, q - 1)
>>> λn # Carmichael function of n (RSA successor to phi)
5936
>>> e = 13 # public key exponent
>>> gcd(e, φn) == 1 # 13 is coprime to φn (11872) (requirement 2)
True
>>> gcd(e, λn) == 1 # 13 is coprime to λn (5936) (requirement 2)
True
>>> gcd(e, n) == 1 # 13 is coprime to n (12091) (requirement 2)
True
```

This leaves the private key. But unlike the public key—or perhaps more aptly
because the public key has already been chosen—there can now only be one value
for the private key; that is, the modular multiplicative inverse of *e* with
respect to φ(*n*) or λ(*n*), which is an integer *d* such that
*de* mod φ(*n*) = 1 or *de* mod λ(*n*) = 1. The Extended Euclidean algorithm
is the usual candidate for solving this problem, but with such small values it's
trivial to find *d* with a naive conditional list comprehension.

```
>>> [d for d in range(φn) if d * e % φn == 1]
[3653]
>>> [d for d in range(λn) if d * e % λn == 1]
[3653]
>>> d = 3653
```

And that's how the private exponent of 3653 was determined. This brute force
approach, however, is rather inefficient due to iterating over the set
{1, ..., φ(n)} or {1, ..., λ(n)}, which are 11872 and 5936 iterations,
respectively. And that's with purposely weak parameters chosen specifically for
this demonstration; with a time complexity of O(*n*) this will grow
proportionally to the input. An alternative approach shown by the creators of
the RSA algorithm greatly improves efficiency. Given that *de* ≡ 1 mod φ(*n*),
then it holds that *de* = 1 + *k* φ(*n*) for an integer *k* less than *e*, so
a more efficient list comprehension uses the formula
*d* = (1 + *k* φ(*n*)) ÷ *e*, {*k* ∈ ℤ : *k* < *e*} where the set of integers is
limited to 1 ≤ *k* ≤ *e* (i.e., from 1 to 13); this results in significantly
fewer iterations and a time complexity of O(log *n*).

```
>>> [d // e for d in map(lambda x: 1 + x * φn, range(e)) if d % φn == 1 and d % e == 0]
[3653]
```

And the same formula holds for the new RSA standard using Carmichael's function:
*d* = (1 + *k* λ(*n*)) ÷ *e*, {*k* ∈ ℤ : *k* < *e*}.

```
>>> [d // e for d in map(lambda x: 1 + x * λn, range(e)) if d % λn == 1 and d % e == 0]
[3653]
```

For the sake of completeness, however, let's also step through the extended
Euclidean algorithm to express the greatest common denominator of the integers
φ(*n*) and *e* as the linear combination *c* φ(*n*) + *de* = 1 to find the
modular inverse of *e*. According to Bézout's identity, for non-zero integers
*a* and *b*, if gcd(*a*, *b*) = *r*, then there exist two integers *s* and *t*
so that *sa* + *tb* = *r*. And because *r* = 1, and the modular multiplicative
inverse of an integer *b* is an integer *t* such that *tb* is congruent to 1
with respect to the modulus *a*, we know that *tb* ≡ 1 mod *a*, such that
*t* (reduced mod *a* if *t* is negative) is the modular inverse of *b*.
Therefore, in the linear combination *c* φ(*n*) + *de* = 1, *d* is the inverse
of *e*, and our private key.

```
11872 = 913(13) + 3 # Euclidean algorithm till gcd(a, b) = 1
13 = 4(3) + 1 # then extended Euclidean algorithm by
1 = 13 - 4(3) # back substitution of each remainder
1 = 13 - 4(11872 - 913(13)) # with equivalent expanded terms and
1 = -4(11872) + 3653(13) # simplify till 1 = cφ(n) + de
```

And the updated RSA standard using Carmichael's function λ(*n*) = 5936.

```
5936 = 456(13) + 8
13 = 1(8) + 5
8 = 1(5) + 3
5 = 1(3) + 2
3 = 1(2) + 1
1 = 3 - 1(2)
1 = 3 - 1(5 - 1(3))
1 = 2(3) - 1(5)
1 = 2(8 - 1(5)) - 1(5)
1 = 2(8) - 3(5)
1 = 2(8) - 3(13 - 1(8))
1 = 5(8) - 3(13)
1 = 5(5936 - 456(13)) - 3(13)
1 = 5(5936) - 2283(13)
```

And because -2283 is negative, reduce by mod λ(*n*) to find the inverse, which
is the same value of 3653. Euclid's extended algorithm can also be written in
Python to find the greatest common denominator of *a* and *b* as well as
Bézout's coefficients *s* and *t*, which are efficiently computed with a time
complexity of O(log *n*).

```
>>> ext = lambda a, b: (lambda d, x, y: (d, y, x - (a // b) * y))(*ext(b, a % b)) if b else (a, 1, 0)
>>> ext(φn, e)
(1, -4, 3653)
>>> ext(λn, e)
(1, 5, -2283)
>>> -2283 % λn
3653
```

This can be further simplified to return just the private exponent (i.e., the
modular multiplicative inverse of *e* alone), which is the only Bézout
coefficient of interest in the RSA algorithm.

```
>>> modinv = lambda x, y: 0 if x == 0 else 1 if y % x == 0 else y - modinv(y % x, x) * y // x
>>> modinv(e, φn)
3653
>>> modinv(e, λn)
3653
```

It's worth noting that although in this instance λ(*n*) generates the same
private key as φ(*n*), this isn't always the case; for example, consider the
following scenario with primes 113 and 127, using the public key of 23.

```
>>> p, q = 113, 127
>>> n = p * q
>>> n
14351
>>> φn = (p - 1) * (q - 1)
>>> φn
14112
>>> λn = lcm(p - 1, q - 1)
>>> λn
1008
>>> [d for d in range(φn) if gcd(d, n) == 1 and gcd(d, φn) == 1 and gcd(d, λn) == 1]
[1, 5, 11, 13, 17, 19, 23, 25, 29, 31, 37, 41, 43, 47, 53, 55, 59, 61, 65, 67, 71, 73,
...
14083, 14087, 14089, 14093, 14095, 14099, 14101, 14107, 14111]
>>> e = 23
```

The number 23 is a viable public exponent as it is coprime to both φ(*n*) and
λ(*n*). However, the inverse of 23 mod φ(*n*) is not the same as the inverse of
23 mod λ(*n*):

```
>>> modinv(e, φn)
4295
>>> modinv(e, λn)
263
>>> ext(φn, e)
(1, -7, 4295)
>>> ext(λn, e)
(1, -6, 263)
```

That's two distinct private keys from the same primes *p* and *q*, and the same
public key *e*. And yet both private exponents 4295 and 263 decrypt values that
are encrypted with the same public exponent of 23.

```
>>> 11960 ** e % n
1877
>>> 1877 ** 4295 % n
11960
>>> 1877 ** 263 % n
11960
```

Which raises the question: for what fraction of the set of private keys currently in use are there secondary, albeit unknown, private keys? Practically speaking, it wouldn't change much; without knowing which prime numbers were used to generate the key pair, you're just as likely to crack the unknown (to the owner) private key as you are to crack the key that is known to the owner. But it's an interesting artefact of the RSA primitive nonetheless.

## Key Generation Summary

And that's how all the components that are used in the RSA asymmetric cryptographic algorithm are generated. It's indicative of its ingenuity that the process can be summarised in five simple steps and less than 100 words:

**1.** Select two prime numbers

p= 107

q= 113

**2.** Compute the modulus *n*

n=pq= 12091

**3.** Compute Phi of *n* (or Carmichael's function of *n*)

φ(

n) = (p- 1)(q- 1) = 11872λ(

n) = lcm(p- 1,q- 1) = 5936

**4.** Choose the public key *e* such that:

- 1 <
e< φ(n) [or λ(n)]eis coprime withnand φ(n) [or λ(n)]e = 13

**5.** Compute the private key *d* such that:

1 =

edmod φ(n) [or λ(n)]1 = 13 ⋅ 3653 (mod 11872) [or (mod 5936)]

d = 3653

That concludes this brief exposition of the RSA cryptosystem. It's a brilliantly simple approach that for the last forty years has been influential in Internet security, and continues to contribute to securing the Internet of the future.

## Comments

comments powered by Disqus