This repository contains the course materials from Math 157: Intro to Mathematical Software.
Creative Commons BY-SA 4.0 license.
License: OTHER
Math 157: Intro to Mathematical Software
UC San Diego, winter 2018
February 14, 2018: Number theory and cryptography (part 2 of 3)
Administrivia:
Schedule change for this week: due to a schedule conflict, my office hours will take place Thursday 3-4 instead of 4-5.
Public-key cryptography
For most of human history, cryptography has involved using a secret key to encrypt a message to keep its contents away from eavesdroppers. The sender would use the secret key (much like a physical key) to transform the message into something unintelligible, and the recipient would use the same key (but possibly in another manner, much like turning the key the opposite way in a lock) to un-transform the message back into its original form.
Starting in the 1970s, it was discovered that one could use ideas from number theory to implement public-key cryptography, in which the information required to encrypt and decrypt messages is not the same! This is more like a mailbox (or homework dropbox): the encryption key can be made public while the decryption key remains safely secret. (One can also invert the paradigm to make digital signatures, where the secret key is required to sign, but a public key is required to decrypt.)
Public key cryptography is now used widely as part of Internet security; for instance, it underlies the https protocol that web sites like CoCalc use to deliver content to your browser. The signature paradigm is also closely related to the concept of a blockchain, which underlies cryptocurrencies and other emerging e-banking protocols.
There are three major public key techniques in use today:
RSA (Rivest-Shamir-Adleman), based on the difficulty of factoring large numbers;
DH (Diffie-Hellman), based on the difficulty of computing discrete logarithms in the ring of integers modulo a large prime ;
ECDH (elliptic curve Diffie-Hellman), based on the difficulty of computing discrete logarithms in the group of point on an elliptic curve.
While these can all be used as honest cryptosystems (i.e. protocols for exchanging messages securely), they are all are typically used for secure key exchange. Say Alice and Bob want to communicate with each other. (For some reason, examples of cryptography always involve Alice and Bob.)
First, they use the key-exchange protocol to construct a shared secret key .
Then they use symmetric key cryptography (e.g. AES) to actually communicate. (This typically turns out to be more efficient in practice than directly using a public-key method.)
This talk will focus on constructing (and breaking) Diffie-Hellman key exchange.
Discrete logarithms
Let be a prime. Last time, we defined the notion of a primitive root modulo ; this is an integer such that the multiplicative order of mod is equal to the maximum possible value of . In other words, the quantities exhaust all of the nonzero residue classes modulo .
Suppose that is a primitive root mod . Then for any not divisible by , the discrete logarithm of mod , with respect to the base , is the integer such that .
This looks straightforward enough, but it is actually very difficult to compute discrete logarithms! There is no obvious trick for this.
As a result, the function is often treated as a one-way function: it is easy to compute but hard to invert. (This is a practical definition, not a formal mathematical one, because we (probably) cannot say for sure that it is actually hard to compute, as opposed to our merely being ignorant.)
The difficulty of integer factorization behaves in a similar (but slightly more complicated) way: if are two prime numbers, then it is easy to form the product but much harder to recover and from the product. We'll see next time how the RSA method derives its security from this state of affairs.
Aside: quadratic residues
Although this does not play a direct role in Diffie-Hellman, this topic arises naturally at this point and is used in another way in the homework, so here it is.
For a positive integer, an integer is a quadratic residue mod if there exists a perfect square congruent to modulo .
For example, say . To test whether a perfect square is congruent to modulo , we need only test . Their residues modulo are .
For an odd prime, the list of quadratic residues mod will include 0 plus other values. (If we run over , no residue can occur more than twice: if , then is divisible by , and so one of or must be.)
For an odd prime and not divisible by , is a quadratic residue modulo if and only if its discrete logarithm (with respect to any base) is even.
The Legendre symbol is defined to be if , if is a nonzero quadratic residue modulo , and if is not a quadratic residue mod . This symbol has the multiplicativity property: (The less-than-obvious part is that the product of two quadratic nonresidues is a quadratic residue. This would fail if were not prime.)
The law of quadratic reciprocity (originally proved by Gauss) states that if are distinct odd primes, then That is, the two Legendre symbols agree if either or is congruent to 1 mod 4, and disagree if they are both congruent to 3 mod 4.
If is not prime, you can apply quadratic reciprocity by first factoring . But this is not necessary: if one defines the Kronecker symbol to extend the Legendre symbol via the rule then the formula extends to all odd positive integers .
This still leaves the prime 2, but there is another formula of Gauss to handle this case: That is, if is an odd prime, then 2 is a quadratic residue mod if is congruent to 1 or 7 mod 8, and a quadratic nonresidue mod if is congruent to 3 or 5 mod 8.
Upshot: you can compute Legendre (and Kronecker) symbols quickly using a form of the Euclidean algorithm!
Discrete logarithms are hard!
The main difference between the ordinary logarithm and the discrete logarithm is that, while the former is always easy to calculate, the latter is much more difficult. Let's try doing a discrete logarithm on a much larger field.
From discrete logs to Diffie-Hellman
Next, let's use our machinery to construct a Diffie-Hellman key exchange protocol: here are the steps.
Alice and Bob (publicly) agree on a large prime , modulo which all computations will take place, and a generator of (i.e., a primitive root mod ).
Alice chooses a secret, random integer between and , while Bob chooses a secret random integer in the same range.
Alice (publicly) transmits to Bob, while Bob (publicly) transmits to Alice.
Alice and Bob both compute , and use that value as their shared secret.
Now, if Eve wants to find their shared secret, she has to be able to compute only knowing , , and . This is called the Diffie-Hellman problem, and is believed to be as hard as computing discrete logarithms in general.
You should have all the tools to make this yourself: please fill in the outline below.
And that's it! Alice and Bob have a shared secret, which they can use to send messages.
Attacking Diffie-Hellman: a generic approach
We're going to consider two attacks on Diffie-Hellman key exchange: the first is a brute force attack on the system, while the second is an attack which works if is a weak Diffie-Hellman prime. In both cases, we're actually going to attack the discrete logarithm problem directly. That is, given for some unknown , find .
The simplest brute force attack, of course, is simply exhaustion. That is, just try every value until we find a hit. Let's try this on a particularly small example.
Try this now: try the above with a larger prime, say one with 30 bits. How much harder is it?
There are better brute-force attacks which use memory to reduce the amount of computation required. We will talk about one particular one, called baby-step giant-step, which works as follows:
Compute .
For , compute and store along with the index .
For , compute , and check if it's in the list above. If it is, stop, and output the pair .
If the above produces an output, we know that , i.e. , i.e. . Moreover, since we can always write as for , this algorithm should always output.
Now, let's try it out.
Note that we were able to break a 32-bit problem faster than the exhaustive approach broke a 20-bit problem. Try going a little higher, say a 40-bit prime.
Attacking Diffie-Hellman: weak primes
Finally, let's look at one attack involving a potential weakness of the prime , namely that has smaller factors. Assume for relatively prime factors of size . Since for any , for any . That is, if we only look at elements with are -powers of something, they have order . An analogous thing happens if we switch and .
We use this to recover and , from which we can use the Chinese Remainder Theorem to recover .
Find the discrete logarithm of with respect to using one of the algorithms above, i.e. such that . This implies that , i.e. .
Find the discrete logarithm of with respect to using one of the algorithms above, i.e. such that . This implies that , i.e. .
Compute .
Note that the Chinese remainder theorem works with any number of relatively prime moduli, so we can work with all of the prime factors of independently. In particular, no matter how large is, if can be factored into small values, we can use this technique to quickly attack the discrete logarithm problem modulo .
As a result, when choosing a prime for Diffie-Hellman, it is important to first try to factor to make sure does not have small prime factors.