Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

This repository contains the course materials from Math 157: Intro to Mathematical Software.

Creative Commons BY-SA 4.0 license.

Views: 3037
License: OTHER
Kernel: SageMath 8.1

Math 157: Intro to Mathematical Software

UC San Diego, winter 2018

February 14, 2018: Number theory and cryptography (part 2 of 3)

Administrivia:

  • Schedule change for this week: due to a schedule conflict, my office hours will take place Thursday 3-4 instead of 4-5.

Public-key cryptography

For most of human history, cryptography has involved using a secret key to encrypt a message to keep its contents away from eavesdroppers. The sender would use the secret key (much like a physical key) to transform the message into something unintelligible, and the recipient would use the same key (but possibly in another manner, much like turning the key the opposite way in a lock) to un-transform the message back into its original form.

Starting in the 1970s, it was discovered that one could use ideas from number theory to implement public-key cryptography, in which the information required to encrypt and decrypt messages is not the same! This is more like a mailbox (or homework dropbox): the encryption key can be made public while the decryption key remains safely secret. (One can also invert the paradigm to make digital signatures, where the secret key is required to sign, but a public key is required to decrypt.)

Public key cryptography is now used widely as part of Internet security; for instance, it underlies the https protocol that web sites like CoCalc use to deliver content to your browser. The signature paradigm is also closely related to the concept of a blockchain, which underlies cryptocurrencies and other emerging e-banking protocols.

There are three major public key techniques in use today:

  • RSA (Rivest-Shamir-Adleman), based on the difficulty of factoring large numbers;

  • DH (Diffie-Hellman), based on the difficulty of computing discrete logarithms in the ring of integers modulo a large prime pp;

  • ECDH (elliptic curve Diffie-Hellman), based on the difficulty of computing discrete logarithms in the group of point on an elliptic curve.

While these can all be used as honest cryptosystems (i.e. protocols for exchanging messages securely), they are all are typically used for secure key exchange. Say Alice and Bob want to communicate with each other. (For some reason, examples of cryptography always involve Alice and Bob.)

  • First, they use the key-exchange protocol to construct a shared secret key KK.

  • Then they use symmetric key cryptography (e.g. AES) to actually communicate. (This typically turns out to be more efficient in practice than directly using a public-key method.)

This talk will focus on constructing (and breaking) Diffie-Hellman key exchange.

Discrete logarithms

Let pp be a prime. Last time, we defined the notion of a primitive root modulo pp; this is an integer aa such that the multiplicative order of aa mod pp is equal to the maximum possible value of p1p-1. In other words, the quantities 1,a,,ap2(modp) 1, a, \dots, a^{p-2} \pmod{p} exhaust all of the nonzero residue classes modulo pp.

Suppose that aa is a primitive root mod pp. Then for any mm not divisible by pp, the discrete logarithm of mm mod pp, with respect to the base aa, is the integer g{0,,p2}g \in \{0,\dots,p-2\} such that agm(modp)a^g \equiv m \pmod{p}.

mod(2, 101)^19
98
2^19 %101
98
mod(98,101).log(mod(2,101))
19

This looks straightforward enough, but it is actually very difficult to compute discrete logarithms! There is no obvious trick for this.

As a result, the function gag(modp)g \mapsto a^g \pmod{p} is often treated as a one-way function: it is easy to compute but hard to invert. (This is a practical definition, not a formal mathematical one, because we (probably) cannot say for sure that it is actually hard to compute, as opposed to our merely being ignorant.)

The difficulty of integer factorization behaves in a similar (but slightly more complicated) way: if p,qp,q are two prime numbers, then it is easy to form the product pqpq but much harder to recover pp and qq from the product. We'll see next time how the RSA method derives its security from this state of affairs.

Aside: quadratic residues

Although this does not play a direct role in Diffie-Hellman, this topic arises naturally at this point and is used in another way in the homework, so here it is.

For nn a positive integer, an integer aa is a quadratic residue mod nn if there exists a perfect square congruent to aa modulo nn.

For example, say n=7n = 7. To test whether a perfect square is congruent to aa modulo nn, we need only test 02,,620^2, \dots, 6^2. Their residues modulo 77 are 0,1,4,2,2,4,10, 1, 4, 2, 2, 4, 1.

quadratic_residues(107) # for any p, will include 0 and (p-1)/2 other values.
[0, 1, 3, 4, 9, 10, 11, 12, 13, 14, 16, 19, 23, 25, 27, 29, 30, 33, 34, 35, 36, 37, 39, 40, 41, 42, 44, 47, 48, 49, 52, 53, 56, 57, 61, 62, 64, 69, 75, 76, 79, 81, 83, 85, 86, 87, 89, 90, 92, 99, 100, 101, 102, 105]

For pp an odd prime, the list of quadratic residues mod pp will include 0 plus (p1)/2(p-1)/2 other values. (If we run over {0,,p1}\{0,\dots,p-1\}, no residue can occur more than twice: if x2y2(modp)x^2 \equiv y^2 \pmod{p}, then (x+y)(xy)(x+y)(x-y) is divisible by pp, and so one of x+yx+y or xyx-y must be.)

For pp an odd prime and aa not divisible by pp, aa is a quadratic residue modulo pp if and only if its discrete logarithm (with respect to any base) is even.

The Legendre symbol (ap)\left( \frac{a}{p} \right) is defined to be 00 if a0(modp)a \equiv 0 \pmod{p}, +1+1 if aa is a nonzero quadratic residue modulo pp, and 1-1 if aa is not a quadratic residue mod pp. This symbol has the multiplicativity property: (ap)(bp)=(abp). \left( \frac{a}{p} \right) \left( \frac{b}{p} \right) = \left( \frac{ab}{p} \right). (The less-than-obvious part is that the product of two quadratic nonresidues is a quadratic residue. This would fail if pp were not prime.)

legendre_symbol(5, 7)
-1
p = next_prime(10^70) %time legendre_symbol(3, p) ## How is this feasible?? No way we are exhausting over all residue classes mod p!
CPU times: user 132 ms, sys: 5.27 ms, total: 137 ms Wall time: 155 ms
1
p%4
1
p%3
1

The law of quadratic reciprocity (originally proved by Gauss) states that if p,qp,q are distinct odd primes, then (pq)(qp)=(1)(p1)(q1)/4. \left( \frac{p}{q} \right) \left( \frac{q}{p} \right) = (-1)^{(p-1)(q-1)/4}. That is, the two Legendre symbols agree if either pp or qq is congruent to 1 mod 4, and disagree if they are both congruent to 3 mod 4.

If aa is not prime, you can apply quadratic reciprocity by first factoring aa. But this is not necessary: if one defines the Kronecker symbol to extend the Legendre symbol via the rule (ab1)(ab2)=(ab1b2), \left( \frac{a}{b_1} \right) \left( \frac{a}{b_2} \right) = \left( \frac{a}{b_1 b_2} \right), then the formula (pq)(qp)=(1)(p1)(q1)/4. \left( \frac{p}{q} \right) \left( \frac{q}{p} \right) = (-1)^{(p-1)(q-1)/4}. extends to all odd positive integers p,qp,q.

This still leaves the prime 2, but there is another formula of Gauss to handle this case: (2p)=(1)(p1)/8. \left( \frac{2}{p} \right) = (-1)^{(p-1)/8}. That is, if pp is an odd prime, then 2 is a quadratic residue mod pp if pp is congruent to 1 or 7 mod 8, and a quadratic nonresidue mod pp if pp is congruent to 3 or 5 mod 8.

Upshot: you can compute Legendre (and Kronecker) symbols quickly using a form of the Euclidean algorithm!

a = next_prime(10^31) b = next_prime(10^32) legendre_symbol(a, b) # Look how fast this is!
-1

Discrete logarithms are hard!

The main difference between the ordinary logarithm and the discrete logarithm is that, while the former is always easy to calculate, the latter is much more difficult. Let's try doing a discrete logarithm on a much larger field.

p = next_prime(5 * 10^10); print(p) F = GF(p) e = randint(1, 11) c = F(2)^e %timeit c.log(F(2))
50000000021 100 loops, best of 3: 10.2 ms per loop
p = next_prime(5 * 10^20); print(p) F2 = GF(p) e = randint(1, (p-1)/2) c = F2(2)^e %timeit c.log(F2(2))
500000000000000000003 1 loop, best of 3: 375 ms per loop
p = next_prime(5 * 10^30); print(p) F2 = GF(p) e = randint(1, (p-1)/2) c = F2(2)^e %timeit c.log(F2(2))
5000000000000000000000000000027 1 loop, best of 3: 950 ms per loop
This is not scaling well. Practical implementations of Diffie-Hellman use 1000 bit primes (or larger), for which discrete logarithms are believed to be extremely difficult.

From discrete logs to Diffie-Hellman

Next, let's use our machinery to construct a Diffie-Hellman key exchange protocol: here are the steps.

  • Alice and Bob (publicly) agree on a large prime pp, modulo which all computations will take place, and a generator gg of GF(p)GF(p) (i.e., a primitive root mod pp).

  • Alice chooses a secret, random integer aa between 00 and p1p-1, while Bob chooses a secret random integer bb in the same range.

  • Alice (publicly) transmits A=gamodpA=g^a\mod p to Bob, while Bob (publicly) transmits B=gbmodpB=g^b\mod p to Alice.

  • Alice and Bob both compute gabmodp=Bamodp=Abmodpg^{ab}\mod p = B^a\mod p = A^b\mod p, and use that value as their shared secret.

Now, if Eve wants to find their shared secret, she has to be able to compute gabg^{ab} only knowing gg, gag^a, and gbg^b. This is called the Diffie-Hellman problem, and is believed to be as hard as computing discrete logarithms in general.

You should have all the tools to make this yourself: please fill in the outline below.

# Public stuff p = random_prime(2^64) Fp = GF(p) g = Fp.multiplicative_generator() # Alice's private and public keys a = Fp.random_element() #... A = pow(g, a, p) # Bob's private and public keys b = randint(0, p-2) B = pow(g, b, p) # Alice computes the shared secret K_alice = pow(B, a, p) # Bob computes the shared secret K_bob = pow(A, b, p) # Finally, check that they are the same print(K_alice == K_bob)
True

And that's it! Alice and Bob have a shared secret, which they can use to send messages.

Attacking Diffie-Hellman: a generic approach

We're going to consider two attacks on Diffie-Hellman key exchange: the first is a brute force attack on the system, while the second is an attack which works if pp is a weak Diffie-Hellman prime. In both cases, we're actually going to attack the discrete logarithm problem directly. That is, given gag^a for some unknown aa, find aa.

The simplest brute force attack, of course, is simply exhaustion. That is, just try every value until we find a hit. Let's try this on a particularly small example.

p = previous_prime(2^20) Fp = GF(p) g = Fp.multiplicative_generator() # Alice's public key, based on an "unknown" random number A = g^(randint(2, p-1)) # Now let's find it def exhaust(g, A): for a in range(2, g.multiplicative_order()): if g^a == A: return a %time print(A, g^exhaust(g,A))
(871540, 871540) CPU times: user 659 ms, sys: 29.7 ms, total: 689 ms Wall time: 687 ms

Try this now: try the above with a larger prime, say one with 30 bits. How much harder is it?

n = 30 p = random_prime(2^n) # Fill in the rest here...

There are better brute-force attacks which use memory to reduce the amount of computation required. We will talk about one particular one, called baby-step giant-step, which works as follows:

  • Compute m=p1m=\lfloor\sqrt{p-1}\rfloor.

  • For e=0,,m1e=0,\ldots,m-1, compute and store gemodpg^e\mod p along with the index ee.

  • For f=0,,m1f=0,\ldots,m-1, compute AgfmmodpA*g^{-fm}\mod p, and check if it's in the list above. If it is, stop, and output the pair (e,f)(e,f).

If the above produces an output, we know that gegafmmodpg^e \equiv g^{a-fm}\mod p, i.e. eafmmodp1e \equiv a-fm\mod p-1, i.e. ae+fmmodp1a \equiv e+fm\mod p-1. Moreover, since we can always write a<p1a<p-1 as e+fme+fm for e,f<me,f < m, this algorithm should always output.

Now, let's try it out.

p=4294967291 # a 32-bit prime Fp = GF(p) g = Fp.multiplicative_generator() # Construct random public key as before A = g^randint(0,p-1) def bsgs(g, A): # Step 1 m = floor(sqrt(g.multiplicative_order())) # Step 2 small_g_powers = {g^e:e for e in range(m)} # Step 3 gpow = A mult = g^(-m) for f in range(m): if gpow in small_g_powers: return small_g_powers[gpow]+f*m gpow *= mult %time print(A, g^bsgs(g, A))
(1786304327, 1786304327) CPU times: user 307 ms, sys: 28.6 ms, total: 335 ms Wall time: 357 ms

Note that we were able to break a 32-bit problem faster than the exhaustive approach broke a 20-bit problem. Try going a little higher, say a 40-bit prime.

Attacking Diffie-Hellman: weak primes

Finally, let's look at one attack involving a potential weakness of the prime pp, namely that p1p-1 has smaller factors. Assume p1=q1q2p-1 = q_1*q_2 for q1,q2q_1, q_2 relatively prime factors of size p\approx \sqrt{p}. Since xp1modp=1x^{p-1}\mod p = 1 for any xx, (xq1)q2modp=1(x^{q_1})^{q_2}\mod p = 1 for any xx. That is, if we only look at elements with are q1q_1-powers of something, they have order q2q_2. An analogous thing happens if we switch q1q_1 and q2q_2.

We use this to recover amodq1a\mod q_1 and amodq2a\mod q_2, from which we can use the Chinese Remainder Theorem to recover aa.

  • Find the discrete logarithm of Aq1A^{q_1} with respect to gq1g^{q_1} using one of the algorithms above, i.e. a2a_2 such that ga2q1=Aq1g^{a_2*q_1} = A^{q_1}. This implies that a2q1aq1modq1a_2*q_1 \equiv a*q_1\mod q-1, i.e. a2amodq2a_2\equiv a\mod q_2.

  • Find the discrete logarithm of Aq2A^{q_2} with respect to gq2g^{q_2} using one of the algorithms above, i.e. a1a_1 such that ga1q2=Aq2g^{a_1*q_2} = A^{q_2}. This implies that a1q2aq2modq1a_1*q_2 \equiv a*q_2\mod q-1, i.e. a1amodq1a_1\equiv a\mod q_1.

  • Compute a=CRT(a1,a2,q1,q2)a = CRT(a_1, a_2, q_1, q_2).

# Let's find a particularly nice p of the appropriate form q_1 = previous_prime(2**32) for q_2 in range(2**32-1000, 2**32): p = q_1*q_2+1 if is_prime(p): break print "p = ", p print "p-1 = ", factor(p-1)
p = 18446739937656050359 p-1 = 2 * 3^2 * 13^2 * 1411889 * 4294967291
Fp = GF(p) g = Fp.multiplicative_generator() # Make Alice's public key as usual A = g^(randint(1,p-1)) def crt_attack(g, A, q_1, q_2): a_2 = bsgs(g^q_1, A^q_1) a_1 = bsgs(g^q_2, A^q_2) return crt(a_1, a_2, q_1, q_2) %time print A, g^crt_attack(g, A, q_1, q_2)
CPU times: user 2 µs, sys: 0 ns, total: 2 µs Wall time: 5.01 µs 12352333673032137476 12352333673032137476

Note that the Chinese remainder theorem works with any number of relatively prime moduli, so we can work with all of the prime factors of p1p-1 independently. In particular, no matter how large pp is, if p1p-1 can be factored into small values, we can use this technique to quickly attack the discrete logarithm problem modulo pp.

As a result, when choosing a prime pp for Diffie-Hellman, it is important to first try to factor pp to make sure p1p-1 does not have small prime factors.