Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

This repository contains the course materials from Math 157: Intro to Mathematical Software.

Creative Commons BY-SA 4.0 license.

Views: 3037
License: OTHER
Kernel: R (R-Project)

Math 157: Intro to Mathematical Software

UC San Diego, winter 2018

Homework 6: due February 23, 2018

Please enter all answers within this notebook unless otherwise specified. As usual, don't forget to cite sources and collaborators.

Through this problem set, use the SageMath 8.1 kernel except as specified.

This homework consists of 5 problems, each of equal value.

Problem 1: The birthday paradox (and comedy)

Grading criteria: correctness of code and thoroughness of explanations.

1a. A well-known statement of mathematical folklore asserts that if you choose NN people at random, then the probability that two of them have the same birthday exceeds 50% as soon as N23N \geq 23. This is sometimes called the birthday paradox not because of any logical inconsistency, but simply because people find this counterintuitive; a more accepted terminology is the birthday problem.

Write a function that, given NN, computes this probability. For simplicity, you may assume that every year consists of 365 days (i.e., ignore leap years) and that birthdays are distributed uniformly at random.

def birthday(n): if (n<0): raise ValueError elif n>365: return(1.0) else: prob_diff=factorial(365)/(365^n*factorial(365-n)) return(1.0-prob_diff)

1b. Verify numerically that the probability exceeds 0.5 if and only if N23N \geq 23.

The probability that there is a shared birthday is clearly a monotone increasing function with NN. That is, as we put more and more people into the room, the likelihood of a repeated birthday strictly increases. Thus it suffices to show that the probability with 22 people in the room is less than 50%, and the probability with 23 people in the room is greater than 50%.

print('With 22 people in the room: '+str(birthday(22))) print('With 23 people in the room: '+str(birthday(23)))
With 22 people in the room: 0.475695307662550 With 23 people in the room: 0.507297234323985

1c. If a year were to consist of MM days, how would the crossover value depend (asympotically) on NN? Why?

Here are some heuristics (which could be fleshed out to a full proof) to show that the crossover value will be roughly M\sqrt{M}. For a year with MM days, let p(N)p(N) denote the probability that with NN people in a room there will be a shared birthday. Then we have an exact formula p(N)=1(11/M)(12/M)(1(N1)/M)p(N)=1-(1-1/M)(1-2/M)\dots(1-(N-1)/M) We want p(N)1/2p(N)\approx 1/2, so it suffices to require (11/M)(12/M)(1(N1)/M1/2(1-1/M)(1-2/M)\dots(1-(N-1)/M\approx 1/2. But now if MM is large, and NN is small compared to MM, we will have (11/M)(12/M)(1(N1)/M)(11/d)(11/d)(11/d)=(11/d)n(n1)/2. (1-1/M)(1-2/M)\dots(1-(N-1)/M)\approx (1-1/d)(1-1/d)\dots(1-1/d)=(1-1/d)^{n(n-1)/2}. Recalling our approximation to ee as a limit (https://en.wikipedia.org/wiki/List_of_representations_of_e), this gives (11/M)(12/M)(1(N1)/M)(11/M)N(N1)/2eN(N1)/(2M) (1-1/M)(1-2/M)\dots(1-(N-1)/M)\approx (1-1/M)^{N(N-1)/2}\approx e^{-N(N-1)/(2M)} Recall we want this to be close to 1/21/2, so we need eN(N1)/(2M)1/2, e^{-N(N-1)/(2M)}\approx 1/2, and after taking logarithms this gives N(N1)/2Mln(2) N(N-1)/2M\approx \ln(2) or N(N1)2Mln(2)N(N-1)\approx 2M\ln(2). Since N(N1)N2N(N-1)\approx N^2, we get that the threshold value occurs approximately at N2Mln(2)N\approx \sqrt{2M\ln(2)}.

1d. Explain, in your own words, what the following three things have in common.

All three topics can be viewed in the light of a "collision problem" or "collision algorithm." At the most basic level, the birthday problem states that in a year with MM days, it takes roughly M\sqrt{M} people in a room to expect a "collision," or a pair of people who share a birthday. The baby-step-giant-step attack on discrete logarithms also works in a similar fashion. Namely, in the bsgs algorithm we first create one list consisting of the "baby steps;" (in light of the example in the 02-14 lecture), these are the elements of the form geg^e. We then create another list consisting of the "giant steps;" these are the elements of the form AgkmA\cdot g^{-km}. Then we simply compare the two lists and look for a "collision," or a repeated element. Once we find such a pair, we can crack the discrete log problem easily. Finally, the mathematical theory of late night comedy also works in a very similar fashion. For example, when they spoke of creating a joke about Donald Trump and Disney World, they starting creating lists beneath each name, and then looked for "collisions," or connections, between the two lists. Once they found a connection, such as "Trump likes money" and "Scrooge McDuck," they find their joke by associating the two topics.

Problem 2: Chinks in the armor

Grading criteria: correctness of code and mathematical reasoning.

2a. There exist two primes pp and qq such that N=pqN = pq and φ(N)\varphi(N) have the following values. Find pp and qq.

N = 46708822349259203361344297606349795508624596894610331963475250447878049614343188469958138667702672064437158794030506161936018920993393470466541259138024527655091597122885936534640983967743981020417066596244043571854706943422478922714341547167502363866097396753119154853150662020146962118536726679429506821989 ph = 46708822349259203361344297606349795508624596894610331963475250447878049614343188469958138667702672064437158794030506161936018920993393470466541259138024513978876893776281066371789960571608641635171633767722385559809330963145405657507385159096521621672648416769760458924356578637127097073558518163200222200556
p = var('p') soln = solve([p^2+N+p*(ph-N-1)==0],p) print(soln)
[ p == 6612519666394911182064823512383224653251775061023179506819961214021544640933519912090015249482762731324741709337294603613013844038465174236651822063374327, p == 7063695036951693688098027511012910686133470371805342151192084161958732432331687044298055731259430717655241649358634190470369175826579803971864407221247107 ]
a=6612519666394911182064823512383224653251775061023179506819961214021544640933519912090015249482762731324741709337294603613013844038465174236651822063374327 b=7063695036951693688098027511012910686133470371805342151192084161958732432331687044298055731259430717655241649358634190470369175826579803971864407221247107 a*b==N
True

Thus NN factors as N=abN=a\cdot b, with aa and bb as above, a=66125...a=66125..., b=70636...b=70636...

2b. The following integer NN is a product of two primes of roughly equal size (which are known to me but secret to you). The integer tt given below has the property that M3t(modN)M^3 \equiv t \pmod{N} for some positive integer MM not greater than 1010510^{105}. Find MM.

N=76423640358847412652374294300302041459529359752680987485518411448531451335119684258103920661719583680308606457161396261605016691913851993440982486450916973319097247600145790676526404213081447781619033665382062984788563242010572490285667221071318294609426838188283710453878988160806557791285882029204772981529 t=11807826974296406142491583468052817190645455044270507694732514014928570915157636104659010509935579145791187971616680786969637105700382114579562649014536639676091007952025381665220916825703280184232877376453652231909120222974982306195030361794694545267682687918232040294524659748461552986872564797138673247308
i=1 while True: i+=1 #print(i) if i%10000==0: print(i) S=t+i*N if (S**(1/3))==ceil(S**(1/3)): print(S**(1/3),i) break
10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 120000 (214035639750189648224019619384655479933054780209521387766158744707529839948106277822537351209299245170833, 128301)
mod(214035639750189648224019619384655479933054780209521387766158744707529839948106277822537351209299245170833^3,N)==mod(t,N)
True

So we have M=214035639750189648224019619384655479933054780209521387766158744707529839948106277822537351209299245170833M=214035639750189648224019619384655479933054780209521387766158744707529839948106277822537351209299245170833.

Problem 3: Elliptic curve Diffie-Hellman

Grading criteria: correctness of code.

3a. The following code defines an elliptic curve (whatever that is!) over Fp\mathbb{F}_p for a certain prime pp. This has the structure of an abelian group; compute the order of this group.

p = 6501010882155498913 E = EllipticCurve(GF(p), [0,1,0,1,-1])
print(E.order())
6501010884453029318

3b. Check that the following element is a generator.

g = E(4808979935694435458, 2167227988161099418)
print(g.order()==E.order()) #A basic fact of group theory is that if the order of an element equals the order of the group, then that element must be #a generator for the group.
True

3c. Using EE and gg, perform the analogue of a Diffie-Hellman key exchange. Hint: the analogue of modular multiplication is denoted by addition, and the analogue of exponentiation g^n is denoted n*g.

g+g+g, 3*g
# Your code goes here a = randint(0, p-1) # Alice's secret b = randint(0, p-1) # Bob's secret #The fact that we are working over the elliptic curve E and using the generator g is completely public information: p = 6501010882155498913 E = EllipticCurve(GF(p), [0,1,0,1,-1]) g = E(4808979935694435458, 2167227988161099418) ###### #Alice and Bob both multiply g by their respective secrets. A=a*g B=b*g #Alice and Bob compute their respective shared secretes K_alice=a*B K_bob=b*A #Hopefully they are the same: K_alice==K_bob
True

Problem 4: Attacking elliptic curve Diffie-Hellman

Grading criteria: correctness of code.

Throughout this problem, use the same elliptic curve and generator as in problem 3.

4a. Take the code given in lecture for the baby-step-giant-step method for computing discrete logarithms, and adapt it to this elliptic curve.

#This function solves x*h=A for x; in otherwords, h will be the "base" of the logarithm, and A will be the target multiple of h. def elliptic_bsgs(h,A): # Step 1 m = floor(sqrt(h.order())) # Step 2 small_h_powers = {e*h:e for e in range(m)} # Step 3 hpow = A mult = m*(-h) for f in range(m): if hpow in small_h_powers: return small_h_powers[hpow]+f*m hpow += mult p = 2430985831 E = EllipticCurve(GF(p), [0,1,0,1,-1]) g = E(1142679369, 725589986) #Testing it out: if it works correctly, running elliptic_bsgs(g,a*g) should return a. print(elliptic_bsgs(g,43*g)) print(elliptic_bsgs(g,598*g))
43 598

4b. Take the code given in lecture for the Chinese Remainder Theorem method for computing discrete logarithms, and adapt it to this elliptic curve. (Hint: first factor the order of EE.)

factor(E.order())
2 * 1613 * 753547
#For the crt attack to work, we only need q_1 and q_2 to be coprime. Thus we can simply set q_1=2*1613 and q_2=753547. q_1=2*1613 q_2=753547 def elliptic_crt(h, A, q_1, q_2): a_2 = elliptic_bsgs(q_1*h, q_1*A) a_1 = elliptic_bsgs(q_2*h, q_2*A) return crt(a_1, a_2, q_1, q_2)
#Just some sanity tests, this is all subsumed in 4c anyways, so it's not too important: for i in range(2): k=randint(0, g.order()-1) print(k,elliptic_bsgs(g,k*g),elliptic_crt(g,k*g,q_1,q_2),k*g==elliptic_crt(g,k*g,q_1,q_2)*g)
(872626738, 872626738, 872626738, True) (1015039810, 1015039810, 1015039810, True)
4c. Test your solutions to 4a and 4b using the following code.
p = 2430985831 E = EllipticCurve(GF(p), [0,1,0,1,-1]) g = E(1142679369, 725589986) m = randint(0, p-1) A = m*g print(m) print(g.discrete_log(A)) print(elliptic_bsgs(g,A)) print(elliptic_crt(g,A,q_1,q_2))
2244216661 2244216661 2244216661 2244216661

Problem 5: A preview of R

Grading criteria: correctness and thoroughness of explanations.

For the following exercise, switch your kernel to "R (R-Project)".

In each of the following cells, run the R commands as indicated, then explain in words:

  • what is the data being analyzed;

  • what the code is doing;

  • one conclusion you drew from the data.

You might want to consult the documentation for the R datasets package, from which these examples were taken.

pairs(mtcars)
Image in a Jupyter notebook

The data set examines 32 models of car from the 1970's and makes observations on 11 variables: Miles per gallon, number of engine cylinders, displacement, horsepower, axle ratio, weight, 1/4 mile time, type of engine (V or S(traight)), type of transmission, number of gears, and number of carburetors. For each pair of variables, the code gives a correlation plot; for instance, the (1,4) entry shows the relationship between horsepower and MPG. The code then displays all such graphs in a nice grid layout. Something you could learn from these plots is that higher horsepower is correlated with lower miles per gallon (read the (1,4) entry).

require(stats); require(graphics) coplot(circumference ~ age | Tree, data = Orange, show.given = FALSE)
Image in a Jupyter notebook

The "Orange" data set has data for 5 orange trees. The data set tracks the growth of the tree circumference over time, with seven sample points for each tree over a span of about 4.5 years. The age of the tree is measured in days since 12/31/1968, and the circumference is measured in millimeters (with the measurement being taken at chest height). The coplot function makes a plot for each tree, showing the correlation of height with age. The individual plots are then stacked next to each other.

Something to learn from this data set is that trees get wider as they get older, and that this widening occurs fairly linearly.

x <- apply(HairEyeColor, c(1, 2), sum) x mosaicplot(x)
BrownBlueHazelGreen
Black 6820 15 5
Brown11984 54 29
Red 2617 14 14
Blond 794 10 16
Image in a Jupyter notebook

The HairEyeColor data set is a data set consisting of observations on 592 students' hair color, eye color, and gender. The data is stored in a three dimensional array. It was originally aggregated over gender, but the male/female split was added in post for pedagogical reasons. The code is first re-aggregated the data over gender. Then the code is doing a mosiacplot, which can be read as follows. Let us examine the rectangle in the second column and fourth row. The width of this rectangle tells us that a large portion of the sample had brown hair. The height of this rectangle tells us that, amongst individuals with brown hair, a minority of them had green eyes.

Something that you could learn from this data set is that a majority of the students that were surveyed had brown hair.