This repository contains the course materials from Math 157: Intro to Mathematical Software.
Creative Commons BY-SA 4.0 license.
License: OTHER
Math 157: Intro to Mathematical Software
UC San Diego, winter 2018
Homework 6: due February 23, 2018
Please enter all answers within this notebook unless otherwise specified. As usual, don't forget to cite sources and collaborators.
Through this problem set, use the SageMath 8.1 kernel except as specified.
This homework consists of 5 problems, each of equal value.
Problem 1: The birthday paradox (and comedy)
Grading criteria: correctness of code and thoroughness of explanations.
1a. A well-known statement of mathematical folklore asserts that if you choose people at random, then the probability that two of them have the same birthday exceeds 50% as soon as . This is sometimes called the birthday paradox not because of any logical inconsistency, but simply because people find this counterintuitive; a more accepted terminology is the birthday problem.
Write a function that, given , computes this probability. For simplicity, you may assume that every year consists of 365 days (i.e., ignore leap years) and that birthdays are distributed uniformly at random.
1b. Verify numerically that the probability exceeds 0.5 if and only if .
The probability that there is a shared birthday is clearly a monotone increasing function with . That is, as we put more and more people into the room, the likelihood of a repeated birthday strictly increases. Thus it suffices to show that the probability with 22 people in the room is less than 50%, and the probability with 23 people in the room is greater than 50%.
1c. If a year were to consist of days, how would the crossover value depend (asympotically) on ? Why?
Here are some heuristics (which could be fleshed out to a full proof) to show that the crossover value will be roughly . For a year with days, let denote the probability that with people in a room there will be a shared birthday. Then we have an exact formula We want , so it suffices to require . But now if is large, and is small compared to , we will have Recalling our approximation to as a limit (https://en.wikipedia.org/wiki/List_of_representations_of_e), this gives Recall we want this to be close to , so we need and after taking logarithms this gives or . Since , we get that the threshold value occurs approximately at .
1d. Explain, in your own words, what the following three things have in common.
The birthday problem.
The baby-step-giant-step attack on discrete logarithms.
The mathematical theory of late-night comedy reported by Slate last week. (Not a joke!)
All three topics can be viewed in the light of a "collision problem" or "collision algorithm." At the most basic level, the birthday problem states that in a year with days, it takes roughly people in a room to expect a "collision," or a pair of people who share a birthday. The baby-step-giant-step attack on discrete logarithms also works in a similar fashion. Namely, in the bsgs algorithm we first create one list consisting of the "baby steps;" (in light of the example in the 02-14 lecture), these are the elements of the form . We then create another list consisting of the "giant steps;" these are the elements of the form . Then we simply compare the two lists and look for a "collision," or a repeated element. Once we find such a pair, we can crack the discrete log problem easily. Finally, the mathematical theory of late night comedy also works in a very similar fashion. For example, when they spoke of creating a joke about Donald Trump and Disney World, they starting creating lists beneath each name, and then looked for "collisions," or connections, between the two lists. Once they found a connection, such as "Trump likes money" and "Scrooge McDuck," they find their joke by associating the two topics.
Problem 2: Chinks in the armor
Grading criteria: correctness of code and mathematical reasoning.
2a. There exist two primes and such that and have the following values. Find and .
Thus factors as , with and as above, ,
2b. The following integer is a product of two primes of roughly equal size (which are known to me but secret to you). The integer given below has the property that for some positive integer not greater than . Find .
So we have .
Problem 3: Elliptic curve Diffie-Hellman
Grading criteria: correctness of code.
3a. The following code defines an elliptic curve (whatever that is!) over for a certain prime . This has the structure of an abelian group; compute the order of this group.
3b. Check that the following element is a generator.
3c. Using and , perform the analogue of a Diffie-Hellman key exchange. Hint: the analogue of modular multiplication is denoted by addition, and the analogue of exponentiation g^n
is denoted n*g
.
Problem 4: Attacking elliptic curve Diffie-Hellman
Grading criteria: correctness of code.
Throughout this problem, use the same elliptic curve and generator as in problem 3.
4a. Take the code given in lecture for the baby-step-giant-step method for computing discrete logarithms, and adapt it to this elliptic curve.
4b. Take the code given in lecture for the Chinese Remainder Theorem method for computing discrete logarithms, and adapt it to this elliptic curve. (Hint: first factor the order of .)
4c. Test your solutions to 4a and 4b using the following code.
Problem 5: A preview of R
Grading criteria: correctness and thoroughness of explanations.
For the following exercise, switch your kernel to "R (R-Project)".
In each of the following cells, run the R commands as indicated, then explain in words:
what is the data being analyzed;
what the code is doing;
one conclusion you drew from the data.
You might want to consult the documentation for the R datasets package, from which these examples were taken.
The data set examines 32 models of car from the 1970's and makes observations on 11 variables: Miles per gallon, number of engine cylinders, displacement, horsepower, axle ratio, weight, 1/4 mile time, type of engine (V or S(traight)), type of transmission, number of gears, and number of carburetors. For each pair of variables, the code gives a correlation plot; for instance, the (1,4) entry shows the relationship between horsepower and MPG. The code then displays all such graphs in a nice grid layout. Something you could learn from these plots is that higher horsepower is correlated with lower miles per gallon (read the (1,4) entry).
The "Orange" data set has data for 5 orange trees. The data set tracks the growth of the tree circumference over time, with seven sample points for each tree over a span of about 4.5 years. The age of the tree is measured in days since 12/31/1968, and the circumference is measured in millimeters (with the measurement being taken at chest height). The coplot function makes a plot for each tree, showing the correlation of height with age. The individual plots are then stacked next to each other.
Something to learn from this data set is that trees get wider as they get older, and that this widening occurs fairly linearly.
Brown | Blue | Hazel | Green | |
---|---|---|---|---|
Black | 68 | 20 | 15 | 5 |
Brown | 119 | 84 | 54 | 29 |
Red | 26 | 17 | 14 | 14 |
Blond | 7 | 94 | 10 | 16 |
The HairEyeColor data set is a data set consisting of observations on 592 students' hair color, eye color, and gender. The data is stored in a three dimensional array. It was originally aggregated over gender, but the male/female split was added in post for pedagogical reasons. The code is first re-aggregated the data over gender. Then the code is doing a mosiacplot, which can be read as follows. Let us examine the rectangle in the second column and fourth row. The width of this rectangle tells us that a large portion of the sample had brown hair. The height of this rectangle tells us that, amongst individuals with brown hair, a minority of them had green eyes.
Something that you could learn from this data set is that a majority of the students that were surveyed had brown hair.