CoCalc Public FilesFinalProjects / PossibleTopics.ipynb
Compute Environment: Ubuntu 20.04 (Default)

## Permutation Patterns (Combinatorics)

The theory of permutation patterns gives a notion of when a longer permutation "contains" or "avoids" a shorter permutation. This field was first studied by Percy MacMahon in the 1800's, although it was popularized by Donald Knuth when he discovered that permutation patterns were related to "stack sortable" sequences (a stack is a computer science data structure similar to a list, but you can only access one element of the stack at a time).

Many mathematicians enjoy studying enumerative or computational aspects of permutation patterns and statistics. For example, the number of length $n$ permutations which avoid a single pattern of length 3 is given by a Catalan number, and it is known that the problem of determining whether or not one permutation contains another is an NP-Complete problem. Students choosing this project can select any of these aspects to focus on for their presentation; for more information, see here: https://en.wikipedia.org/wiki/Permutation_pattern .

## Pandemic Modelling (Differential Equations)

The SIR model of a pandemic splits humanity into three groups: Susceptible, Infected, and Recovered (hence the name of the model). The model then presents a simple formula for how these populations change over time, as a function of several parameters regarding the disease in question. Models such as this allow statisticians and epidemiologists to make recommendations and predictions regarding the evolution of a pandemic, as well as the effects that certain policies may have on this evolution.

SageMath has numerous methods for solving differential equations, and hence provides an opportunity to do basic pandemic modeling. This project will be an introduction to this idea; students choosing this project should have a basic idea of differential equations. They can use this knowledge to provide analysis (and visuals) of how pandemics spread over time. For more information, see here: https://jaydaigle.net/blog/the-sir-model-of-epidemics/

## Logistic Regression (Statistics/Modelling)

Logistic regression provides a basic way of predicting the probability of a certain event happening. For example, maybe you know an individual's race, age, economic status, and state of residence, and want to predict whether or not they will vote for a certain candidate in an upcoming election. Regressions such as this are very useful in guiding policy decisions, since they allow the discovery of certain correlations between data points. For example, maybe advertising on billboards does not seem to correlate with an increase in a customer's probability of buying my product, but advertising on YouTube does correlate with an increase that probability; hence I should consider advertising more on YouTube. Logistic regressions can also be used as a basic classifying algorithm, when their are two groups to choose from.

Students choosing this project could discuss the theory of logistic regressions and then carry out an example of logistic regression on a concrete data set. Students can either choose a data set of their own to examine, work with myself/a TA to find one, or can use a made up data set regarding grad school admissions which will be provided by me. For more info on logistic regressions, see here: https://www.youtube.com/watch?v=yIYKR4sgzI8

## K-means Clustering (Machine Learning)

The $K$-means algorithm is an unsupervised machine learning algorithm which takes a data set and groups the points into one of $K$ clusters. This can be very useful for data cleaning, data analysis, or machine learning. The algorithm works by first selecting $K$ random "center points" in the data, and assigns a point to a cluster by minimizing its distance to a center point. It then iteratively updates the center points and repeats this process until stable clusters have formed.

Students selecting this project should discuss some of the basic ideas of clustering in machine learning, and then should visualize the $K$-means algorithm and discuss how to measure the "success" of the algorithm. Time permitting, they could then give basic applications, such as to image compression. For more info on this algorithm, see here: https://www.youtube.com/watch?v=4b5d3muPQmA

## Error Correcting Codes (Coding/Information Theory)

Suppose you send a binary message to your friend over a noisy channel. This means that every letter in your message has some probability $p$ of "flipping" (changing from 0 to 1 or from 1 to 0) during transmission. How can you add redundancy into your message so that your friend can recover your message, even if some errors are introduced? For instance, if you want to send the message "1" to your friend, you could simply repeat the message three times: send "111." This allows recovery from a single error using "majority decoding;" if you receive "101" you still decode to "1" since there are more 1s in the message than 0s.

Error correcting codes is a widely studied area of math and computer science. It is used in information storage (writing information to DVDs/CDs) and to efficiently communicate with satellites (there is a lot of noise in space). A student choosing this project should give an introduction to the theory of error correcting codes, and then work with several examples of codes in Sage, such as Hamming codes or Reed Solomon codes. This video: https://www.youtube.com/watch?v=X8jsijhllIA gives a good introduction for more information.

## Steganography (Cryptography, kinda?)

Steganography is the practice of hiding information in plain sight, such as by hiding a message in the pixel intensities of a photograph. This was somewhat infamously brought up in the news several years ago, as certain terrorist cells were allegedly using this idea to secretly communicate. The ideas here are similar in spirit to cryptography, although steganography doesn't necessarily encode a message, it simply hides it. These ideas can also be

Students selecting this project should discuss the ideas of steganography, and implement one or two basic versions of steganography in SageMath. This could involve importing image files into NumPy arrays, manipulating the arrays, and then reformatting them to produce the desired message. Alternatively, students could decide to examine methods for detecting steganography. This project might be a bit more open ended/difficult than the previous five, but could be interesting for a motivated student; see this for more information: https://www.youtube.com/watch?v=TWEXCYQKyDc&t=154s

In [ ]: