Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

This repository contains the course materials from Math 157: Intro to Mathematical Software.

Creative Commons BY-SA 4.0 license.

Views: 3033
License: OTHER
Kernel: Python 3 (Ubuntu Linux)

Math 157: Intro to Mathematical Software

UC San Diego, winter 2018

This script creates the final project groups based on the Google Form input. This input was sanitized to remove duplicate names and to insert students that did not fill out the Google Form.

We set a random seed to make this calculation reproducible.

import itertools import pandas as pd import numpy as np import random random.seed(2718281828) import os import shutil

Import a CSV file retrieved from Google Docs.

poll = pd.read_csv("Math 157 final project - Sheet1.csv") poll.describe()
Timestamp Your name (as shown in CoCalc) First choice of topic Second choice of topic
count 129 149 129 129
unique 129 149 6 6
top 2/28/2018 14:06:51 Jiawei Zhou Model selection for machine learning Combinatorial designs
freq 1 1 35 32

We only need the name and the first choice. Let's use the name as the axis.

poll = poll[['Your name (as shown in CoCalc)', 'First choice of topic']] poll = poll.rename(columns={'Your name (as shown in CoCalc)': 'Name', 'First choice of topic': 'Topic'}) poll.describe()
Name Topic
count 149 129
unique 149 6
top Jiawei Zhou Model selection for machine learning
freq 1 35

Standardize capitalization and whitespace in the names.

poll['Name'] = poll['Name'].map(lambda x: " ".join(x.split()).title()) poll.head()
Name Topic
0 Aliaksandr Sumak NaN
1 An-Vy Hoang Numerical solution of ODEs
2 Anastasia Guanio Fourier transforms
3 Andrew Or Model selection for machine learning
4 Ansh Sancheti Numerical solution of ODEs

To make sure that the input was sanitized correctly, compare with a CSV file of course grades generated by CoCalc.

roster = pd.read_csv("export_math157_master.csv") roster = roster[['Name', 'Email']] roster.describe()
Name Email
count 149 149
unique 149 149
top Jiawei Zhou [email protected]
freq 1 1

Standardize capitalization and whitespace in the names.

roster['Name'] = roster['Name'].map(lambda x: " ".join(x.split()).title()) roster.head()
Name Email
0 Aliaksandr Sumak [email protected]
1 An-Vy Hoang [email protected]
2 Anastasia Guanio [email protected]
3 Andrew Or [email protected]
4 Ansh Sancheti [email protected]

Check that everything matches up. Make sure this is valid before using the results!

poll_names = poll['Name'] roster_names = roster['Name'] print("Names in final project poll not found in course roster") print(poll_names[~poll_names.isin(roster_names)]) print("Names in course roster not found in final project poll") print(roster_names[~roster_names.isin(poll_names)])
Names in final project poll not found in course roster 76 Megan Chang Name: Name, dtype: object Names in course roster not found in final project poll 144 [email protected] Chang Name: Name, dtype: object

Convert the poll results into a dictionary.

d = poll.set_index('Name').to_dict()['Topic']

Remove predefined groups before processing the rest of the list.

presets = [['Zhanning Gu', 'Conghan Wang', 'Junyi Li']] d2 = {} for i in presets: for j in i: d2[j] = d[j] del d[j]

Separately, count the number of instances of each topic, including nan (blank).

topic_counts = {s:list(d.values()).count(s) for s in set(d.values())} topic_counts
{'Combinatorial designs': 17, 'Fourier transforms': 12, 'Linear feedback shift registers': 6, 'Model selection for machine learning': 35, 'Numerical solution of ODEs': 29, 'Permutation groups': 29, nan: 18}

In order to make the groups evenly sized, we add a dummy topic. Just like for real topics, we want no two dummies in the same group.

n = len(d.keys()) group_size = 4 num_dummies = (-n) % group_size topic_counts['Dummy topic'] = num_dummies n += num_dummies num_groups = n // group_size print(n, num_dummies, num_groups)
148 2 37
topics = list(topic_counts.keys()) topics.sort(key = lambda x: -topic_counts[x]) topics.remove(np.nan) topics
['Model selection for machine learning', 'Numerical solution of ODEs', 'Permutation groups', 'Combinatorial designs', 'Fourier transforms', 'Linear feedback shift registers', 'Dummy topic']

Distribute topics among the groups, from most to least popular (skipping the blanks). No topic may be assigned more than once within a group. To assign a particular topic, we choose a random subset of groups where each group is weighted by the number of empty spaces it has.

There is a probability of failure because we run out of empty groups. In case this occurs, we give up and try again.

while True: l = [[] for _ in range(num_groups)] try: for i in topics: m = [] for j in range(num_groups): m += [j for _ in range(group_size - len(l[j]))] for _ in range(topic_counts[i]): k = random.choice(m) l[k].append(i) m = [k1 for k1 in m if k1 != k] except ValueError: print("Retrying") finally: break

Pad the remaining groups with blanks.

for g in l: g += ['' for _ in range(group_size - len(g))]

Fill students in at random according to their chosen topics.

for g in l: for i in range(len(g)-1,-1,-1): if g[i] == "Dummy topic": del g[i] continue if g[i] in topics: candidates = [j for j in d if d[j] == g[i]] elif g[i] == '': candidates = [j for j in d if d[j] is np.nan] else: continue x = random.choice(candidates) g[i] = x d2[x] = d[x] del d[x]

Add back the preassigned groups.

l += presets d.update(d2)

For each group, assign each student who did not choose a topic to a random choice.

topics.remove("Dummy topic") for g in l: for i in g: if d[i] is np.nan: m = [j for j in topics if not j in [d[i1] for i1 in g]] d[i] = random.choice(m)

Print the groups to a file.

f = open("final_project_groups.md", "w") f.write("The final project groups are listed below. In the shared project, you will find a folder called `final_project_workspaces`, and within that a subfolder called `group_xx` where `xx` is the number of your group (padded to 2 digits if necessary). You may use this as a shared workspace to coordinate; in particular, I have created a chat room for you to use to discuss. Remember, please let me know if you fail to contact one of your group members.\n\n") f.write("For those who answered the Google Form before Saturday 5:30pm, I have assigned you your first choice. For others, I have assigned you a topic at random.\n\n") for i in range(len(l)): f.write("Group " + str(i+1) + ":\n") for j in l[i]: f.write("- " + j + ": " + (d[j] if d[j] is not np.nan else "(no topic chosen)") + "\n") f.write("\n") f.write("\n") f.close()

Create a workspace for each project. Make sure to edit the template chat room as appropriate before doing this.

After running this command, move the folder final_project_workspaces into the shared project.

if not os.path.exists('final_project_workspaces'): os.makedirs('final_project_workspaces') for i in range(1, num_groups+1): n = str(i).zfill(2) ## Assumes between 10 and 99 groups s = 'final_project_workspaces/group_' + n if not os.path.exists(s): os.makedirs(s) shutil.copyfile('template_chat.sage-chat', s + '/group_' + n + '_chat.sage-chat')