CoCalc -- create-project-groups.ipynb

| Download

This repository contains the course materials from Math 157: Intro to Mathematical Software.

Creative Commons BY-SA 4.0 license.

Project: Support and Testing

Path: math157 / final_project_calculation / create-project-groups.ipynb

Views: ³⁰³³
License: OTHER

Kernel: Python 3 (Ubuntu Linux)

Math 157: Intro to Mathematical Software

UC San Diego, winter 2018

This script creates the final project groups based on the Google Form input. This input was sanitized to remove duplicate names and to insert students that did not fill out the Google Form.

We set a random seed to make this calculation reproducible.

In [1]:

import itertools
import pandas as pd
import numpy as np
import random
random.seed(2718281828)
import os
import shutil

Import a CSV file retrieved from Google Docs.

In [2]:

poll = pd.read_csv("Math 157 final project - Sheet1.csv")
poll.describe()

	Timestamp	Your name (as shown in CoCalc)	First choice of topic	Second choice of topic
count	129	149	129	129
unique	129	149	6	6
top	2/28/2018 14:06:51	Jiawei Zhou	Model selection for machine learning	Combinatorial designs
freq	1	1	35	32

We only need the name and the first choice. Let's use the name as the axis.

In [3]:

poll = poll[['Your name (as shown in CoCalc)', 'First choice of topic']]
poll = poll.rename(columns={'Your name (as shown in CoCalc)': 'Name', 'First choice of topic': 'Topic'})
poll.describe()

	Name	Topic
count	149	129
unique	149	6
top	Jiawei Zhou	Model selection for machine learning
freq	1	35

Standardize capitalization and whitespace in the names.

In [4]:

poll['Name'] = poll['Name'].map(lambda x: " ".join(x.split()).title())
poll.head()

	Name	Topic
0	Aliaksandr Sumak	NaN
1	An-Vy Hoang	Numerical solution of ODEs
2	Anastasia Guanio	Fourier transforms
3	Andrew Or	Model selection for machine learning
4	Ansh Sancheti	Numerical solution of ODEs

To make sure that the input was sanitized correctly, compare with a CSV file of course grades generated by CoCalc.

In [5]:

roster = pd.read_csv("export_math157_master.csv")
roster = roster[['Name', 'Email']]
roster.describe()

	Name	Email
count	149	149
unique	149	149
top	Jiawei Zhou	[email protected]
freq	1	1

Standardize capitalization and whitespace in the names.

In [6]:

roster['Name'] = roster['Name'].map(lambda x: " ".join(x.split()).title())
roster.head()

	Name	Email
0	Aliaksandr Sumak	[email protected]
1	An-Vy Hoang	[email protected]
2	Anastasia Guanio	[email protected]
3	Andrew Or	[email protected]
4	Ansh Sancheti	[email protected]

Check that everything matches up. Make sure this is valid before using the results!

In [7]:

poll_names = poll['Name']
roster_names = roster['Name']

print("Names in final project poll not found in course roster")
print(poll_names[~poll_names.isin(roster_names)])

print("Names in course roster not found in final project poll")
print(roster_names[~roster_names.isin(poll_names)])

Names in final project poll not found in course roster
76    Megan Chang
Name: Name, dtype: object
Names in course roster not found in final project poll
144    [email protected] Chang
Name: Name, dtype: object

Convert the poll results into a dictionary.

In [8]:

d = poll.set_index('Name').to_dict()['Topic']

Remove predefined groups before processing the rest of the list.

In [9]:

presets = [['Zhanning Gu', 'Conghan Wang', 'Junyi Li']]
d2 = {}
for i in presets:
    for j in i:
        d2[j] = d[j]
        del d[j]

Separately, count the number of instances of each topic, including nan (blank).

In [10]:

topic_counts = {s:list(d.values()).count(s) for s in set(d.values())}
topic_counts

{'Combinatorial designs': 17,
 'Fourier transforms': 12,
 'Linear feedback shift registers': 6,
 'Model selection for machine learning': 35,
 'Numerical solution of ODEs': 29,
 'Permutation groups': 29,
 nan: 18}

In order to make the groups evenly sized, we add a dummy topic. Just like for real topics, we want no two dummies in the same group.

In [11]:

n = len(d.keys())
group_size = 4
num_dummies = (-n) % group_size
topic_counts['Dummy topic'] = num_dummies
n += num_dummies
num_groups = n // group_size
print(n, num_dummies, num_groups)

148 2 37

In [12]:

topics = list(topic_counts.keys())
topics.sort(key = lambda x: -topic_counts[x])
topics.remove(np.nan)
topics

['Model selection for machine learning',
 'Numerical solution of ODEs',
 'Permutation groups',
 'Combinatorial designs',
 'Fourier transforms',
 'Linear feedback shift registers',
 'Dummy topic']

Distribute topics among the groups, from most to least popular (skipping the blanks). No topic may be assigned more than once within a group. To assign a particular topic, we choose a random subset of groups where each group is weighted by the number of empty spaces it has.

There is a probability of failure because we run out of empty groups. In case this occurs, we give up and try again.

In [13]:

while True:
    l = [[] for _ in range(num_groups)]
    try:
        for i in topics:
            m = []
            for j in range(num_groups):
                m += [j for _ in range(group_size - len(l[j]))]
            for _ in range(topic_counts[i]):
                k = random.choice(m)
                l[k].append(i)
                m = [k1 for k1 in m if k1 != k]
    except ValueError:
        print("Retrying")
    finally:
        break

Pad the remaining groups with blanks.

In [14]:

for g in l:
    g += ['' for _ in range(group_size - len(g))]

Fill students in at random according to their chosen topics.

In [15]:

for g in l:
    for i in range(len(g)-1,-1,-1):
        if g[i] == "Dummy topic":
            del g[i]
            continue
        if g[i] in topics:
            candidates = [j for j in d if d[j] == g[i]]
        elif g[i] == '':
            candidates = [j for j in d if d[j] is np.nan]
        else: continue
        x = random.choice(candidates)
        g[i] = x
        d2[x] = d[x]
        del d[x]

Add back the preassigned groups.

In [16]:

l += presets
d.update(d2)

For each group, assign each student who did not choose a topic to a random choice.

In [17]:

topics.remove("Dummy topic")
for g in l:
    for i in g:
        if d[i] is np.nan:
            m = [j for j in topics if not j in [d[i1] for i1 in g]]
            d[i] = random.choice(m)

Print the groups to a file.

In [20]:

f = open("final_project_groups.md", "w")
f.write("The final project groups are listed below. In the shared project, you will find a folder called `final_project_workspaces`, and within that a subfolder called `group_xx` where `xx` is the number of your group (padded to 2 digits if necessary). You may use this as a shared workspace to coordinate; in particular, I have created a chat room for you to use to discuss. Remember, please let me know if you fail to contact one of your group members.\n\n")
f.write("For those who answered the Google Form before Saturday 5:30pm, I have assigned you your first choice. For others, I have assigned you a topic at random.\n\n")
for i in range(len(l)):
    f.write("Group " + str(i+1) + ":\n")
    for j in l[i]:
        f.write("- " + j + ": " + (d[j] if d[j] is not np.nan else "(no topic chosen)") + "\n")
    f.write("\n")
f.write("\n")
f.close()

Create a workspace for each project. Make sure to edit the template chat room as appropriate before doing this.

After running this command, move the folder final_project_workspaces into the shared project.

In [19]:

if not os.path.exists('final_project_workspaces'):
    os.makedirs('final_project_workspaces')
for i in range(1, num_groups+1):
    n = str(i).zfill(2) ## Assumes between 10 and 99 groups
    s = 'final_project_workspaces/group_' + n
    if not os.path.exists(s):
        os.makedirs(s)
    shutil.copyfile('template_chat.sage-chat', s + '/group_' + n + '_chat.sage-chat')

In [0]:

In [0]: