Analysing the Edinburgh Fringe Festival Jokes
This is the ipython notebook for the blog post: Python, natural language processing and predicting funny.
Here are the libraries we are going to need:
Loading and tidying the data
Author | Rank | Raw_joke | Year | |
---|---|---|---|---|
0 | Tim Vine | 1 | I've decided to sell my Hoover... well it was ... | 2014 |
1 | Masai Graham | 2 | I've written a joke about a fat badger but I c... | 2014 |
10 | Rob Auton | 1 | I heard a rumour that Cadbury is bringing out ... | 2013 |
11 | Alex Horne | 2 | I used to work in a shoe-recycling shop. It wa... | 2013 |
12 | Alfie Moore | 3 | I'm in a same-sex marriage... the sex is alway... | 2013 |
Author | Rank | Raw_joke | Year | |
---|---|---|---|---|
59 | Simon Brodkin | 10 | I started so many fights at my school - I had ... | 2009 |
6 | Scott Capurro | 7 | Scotland had oil but it's running out thanks t... | 2014 |
7 | Jason Cook | 8 | I've been married for 10 years I haven't made ... | 2014 |
8 | Felicity Ward | 9 | This show is about perception and perspective.... | 2014 |
9 | Masai Graham | 2 | I've written a joke about a fat badger but I c... | 2013 |
Getting rid of the common word and tokenising the jokes
Author | Rank | Raw_joke | Year | Joke | |
---|---|---|---|---|---|
0 | Tim Vine | 1 | I've decided to sell my Hoover... well it was ... | 2014 | [DECIDED, SELL, HOOVER, WELL, COLLECTING, DUST] |
1 | Masai Graham | 2 | I've written a joke about a fat badger but I c... | 2014 | [WRITTEN, JOKE, FAT, BADGER, COULDN, FIT, SET] |
10 | Rob Auton | 1 | I heard a rumour that Cadbury is bringing out ... | 2013 | [HEARD, RUMOUR, CADBURY, BRINGING, ORIENTAL, C... |
11 | Alex Horne | 2 | I used to work in a shoe-recycling shop. It wa... | 2013 | [USED, WORK, SHOE, RECYCLING, SHOP, SOLE, DEST... |
12 | Alfie Moore | 3 | I'm in a same-sex marriage... the sex is alway... | 2013 | [SEX, MARRIAGE, SEX, ALWAYS] |
Training our classifier
From here on in we use the jokes up until 2013 as the training set.
We start by getting the entire set of words in all the jokes from the training set.
Creating a function to extract features from a given joke
Author | Rank | Raw_joke | Year | Joke | Features | |
---|---|---|---|---|---|---|
0 | Tim Vine | 1 | I've decided to sell my Hoover... well it was ... | 2014 | [DECIDED, SELL, HOOVER, WELL, COLLECTING, DUST] | {u'contains(DUST)': False, u'contains(COLLECTI... |
1 | Masai Graham | 2 | I've written a joke about a fat badger but I c... | 2014 | [WRITTEN, JOKE, FAT, BADGER, COULDN, FIT, SET] | {u'contains(SET)': True, u'contains(WRITTEN)':... |
10 | Rob Auton | 1 | I heard a rumour that Cadbury is bringing out ... | 2013 | [HEARD, RUMOUR, CADBURY, BRINGING, ORIENTAL, C... | {u'contains(ORIENTAL)': True, u'contains(CHOCO... |
11 | Alex Horne | 2 | I used to work in a shoe-recycling shop. It wa... | 2013 | [USED, WORK, SHOE, RECYCLING, SHOP, SOLE, DEST... | {u'contains(DESTROYING)': True, u'contains(SOL... |
12 | Alfie Moore | 3 | I'm in a same-sex marriage... the sex is alway... | 2013 | [SEX, MARRIAGE, SEX, ALWAYS] | {u'contains(MARRIAGE)': True, u'contains(SEX)'... |
Labelling our jokes depending on what will be deemed as funny
Author | Rank | Raw_joke | Year | Joke | Features | Funny | |
---|---|---|---|---|---|---|---|
0 | Tim Vine | 1 | I've decided to sell my Hoover... well it was ... | 2014 | [DECIDED, SELL, HOOVER, WELL, COLLECTING, DUST] | {u'contains(DUST)': False, u'contains(COLLECTI... | True |
1 | Masai Graham | 2 | I've written a joke about a fat badger but I c... | 2014 | [WRITTEN, JOKE, FAT, BADGER, COULDN, FIT, SET] | {u'contains(SET)': True, u'contains(WRITTEN)':... | True |
10 | Rob Auton | 1 | I heard a rumour that Cadbury is bringing out ... | 2013 | [HEARD, RUMOUR, CADBURY, BRINGING, ORIENTAL, C... | {u'contains(ORIENTAL)': True, u'contains(CHOCO... | True |
11 | Alex Horne | 2 | I used to work in a shoe-recycling shop. It wa... | 2013 | [USED, WORK, SHOE, RECYCLING, SHOP, SOLE, DEST... | {u'contains(DESTROYING)': True, u'contains(SOL... | True |
12 | Alfie Moore | 3 | I'm in a same-sex marriage... the sex is alway... | 2013 | [SEX, MARRIAGE, SEX, ALWAYS] | {u'contains(MARRIAGE)': True, u'contains(SEX)'... | True |
13 | Tim Vine | 4 | My friend told me he was going to a fancy dres... | 2013 | [FRIEND, TOLD, GOING, FANCY, DRESS, PARTY, ITA... | {u'contains(GOING)': True, u'contains(PARTY)':... | True |
14 | Gary Delaney | 5 | I can give you the cause of anaphylactic shock... | 2013 | [GIVE, CAUSE, ANAPHYLACTIC, SHOCK, NUTSHELL] | {u'contains(ANAPHYLACTIC)': True, u'contains(N... | True |
15 | Phil Wang | 6 | The Pope is a lot like Doctor Who. He never di... | 2013 | [POPE, LOT, LIKE, DOCTOR, NEVER, DIES, KEEPS, ... | {u'contains(REPLACED)': True, u'contains(NEVER... | False |
16 | Marcus Brigstocke | 7 | You know you are fat when you hug a child and ... | 2013 | [KNOW, FAT, HUG, CHILD, GETS, LOST] | {u'contains(LOST)': True, u'contains(CHILD)': ... | False |
17 | Liam Williams | 8 | The universe implodes. No matter. | 2013 | [UNIVERSE, IMPLODES, MATTER] | {u'contains(MATTER)': True, u'contains(IMPLODE... | False |
Creating a labeled feature
Author | Rank | Raw_joke | Year | Joke | Features | Funny | Labeled_Feature | |
---|---|---|---|---|---|---|---|---|
0 | Tim Vine | 1 | I've decided to sell my Hoover... well it was ... | 2014 | [DECIDED, SELL, HOOVER, WELL, COLLECTING, DUST] | {u'contains(DUST)': False, u'contains(COLLECTI... | True | ({u'contains(DUST)': False, u'contains(COLLECT... |
1 | Masai Graham | 2 | I've written a joke about a fat badger but I c... | 2014 | [WRITTEN, JOKE, FAT, BADGER, COULDN, FIT, SET] | {u'contains(SET)': True, u'contains(WRITTEN)':... | True | ({u'contains(SET)': True, u'contains(WRITTEN)'... |
10 | Rob Auton | 1 | I heard a rumour that Cadbury is bringing out ... | 2013 | [HEARD, RUMOUR, CADBURY, BRINGING, ORIENTAL, C... | {u'contains(ORIENTAL)': True, u'contains(CHOCO... | True | ({u'contains(ORIENTAL)': True, u'contains(CHOC... |
11 | Alex Horne | 2 | I used to work in a shoe-recycling shop. It wa... | 2013 | [USED, WORK, SHOE, RECYCLING, SHOP, SOLE, DEST... | {u'contains(DESTROYING)': True, u'contains(SOL... | True | ({u'contains(DESTROYING)': True, u'contains(SO... |
12 | Alfie Moore | 3 | I'm in a same-sex marriage... the sex is alway... | 2013 | [SEX, MARRIAGE, SEX, ALWAYS] | {u'contains(MARRIAGE)': True, u'contains(SEX)'... | True | ({u'contains(MARRIAGE)': True, u'contains(SEX)... |
Creating our classifier
The real test comes from applying our classifier to this year's jokes
Raw_joke | Funny | Prediction | |
---|---|---|---|
0 | I've decided to sell my Hoover... well it was ... | True | False |
1 | I've written a joke about a fat badger but I c... | True | True |
2 | Always leave them wanting more my uncle used t... | True | True |
3 | I was given some Sudoku toilet paper. It didn'... | True | False |
4 | I wanted to do a show about feminism. But my h... | True | False |
5 | Money can't buy you happiness? Well check this... | False | False |
6 | Scotland had oil but it's running out thanks t... | False | True |
7 | I've been married for 10 years I haven't made ... | False | True |
8 | This show is about perception and perspective.... | False | True |
Wrapping all of the above in a function to see if we can identify how our classifier performs based on a funniness threshold
Wrapping everything in another function to see the effect of the testing data set
We used previous years to train for this year. Here we will just use random samples of a variety of size of the data to train.
Here is a plot of the accuracy for varying ratio.
Here are all the above on a single plot (not terrible helpful).