{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "# Analysing the Edinburgh Fringe Festival Jokes\n", "\n", "**This is the ipython notebook for the blog post: [Python, natural language processing and predicting funny](http://vknight.org/unpeudemath/code/2015/06/14/natural-language-and-predicting-funny/)**.\n", "\n", "Here are the libraries we are going to need:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "import pandas # To handle our data nicely\n", "import nltk # For all the clever stuff" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## Loading and tidying the data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | Author | \n", "Rank | \n", "Raw_joke | \n", "Year | \n", "
---|---|---|---|---|
0 | \n", "Tim Vine | \n", "1 | \n", "I've decided to sell my Hoover... well it was ... | \n", "2014 | \n", "
1 | \n", "Masai Graham | \n", "2 | \n", "I've written a joke about a fat badger but I c... | \n", "2014 | \n", "
10 | \n", "Rob Auton | \n", "1 | \n", "I heard a rumour that Cadbury is bringing out ... | \n", "2013 | \n", "
11 | \n", "Alex Horne | \n", "2 | \n", "I used to work in a shoe-recycling shop. It wa... | \n", "2013 | \n", "
12 | \n", "Alfie Moore | \n", "3 | \n", "I'm in a same-sex marriage... the sex is alway... | \n", "2013 | \n", "
\n", " | Author | \n", "Rank | \n", "Raw_joke | \n", "Year | \n", "
---|---|---|---|---|
59 | \n", "Simon Brodkin | \n", "10 | \n", "I started so many fights at my school - I had ... | \n", "2009 | \n", "
6 | \n", "Scott Capurro | \n", "7 | \n", "Scotland had oil but it's running out thanks t... | \n", "2014 | \n", "
7 | \n", "Jason Cook | \n", "8 | \n", "I've been married for 10 years I haven't made ... | \n", "2014 | \n", "
8 | \n", "Felicity Ward | \n", "9 | \n", "This show is about perception and perspective.... | \n", "2014 | \n", "
9 | \n", "Masai Graham | \n", "2 | \n", "I've written a joke about a fat badger but I c... | \n", "2013 | \n", "
\n", " | Author | \n", "Rank | \n", "Raw_joke | \n", "Year | \n", "Joke | \n", "
---|---|---|---|---|---|
0 | \n", "Tim Vine | \n", "1 | \n", "I've decided to sell my Hoover... well it was ... | \n", "2014 | \n", "[DECIDED, SELL, HOOVER, WELL, COLLECTING, DUST] | \n", "
1 | \n", "Masai Graham | \n", "2 | \n", "I've written a joke about a fat badger but I c... | \n", "2014 | \n", "[WRITTEN, JOKE, FAT, BADGER, COULDN, FIT, SET] | \n", "
10 | \n", "Rob Auton | \n", "1 | \n", "I heard a rumour that Cadbury is bringing out ... | \n", "2013 | \n", "[HEARD, RUMOUR, CADBURY, BRINGING, ORIENTAL, C... | \n", "
11 | \n", "Alex Horne | \n", "2 | \n", "I used to work in a shoe-recycling shop. It wa... | \n", "2013 | \n", "[USED, WORK, SHOE, RECYCLING, SHOP, SOLE, DEST... | \n", "
12 | \n", "Alfie Moore | \n", "3 | \n", "I'm in a same-sex marriage... the sex is alway... | \n", "2013 | \n", "[SEX, MARRIAGE, SEX, ALWAYS] | \n", "
\n", " | Author | \n", "Rank | \n", "Raw_joke | \n", "Year | \n", "Joke | \n", "Features | \n", "
---|---|---|---|---|---|---|
0 | \n", "Tim Vine | \n", "1 | \n", "I've decided to sell my Hoover... well it was ... | \n", "2014 | \n", "[DECIDED, SELL, HOOVER, WELL, COLLECTING, DUST] | \n", "{u'contains(DUST)': False, u'contains(COLLECTI... | \n", "
1 | \n", "Masai Graham | \n", "2 | \n", "I've written a joke about a fat badger but I c... | \n", "2014 | \n", "[WRITTEN, JOKE, FAT, BADGER, COULDN, FIT, SET] | \n", "{u'contains(SET)': True, u'contains(WRITTEN)':... | \n", "
10 | \n", "Rob Auton | \n", "1 | \n", "I heard a rumour that Cadbury is bringing out ... | \n", "2013 | \n", "[HEARD, RUMOUR, CADBURY, BRINGING, ORIENTAL, C... | \n", "{u'contains(ORIENTAL)': True, u'contains(CHOCO... | \n", "
11 | \n", "Alex Horne | \n", "2 | \n", "I used to work in a shoe-recycling shop. It wa... | \n", "2013 | \n", "[USED, WORK, SHOE, RECYCLING, SHOP, SOLE, DEST... | \n", "{u'contains(DESTROYING)': True, u'contains(SOL... | \n", "
12 | \n", "Alfie Moore | \n", "3 | \n", "I'm in a same-sex marriage... the sex is alway... | \n", "2013 | \n", "[SEX, MARRIAGE, SEX, ALWAYS] | \n", "{u'contains(MARRIAGE)': True, u'contains(SEX)'... | \n", "
\n", " | Author | \n", "Rank | \n", "Raw_joke | \n", "Year | \n", "Joke | \n", "Features | \n", "Funny | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "Tim Vine | \n", "1 | \n", "I've decided to sell my Hoover... well it was ... | \n", "2014 | \n", "[DECIDED, SELL, HOOVER, WELL, COLLECTING, DUST] | \n", "{u'contains(DUST)': False, u'contains(COLLECTI... | \n", "True | \n", "
1 | \n", "Masai Graham | \n", "2 | \n", "I've written a joke about a fat badger but I c... | \n", "2014 | \n", "[WRITTEN, JOKE, FAT, BADGER, COULDN, FIT, SET] | \n", "{u'contains(SET)': True, u'contains(WRITTEN)':... | \n", "True | \n", "
10 | \n", "Rob Auton | \n", "1 | \n", "I heard a rumour that Cadbury is bringing out ... | \n", "2013 | \n", "[HEARD, RUMOUR, CADBURY, BRINGING, ORIENTAL, C... | \n", "{u'contains(ORIENTAL)': True, u'contains(CHOCO... | \n", "True | \n", "
11 | \n", "Alex Horne | \n", "2 | \n", "I used to work in a shoe-recycling shop. It wa... | \n", "2013 | \n", "[USED, WORK, SHOE, RECYCLING, SHOP, SOLE, DEST... | \n", "{u'contains(DESTROYING)': True, u'contains(SOL... | \n", "True | \n", "
12 | \n", "Alfie Moore | \n", "3 | \n", "I'm in a same-sex marriage... the sex is alway... | \n", "2013 | \n", "[SEX, MARRIAGE, SEX, ALWAYS] | \n", "{u'contains(MARRIAGE)': True, u'contains(SEX)'... | \n", "True | \n", "
13 | \n", "Tim Vine | \n", "4 | \n", "My friend told me he was going to a fancy dres... | \n", "2013 | \n", "[FRIEND, TOLD, GOING, FANCY, DRESS, PARTY, ITA... | \n", "{u'contains(GOING)': True, u'contains(PARTY)':... | \n", "True | \n", "
14 | \n", "Gary Delaney | \n", "5 | \n", "I can give you the cause of anaphylactic shock... | \n", "2013 | \n", "[GIVE, CAUSE, ANAPHYLACTIC, SHOCK, NUTSHELL] | \n", "{u'contains(ANAPHYLACTIC)': True, u'contains(N... | \n", "True | \n", "
15 | \n", "Phil Wang | \n", "6 | \n", "The Pope is a lot like Doctor Who. He never di... | \n", "2013 | \n", "[POPE, LOT, LIKE, DOCTOR, NEVER, DIES, KEEPS, ... | \n", "{u'contains(REPLACED)': True, u'contains(NEVER... | \n", "False | \n", "
16 | \n", "Marcus Brigstocke | \n", "7 | \n", "You know you are fat when you hug a child and ... | \n", "2013 | \n", "[KNOW, FAT, HUG, CHILD, GETS, LOST] | \n", "{u'contains(LOST)': True, u'contains(CHILD)': ... | \n", "False | \n", "
17 | \n", "Liam Williams | \n", "8 | \n", "The universe implodes. No matter. | \n", "2013 | \n", "[UNIVERSE, IMPLODES, MATTER] | \n", "{u'contains(MATTER)': True, u'contains(IMPLODE... | \n", "False | \n", "
\n", " | Author | \n", "Rank | \n", "Raw_joke | \n", "Year | \n", "Joke | \n", "Features | \n", "Funny | \n", "Labeled_Feature | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "Tim Vine | \n", "1 | \n", "I've decided to sell my Hoover... well it was ... | \n", "2014 | \n", "[DECIDED, SELL, HOOVER, WELL, COLLECTING, DUST] | \n", "{u'contains(DUST)': False, u'contains(COLLECTI... | \n", "True | \n", "({u'contains(DUST)': False, u'contains(COLLECT... | \n", "
1 | \n", "Masai Graham | \n", "2 | \n", "I've written a joke about a fat badger but I c... | \n", "2014 | \n", "[WRITTEN, JOKE, FAT, BADGER, COULDN, FIT, SET] | \n", "{u'contains(SET)': True, u'contains(WRITTEN)':... | \n", "True | \n", "({u'contains(SET)': True, u'contains(WRITTEN)'... | \n", "
10 | \n", "Rob Auton | \n", "1 | \n", "I heard a rumour that Cadbury is bringing out ... | \n", "2013 | \n", "[HEARD, RUMOUR, CADBURY, BRINGING, ORIENTAL, C... | \n", "{u'contains(ORIENTAL)': True, u'contains(CHOCO... | \n", "True | \n", "({u'contains(ORIENTAL)': True, u'contains(CHOC... | \n", "
11 | \n", "Alex Horne | \n", "2 | \n", "I used to work in a shoe-recycling shop. It wa... | \n", "2013 | \n", "[USED, WORK, SHOE, RECYCLING, SHOP, SOLE, DEST... | \n", "{u'contains(DESTROYING)': True, u'contains(SOL... | \n", "True | \n", "({u'contains(DESTROYING)': True, u'contains(SO... | \n", "
12 | \n", "Alfie Moore | \n", "3 | \n", "I'm in a same-sex marriage... the sex is alway... | \n", "2013 | \n", "[SEX, MARRIAGE, SEX, ALWAYS] | \n", "{u'contains(MARRIAGE)': True, u'contains(SEX)'... | \n", "True | \n", "({u'contains(MARRIAGE)': True, u'contains(SEX)... | \n", "
\n", " | Raw_joke | \n", "Funny | \n", "Prediction | \n", "
---|---|---|---|
0 | \n", "I've decided to sell my Hoover... well it was ... | \n", "True | \n", "False | \n", "
1 | \n", "I've written a joke about a fat badger but I c... | \n", "True | \n", "True | \n", "
2 | \n", "Always leave them wanting more my uncle used t... | \n", "True | \n", "True | \n", "
3 | \n", "I was given some Sudoku toilet paper. It didn'... | \n", "True | \n", "False | \n", "
4 | \n", "I wanted to do a show about feminism. But my h... | \n", "True | \n", "False | \n", "
5 | \n", "Money can't buy you happiness? Well check this... | \n", "False | \n", "False | \n", "
6 | \n", "Scotland had oil but it's running out thanks t... | \n", "False | \n", "True | \n", "
7 | \n", "I've been married for 10 years I haven't made ... | \n", "False | \n", "True | \n", "
8 | \n", "This show is about perception and perspective.... | \n", "False | \n", "True | \n", "