Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

Project 1

Views: 107
Kernel: R (R-Project)

Project Title

Author 1 (dc3528), Author 2 (an2312)

The purpose of this project is to investigate 3 questions, one of them regarding of the directors Tom Cherones, who is one of the most frequent directors. We will look at all the episodes that were directed by him, and compare directors to see how different directors differ regarding directing style. The purpose of this is if someone wants to investigate directing styles then she/he would be able to filter out the directors. Secondly we would like to compare a male (George) and female(Elaine) in the types of words they use that have a generlally negative sentiment and comparing this to the overall negative and postive sentiment with the all the charcters. Additinally we will investigate the sentiments of characters, to comapre and see if there is a trend in the postive/negative nature of them.

These questions are of interest to analyze the nature of the show and different elements that the show is composed of. Touching upon the nature of the charcters, difference in gender of the charcters, and even patterns with a particular director shows how a show is and how it may affects its viewer in different contexts.

Codes

# Your codes go here .. library('dplyr') library('ggplot2') episode_info <- read.csv( "episode_info.csv" ) seinfeld_lines <- read.csv("seinfeld_lines.csv") seinfeld_words <- read.csv("seinfeld_words.csv") seinfeld_sentiments <- read.csv("seinfeld_sentiments.csv")
Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union

QUESTIONS

QUESTION #1:

Using the dataframe named seinfeld_words, we will first look at two characters, one male and one female to compare how they differ in the types of words they use that have a generally negative sentiment. Then we will create a bar graph that shows the overall amount of negative and positive sentiment in the data frame for all the characters. To do this we will: a. filter out a male, George, and see the words associated with negative sentiment for him b. filter out a female, Elaine, and see the words associated with negative sentiment for her c. create a bar graph and see the overall negative sentiment for all of the characters

By analyzing this data we can make many conclusions. First we compare a male and female which is a small sample of the full population of characters and we can see is a female or male more generally negative.Using dim to count the observations in both tables we see that for George there are 3579 observations of words that are associated with negative sentiment and for Elaine there are 2313. Therefore it seems based off this sample that the males have a higher amount of negative general sentiment. This can be useful for someone that wants to compare how different genders express emotions. In addition, the bar graph is based on the full population and shows that overall the population is has a higher negative general sentiment by the words they use.

A limitation of this is the context is not shown with these words so if a word is used in an odd way it may not be categorized in the most accuare sense.

data<- filter(seinfeld_words, Character=='GEORGE', generalSentiment=='negative') head(data) data2<- filter(seinfeld_words, Character=='ELAINE', generalSentiment=='negative') head(data2) dim(data) dim(data2) ggplot(seinfeld_words, aes(x=generalSentimentracter))+geom_bar()
X.1XCharacterEpisodeNoSEIDSeasonwordsentimentgeneralSentimentsentimentScore
12 35 GEORGE 1 S01E01 1 hate anger negative-3
13 35 GEORGE 1 S01E01 1 hate disgust negative-3
14 35 GEORGE 1 S01E01 1 hate fear negative-3
15 35 GEORGE 1 S01E01 1 hate negativenegative-3
16 35 GEORGE 1 S01E01 1 hate sadness negative-3
30 58 GEORGE 1 S01E01 1 tired negativenegative-2
X.1XCharacterEpisodeNoSEIDSeasonwordsentimentgeneralSentimentsentimentScore
224 233 ELAINE 1 S01E01 1 disgust anger negative-3
225 233 ELAINE 1 S01E01 1 disgust disgust negative-3
226 233 ELAINE 1 S01E01 1 disgust fear negative-3
227 233 ELAINE 1 S01E01 1 disgust negativenegative-3
228 233 ELAINE 1 S01E01 1 disgust sadness negative-3
229 237 ELAINE 1 S01E01 1 lost negativenegative-3
  1. 3579
  2. 10
  1. 2313
  2. 10
Image in a Jupyter notebook

QUESTION #2:

All of the episodes have a different director, some directors being more frequent than others. For this question, we will focus on the director Tom Cherones, who is one of the most frequent. We will: a. look at all the episodes that were directed by him b. create a bar graph to compare how many episodes were directed by him and how many were directed by the other directors

This information is useful to anyone who would like to analyze the way Tom Cherones directs and see what might be the difference between the episodes he has directed and those that have been directed by others. Using the bar graph we can see that most of the episodes are directed by Andy Ackerman and Tom Cherones, using this visualization one can see who are the most frequent directors of the show and use this to further analyze the episodes. The table and bar graph show how some directors only directed 1 or 2 episodes and it might be interesting to see what are the differences in these episodes and how might the director be reflected in the episode.

A limitation, or perhaps something that could make this even more narrow in the scope of research would be to look at the nature of each show and how Cherones directors his shows. Is there a pattern? This data, although helpful is only the beginning of research that can be done.

episode_by_director<-filter(episode_info, Director=='Tom Cherones') head(episode_by_director)
XSeasonEpisodeNoTitleAirDateWritersDirectorSEID
1 1 1 The Stakeout May 31, 1990 Larry David, Jerry SeinfeldTom Cherones S01E01
2 1 2 The Robbery June 7, 1990 Matt Goldman Tom Cherones S01E02
3 1 3 Male Unbonding June 14, 1990 Larry David, Jerry SeinfeldTom Cherones S01E03
4 1 4 The Stock Tip June 21, 1990 Larry David, Jerry SeinfeldTom Cherones S01E04
5 2 1 The Ex-Girlfriend January 16, 1991 Larry David, Jerry SeinfeldTom Cherones S02E01
6 2 2 The Pony Remark January 30, 1991 Larry David, Jerry SeinfeldTom Cherones S02E02
grouped_by_director<- group_by(episode_info, Director) table <-summarize(grouped_by_director, Numepisodes=n()) table
DirectorNumepisodes
Andy Ackerman 87
Art Wolff 1
David Owen Trainor 2
David Steinberg 1
David  Steinberg 1
Jason Alexander 1
Joshua White 1
Tom Cherones 80
ggplot(episode_info, aes(x=Director))+geom_bar()
Image in a Jupyter notebook

QUESTION #3: In this project we will investigate if the 3 of the characters Jerry, Larry, and Matt are positive or negative. Using the data frame seinfeld_sentiments, creating a bar chart whose horizontal axis is Season (treated as a categorical variable) vertical axis is the number of words spoken by Jerry, Larry, and Matt in each season, with fill (color) coded by whether the word is generally positive or negative.

Regarding Jerry it was more positive than negative but not significantly more than the other. Regarding Larry it seemed to be split pretty equally Regarding Matt it was more skewed towards negative compared to the other characters but not too extreme.

This data although interesting is not surprising because Seinfield is very sarcastic in its humor and often times characters say negative comments in the show. While in this data visualization we compared just males further analysis can be done with the same methods to compare the females in the show if a person was interested in the gender differences.

sentiments_season <- group_by(seinfeld_sentiments, Season, Character == "JERRY", generalSentiment) summary <- summarize(sentiments_season, Count = n()) ggplot( summary, aes (x= Season, y= Count, fill= generalSentiment)) + geom_col( position = "fill")
Image in a Jupyter notebook
sentiments_season <- group_by(seinfeld_sentiments, Season, Character == "LARRY", generalSentiment) summary <- summarize(sentiments_season, Count = n()) ggplot( summary, aes (x= Season, y= Count, fill= generalSentiment)) + geom_col( position = "fill")
Image in a Jupyter notebook
sentiments_season <- group_by(seinfeld_sentiments, Season, Character == "MATT", generalSentiment) summary <- summarize(sentiments_season, Count = n()) ggplot( summary, aes (x= Season, y= Count, fill= generalSentiment)) + geom_col( position = "fill")
Image in a Jupyter notebook

Appendix

filter( DATAFRAMENAME, CRITERIA): to produce a new data frame that contains only: rows in the data frame DATAFRAMENAME that satisfy the criteria specified in CRITERIA.

group_by( DATAFRAMENAME, COLUMNNAME ): to group rows of DATAFRAMENAME by their values in the column COLUMNNAME

summarize( GROUPEDDATAFRAMENAME, NEWCOLUMN = FORMULA ): to compute a summary quantity from grouped data GROUPEDDATAFRAMENAME (usuall the output of group_by()), where the summary quantity is stored in a new column called NEWCOLUMN

ggplot( DATAFRAMENAME, aes( x = COLUMNNAME1, y = COLUMNNAME2 ) ) + geom_point(): to create a scatterplot with data in DATAFRAMENAME, with COLUMNNAME1 on the x-axis and COLUMNNAME2 on the y-axis

ggplot( DATAFRAMENAME, aes( x = COLUMNNAME1, y = COLUMNNAME2 ) ) + geom_col(): to create a bar chart with data in DATAFRAMENAME, with COLUMNNAME1 on the x-axis and COLUMNNAME2 on the y-axis

ggplot( DATAFRAMENAME, aes( x = COLUMNNAME ) ) + geom_bar(): to create a bar chart with data in DATAFRAMENAME, with COLUMNNAME on the x-axis and the number of observations on the y-axis

ggplot( DATAFRAMENAME, aes( x = COLUMNNAME) ) + geom_histogram(): to create a historam of the data in column COLUMNNAME in the data frame DATAFRAMENAME.