W6 - Text Analytics
[1] "/projects/db982efa-e439-4e2d-933b-7c7011c6b21a/MITx15.071x-AnalyticsEdge"
Loading required package: NLP
'data.frame': 1181 obs. of 2 variables:
$ Tweet: chr "I have to say, Apple has by far the best customer care service I have ever received! @Apple @AppStore" "iOS 7 is so fricking smooth & beautiful!! #ThanxApple @Apple" "LOVE U @APPLE" "Thank you @apple, loving my new iPhone 5S!!!!! #apple #iphone5S pic.twitter.com/XmHJCU4pcb" ...
$ Avg : num 2 2 1.8 1.8 1.8 1.8 1.8 1.6 1.6 1.6 ...
FALSE TRUE
999 182
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 1181
[1] "i have to say, apple has by far the best customer care service i have ever received! @apple @appstore"
[1] "i have to say, apple has by far the best customer care service i have ever received! @apple @appstore"
[1] "i have to say apple has by far the best customer care service i have ever received apple appstore"
[1] "i" "me" "my" "myself" "we" "our" "ours" "ourselves"
[9] "you" "your"
[1] 174
[1] " say far best customer care service ever received appstore"
[1] " say far best custom care servic ever receiv appstor"
<<DocumentTermMatrix (documents: 1181, terms: 3289)>>
Non-/sparse entries: 8980/3875329
Sparsity : 100%
Maximal term length: 115
Weighting : term frequency (tf)
<<DocumentTermMatrix (documents: 6, terms: 11)>>
Non-/sparse entries: 1/65
Sparsity : 98%
Maximal term length: 9
Weighting : term frequency (tf)
Terms
Docs cheapen cheaper check cheep cheer cheerio cherylcol chief chiiiiqu child children
character(0) 0 0 0 0 0 0 0 0 0 0 0
character(0) 0 0 0 0 0 0 0 0 0 0 0
character(0) 0 0 0 0 0 0 0 0 0 0 0
character(0) 0 0 0 0 0 0 0 0 0 0 0
character(0) 0 0 0 0 0 0 0 0 0 0 0
character(0) 0 0 0 0 1 0 0 0 0 0 0
[1] "android" "anyon" "app" "appl"
[5] "back" "batteri" "better" "buy"
[9] "can" "cant" "come" "dont"
[13] "fingerprint" "freak" "get" "googl"
[17] "ios7" "ipad" "iphon" "iphone5"
[21] "iphone5c" "ipod" "ipodplayerpromo" "itun"
[25] "just" "like" "lol" "look"
[29] "love" "make" "market" "microsoft"
[33] "need" "new" "now" "one"
[37] "phone" "pleas" "promo" "promoipodplayerpromo"
[41] "realli" "releas" "samsung" "say"
[45] "store" "thank" "think" "time"
[49] "twitter" "updat" "use" "via"
[53] "want" "well" "will" "work"
<<DocumentTermMatrix (documents: 1181, terms: 309)>>
Non-/sparse entries: 4669/360260
Sparsity : 99%
Maximal term length: 20
Weighting : term frequency (tf)
[1] "iphon" "itun" "new"
predictCART
FALSE TRUE
FALSE 294 6
TRUE 37 18
[1] 0.8788732
FALSE TRUE
300 55
[1] 0.8450704
randomForest 4.6-10
Type rfNews() to see new features/changes/bug fixes.
predictRF
FALSE TRUE
FALSE 293 7
TRUE 34 21
[1] 0.884507
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
prediction from a rank-deficient fit may be misleading
FALSE TRUE
FALSE 253 47
TRUE 27 28
[1] 0.7915493