SharedCDS-102 / Lab Week 05 - Introduction to Data Analysis using R / CDS-102 Lab Week 05 Workbook.htmlOpen in CoCalc
Authors: bassil alomari , James Glasbrenner, Gideon Gogovi, John Lyver
Views : 2
Description: Jupyter html version of CDS-102/Lab Week 05 - Introduction to Data Analysis using R/CDS-102 Lab Week 05 Workbook.ipynb
CDS-102 Lab Week 05 Workbook

CDS-102: Lab 5 Workbook

Name: Bassil Alomari

March 2, 2017

In [23]:
# Run this code block to load the Tidyverse package
.libPaths(new = "~/Rlibs")
library(tidyverse)
In [17]:
office_data <- read_csv('OfficeSupplies.csv')
Parsed with column specification:
cols(
  OrderDate = col_character(),
  Region = col_character(),
  Rep = col_character(),
  Item = col_character(),
  Units = col_integer(),
  `Unit Price` = col_double()
)
In [18]:
print(office_data)
# A tibble: 43 × 6
     OrderDate  Region     Rep    Item Units `Unit Price`
         <chr>   <chr>   <chr>   <chr> <int>        <dbl>
1   4-Jul-2014    East Richard Pen Set    62         4.99
2  12-Jul-2014    East    Nick  Binder    29         1.99
3  21-Jul-2014 Central  Morgan Pen Set    55        12.49
4  29-Jul-2014    East   Susan  Binder    81        19.99
5   7-Aug-2014 Central Matthew Pen Set    42        23.95
6  15-Aug-2014    East Richard  Pencil    35         4.99
7  24-Aug-2014    West   James    Desk     3       275.00
8   1-Sep-2014 Central   Smith    Desk     2       125.00
9  10-Sep-2014 Central    Bill  Pencil     7         1.29
10 18-Sep-2014    East Richard Pen Set    16        15.99
# ... with 33 more rows
In [19]:
Office_data <- rownames_to_column(as_tibble(office_data),var = "orderdate")
print(office_data)
# A tibble: 43 × 6
     OrderDate  Region     Rep    Item Units `Unit Price`
         <chr>   <chr>   <chr>   <chr> <int>        <dbl>
1   4-Jul-2014    East Richard Pen Set    62         4.99
2  12-Jul-2014    East    Nick  Binder    29         1.99
3  21-Jul-2014 Central  Morgan Pen Set    55        12.49
4  29-Jul-2014    East   Susan  Binder    81        19.99
5   7-Aug-2014 Central Matthew Pen Set    42        23.95
6  15-Aug-2014    East Richard  Pencil    35         4.99
7  24-Aug-2014    West   James    Desk     3       275.00
8   1-Sep-2014 Central   Smith    Desk     2       125.00
9  10-Sep-2014 Central    Bill  Pencil     7         1.29
10 18-Sep-2014    East Richard Pen Set    16        15.99
# ... with 33 more rows
In [42]:
print(select(office_data, Region, Rep, Item))
# A tibble: 43 × 3
    Region     Rep    Item
     <chr>   <chr>   <chr>
1     East Richard Pen Set
2     East    Nick  Binder
3  Central  Morgan Pen Set
4     East   Susan  Binder
5  Central Matthew Pen Set
6     East Richard  Pencil
7     West   James    Desk
8  Central   Smith    Desk
9  Central    Bill  Pencil
10    East Richard Pen Set
# ... with 33 more rows
In [20]:
(select(office_data, Region, Rep, Item))
print(distinct(Region, Rep, Item))
RegionRepItem
East RichardPen Set
East Nick Binder
CentralMorgan Pen Set
East Susan Binder
CentralMatthewPen Set
East RichardPencil
West James Desk
CentralSmith Desk
CentralBill Pencil
East RichardPen Set
West James Pen
CentralMorgan Binder
West Thomas Binder
East RichardPen
CentralRachel Pencil
East Susan Pen
CentralAlex Binder
CentralMatthewPen Set
CentralAlex Binder
CentralSmith Pencil
CentralRachel Binder
East Susan Pen Set
East RichardPencil
CentralBill Binder
CentralMatthewBinder
CentralSmith Binder
CentralAlex Pencil
East RichardBinder
CentralBill Pen
West James Binder
West James Pencil
CentralAlex Pen Set
East RichardBinder
CentralRachel Pencil
CentralRachel Pencil
East Nick Pen
CentralAlex Pencil
CentralBill Pencil
West Thomas Pencil
CentralBill Binder
East RichardBinder
CentralMatthewDesk
CentralMorgan Pencil
Error in distinct_(.data, .dots = lazyeval::lazy_dots(...), .keep_all = .keep_all): object 'Region' not found
Traceback:

1. print(distinct(Region, Rep, Item))
2. distinct(Region, Rep, Item)
3. distinct_(.data, .dots = lazyeval::lazy_dots(...), .keep_all = .keep_all)
In [21]:
office_data.mutated <- mutate(office_data, TotalCost = Units * `Unit Price`)
print(select(office_data.mutated,OrderDate, Units,`Unit Price`, TotalCost))
# A tibble: 43 × 4
     OrderDate Units `Unit Price` TotalCost
         <chr> <int>        <dbl>     <dbl>
1   4-Jul-2014    62         4.99    309.38
2  12-Jul-2014    29         1.99     57.71
3  21-Jul-2014    55        12.49    686.95
4  29-Jul-2014    81        19.99   1619.19
5   7-Aug-2014    42        23.95   1005.90
6  15-Aug-2014    35         4.99    174.65
7  24-Aug-2014     3       275.00    825.00
8   1-Sep-2014     2       125.00    250.00
9  10-Sep-2014     7         1.29      9.03
10 18-Sep-2014    16        15.99    255.84
# ... with 33 more rows
In [22]:
by_Region <- group_by(office_data, Region)
print(summarise(by_Region,sum(Units), sum(`Unit Price`)))
# A tibble: 3 × 3
   Region `sum(Units)` `sum(\\`Unit Price\\`)`
    <chr>        <int>                   <dbl>
1 Central         1199                  432.45
2    East          691                  118.87
3    West          231                  321.95
In [115]:
Office_data.filtered <- filter(by_Region, Region == "Central")
print(Office_data.filtered)
Source: local data frame [24 x 6]
Groups: Region [1]

     OrderDate  Region     Rep    Item Units `Unit Price`
         <chr>   <chr>   <chr>   <chr> <int>        <dbl>
1  21-Jul-2014 Central  Morgan Pen Set    55        12.49
2   7-Aug-2014 Central Matthew Pen Set    42        23.95
3   1-Sep-2014 Central   Smith    Desk     2       125.00
4  10-Sep-2014 Central    Bill  Pencil     7         1.29
5   5-Oct-2014 Central  Morgan  Binder    28         8.99
6  31-Oct-2014 Central  Rachel  Pencil    14         1.29
7  17-Nov-2014 Central    Alex  Binder    11         4.99
8  25-Nov-2014 Central Matthew Pen Set    96         4.99
9   4-Dec-2014 Central    Alex  Binder    94        19.99
10 12-Dec-2014 Central   Smith  Pencil    67         1.29
# ... with 14 more rows
In [127]:
by_Region <- group_by(office_data, Region, Rep, Units, `Unit Price` )
print(summarise(by_Region))
Source: local data frame [43 x 4]
Groups: Region, Rep, Units [?]

    Region   Rep Units `Unit Price`
     <chr> <chr> <int>        <dbl>
1  Central  Alex    11         4.99
2  Central  Alex    36         4.99
3  Central  Alex    50         4.99
4  Central  Alex    90         4.99
5  Central  Alex    94        19.99
6  Central  Bill     7         1.29
7  Central  Bill    27        19.99
8  Central  Bill    46         8.99
9  Central  Bill    53         1.29
10 Central  Bill    80         8.99
# ... with 33 more rows
In [ ]: