CoCalc Shared Fileslabs / lab6 / lab6.ipynbOpen in CoCalc with one click!
Authors: Abigail Buchholz, Savannah McKinney, Morgan Paramore
Views : 44
Description: Honors Course #3

Lab Six

Welcome to DATA 101 lab six, an individual OR partner lab, your choice. Either way, it will be collected on Tuesday, November 26th at 11:59pm. Good luck!

You are what you eat

In the cell below, create a string variable called my_name which has your full name. (If you're doing this lab with a partner, instead create a NumPy array called our_names with two strings in it.)

Also in the cell below, create a Pandas series called "my_food" with three elements: the index of this series should be the values pizza, burgers, and sushi, in that order. The corresponding values of the series should be the approximate number of pieces of pizza, hamburgers, and pieces of sushi you estimate you have eaten in your lifetime. (If you're doing this lab with a partner, instead create a Pandas DataFrame with two columns, one with the first name of partner #1 and the other with the first name of partner #2. The index of the DataFrame should still be as specified above.)

In [ ]:
In [4]:
import pandas as pd import numpy as np import scipy.stats our_names=np.array(['Abby Buchholz', 'Morgan Paramore']) print(our_names) my_food= pd.DataFrame(columns=[],index=['pizza','burgers','sushi']) my_food['abby']=np.array([100,331,211]) my_food['morgan']=np.array([142,277,21]) [print(my_food)]
['Abby Buchholz' 'Morgan Paramore'] abby morgan pizza 100 142 burgers 331 277 sushi 211 21
In [36]:
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-36-37e6c44faca7> in <module>() 2 3 fam['cornwell']=('mary') ----> 4 fam=fam.add(other='age',fill_value=na) 5 print(fam) NameError: name 'na' is not defined

Highly functioning

Write a function called triple_it() that takes a single numeric argument (which you can name whatever you want), and returns triple its value. (For example, calling triple_it(4) should return the number 12.)

Make sure to also write code to test this function (i.e., call the function for a few sample values, and print the answer it returns) to make sure it works.

In [ ]:
def triple_it(score): return score*3 print(triple_it(2)) print(triple_it(10))

Write a function called mpg() that takes three arguments: start_mileage, which is the starting mileage on a car's odometer; end_mileage, which is the ending mileage on a car's odometer; and gallons, which is the number of gallons of gas it took to drive that distance. It should return the miles per gallon for the car.

Needless to say, you should also write code to test this function.

In [ ]:
def mpg(start_mileage, end_mileage, gallons): return (start_mileage-end_mileage)/gallons print(mpg(100,50,1))

Write a function called lucky_add() that takes two numeric arguments (which you can name whatever you want), and returns their sum, unless their sum is 13, in which case it returns 0.

Needless to say, you should also write code to test it.

In [ ]:
def lucky_add(num1,num2): test=(num1+num2) if test==13: return 0 else: return num1+num2 print(lucky_add(1,3)) print(lucky_add(12,1))

Write a function called salutation() which takes three arguments: gender (male, female, or other), marital_status (single, married, divorced, or widowed), and degree (None, AA, BA, BS, MA, MS, or PhD). It should return the string "Dr." for Ph.D.'s, "Mr." for all other males, "Mx." for all people with "other" gender, "Miss" for single females, "Mrs." for married females, and "Ms." for other females. (Yes, we did this example in class and in the book. Try to do it yourself again without looking, though, and then look if you need to.)

Needless to say.

In [ ]:
def salutation(gender, marital_status, degree): if degree == "PhD": return "Dr." elif gender == "male": return "Mr." elif gender == "female": if marital_status == "married": return "Mrs." elif marital_status == "single": return "Miss" else: return "Ms." else: return "Mx." print(salutation('female',"married","PhD")) print(salutation('female',"widowed","AA")) print(salutation('male',"divorced","PhD")) print(salutation('male',"single","BA")) print(salutation('other','single','PhD')) print(salutation('other','single','AA'))

In the U.S. tax code, a taxpayer's AGI (Adjusted Gross Income) is (basically) the sum of their year's wages and tips, minus any alimony they received and any retirement contributions they made. Write a function called agi() which takes four arguments -- a taxpayer's wages, tips, alimony, and retirement contributions -- and returns their AGI.

Needless to.

In [ ]:
def agi(wages,tips,alimony,retirement_contributions): return(wages+tips)-(alimony+retirement_contributions) print(agi(100,100,50,50)) print(agi(10,5,2,1))

In the U.S. tax code, a taxpayer's taxable income is their AGI minus their deductions. Deductions can be either "standard" or "itemized." For taxpayers who itemize their deductions, their deduction is simply the amount they itemized (duh). In 2019, the standard deduction (for non-itemizers) is either $12,200 (for taxpayers less than 65 years old) or else $14,000 (for those 65 years or older).

Write a function called taxable_income() that takes three numeric arguments: a taxpayer's AGI, itemized deductions, and age. It should return their taxable income. For itemized deductions equal to 0, use the taxpayer's standard deduction in the calculation. Otherwise, use their itemized deductions.


In [ ]:
def taxable_income(AGI,deduction,age): if deduction>0: return AGI-deduction else: if age<65: deduction = 12200 return AGI-deduction else: deduction = 14000 return AGI-deduction taxable_income(10000, 12, 44)

Taxation without representation

The file peeps.csv in your lab6 folder contains fictitious information about 17,185 made-up taxpaying Americans. Read it into a DataFrame called peeps, and spend a minute inspecting its contents.

In [ ]:
In [ ]:
data_set=peeps.head(10) #for row in data_set.itertuples(): # print("{} is a {} who is {} years old.".format(row.First,row.Gender,row.Age)) #peeps_crosstab=pd.crosstab(peeps.Gender,peeps.Marital_status) #scipy.stats.chi2_contingency(peeps_crosstab) peeps=peeps[peeps.Wages<]peeps.boxplot('Wages',by='Gender')

Print a message to the screen giving the lowest, highest, and median age of the taxpayers in this data set.

In [ ]:
peeps_age=peeps['Age'] print('The lowest age of taxpayers is {}, while the highest age is {}, and the median age is {}'.format(peeps_age.min(),peeps_age.max(),peeps_age.median()))

Print out a little table that shows how many males, females, and other-gendered people there are.

In [ ]:
peeps_gender=peeps['Gender'] peeps_gender=pd.DataFrame(peeps_gender.value_counts()) print(peeps_gender)

Display a bar chart that shows how many people with each type of college degree there are.

In [ ]:
peeps_degree=peeps['Degree'] peeps_degree=peeps_degree.value_counts() peeps_degree.plot(kind='bar')

Display a histogram that shows how common various amounts of income are ("income" is wages plus tips).

In [ ]:
peeps_wages=peeps['Wages'] peeps_tips=peeps['Tips'] peeps_income=peeps_wages+peeps_tips peeps_income=peeps_income[peeps_income<=300000] print(peeps_income.plot(kind='hist', bins=50)) print("There was an outlier in the data, Woody Woodpecker was making 6000000, which is an exhorbenant amount ")

Print out a little table that shows how many people of each gender received alimony.

In [ ]:
peeps=pd.read_csv('peeps.csv') just_gender_alimony=peeps [peeps.Alimony>0] print (just_gender_alimony['Gender'].value_counts())
Display a grouped box plot that shows how much alimony -- for those who received alimony -- people of the various genders received.
In [ ]:

How many taxpayers in this data set are named "Stephen Davies?" Print a message to the screen with that information.

In [ ]:
stephen = peeps[(peeps.First=='Stephen')&(peeps.Last== 'Davies')] stephen = stephen.count() print(stephen.First)

Financial advisors say that you really ought to put at least 10% of your income towards retirement if you don't want to be a leech on society when you're older. How many of these taxpayers are actually doing that? Print a message to the screen with information about what percentage (not a raw total) of these taxpayers are socking away at least that much. (To be clear, the message could be something like: "Only 73.9% of these taxpayers are saving the recommended 10%!")

In [ ]:
peeps_income=peeps_wages+peeps_tips ideal_retire = (peeps_income*.10) retire = peeps.Retirement enough = peeps[retire >= ideal_retire] num_enough = enough.count().First print(num_enough) percent = (num_enough/(len(peeps)))*100 print(round(percent, 1)) print("Only {}% of these taxpayers are saving the recommended 10%!".format(round(percent,1)))

Create a new column in this DataFrame called Honorific that contains each taxpayer's form of address as defined in your salutation() function.

In [ ]:
for row in peeps.itertuples(): honor= salutation(row.Gender, row.Marital_status, row.Degree) peeps['Honorific'] = honor print(peeps) print(salutation('female', 'married', 'BS'))
In [ ]:

What percent of these taxpayers go by either "Mr." or "Mrs."? Print a message to the screen with that information.

In [ ]:
num_mr = peeps['Honorific'].value_counts() print(num_mr)

Write a loop that will print a message for each taxpayer with last name "Potter" or "Granger" that says (for instance) "You earned $59182 this year, Mr. Frodo P. Granger!" The number printed should be the taxpayer's income (wages plus tips), and their full name should appear as in that example, with all four components.

In [ ]:
potter_granger=peeps[(peeps.Last=='Potter')|(peeps.Last=='Granger')] print(potter_granger) for row in potter_granger.itertuples(): pg_inc = row.Wages+row.Tips print("You earned ${} this year, {} {} {}. {}".format(pg_inc,row.Honorific,row.First,row.Middle,row.Last))

Create a new column in this DataFrame called AGI that contains each taxpayer's AGI as defined in your agi() function.

In [ ]:
col_agi = agi(peeps.Wages, peeps.Tips, peeps.Alimony, peeps.Retirement) peeps['AGI'] = col_agi print(peeps)

Create a new column in this DataFrame called TaxableIncome that contains each taxpayer's taxable income as defined in your taxable_income() function.

In [ ]:
for row in peeps.itertuples(): taxable_inc = taxable_income(row.AGI, row.Deductions, row.Age) peeps['TaxableIncome'] = taxable_inc print(peeps)

Is taxable income significantly associated/correlated with marital status? Perform this analysis and print a message to the screen giving the answer.

In [ ]:

Is taxable income significantly associated/correlated with age? Perform this analysis and print a message to the screen giving the answer.

In [ ]:
hi_age = med_age = lo_age =

Is taxable income significantly different between those with Associates Degrees and those with Bachelors Degrees? Perform this analysis and print a message to the screen giving the answer.

In [ ]:
bachelor = peeps[(peeps.Degree == "BS") | (peeps.Degree == "BA")] print(bachelor) associate = peeps[peeps.Degree == "AA"] print(associate) inc_bach = bachelor.Wages+bachelor.Tips inc_assoc = associate.Wages+associate.Tips scipy.stats.ttest_ind(inc_assoc, inc_bach) print("The p-value is about 0.707, so we cannot conclude with confidence that there is a significant difference in taxable income between those with Associate Degrees and Bachelors Degrees")
In [ ]: