Lesson 4: In-class exercises


Instructions: For each problem, write code in the provided code block. Don't forget to run your code to make sure it works.


1. Simple list and dictionary practice

Using the data below, write code to accomplish the following tasks.

Name Favorite Food
Wilfred Steak
Manfred Duck
Wadsworth Spaghetti
Jeeves Ice cream
Mitsworth Tuna

(A) Make a list of all the names, then loop through the list and print each name out.

In [1]:
names=["Wilfred","Manfred","Wadsworth","Jeeves","Mitsworth"]
for item in names:
    print item
Wilfred
Manfred
Wadsworth
Jeeves
Mitsworth

(B) Below, some of the names and foods have already been added to a dictionary. Fill in the missing entries using the dict[key] = value syntax. Then loop through the dictionary and print each name and food combination in the format:

<NAME>'s favorite food is <FOOD>
In [29]:
favFoods = {"Wilfred":"Steak", "Manfred":"Duck", "Wadsworth":"Spaghetti"}

# add your code below:
favFoods["Jeeves"]="Ice Cream"
favFoods["Mitsworth"]="Tuna"
for item in favFoods:
    print item+"'s"+" "+"favorite food is"+favFoods[item]
Manfred's favorite food isDuck
Wadsworth's favorite food isSpaghetti
Wilfred's favorite food isSteak
Mitsworth's favorite food isTuna
Jeeves's favorite food isIce Cream

(C) In the dictionary from part (B), change Wilfred's favorite food to pizza.

In [30]:
favFoods["Wilfred"]="pizza"

2. Duplicate removal

Read in the file genes.txt and print only the unique gene IDs (remove the duplicates). Do not assume repeat IDs appear consecutively in the file.

Hint: see the practice exercises from Lesson 4 for an example of how to remove duplicates using a list.

In [48]:
Filename="genes.txt"
inFile=open(Filename,'r')
new_list=[]
for item in inFile:
    if item not in new_list:
        new_list.append(item)
print new_list
    
    
['uc007zzs.1\n', 'uc009akk.2\n', 'uc009eyb.1\n', 'uc008vlv.1\n', 'uc008wzq.1\n', 'uc007hnl.1\n', 'uc008tvu.1\n', 'uc008vlv.3\n', 'uc007xgk.1\n', 'uc009qsh.1\n', 'uc008all.1\n', 'uc008eda.1\n', 'uc007shu.4\n', 'uc009mor.1\n', 'uc008fux.1\n', 'uc007ztg.2\n', 'uc007nkt.1\n', 'uc008qul.3\n', 'uc008ktr.2\n', 'uc008iwn.1\n', 'uc009fxp.2\n', 'uc008vsh.1\n', 'uc008gkj.2\n', 'uc007piu.2\n', 'uc008vsk.1\n', 'uc008vsv.1\n', 'uc008kjh.1\n', 'uc009dri.1\n', 'uc008vlv.2\n', 'uc009rxy.1\n', 'uc008fyq.1\n', 'uc009act.1\n', 'uc008lub.1\n', 'uc007ker.1\n', 'uc008qiz.2\n', 'uc008bak.1\n', 'uc008kcg.1\n', 'uc009cjg.1\n', 'uc007vlq.1\n', 'uc007xog.1\n', 'uc009avv.1\n', 'uc008kcg.2\n', 'uc007kmj.1\n', 'uc008oaj.1\n', 'uc007cib.1\n', 'uc007ket.1\n', 'uc009rpf.1\n', 'uc008owo.1\n', 'uc008jaq.1']

3. Split practice

Read in the file init_sites.txt and compute the average CDS length (i.e. average the values in the 7th column). Your answer should be 236.36.

In [49]:
Filename="init_sites.txt"
inFile=open(Filename,'r')
inFile.readline()
line_count=0
total=0
for line in inFile:
    line = line.rstrip('\r\n') #strips embedded line ends to prevent spaces between lines
    data = line.split() #splits file by spaces
    total+=int(data[6]) #converts from string to integer
    line_count=line_count+1 #accumulates lines with for loop iteration
    
    
print total/line_count 
    
236

4. The "many counters" problem

Write a script that reads a file of sequences and tallies how many sequences there are of each length. Use sequences3.txt as input to test your code. After reading through all the sequences, print the sequence length that was the most common.

Hint: you can use a dictionary to keep track of all the tallies, e.g.:

In [18]:
# HINT CODE

seq = "ATGCTGATCGATATA"
length = len(seq)
tallyDictionary=[]
if length not in tallyDictionary:
    tallyDictionary[length] = 1     # initialize if first occurrence
else:
    tallyDictionary[length] += 1    # otherwise just increment the count
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-18-bbbe251d4d52> in <module>()
      5 tallyDictionary=[]
      6 if length not in tallyDictionary:
----> 7     tallyDictionary[length] = 1     # initialize if first occurrence
      8 else:
      9     tallyDictionary[length] += 1    # otherwise just increment the count

IndexError: list assignment index out of range
In [5]:
filename="sequences3.txt"
infile=open(filename, 'r')
mylist={}
for line in infile:
    line=line.rstrip('\r\n')
    length=len(line)
    if length not in mylist:
        mylist[length]=1
    else:
        mylist[length]+=1
    if mylist[length]==7:
        print length
51

Homework exercise (10 Points)


Codon table

For this question, use codon_table.txt, which contains a list of all possible codons and their corresponding amino acids. We will be using this info to translate a nucleotide sequence into amino acids. Each part of this question builds off the previous parts.

(A) Thinkin' question (short answer, not code): If we want to create a codon dictionary and use it to translate nucleotide sequences, would it be better to use the codons or amino acids as keys? (2 Points)

Since multiple tiplet codons code for the same amino acid sequences, I would use codons for as the key.

(B) Read in codon_table.txt (note that it has a header line) and use it to create a codon dictionary. Then use raw_input() prompt the user to enter a single codon (e.g. ATG) and print the amino acid corresponding to that codon to the screen. If the nucleotide combonation is not a valid codon, print a warning message. (4 Points)

In [20]:
codon="codon_table.txt"
infile=open(codon, 'r')
infile.readline()
translation={}
for line in infile:
    line=line.rstrip('\n')
    data=line.split()
    seq=data[0]
    aa=data[1]
    translation[seq]+=aa
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-ed6a569d0c84> in <module>()
      6     line=line.rstrip('\n')
      7     data=line.split()
----> 8     translation+=data

TypeError: unsupported operand type(s) for +=: 'dict' and 'list'

(C) Now we will adapt the code in (b) to translate a longer sequence. Instead of prompting the user for a single codon, allow them to enter a longer sequence. First, check that the sequence they entered has a length that is a multiple of 3 (Hint: use the mod operator, %), and print an error message if it is not. If it is valid, then go on to translate every three nucleotides to an amino acid. Print the final amino acid sequence to the screen. We have included some code to help you out. You can either program this function from scratch, or add to the given code. (4 Points)

In [ ]:
#Prompt the user for a sequence


#Check that their sequence is a multiple of 3



#Loop through the sequence, in groups of 3, translating each one as you go
protSeq = "" #Add each amino acid to this string as you loop through the codons. 
for i in range(0,len(request),3): #request is the sequence given by the user
    codon = request[i:i+3] #gets the current codon under consideration
    
    #Use your dictionary to find what AA this codon corresponds to, as in part b. Print an error if it is invalid.
    
    
print "Your protein sequence is: " + protSeq