Instructions: For each problem, write code in the provided code block. Don't forget to run your code to make sure it works.
1. Simple list and dictionary practice
Using the data below, write code to accomplish the following tasks.
Name | Favorite Food |
---|---|
Wilfred | Steak |
Manfred | Duck |
Wadsworth | Spaghetti |
Jeeves | Ice cream |
Mitsworth | Tuna |
(A) Make a list of all the names, then loop through the list and print each name out.
names=["Wilfred","Manfred","Wadsworth","Jeeves","Mitsworth"]
for item in names:
print item
(B) Below, some of the names and foods have already been added to a dictionary. Fill in the missing entries using the dict[key] = value
syntax. Then loop through the dictionary and print each name and food combination in the format:
<NAME>'s favorite food is <FOOD>
favFoods = {"Wilfred":"Steak", "Manfred":"Duck", "Wadsworth":"Spaghetti"}
# add your code below:
favFoods["Jeeves"]="Ice Cream"
favFoods["Mitsworth"]="Tuna"
for item in favFoods:
print item+"'s"+" "+"favorite food is"+favFoods[item]
(C) In the dictionary from part (B), change Wilfred's favorite food to pizza.
favFoods["Wilfred"]="pizza"
2. Duplicate removal
Read in the file genes.txt
and print only the unique gene IDs (remove the duplicates). Do not assume repeat IDs appear consecutively in the file.
Hint: see the practice exercises from Lesson 4 for an example of how to remove duplicates using a list.
Filename="genes.txt"
inFile=open(Filename,'r')
new_list=[]
for item in inFile:
if item not in new_list:
new_list.append(item)
print new_list
3. Split practice
Read in the file init_sites.txt
and compute the average CDS length (i.e. average the values in the 7th column). Your answer should be 236.36.
Filename="init_sites.txt"
inFile=open(Filename,'r')
inFile.readline()
line_count=0
total=0
for line in inFile:
line = line.rstrip('\r\n') #strips embedded line ends to prevent spaces between lines
data = line.split() #splits file by spaces
total+=int(data[6]) #converts from string to integer
line_count=line_count+1 #accumulates lines with for loop iteration
print total/line_count
4. The "many counters" problem
Write a script that reads a file of sequences and tallies how many sequences there are of each length. Use sequences3.txt
as input to test your code. After reading through all the sequences, print the sequence length that was the most common.
Hint: you can use a dictionary to keep track of all the tallies, e.g.:
# HINT CODE
seq = "ATGCTGATCGATATA"
length = len(seq)
tallyDictionary=[]
if length not in tallyDictionary:
tallyDictionary[length] = 1 # initialize if first occurrence
else:
tallyDictionary[length] += 1 # otherwise just increment the count
filename="sequences3.txt"
infile=open(filename, 'r')
mylist={}
for line in infile:
line=line.rstrip('\r\n')
length=len(line)
if length not in mylist:
mylist[length]=1
else:
mylist[length]+=1
if mylist[length]==7:
print length
Codon table
For this question, use codon_table.txt
, which contains a list of all possible codons and their corresponding amino acids. We will be using this info to translate a nucleotide sequence into amino acids. Each part of this question builds off the previous parts.
(A) Thinkin' question (short answer, not code): If we want to create a codon dictionary and use it to translate nucleotide sequences, would it be better to use the codons or amino acids as keys? (2 Points)
(B) Read in codon_table.txt
(note that it has a header line) and use it to create a codon dictionary. Then use raw_input()
prompt the user to enter a single codon (e.g. ATG) and print the amino acid corresponding to that codon to the screen. If the nucleotide combonation is not a valid codon, print a warning message. (4 Points)
codon="codon_table.txt"
infile=open(codon, 'r')
infile.readline()
translation={}
for line in infile:
line=line.rstrip('\n')
data=line.split()
seq=data[0]
aa=data[1]
translation[seq]+=aa
(C) Now we will adapt the code in (b) to translate a longer sequence. Instead of prompting the user for a single codon, allow them to enter a longer sequence. First, check that the sequence they entered has a length that is a multiple of 3 (Hint: use the mod operator, %), and print an error message if it is not. If it is valid, then go on to translate every three nucleotides to an amino acid. Print the final amino acid sequence to the screen. We have included some code to help you out. You can either program this function from scratch, or add to the given code. (4 Points)
#Prompt the user for a sequence
#Check that their sequence is a multiple of 3
#Loop through the sequence, in groups of 3, translating each one as you go
protSeq = "" #Add each amino acid to this string as you loop through the codons.
for i in range(0,len(request),3): #request is the sequence given by the user
codon = request[i:i+3] #gets the current codon under consideration
#Use your dictionary to find what AA this codon corresponds to, as in part b. Print an error if it is invalid.
print "Your protein sequence is: " + protSeq