Jupyter notebook 26_Python-IV/Lesson4_exercises.ipynb
Lesson 4: In-class exercises
Instructions: For each problem, write code in the provided code block. Don't forget to run your code to make sure it works.
1. Simple list and dictionary practice
Using the data below, write code to accomplish the following tasks.
Name | Favorite Food |
---|---|
Wilfred | Steak |
Manfred | Duck |
Wadsworth | Spaghetti |
Jeeves | Ice cream |
Mitsworth | Tuna |
(A) Make a list of all the names, then loop through the list and print each name out.
(B) Below, some of the names and foods have already been added to a dictionary. Fill in the missing entries using the dict[key] = value
syntax. Then loop through the dictionary and print each name and food combination in the format:
(C) In the dictionary from part (B), change Wilfred's favorite food to pizza.
2. Duplicate removal
Read in the file genes.txt
and print only the unique gene IDs (remove the duplicates). Do not assume repeat IDs appear consecutively in the file.
Hint: see the practice exercises from Lesson 4 for an example of how to remove duplicates using a list.
3. Split practice
Read in the file init_sites.txt
and compute the average CDS length (i.e. average the values in the 7th column). Your answer should be 236.36.
4. The "many counters" problem
Write a script that reads a file of sequences and tallies how many sequences there are of each length. Use sequences3.txt
as input to test your code. After reading through all the sequences, print the sequence length that was the most common.
Hint: you can use a dictionary to keep track of all the tallies, e.g.:
Homework exercise (10 Points)
Codon table
For this question, use codon_table.txt
, which contains a list of all possible codons and their corresponding amino acids. We will be using this info to translate a nucleotide sequence into amino acids. Each part of this question builds off the previous parts.
(A) Thinkin' question (short answer, not code): If we want to create a codon dictionary and use it to translate nucleotide sequences, would it be better to use the codons or amino acids as keys? (2 Points)
(B) Read in codon_table.txt
(note that it has a header line) and use it to create a codon dictionary. Then use raw_input()
prompt the user to enter a single codon (e.g. ATG) and print the amino acid corresponding to that codon to the screen. If the nucleotide combonation is not a valid codon, print a warning message. (4 Points)
(C) Now we will adapt the code in (b) to translate a longer sequence. Instead of prompting the user for a single codon, allow them to enter a longer sequence. First, check that the sequence they entered has a length that is a multiple of 3 (Hint: use the mod operator, %), and print an error message if it is not. If it is valid, then go on to translate every three nucleotides to an amino acid. Print the final amino acid sequence to the screen. We have included some code to help you out. You can either program this function from scratch, or add to the given code. (4 Points)