Workshop 3
Task 3.1
We've seen that the string method find()
returns the index of the first occurrence of a character within a string. Often we want to find all indicies of a character in a string.
Write some code that loops through the following DNA sequence and outputs the indicies of all occurrences of the base "T".
Hint: You should be able to modify your code from Exercise 10.3 for this task.
Task 3.2
Using code, find and print the minimum and maximum values in the following list.
Hint: Use indexing on the sorted list.
Task 3.3
Find and print the middle (also called the median) value of the sorted list of eggs_laid
. Do not find the middle value by hand: find it using code.
Hint 1: Use the list's length. For example, if the length is 5, the middle value is at the 3rd position or index 2.
Hint 2: If you get the error "list indices must be integers or slices, not float", remember that the result of the division operator
/
is a float even when the divisor and dividend are integers. That means you need integer division//
.
Task 3.4
There are three steps in calculating a median of a list of numbers:
Sort the values from lowest to highest.
Find the number of values.
Find the middle value
If the number of values is odd the median is the middle value.
If the number of values is even the median is the average of the middle two values.
E.g., the median of the values [0.5, 0.6, 0.9] is 0.6. The median of the values [0.5, 0.6, 0.9, 1.1] is (0.6+0.9)/2 = 0.75.
Write code to calculate the median of a list of numbers of any length.
Apply your code to finding the median of the following two lists.
Hint 1: You will need to sort the list, find its length, make a decision on which are the middle value(s).
Hint 2: For a list with an even number of values be careful to select the middle elements with the correct indicies.
The median of haemoglobin_levels is 12.6.
Task 3.5
In Task 2.12 you wrote some code to test if a word is a palindrome.
Now modify your code so that it tests and prints whether each word in the following list is a palindrome.
Task 3.6
In Exercise 4.4 you calculated the cumulative number of emperor penguins that joined an Antarctic breeding colony on the first three days of the season. The first three week's values are given below.
Print out the cumulative number of penguins on each day of the first three weeks.
Task 3.7
How many days does it take the colony to just pass 10,000 penguins?
Hint: Modify the code in Task 3.6 to break out of the loop when the total passes 10,000.
Task 3.8
Modify your code from Task 3.7 to use range()
so that you do not need a separate variable to count the number of days until the colony passes 10,000 penguins.
Task 3.9
A single nighttime survey of bats in the Forest of Dean produced a list of the species of all individual bats caught, measured and released.
Create a new list of unique bat species caught.
Print the list of unique species and the number of unique species.
Hint: Create an empty list and only append a bat species to this list if it is not already in the list as you loop through
bat_list
.
Plecotus austriacus, Pipistrellus nathusii, Myotis daubentonii, Nyctalus noctula, Pipistrellus pipistrellus, Eptesicus serotinus, Myotis bechsteinii, Pipistrellus pygmaeus, Myotis brandtii, Myotis mystacinus, Rhinolophus hipposideros, Nyctalus leisleri, Myotis nattereri, Plecotus auritus, Barbastella barbastellus
Task 3.10
In the bat survey, each bat's wingspan was measured. These are given in centimetres, for each bat, in the list wingspans
below.
Using the bat_list
and wingspans
lists, print the average wingspan of Pipistrellus nathusii to 2dp.
Hint 1: Loop through the two lists simultaneously. Each time you encounter Pipistrellus nathusii in
bat_list
append its wingspan to another list. Once finished loopingbat_list
, calculate the average wingspan using the list of wingspans you have constructed.Hint 2: Rather than summing the values in the list by looping over the list as we did in Notebook 12, you might want to use the inbuilt
sum()
function. Google it to find out how to use it.
The average wingspan of Pipistrellus nathusii is 20.27 cm.
Task 3.11
Repeated, short sequences are of interest to geneticists as they suggest the existence of transposible elements within genomic DNA.
Search for and print the first sequence in the following list that starts with "TATA" and has a second "TATA" repeat later in the sequence.
If no sequences are found then print that none was found.
Hint: Loop through the DNA sequences. If a sequence starts with "TATA" test whether the sequence contains a second "TATA" substring. If it does break out of the loop otherwise move onto the next sequence.