Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 125
Image: ubuntu2004
Kernel: Python 3 (system-wide)

Workshop 3

Task 3.1

We've seen that the string method find() returns the index of the first occurrence of a character within a string. Often we want to find all indicies of a character in a string.

Write some code that loops through the following DNA sequence and outputs the indicies of all occurrences of the base "T".

  • Hint: You should be able to modify your code from Exercise 10.3 for this task.

The answer is 0, 1, 2, 4, 6, 8, 11, 12.
dna_seq = 'TTTATGTATCCTTA' counter=0 idx=[] for x in dna_seq: if x=="T": idx.append(counter) counter +=1 print(idx)
[0, 1, 2, 4, 6, 8, 11, 12]

Task 3.2

Using code, find and print the minimum and maximum values in the following list.

  • Hint: Use indexing on the sorted list.

Minimum is 10 eggs, maximum is 16 eggs
eggs_laid = [10,11,14,16,13,14,12,10,16,11,12,13,16,11,10,14,16,16,15,12,11,13,14,15,12,14,12,15,13,13,11] eggs_laid.sort() min= eggs_laid[0] max= eggs_laid[-1] print(f'Maximum is {max} eggs, minimum is {min} eggs')
Maximum is 16 eggs, minimum is 10 eggs

Task 3.3

Find and print the middle (also called the median) value of the sorted list of eggs_laid. Do not find the middle value by hand: find it using code.

  • Hint 1: Use the list's length. For example, if the length is 5, the middle value is at the 3rd position or index 2.

  • Hint 2: If you get the error "list indices must be integers or slices, not float", remember that the result of the division operator / is a float even when the divisor and dividend are integers. That means you need integer division //.

The answer is 13 eggs.
eggs_laid = [10,11,14,16,13,14,12,10,16,11,12,13,16,11,10,14,16,16,15,12,11,13,14,15,12,14,12,15,13,13,11] eggs_laid.sort() length= len(eggs_laid) print(length//2) print(eggs_laid[15])
15 13

Task 3.4

There are three steps in calculating a median of a list of numbers:

  1. Sort the values from lowest to highest.

  2. Find the number of values.

  3. Find the middle value

    • If the number of values is odd the median is the middle value.

    • If the number of values is even the median is the average of the middle two values.

    • E.g., the median of the values [0.5, 0.6, 0.9] is 0.6. The median of the values [0.5, 0.6, 0.9, 1.1] is (0.6+0.9)/2 = 0.75.

  4. Write code to calculate the median of a list of numbers of any length.

  5. Apply your code to finding the median of the following two lists.

  • Hint 1: You will need to sort the list, find its length, make a decision on which are the middle value(s).

  • Hint 2: For a list with an even number of values be careful to select the middle elements with the correct indicies.

The median of wing_lengths is 2.985.

The median of haemoglobin_levels is 12.6.

wing_lengths = [2.9, 2.93, 2.87, 3.22, 3.1, 2.75, 2.81, 3.15, 3.21, 2.98, 2.99, 2.89, 2.78, 3.15, 2.92, 2.9, 3.12, 2.96, 3.22, 3.26, 2.79, 2.85, 2.94, 2.86, 3.21, 3.21, 2.73, 3.0, 2.94, 2.57, 3.06, 2.95, 3.33, 3.1, 3.19, 2.93, 2.89, 2.81, 3.04, 2.89, 2.81, 3.27, 2.58, 3.3, 3.1, 3.08, 2.89, 3.09, 2.91, 2.75, 3.13, 2.94, 3.35, 2.56, 3.46, 2.93, 2.81, 3.09, 3.25, 2.84, 2.62, 2.89, 3.22, 3.17, 3.13, 3.42, 2.69, 3.11, 3.44, 2.88, 2.46, 3.21, 3.03, 2.88, 2.82, 3.18, 3.11, 2.66, 2.97, 3.1, 2.94, 2.84, 2.7, 3.02, 2.76, 2.91, 3.26, 3.02, 2.91, 3.13, 3.15, 3.23, 2.62, 3.11, 3.19, 3.07, 2.87, 3.3, 3.04, 3.03, 3.04, 2.67] haemoglobin_levels = [12.5, 15.1, 12.6, 10.4, 15.7, 9.2, 17.6, 12.9, 10.6, 12.3, 17.9, 14.0, 15.5, 12.5, 10.6] x= wing_lengths x.sort() n= len(x)//2 second_index= x[n] first_index= x[n-1] if len(x)%2==0: print((second_index+first_index)/2) else: print(second_index)
2.9850000000000003

Task 3.5

In Task 2.12 you wrote some code to test if a word is a palindrome.

Now modify your code so that it tests and prints whether each word in the following list is a palindrome.

words = ['golf', 'level', 'spoon', 'reverser', 'noon', 'racecar', 'cell', 'rotator', 'tape', 'stats', 'bridge', 'lagoon', 'tenet'] for c in words: rev_c= c[::-1] if rev_c==c: print('the word is a palindrome') else: print('the word is not a palindrome')
the word is not a palindrome the word is a palindrome the word is not a palindrome the word is not a palindrome the word is a palindrome the word is a palindrome the word is not a palindrome the word is a palindrome the word is not a palindrome the word is a palindrome the word is not a palindrome the word is not a palindrome the word is a palindrome

Task 3.6

In Exercise 4.4 you calculated the cumulative number of emperor penguins that joined an Antarctic breeding colony on the first three days of the season. The first three week's values are given below.

Print out the cumulative number of penguins on each day of the first three weeks.

The first few lines of your output should look like this
Day Total 0 10 1 166 2 239
arriving_penguins = [10, 156, 73, 376, 786, 432, 1035, 901, 1102, 2567, 1571, 916, 1560, 632, 943, 246, 654, 1456, 504, 632, 185] count = 0 day = 0 print("Day\tTotal") for c in arriving_penguins: count += c print(f'{day}\t{count}') day += 1
Day Total 0 10 1 166 2 239 3 615 4 1401 5 1833 6 2868 7 3769 8 4871 9 7438 10 9009 11 9925 12 11485 13 12117 14 13060 15 13306 16 13960 17 15416 18 15920 19 16552 20 16737

Task 3.7

How many days does it take the colony to just pass 10,000 penguins?

  • Hint: Modify the code in Task 3.6 to break out of the loop when the total passes 10,000.

On day 12 the colony passes 10,000 penguins.
arriving_penguins = [10, 156, 73, 376, 786, 432, 1035, 901, 1102, 2567, 1571, 916, 1560, 632, 943, 246, 654, 1456, 504, 632, 185] count = 0 day=0 found='' for c in arriving_penguins: count += c day += 1 if count==10000: found=c break if found: print('found on {day})') else: print('not found')
not found

Task 3.8

Modify your code from Task 3.7 to use range() so that you do not need a separate variable to count the number of days until the colony passes 10,000 penguins.

Task 3.9

A single nighttime survey of bats in the Forest of Dean produced a list of the species of all individual bats caught, measured and released.

  1. Create a new list of unique bat species caught.

  2. Print the list of unique species and the number of unique species.

  • Hint: Create an empty list and only append a bat species to this list if it is not already in the list as you loop through bat_list.

These are 15 unique species, which are:

Plecotus austriacus, Pipistrellus nathusii, Myotis daubentonii, Nyctalus noctula, Pipistrellus pipistrellus, Eptesicus serotinus, Myotis bechsteinii, Pipistrellus pygmaeus, Myotis brandtii, Myotis mystacinus, Rhinolophus hipposideros, Nyctalus leisleri, Myotis nattereri, Plecotus auritus, Barbastella barbastellus

bat_list = ['Plecotus austriacus', 'Pipistrellus nathusii', 'Myotis daubentonii', 'Nyctalus noctula', 'Pipistrellus pipistrellus', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Pipistrellus nathusii', 'Eptesicus serotinus', 'Myotis bechsteinii', 'Pipistrellus nathusii', 'Pipistrellus pygmaeus', 'Pipistrellus pipistrellus', 'Plecotus austriacus', 'Myotis daubentonii', 'Nyctalus noctula', 'Myotis brandtii', 'Myotis mystacinus', 'Pipistrellus nathusii', 'Pipistrellus pygmaeus', 'Pipistrellus nathusii', 'Rhinolophus hipposideros', 'Nyctalus leisleri', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Nyctalus noctula', 'Plecotus austriacus', 'Pipistrellus nathusii', 'Myotis nattereri', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Plecotus auritus', 'Barbastella barbastellus', 'Pipistrellus nathusii', 'Myotis brandtii', 'Pipistrellus pipistrellus', 'Myotis nattereri']

Task 3.10

In the bat survey, each bat's wingspan was measured. These are given in centimetres, for each bat, in the list wingspans below.

Using the bat_list and wingspans lists, print the average wingspan of Pipistrellus nathusii to 2dp.

  • Hint 1: Loop through the two lists simultaneously. Each time you encounter Pipistrellus nathusii in bat_list append its wingspan to another list. Once finished looping bat_list, calculate the average wingspan using the list of wingspans you have constructed.

  • Hint 2: Rather than summing the values in the list by looping over the list as we did in Notebook 12, you might want to use the inbuilt sum() function. Google it to find out how to use it.

The average wingspan of Pipistrellus nathusii is 20.27 cm.

wingspans = [20.4, 21.1, 17.1, 16.7, 24.4, 17.8, 20.1, 21.2, 20.8, 18.4, 20.0, 20.8, 19.4, 16.9, 18.0, 18.5, 18.0, 17.5, 18.7, 21.6, 21.6, 20.1, 20.6, 22.0, 20.0, 16.8, 24.2, 15.4, 21.2, 22.2, 26.1, 21.5, 18.9, 18.5, 19.9, 20.9, 18.4]

Task 3.11

Repeated, short sequences are of interest to geneticists as they suggest the existence of transposible elements within genomic DNA.

  1. Search for and print the first sequence in the following list that starts with "TATA" and has a second "TATA" repeat later in the sequence.

  2. If no sequences are found then print that none was found.

  • Hint: Loop through the DNA sequences. If a sequence starts with "TATA" test whether the sequence contains a second "TATA" substring. If it does break out of the loop otherwise move onto the next sequence.

dna_sequences = ['TATAGGTATTACGA', 'GATTAGGATGAA', 'TAGCCGGGTATA', 'TATAGGTAGGATATA', 'TATAGGGTTGAAGT']