Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 1572
Kernel: Python 2 (SageMath)

Lists

  • Lets make a list:

  • Execute the code by pressing ctrl + enter

    • Name: my_list

    • Value: []

# Write here your code to make an empty list my_list = []
print "'" print '"'
' "

Write your code to make lists:

  • empty list which can hold 4 items

  • list called molecules containing elements: DNA, RNA, Protein and Lipid

  • containing 4 sequences

  • containing at least 1 string, 1 integer, and 1 boolean

# Write here your code to make lists my_list = [None]*4 molecules = ["DNA","RNA","Protein","Lipid","DNA"] seqs = ["ATCG"]*4 mixed_list = ["ABC",3,True] # Print the results print(my_list) print(molecules) print(seqs) print(mixed_list)
[None, None, None, None] ['DNA', 'RNA', 'Protein', 'Lipid', 'DNA'] ['ATCG', 'ATCG', 'ATCG', 'ATCG'] ['ABC', 3, True] 0
print molecules[1:].index("DNA") print molecules[1:2].index("DNA")
3
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-10-20b0201bec85> in <module>() 1 print molecules[1:].index("DNA") ----> 2 print molecules[1:2].index("DNA") ValueError: 'DNA' is not in list

Accessing Lists

Using the lists that you have created, check what is stored at index 1 of the list 'molecules'

# molecules[1]? print molecules[1] # What is the index for 'DNA' in the list molecules? print molecules[0] # Try command list.index(object) print molecules.index("DNA")
RNA DNA 0

Lists – Exercise

  1. Create the following 3 string variables with their restriction sites:

    • eco_r1 : gaattc

    • bam_h1 : ggatcc

    • hind3 : aagctt

  2. Make a list called restriction_enzymes containing the 3 restriction sites.

  3. Using a single command print the restriction sites in the list on 3 different lines.

  4. Print the first letter of eco_r1 using the list.

# Q1, hint: eco_r1 = ? # bam_h1 = ? # hind3 = ? eco_r1 = "gaattc" bam_h1 = "ggatcc" hind3 = "aagctt" # Q2, hint: restriction_enzymes = restriction_enzymes = [eco_r1,bam_h1,hind3]
# Q3, hint: print "EcoRI: ", restriction_enzymes[?], ... print "EcoRI:",restriction_enzymes[0],"\n","BamH1:",restriction_enzymes[1],"\n","Hind3:",restriction_enzymes[2],"\n" # OR use a for loop for restrict in restriction_enzymes: print restrict print " " # OR be fancy and use join print '\n'.join(restriction_enzymes) print " "
EcoRI: gaattc BamH1: ggatcc Hind3: aagctt gaattc ggatcc aagctt gaattc ggatcc aagctt
# Q4, hint: restriction_enzymes?? print restriction_enzymes print restriction_enzymes[0][0]
restriction_enzymes = [eco_r1,bam_h1,hind3] # Bonus question: Change first letter of eco_r1 restriction site from g to G using list 'restriction_enzymes' restriction_enzymes[0] = restriction_enzymes[0][0].upper()+restriction_enzymes[0][1:] print restriction_enzymes #print restriction_enzymes[0][0] restriction_enzymes[0] = restriction_enzymes[0].upper() print restriction_enzymes #restriction_enzymes[0][0] = restriction_enzymes[0][0].upper() #Doesn't work You cant edit strings like this
['Gaattc', 'ggatcc', 'aagctt'] ['GAATTC', 'ggatcc', 'aagctt']
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-13-59eeb8ac5c35> in <module>() 9 print restriction_enzymes 10 ---> 11 restriction_enzymes[0][0] = restriction_enzymes[0][0].upper() TypeError: 'str' object does not support item assignment

Basic List operations

Adding (concatenate) two lists

Repeating items in list

Compare two lists: cmp(L1, L2)

Length of a list: len(L1)

Maximum value: max(L1)

Minimum value: min(L1)

# Try some list operations my_list = [1, 2, 5] + [3, 4, 6] my_list2 = [1, 2, 5] + [2, 4, "ZZZ"] print my_list print [3]*5 print " " print cmp(my_list, my_list) print cmp(my_list[1:], my_list) print cmp(my_list, my_list[2:]) print " " print cmp(my_list, my_list) print " " print len(my_list) print max(my_list) print min(my_list) print len(my_list2) print max(my_list2) print min(my_list2)
[1, 2, 5, 3, 4, 6] [3, 3, 3, 3, 3] 0 1 -1 0 6 6 1 6 ZZZ 1

Basic List operations - Exercise

  1. Remember that we made a list of restriction enzymes as: restriction_enzymes = [eco_r1, bam_h1, hind3]. Create a new variable my_enzyme = 'catgac’

  2. Make a new list with my_enzyme added to the existing list of restriction enzymes.

  3. Make a new list containing 3 repetitions of eco_r1, 4 repetitions of bam_h1, and 2 repetitions of hind3.

  4. What is the length of the lists you just created?

  5. What is the maximum value of the list?

  6. What is the minimum value of the list?

restriction_enzymes = [eco_r1,bam_h1,hind3] # Create variable my_enzyme and concatenate it with restriction_enzyme list my_enzyme = "actgcac" print restriction_enzymes restriction_enzymes.append(my_enzyme) print restriction_enzymes restriction_enzymes += [my_enzyme] print restriction_enzymes restriction_enzymes += my_enzyme print restriction_enzymes restriction_enzymes = restriction_enzymes + my_enzyme # Create list with repeated entries and find its length
['gaattc', 'ggatcc', 'aagctt'] ['gaattc', 'ggatcc', 'aagctt', 'actgcac'] ['gaattc', 'ggatcc', 'aagctt', 'actgcac', 'actgcac'] ['gaattc', 'ggatcc', 'aagctt', 'actgcac', 'actgcac', 'a', 'c', 't', 'g', 'c', 'a', 'c']
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-22-573ede14846e> in <module>() 15 print restriction_enzymes 16 ---> 17 restriction_enzymes = restriction_enzymes + my_enzyme 18 # Create list with repeated entries and find its length TypeError: can only concatenate list (not "str") to list

Looping Lists - Exercise

  1. Calculate the number of nucleotides (A, T, G, and C) used by each restriction enzyme?

  2. Calculate the total number of nucleotides used by all restriction enzymes taken together?

  3. Use the list entry seqs given in the worksheet. How many DNA and RNA sequences are given in seqs? (Hint: try for loop, if elif conditions and string find command)

  4. Remember EcoRI cuts at restriction site 'gaattc'. Can you check how many sequences given in list seqs have EcoRI motif?

    Bonus Question: Make a list containing ['A', 'AA', 'AAA', 'AAAA', ….., 20 A].

eco_r1 = "gaattc" bam_h1 = "ggatcc" hind3 = "aagctt" restriction_enzymes = [eco_r1.upper(), bam_h1.upper(), hind3.upper()] # First try for item in restriction_enzymes: print item # range is a function that returns a list containing arithmetic progressions range(10) # If we need only the index for index in range(len(restriction_enzymes)): print index # If we need both the index and the item for index, item in enumerate(restriction_enzymes): print index, item # Q1 # Calculate number of nucleotides using for loop total_count = 0 for enzyme in restriction_enzymes: print enzyme count = enzyme.count("A") + enzyme.count("T") + enzyme.count("C") + enzyme.count("G") count = len(enzyme) print count total_count += count
GAATTC GGATCC AAGCTT 0 1 2 0 GAATTC 1 GGATCC 2 AAGCTT GAATTC 6 GGATCC 6 AAGCTT 6 0 20 20
# Q3 # Hint: try for loop, if elif conditions and string find command seqs = ['actgactgactgaattcgactg','caucgaucgcgauacacgaucagcuacg','augcagacgacguacgu','atcgatcgatcgatcacgt','atcgtagctactagctagc','acgatcgtagctacgta','cgaucagucgaucgauccagcga','cguacguagcacaugcagucaguauacguacggacgacgac','catgactgactgatcgatgctgactgactg','atcggatctgaactgactg','actgactgactgactg','caucgaucgcgauacacgaucagcuacg','augcagacgacguacgu','atcgatcgaattcgatcgatcacgt','atcgtagctactagctagc','acgatcgaattcgtagctacgta','cgaucagucgaucgauccagcga','cguacguagcacaugcagucaguauacguacggacgacgac','catgactgactgatcgatgaattcgctgactgactg','aucggauccgaaccgacag'] for index,item in enumerate(seqs): seqs[index] = item.upper() dna_count = 0 rna_count = 0 for seq in seqs: print(seq) print(seq.find('U')) if seq.find('U') != -1: rna_count += 1 elif seq.find('G') != -1: dna_count += 1 print "-----------" print dna_count, rna_count print "-----------" # Q4 # Hint: try for loop and append list cut_seqs=0 cut_sequences = [] for seq in seqs: if seq.find(eco_r1.upper()) != -1: cut_seqs += 1 #print seq cut_sequences.append(seq) print(cut_sequences) print(cut_seqs) print(len(cut_sequences))
ACTGACTGACTGAATTCGACTG -1 CAUCGAUCGCGAUACACGAUCAGCUACG 2 AUGCAGACGACGUACGU 1 ATCGATCGATCGATCACGT -1 ATCGTAGCTACTAGCTAGC -1 ACGATCGTAGCTACGTA -1 CGAUCAGUCGAUCGAUCCAGCGA 3 CGUACGUAGCACAUGCAGUCAGUAUACGUACGGACGACGAC 2 CATGACTGACTGATCGATGCTGACTGACTG -1 ATCGGATCTGAACTGACTG -1 ACTGACTGACTGACTG -1 CAUCGAUCGCGAUACACGAUCAGCUACG 2 AUGCAGACGACGUACGU 1 ATCGATCGAATTCGATCGATCACGT -1 ATCGTAGCTACTAGCTAGC -1 ACGATCGAATTCGTAGCTACGTA -1 CGAUCAGUCGAUCGAUCCAGCGA 3 CGUACGUAGCACAUGCAGUCAGUAUACGUACGGACGACGAC 2 CATGACTGACTGATCGATGAATTCGCTGACTGACTG -1 AUCGGAUCCGAACCGACAG 1 ----------- 11 9 ----------- ['ACTGACTGACTGAATTCGACTG', 'ATCGATCGAATTCGATCGATCACGT', 'ACGATCGAATTCGTAGCTACGTA', 'CATGACTGACTGATCGATGAATTCGCTGACTGACTG'] 4 4

Modifying Lists

  1. Take list seqs and add 3 random DNA/RNA sequences at the end of the list

  2. Take list seqs and add 'atcg' as the third sequence of the list

  3. Create a new sorted list seq_sort from seqs

# Try modifying lists molecules[?] = ? print molecules # Use list functions
list1 = [2,1,3,4] list2 = list1 print list1 print list2 list1.sort() print list1 print list2
[2, 1, 3, 4] [2, 1, 3, 4] [1, 2, 3, 4] [2, 1, 3, 4]

Modifying Lists

  1. Remove 'atcg' sequence of the list seqs

  2. Print 5th to 10th sequence entry from the lists seqs and seq_sort

# Use list functions

Converting strings to Lists

Two essential command of list conversion are:

  • Create a list from a string: StringName.split()

  • Create a string from lists: ' '.join(ListName)

  1. Split 'atgcatgcatgc' sequence using split() command

  2. Create a list by splitting 'atgcatgcatgc' sequence using 't' as the separator.

# Try split and join command amino = 'Ala Cys Phe Val Glu'.split(" ") a = ''.join(amino) print a print list(a) # Try split command
AlaCysPheValGlu ['A', 'l', 'a', 'C', 'y', 's', 'P', 'h', 'e', 'V', 'a', 'l', 'G', 'l', 'u']

Tuples

Tuples are Python sequence but unlike lists they are immutable. Tuples use parentheses and lists use square brackets.

tup1 = ('physics', 'chemistry', 1997, 2000)

tup2 = (1, 2, 3, 4, 5 )

tup3 = ('a', 'b', 'c', 'd')

Accessing tuples:

print "tup1[0]: ", tup1[0]

print "tup2[1:5]: ", tup2[1:5]

Try to assign tuples:

tup1[2] = 2015

tup4 = tup2 + tup3

# Make tuples with different items # Access different elements from tuples # Assign different values to tuples

Tuples Exercise

Suppose you want to find distance between Carbon atoms in a protein sequence.

Assign 4 tuples each containing the 3D co-ordinates:

XYZ
C290.551.256-1.153
C1031.28-1.721.24
C1151.591.731.04
C1181.36-1.86-1.74

Calculate square distance between different carbon atoms. Hint use: (x1-x2)^2+(y1-y2)^2+(z1-z2)^2

# Hint: make a list # Each element of the list can contain a tuples with coordinates of a carbon atom =(0.55,1.256,-1.153) c103=(1.28,-1.72,1.24) c115=(1.59,1.73,1.04) c118=(1.36,-1.86,-1.74) c_atoms = [c29,c103,c115,c118] # use two for loops to calculate distance between their X, Y and Z for i in range(len(c_atoms)): for j in range(i,len(c_atoms)): # Option 1, prior knowledge x1,y1,z1=c_atoms[i] x2,y2,z2=c_atoms[j] distance = (x1-x2)**2+(y1-y2)**2+(z1-z2)**2 print i,j, distance # Option 2, fewer assumptions distance = 0 #for x in range(len(c_atoms[i])): # distance += (c_atoms[i][x]-c_atoms[j][x])**2 #print i,j, distance
0 0 0.0 0 1 15.115925 0 2 6.115525 0 3 10.710125 1 1 0.0 1 2 12.0386 1 3 8.9064 2 2 0.0 2 3 20.6694 3 3 0.0