Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 1572
Kernel: Python 2 (SageMath)

Dictionary

A dictionary can be created by listing key-value pairs inside a curly braces.

We can access the value associated with a particular key.

Dictionaries are mutable.

# Try to create a dictionary virus = {'name':'HIV', 'type':'RNA', 'cell':'CD4'} virus['name'] # Change values of a dictionary virus['name'] = 'HCV' print virus
{'cell': 'CD4', 'type': 'RNA', 'name': 'HCV'}

Dictionary – Exercise

  1. Make a dictionary nucleotides and assign A, T, G and C as the keys and corresponding nucleotide names as values.

  2. Print the value assigned to key 'G' in the dictionary nucleotides.

  3. Print the first item stored in nucleotides. ( Hint: DictName.values() )

  4. Print all keys and values stored in nucleotides.

  5. Ask the user to input a sequence and print the nucleotide names using nucleotides.

# Make a dictionary nucleotides = {'A':'Adenine', 'T':'Thymine', 'C':'Cytosine', 'G': 'Guanine'} # Print value stored with key 'G' print nucleotides["G"] # Print the first value in the dictionary. Hint: DictName.values() print nucleotides.values()[0] vals = nucleotides.values() print(vals) print(vals[0]) # Print all keys print nucleotides.keys() # Print all values print nucleotides.values() # Print all keys and their corresponding values for key in nucleotides: print key, nucleotides[key] # Use nucleotides dictionary created earlier # Ask user input # Check for each nucl # print validity using dictionary user_input = raw_input("Please enter a sequence: ").upper() for base in user_input: if base in nucleotides: print "Full name for " + base + " is " + nucleotides[base] else: print base + " is not an existing nucleotide!" break
Guanine Adenine ['Adenine', 'Cytosine', 'Thymine', 'Guanine'] Adenine ['A', 'C', 'T', 'G'] ['Adenine', 'Cytosine', 'Thymine', 'Guanine'] A Adenine C Cytosine T Thymine G Guanine
Please enter a sequence:

Adding to a Dictionary

  1. You can add entries to Python dictionaries. Ask user to enter a new key and value for your virus dictionary.

  2. Create two dictionaries containing information about viruses. Can you create a database (using lists) containing the 2 dictionaries? Print names of all viruses stored in our database.

# Add entries to a dictionary virus = {'name':'HIV', 'type':'RNA', 'cell':'CD4'} virus['load'] = 10000 user_key = raw_input("Please enter a key: ") user_val = raw_input("Please enter a value: ") # Add user defined entry to the virus dictionary virus[user_key] = user_val print virus # Create two dictionaries containing information about viruses virus1 = {'name':'HIV', 'type':'RNA', 'cell':'CD4', 'load':10000} virus2 = {'name':'HCV', 'type':'RNA', 'cell':'CD4', 'load':3000} # Create a database containing 2 virus dictionaries. Print all viruses in the dictionary db = [virus1, virus2] for vir in db: print vir
Please enter a key:
Please enter a value:
{'cell': 'CD4', 'load': 10000, 'type': 'RNA', 'name': 'HIV', 'killrate': '10'} {'cell': 'CD4', 'load': 10000, 'type': 'RNA', 'name': 'HIV'} {'cell': 'CD4', 'load': 3000, 'type': 'RNA', 'name': 'HCV'}

Modifying a Dictionary

You can change the value of a key, e.g.: virus1['load'] = 2000

We can delete an entry using del function similar to lists, e.g.: del virus1['load']

virus1['load'] = 10000 print virus1 virus1['load'] = virus1['load'] + 5000 print virus1 virus1['load'] += 5000 print virus1 # Delete an entry del virus1['cell'] print virus1
{'cell': 'CD4', 'load': 10000, 'type': 'RNA', 'name': 'HIV'} {'cell': 'CD4', 'load': 15000, 'type': 'RNA', 'name': 'HIV'} {'cell': 'CD4', 'load': 20000, 'type': 'RNA', 'name': 'HIV'} {'load': 20000, 'type': 'RNA', 'name': 'HIV'}

Looping a Dictionary

You can loop through a dictionary using its keys.

Create a dictionary which has multiple Patient data.

Calculate the average virus load in HIV and HCV patients.

# Create dictionary that contains virus loads for three different patients viral_load = {'Pat1':18000, 'Pat2':13000, 'Pat3':2200} total_load = 0 # Assign a variable to store total virus load # Loop through the dictionary to calculate sum of all virus loads for name in viral_load: total_load += viral_load[name] # or use one-liners to calculate sum of all virus loads total_load = sum(viral_load.values()) # Create a dictionary of dictionaries to store patient specific data #First initialize the dictionary patients patients = {} patients['Pat1']= {'name':'HIV', 'type':'RNA', 'cell':'CD4', 'load':18000} patients['Pat2']= {'name':'HIV', 'type':'RNA', 'cell':'CD4', 'load':13000} patients['Pat3']= {'name':'HIV', 'type':'RNA', 'cell':'CD4', 'load':19000} patients['Pat4']= {'name':'HCV', 'type':'RNA', 'cell':'Hepa', 'load':2200} patients['Pat5']= {'name':'HCV', 'type':'RNA', 'cell':'Hepa', 'load':8200} HIV_load = [] HCV_load = [] for pat in patients: if patients[pat]["name"] is "HIV": HIV_load.append(patients[pat]["load"]) elif patients[pat]["name"] is "HCV": HCV_load.append(patients[pat]["load"]) #print "Average HIV load = ", float(HIV_load) #print "Average HCV load = ", float(HCV_load) print(sum(HIV_load)/len(HIV_load)) print(sum(HCV_load)/len(HCV_load))
16666 5200
loads = {} for pat in patients: if patients[pat]["name"] not in loads: loads[patients[pat]["name"]] = [] loads[patients[pat]["name"]].append(patients[pat]["load"]) print loads for virus in loads: print "%s %d"%(virus, sum(loads[virus])/len(loads[virus]))
{'HCV': [8200, 2200], 'HIV': [18000, 19000, 13000]} HCV 5200 HIV 16666

Challenge: DNA to Protein sequence

  1. Translate the valid DNA sequences from seqs given in code below to protein sequences using the dictionary codon_table

  2. Did you take the reading frames into account? Translate codon for each reading frame.

Start with the following bit of sexy code:

bases = ['T', 'C', 'A', 'G'] codons = [a+b+c for a in bases for b in bases for c in bases] amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG' codon_table = dict(zip(codons, amino_acids)) # Confused by the code?? Ask us what it means!! # Loop over the seqs seqs = ['actgactgactgaattcgactg','caucgaucgcgauacacgaucagcuacg','augcagacgacguacgu','atcgatcgatcgatcacgt','atcgtagctactagctagc','acgatcgtagctacgta','cgaucagucgaucgauccagcga','cguacguagcacaugcagucaguauacguacggacgacgac','catgactgactgatcgatgctgactgactg','atcggatctgaactgactg','actgactgactgactg','caucgaucgcgauacacgaucagcuacg','augcagacgacguacgu','atcgatcgaattcgatcgatcacgt','atcgtagctactagctagc','acgatcgaattcgtagctacgta','cgaucagucgaucgauccagcga','cguacguagcacaugcagucaguauacguacggacgacgac','catgactgactgatcgatgaattcgctgactgactg','aucggauccgaaccgacag'] # Write your code here to translate sequences # First check what is stored in codons, amino_acids and codon_table variables # Make uppercase and chuck U's for index,item in enumerate(seqs): seqs[index] = item.upper().replace('U','T')
File "<ipython-input-25-38f8df452d0d>", line 14 seqs[index] = item.upper().replace('U','T') for index,item in enumerate(seqs) ^ SyntaxError: invalid syntax
proteins = [] for sequence in seqs: protein = [""]*3 for frame in range(3): for position in range(frame, len(sequence), 3): triplet = sequence[position:position+3] if triplet not in codon_table: break amino_acid = codon_table[triplet] protein[fra me] += amino_acid proteins.append(protein) print proteins
[['TD*LNST', 'LTD*IRL', '*LTEFD'], ['TD*LNST', 'LTD*IRL', '*LTEFD'], ['TD*LNST', 'LTD*IRL', '*LTEFD'], ['HRSRYTISY', 'IDRDTRSAT', 'SIAIHDQL'], ['HRSRYTISY', 'IDRDTRSAT', 'SIAIHDQL'], ['HRSRYTISY', 'IDRDTRSAT', 'SIAIHDQL'], ['MQTTY', 'CRRRT', 'ADDVR'], ['MQTTY', 'CRRRT', 'ADDVR'], ['MQTTY', 'CRRRT', 'ADDVR'], ['IDRSIT', 'SIDRSR', 'RSIDH'], ['IDRSIT', 'SIDRSR', 'RSIDH'], ['IDRSIT', 'SIDRSR', 'RSIDH'], ['IVATS*', 'S*LLAS', 'RSY*L'], ['IVATS*', 'S*LLAS', 'RSY*L'], ['IVATS*', 'S*LLAS', 'RSY*L'], ['TIVAT', 'RS*LR', 'DRSYV'], ['TIVAT', 'RS*LR', 'DRSYV'], ['TIVAT', 'RS*LR', 'DRSYV'], ['RSVDRSS', 'DQSIDPA', 'ISRSIQR'], ['RSVDRSS', 'DQSIDPA', 'ISRSIQR'], ['RSVDRSS', 'DQSIDPA', 'ISRSIQR'], ['RT*HMQSVYVRTT', 'VRSTCSQYTYGRR', 'YVAHAVSIRTDDD'], ['RT*HMQSVYVRTT', 'VRSTCSQYTYGRR', 'YVAHAVSIRTDDD'], ['RT*HMQSVYVRTT', 'VRSTCSQYTYGRR', 'YVAHAVSIRTDDD'], ['HD*LIDAD*L', 'MTD*SMLTD', '*LTDRC*LT'], ['HD*LIDAD*L', 'MTD*SMLTD', '*LTDRC*LT'], ['HD*LIDAD*L', 'MTD*SMLTD', '*LTDRC*LT'], ['IGSELT', 'SDLN*L', 'RI*TD'], ['IGSELT', 'SDLN*L', 'RI*TD'], ['IGSELT', 'SDLN*L', 'RI*TD'], ['TD*LT', 'LTD*L', '*LTD'], ['TD*LT', 'LTD*L', '*LTD'], ['TD*LT', 'LTD*L', '*LTD'], ['HRSRYTISY', 'IDRDTRSAT', 'SIAIHDQL'], ['HRSRYTISY', 'IDRDTRSAT', 'SIAIHDQL'], ['HRSRYTISY', 'IDRDTRSAT', 'SIAIHDQL'], ['MQTTY', 'CRRRT', 'ADDVR'], ['MQTTY', 'CRRRT', 'ADDVR'], ['MQTTY', 'CRRRT', 'ADDVR'], ['IDRIRSIT', 'SIEFDRSR', 'RSNSIDH'], ['IDRIRSIT', 'SIEFDRSR', 'RSNSIDH'], ['IDRIRSIT', 'SIEFDRSR', 'RSNSIDH'], ['IVATS*', 'S*LLAS', 'RSY*L'], ['IVATS*', 'S*LLAS', 'RSY*L'], ['IVATS*', 'S*LLAS', 'RSY*L'], ['TIEFVAT', 'RSNS*LR', 'DRIRSYV'], ['TIEFVAT', 'RSNS*LR', 'DRIRSYV'], ['TIEFVAT', 'RSNS*LR', 'DRIRSYV'], ['RSVDRSS', 'DQSIDPA', 'ISRSIQR'], ['RSVDRSS', 'DQSIDPA', 'ISRSIQR'], ['RSVDRSS', 'DQSIDPA', 'ISRSIQR'], ['RT*HMQSVYVRTT', 'VRSTCSQYTYGRR', 'YVAHAVSIRTDDD'], ['RT*HMQSVYVRTT', 'VRSTCSQYTYGRR', 'YVAHAVSIRTDDD'], ['RT*HMQSVYVRTT', 'VRSTCSQYTYGRR', 'YVAHAVSIRTDDD'], ['HD*LIDEFAD*L', 'MTD*SMNSLTD', '*LTDR*IR*LT'], ['HD*LIDEFAD*L', 'MTD*SMNSLTD', '*LTDR*IR*LT'], ['HD*LIDEFAD*L', 'MTD*SMNSLTD', '*LTDR*IR*LT'], ['IGSEPT', 'SDPNRQ', 'RIRTD'], ['IGSEPT', 'SDPNRQ', 'RIRTD'], ['IGSEPT', 'SDPNRQ', 'RIRTD']]
start_codon = "ATG" for sequence in seqs: start_pos = [n for n in xrange(len(sequence)) if sequence.find(start_codon, n) == n] for i in start_pos: triplets = [sequence[n:n+3] for n in range(i,len(sequence),3)] protein = ''.join([codon_table.get(triplet,"") for triplet in triplets]) stop_pos = protein.find("*") if stop_pos != -1: print sequence print triplets print protein print protein[:stop_pos]+"\n"
CATGACTGACTGATCGATGCTGACTGACTG ['ATG', 'ACT', 'GAC', 'TGA', 'TCG', 'ATG', 'CTG', 'ACT', 'GAC', 'TG'] MTD*SMLTD MTD CATGACTGACTGATCGATGAATTCGCTGACTGACTG ['ATG', 'ACT', 'GAC', 'TGA', 'TCG', 'ATG', 'AAT', 'TCG', 'CTG', 'ACT', 'GAC', 'TG'] MTD*SMNSLTD MTD