Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 1571
Kernel: Python 2

File handling – The Hidden Message

  • The file "genome.fa" is a 1 million bp. piece from a bacterial genome

  • Find all open reading frames >= 450 nucleotides / 150 AA

    • Remember an ORF can also be on the complementary strand!

    • An ORF starts with "ATG"

    • An ORF stops with "TAA", "TAG" or "TGA"

  • Translate the ORF into an single letter amino acid sequence

    • ATG --> M

  • Sort the ORFs on length (large to small)

  • From the ORFs take in order the 25th AA

  • What is the hidden message?

# Obtain the AA translation code bases = ["T","C","A","G"] codons = [a+b+c for a in bases for b in bases for c in bases] amino_acids = "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG" codon_table = dict(zip(codons, amino_acids))
# Open the genome file, read the first line, and concatenate the sequence # Make the sequence reverse complement and merge at the end
# Find all start codons # Find all stop codons
# Find the first stop codon in frame after every start and check if length >= 450
# Get all lengths of the ORFs to sort on later # On the sorted ORFs translate the 25th AA