Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Project: CSCI 195
Views: 259
1
#Frequent Words Problem (1A in book, pg. 7)
2
3
#Find the most frequent words in a string.
4
#Input: A string TEXT and an integer k. -- will be supplied as text file with TEXT on one line; k on next line
5
#Output: All most frequent k-mers in TEXT.
6
7
##Use argparse to read a file specified on the command line
8
9
10
#Associate contents of text file with variable - open the file
11
12
13
##Read in information from textfile
14
#Get text string from first line of file into variable text, strip newline
15
16
#Get value for k from second line of file into variable num, strip newline, assign to k
17
18
19
##Find all kmers and build a dictionary of kmers
20
21
#move through text by each position from 0 to end of text less the value of k (length of kmer) e.g., for index in range(index_0,index_lenghtoftext-k)
22
#define each kmer by getting substring of text from i to i+k
23
24
25
##Identify which kmers occur most often in text by looking at number of times each kmer seen in dictionary
26
#initialize highest total seen to 0
27
28
29
##Loop through kmer dictionary to find largest number of times a kmer was seen
30
#OR
31
##Get all values of kmer dictionary, sort list of values, grab highest total
32
33
34
##With most frequent total identified, go back through dictionary to get kmers with total matching highest total observed
35
#define the result variable as null
36
37
#loop through each kmer in the dictionary, obtaining value (times seen) and compare to current highest value seen
38
39
##Print list of most frequently occurring kmers
40
41
42
43
44
45
46