Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 1428
Kernel: SageMath (stable)
%html <h2>Strings and the Burrows-Wheeler Transform</h2> <p>Sage/Python includes a builtin datastructure from strings.</p> <p>There are several ways to input strings. You can input a string using single quotes (') or double quotes ("):</p>
s = "This is a string!" s
t = 'So is this!' print t
%html <p>&nbsp;</p> <p>You can also input a string using three quotes (""" or '''). This is useful if you want to use both " and ' in your string, or you want your string to span multiple lines:</p>
s = """ This is a multi-line string that includes 'single quotes' and "double quotes". """ print s
%html <p><strong>Exercise:</strong> Create and print the following string.</p> <pre style="font-family: Consolas,monospace;"> \ | ( | ) / / _________________ | | | | __ | I <3 Coffee! |/ \ | | | \ / \__/ \___________/ </pre>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> Without using cut-and-paste(!) <em>replace</em> the substring&nbsp;<strong>I &lt;3 Coffee!</strong> with the substring <strong>I &lt;3 Tea!</strong>.</p>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> Print a copy of your string with all the letters capitalized (upercase).</p>
%html <p>&nbsp;</p> <p>Strings behave very much like lists. For example,</p> <ul> </ul> <p> <table style="text-align: center;" border="0" cellspacing="2" cellpadding="10" align="center"> <tbody> <tr> <td style="text-align: center;"><em><strong>Operation</strong></em></td> <td><em><strong>Syntax for strings</strong></em><strong><br /></strong></td> <td><em><strong>Syntax for lists</strong></em></td> </tr> <tr> <td>Accessing a letter<br /></td> <td><strong>string[3]</strong></td> <td><strong>list[3]</strong></td> </tr> <tr> <td>&nbsp;Slicing</td> <td>&nbsp;<strong>string[3:17:2]</strong></td> <td><strong>list[3:17:2]</strong></td> </tr> <tr> <td>&nbsp;Concatenation</td> <td><strong>string1 + sting2</strong> <br /></td> <td><strong>list1 + list2</strong></td> </tr> <tr> <td>A copy <br /></td> <td><strong>string[:]</strong> <br /></td> <td><strong>list[:]</strong></td> </tr> <tr> <td>&nbsp;A reversed copy<br /></td> <td>&nbsp;<strong>string[::-1]</strong></td> <td><strong>list[::-1]</strong></td> </tr> <tr> <td>Length</td> <td><strong>len(string)</strong></td> <td><strong>len(list)</strong></td> </tr> </tbody> </table> </p> <p>&nbsp;</p>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> The factors of length 2 of 'rhubarb' are</p> <pre class="shrunk" style="padding-left: 90px;">rh<br />hu<br />ub<br />ba<br />ar<br />rb</pre> <p>Write a function called <strong>factors </strong>that returns a list of the factors of length <strong>l</strong> of <strong>s</strong>, and list all the factors of length 3 of 'rhubarb'.</p>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> What happens if you apply your function <strong>factors </strong>to the list <strong>[0,1,1,0,1,0,0,1]</strong>? If it doesn't work for both lists and strings, go back and modify your function so that it does work for both.</p>
%html <h2>Run-length encoding<br /></h2> <p>The string$$WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW$$begins with $12$ copies of $W$, one copy of $B$, then $12$ copies of $W$, then $3$ copies of $B$, then $24$ copies of $W$, then 1 copy of $B$ and $14$ copies of $W$. Thus, it can be encoded as $$(W, 12), (B, 1), (W, 12), (B, 3), (W, 24), (B, 1), (W, 14).$$ This is called the <em>run-length encoding</em> of the string.<br /><br /><strong></strong></p>
%html <p><strong>Exercise:</strong> <ul> <li>Write a function that returns the run-length encoding of a string.</li> <li>Does your function work for lists as well as strings? If not, then can you make it so that it works for both strings and lists?</li> <li> Use your function to compute the run-length encoding of the following list.</li> $$[0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0]$$ </ul>
%html <h2>Rotations</h2> <p><br />The <em>rotations</em> of the string 'bananas' are:</p> <pre class="shrunk" style="padding-left: 60px;">bananas<br />ananasb<br />nanasba<br />anasban<br />nasbana<br />asbanan<br />sbanana</pre> <p>and if we sort these alphabetically, then we get:</p> <pre class="shrunk" style="padding-left: 60px;">ananasb<br />anasban<br />asbanan<br />bananas<br />nanasba<br />nasbana<br />sbanana</pre> <p>&nbsp;</p> <p><strong>Exercise:</strong> Define a function <strong>print_sorted_rotations</strong> that sorts all the rotations of a string and prints them in an array as above. Print the sorted rotations of the strings 'ananas'&nbsp; and 'cocomero'.</p>
%html <h2><br /></h2> <h2>The Burrows-Wheeler Transform</h2> <p><br />The <em>Burrows-Wheeler Transform</em> (BWT) of a string <strong>s</strong> sorts all the rotations of <strong>s</strong> and then returns the last column.<br /><br />For example, if we sort the rotations of 'bananas':<br /><br />&nbsp;&nbsp;&nbsp; ananasb<br />&nbsp;&nbsp;&nbsp; anasban<br />&nbsp;&nbsp;&nbsp; asbanan<br />&nbsp;&nbsp;&nbsp; bananas<br />&nbsp;&nbsp;&nbsp; nanasba<br />&nbsp;&nbsp;&nbsp; nasbana<br />&nbsp;&nbsp;&nbsp; sbanana<br /><br />then the last column is <em>bnnsaaa</em>, so the BWT of <em>bananas</em><strong> </strong>is <em>bnnsaaa</em>.</p> <p><br /><strong>Exercise:</strong> Write a function that returns the BWT of a string. Compute the BWT of <em>bananas</em>, <em>ananas</em> and <em>cocomero</em>. (<em>Hint:</em> You can return you answer as a list, but if you want to return a string, then you might want to use the <strong>join</strong> method for strings.)</p>
%html <h2><br /></h2> <p><strong>Exercise: </strong>Combine the functions you defined above to create an <strong>@interact</strong> object that takes a string <strong>s</strong> and prints:</p> <ul> <li> the sorted rotations of <strong>s</strong></li> <li>the run-length encoding of <strong>s</strong></li> <li>the BWT of <strong>s</strong></li> <li>the run-length encoding of the BWT of <strong>s</strong></li> </ul> <p>(<em>Hint:&nbsp;</em> String formatting can be done using the <strong>%</strong> operator. Here are some examples:</p> <p style="padding-left: 30px;">print 'The sum of %s and %s is %s.' % (3,2,3+2)</p> <p>If you are familiar with <em>C</em> then you will notice that string formating is very similar to <em>C</em>'s <em>sprintf</em> statement.)</p> <p>Use your interact object to explore this transformation, and then to answer the following questions below.</p>
%html <p><strong>Exercise:</strong> What is the BWT of the following?</p> <ul> <li>xxyxyxyxyxyxyxyxyxxyxyxyxyxyxyxyxyxy</li> <li> <p class="shrunk">01101001100101101001011001101001100101100110100101</p> </li> <li> <p class="shrunk">cdccdcdccdccdcdccdcdccdccdcdccdccdcdccdcdccdccdcdc</p> </li> </ul> <p>Do you notice any patterns in the BWT of a string?</p> <p>Can you think of an application for this transformation?</p> <p>Find 3 other strings that have a 'nice' image under the BWT.</p>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> Is the Burrows-Wheeler transformation invertible? (That is, can you find two strings that have the same BWT?)</p>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> By comparing the BWT of a string with the first column of the array of sorted rotations of a string <strong>s</strong>, devise and implement an algorithm that reconstructs the first column of the array from the BWT of <strong>s</strong>.</p>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> By examining the first <em>two</em> columns of the array, devise and implement an algorithm that reconstructs the first <em>two</em> columns of the array from the BWT of a string. (<em>Hint:</em> compare the last and first column with the first two columns.)</p>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> By examining the first <em>three</em> columns of the array, devise and implement an algorithm that reconstructs the first <em>three</em> columns of the array from the BWT of a string.</p>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> Write a function that reconstructs the entire array of sorted rotations of a string from the BWT of the string.</p>
%html <p>&nbsp;</p> <p><strong>Exercise:</strong> A <em>Lyndon word</em> is a word w that comes first in alphabetical order among all its rotations. Is the BWT invertible on Lyndon words?</p>
%html <p><strong>Exercise: </strong>Explain how one can modify the BWT to make it invertible on arbitrary words. Implement your modified transformation and the inverse transformation.</p>