By now you should have completed the fourth waypoint (this was in chapter 10, from week 1), and have the corresponding numerical value answer. This should be given here to confirm you have successfully completed it.
Answer that will be automatically graded below, ID: cell-55ff9879af308205
In [ ]:
Test your code from above here(2 points), ID: cell-86a036301ddef2e9
In [ ]:
The ultimate purpose of this assignment is that you are provided with two documents, encrypt.txt and reference_reduced.txt. These are stored on a webserver, and will be download when required. reference_reduced.txt is a generic piece of text in some European (non-English) language. You are going to use this to work out the typical frequencies of different characters in that language. encrypt.txt is another piece of text in that language, but it has been encrypted by a Caesar cipher of unknown offset. You are going to analyse the frequency of the letters in that text, and work out what the unknown offset is.
The Caesar cipher encrypts letters by shifting them by a fixed offset in the alphabet. For example, an offset of 2 performs the encoding
If there is any punctuation (including spaces), these do not get encoded, and are just left as they are. There are different conventions for what you could choose to do with capital letters.
To check your understanding, manually calculate the following (there is no need to write code):
Apply the Caesar cipher of offset 4 (so A↦E) to the string "Python is great". All letters should be in capitals. Set the variable encoded to the encoded string.
The string "P DPSS HWWSF MVY HU PUALYUZOPW" has been encoded with a Caesar cipher of offset 7. Decode it, and set the output to the variable decoded.
What is the most commonly used letter in the English language? Give your single character answer as a string named common
In a long piece of English text which you know is encrypted using a Caesar cipher, if the most common letter is Z, what is the most likely offset of the encoding? Give the integer answer as variable offset
So, let's say we think the answer to question 4 is 0. You'd write in the box below offset=0. Similarly, you'd write decoded="SOME TEXT".
Answer that will be automatically graded below, ID: cell-fa4bbdcd7e48caaf
In [ ]:
# YOUR CODE HEREencoded='TCXLSR MW KVIEX'decoded='PYTHON IS GREAT'encoded='P DPSS HWWSF MVY HU PUALYUZOPW'decoded='I WILL APPLY FOR AN INTERNSHIP'common='E'offset=22decoded='OKIA PATO'
Test your code from above here(1 point), ID: cell-9f20aa9e4bd4e071
In [ ]:
#test to check that the encoded variable is correct
Test your code from above here(1 point), ID: cell-82cb8a7f9a451a7e
In [ ]:
#test to check that the decoded variable is correct
Test your code from above here(1 point), ID: cell-8faf43ef0e7bc281
In [ ]:
#test to check that the common variable is correct
Test your code from above here(1 point), ID: cell-6225659c6e606834
In [ ]:
#test to check that the offset variable is correct
Clearly, we want to be able to identify the most commonly occuring letters if we want to decode an encrypted message.
You are provided with a function that will read text from a given file on a website. Complete the function get_freqs so that it accepts one argument, a string of text. Return a list with 26 elements in it. The first is the number of "a"s in the text, the second is the number of "b"s and so on.
Punctuation has not been encrypted
Hint: don't forget about capital letters! For frequency counts, "a" and "A" are equivalent.
Hint: what useful "string methods" does python provide that might speed up your count?
Your code most return the answer rather than print it (you can print it as well, but the return is crucial), and it must use the same function name that you've been given.
Look at the tests to understand what the input and output should look like.
Answer that will be automatically graded below, ID: cell-c09143c67a934fc8
In [ ]:
importrequests# def load_file(fName):# """Input: filename (string)# Output: file contents as a string"""# assert isinstance(fName, str),"Input must be a string"# f = open(fName,'r')# return f.read()defload_file(fName):url="https://www.ma.rhul.ac.uk/akay/teaching/python/"+fNameresponse=requests.get(url)#check that it opened correctlyassertresponse.status_code==requests.codes.okreturnresponse.textdefget_freqs_from_file(fName):"""Input: filename (string) to open) Output: list of 26 elements The first element of the output is the number of 'a's in the given text, the second element is the number of 'b's"""assertisinstance(fName,str),"Input must be a string"returnget_freqs(load_file(fName))defget_freqs(data):"""Input: string Output: list of 26 elements The first element of the output is the number of 'a's in the given string, the second element is the number of 'b's"""# YOUR CODE HEREraiseNotImplementedError()s=input()str1=''alphabet=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']foriinrange(len(s)):ifs[i]inalphabet:temp=ord(s[i])num=(temp-97+3)%26str1=chr(num+97)print(str1,end="")else:print(" ",end="")
Test your code from above here(1 point), ID: cell-39fa53789544edfd
In [ ]:
#first basic tests#see how, if you don't understand what the question wants, you can probably use these examples to figure it out?assertget_freqs("abcdde")==[1,1,1,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]assertget_freqs("ab? cdde.!")==[1,1,1,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Test your code from above here(1 point), ID: cell-9c853ed8d5c06ed7
In [ ]:
#how do you handle mixed cases?assertget_freqs("Hello World")==[0,0,0,1,1,0,0,1,0,0,0,3,0,0,2,0,0,1,0,0,0,0,1,0,0,0]
Test your code from above here(1 point), ID: cell-437de983f2f88b80
In [ ]:
#a more extensive test
Test your code from above here(1 point), ID: cell-9c33d89abf53c180
In [ ]:
#test the the frequencies from the two given files are correct.
Vector Inner Product
Recall that if you have two vectors v1 and v2, you can calculate their inner product,
where θ is the angle between the two vectors, measuring how similar the directions of the two vectors are.
Complete the function inner_product that accepts v1 and v2 as lists, and returns their inner product.
Don't forget pre-conditions!
Answer that will be automatically graded below, ID: cell-c9b55875f78ab93e
definner_product(list1,list2):'''input: two lists of real numbers of the same length output: the inner product of the two lists'''# YOUR CODE HEREassertall(isinstance(n,int)forninlist1)assertall(isinstance(n,int)forninlist2)calculation=sum([m*mforminzip(list1,list2)])#Dot product calculationreturn(calculation)print(inner_product([1,2],[1,5]))print(inner_product([1,0,1],[-2,5,2]))
Test your code from above here(2 points), ID: cell-bfb30a03c1900850
we'll get a measure of how similar the character frequency distributions are between the encrypted text and the reference text (it would make more sense if we rescaled based on the lengths of the two individual vectors, but that's unnecessary for what's coming). Of course, we don't expect them to be very similar. But, if we tried every possible decryption of the encrypted text, then probably the correctly decrypted version would have the largest inner product. Of course, we don't have to try all possible decryptions of the encrypted text, we just use the frequency analysis that we've already done.
The cycle_list operation is defined as taking a list (first argument), removing one element from the front, and putting it back on the end. This can be repeated k times (optional second argument). Complete the cycle_list function.
Answer that will be automatically graded below, ID: cell-6a8f42d8dce8e051
defcycle_list(myList,k=1):""" Input: a list and a cycle length Output: the list cycled k times (defaults to 1)"""# YOUR CODE HEREassertisinstance(k,int)assertk>0newList=#a new list created to enter the shifted elementsforxinrange(len(myList)):shift=myList[(x+1)%len(myList)]newList.append(shift)output=newListifk==1:#order 1returnoutputelse:#not order 1returncycle_list(output,k-1)print(cycle_list([1,2,3,4]))print(cycle_list([1,2,3,4],k=3))print(cycle_list([1,2,"a",4,"b"]))
[2, 3, 4, 1]
[4, 1, 2, 3]
[2, 'a', 4, 'b', 1]
Test your code from above here(1 point), ID: cell-a991ec688e93a51a
In [ ]:
#simple tests only this timeassertcycle_list([1,2,3,4])==[2,3,4,1]assertcycle_list([1,2,3,4],3)==[4,1,2,3]assertcycle_list([1,2,"a",4,"b"])==[2,"a",4,"b",1]
Test your code from above here(1 point), ID: cell-5726340677847deb
In [ ]:
cycling_list=[1,2,3,4,5]#this catches a surprising issue that can arise.assertcycle_list(cycling_list)==[2,3,4,5,1]assertcycle_list(cycling_list)==[2,3,4,5,1]
Bringing it Together
For each of the 26 different possible cycle lengths, we want to apply that cycle to the frequency information of the encrypted text, and calculate the inner product with the reference frequency information. Then we need to determine which was the cycle with the largest inner product. Complete the all_cycles function.
Answer that will be automatically graded below, ID: cell-64f2bf0e4f2afa1e
defget_best_cycle_length():''' Inputs: none Outputs: length of cycle that gives largest inner product. '''#test all the possible Caesar cipher offsetsans=all_cycles(encrypt_freqs,ref_freqs)returnans.index(max(ans))#find the largest entry in the list, and return its position in the list. that's the desired offsetdefall_cycles(list1,list2):''' Input: two lists of equal length containing frequency data Output: list that is the same length as the input. Element n (0-indexed) is inner product between n-cycle of first list, and second list'''# YOUR CODE HEREassertlen(list1)==len(list2)longList=forxinrange(1,len(list1)+1):#We start at 1 because at minimum k=1brandNew=cycle_list(list1,k=x)calculation=inner_product(brandNew,list2)#The inner product calculation happens here between the brandNew list and list 2longList.append(calculation)bigShift=cycle_list(longList,k=(-1)%len(list1))#As we can not start from k=0, we must shift the longListassertlen(bigShift)==len(list1)returnbigShiftprint(all_cycles([1,0,0,0,0],[1,2,3,7,9]))print(all_cycles([0,0,1,1,0],[1,2,3,7,9]))
[1, 9, 7, 3, 2]
[10, 5, 3, 10, 16]
Test your code from above here(1 point), ID: cell-ba28bca0d8c3fb4c
Explain (in your own words) how the for loop works within decrypt: what do the chr and ord functions do? Each member of the pair programming pair should write their own explanation, independently, in a separate copy of the file (which will contain the same solutions to the programming questions).