Due date: Monday, October 13th, 8:25AM.

Write here the names of the collaborators (just in case):

If you want to know the image of, say, 10000, under this permutation, just look at the corresponding entry:

28390

And the other way around:

10000

Use this permutation to construct a Pearson hash function turning arbitrarily long byte strings into 2-character hashes (use some padding scheme when the original string has odd length).

Make sure that you get the following values:

$m = \tt{hello!}$, $h = \tt{1b3f}$

$m = \tt{HELLO?}$, $h = \tt{fea5}$

$m = \verb{A longer one{$,$h = \tt{e1ba}$

Tests:

True
True
True

From now on, you play the attacker and try to break the above (insecure) hash function, to which you have unlimited access (*i.e.*, you can compute as many hashes as you want with it).

How many hashes of randomly generated strings do you expect it would take before a collision is found for this hash function?

The number of unique 2-byte hashes produced by this function is $\color{blue} 256^2 = 2^{16} = 65536$. By the general theory, we expect a collision after having generated roughly the square root of this number of hashes of random strings, or $\color{blue} 2^8 = 256$.

Check how lucky you are today by performing a birthday attack on the Pearson hash function.

Collision found after 281 trials:
m1 = 0/#i_kKD<+i3H#{]6^3Mwl]3]rJIqtJH^HD8|cuvaA^d3>QL_z3KS(PY~Y<$fBSQ
h1 = 8c47
m2 = xC/-,DhhR^v8Pomx>X-6P3.q'Ue)JJmxPoIM<-rPXR%uOrnw"5Yf.Qp96*R<J?pwef+4yb
h2 = 8c47
CPU time: 0.13 s, Wall time: 0.14 s

The situation is much worse than that, since the Pearson hash function is not designed to be cryptographically secure. Convince yourself that, given two strings, such as
$a = \tt{target}$ and $b = \verb{beginning of the end{$,
it is always possible to append two characters at the end of$b$to get a new string$b'$such that$H(b') = H(a)$. Be fair game by using only information known to the attacker (*i.e.*, it is forbidden to look at $\tt{perm}$ but you can perform as many hash evaluations as you like).

Since there are only $\color{blue} 2^{16}$ 2-character strings to choose from, we can reasonably consider brute-forcing a collision by trying them all:

Collision found after 48258 trials
m1 = target
h1 = 9902
m2 = beginning of the end + bc81
h2 = 9902
CPU time: 5.95 s, Wall time: 5.97 s

But actually... If you think about the computations that are being made by the Pearson hash function to compute the hash of $\color{blue} \mathtt{b + suffix}$, you realize that only the very last step changes: the hash of $\color{blue} \mathtt{b}$ is xored with the 2-character suffix, then permuted. In equations:
$\color{blue} h( \mathtt{b + suffix} ) = \sigma( h(\mathtt{b}) \oplus \mathtt{suffix} ).$
To make this equal to a given value, say $\color{blue} h(\mathtt{a})$, we can solve
$\color{blue} \sigma( h(\mathtt{b}) \oplus \mathtt{suffix} ) = h(\mathtt{a})$
to get the (unique) solution:
$\color{blue} \mathtt{suffix} = \sigma^{-1}( h( \mathtt{a})) \oplus h(\mathtt{b}).$
Thus we could make this attack much more general by using brute force once and for all *to recover the permutation* -- and after that we can easily perform the above extension attack for any pair of strings $\color{blue} \mathtt{a}$ and $\color{blue} \mathtt{b}$.

CPU time: 2.42 s, Wall time: 2.42 s

We can check that the Pearson hash function was successfully reverse-engineered:

True

Now collisions can be generated in the blink of an eye:

Collision found:
m1 = This is the string with the desired hash
h1 = 64ba
m2 = We want to append two characters to this one and reach the target hash + 7e7a
h2 = 64ba

Now a "real-world" example. Some hackers have leaked a large database of SHA1 password hashes from a well-known online retailer, where we learn that Alice's password hashes to eb0176f4d146dd4118ec6ef04552ee92febc3550.

You think Alice might be the kind of person to use her birthdate as password, and you know she was born after 2000. Can you recover her password? (the website administrators carelessly didn't use any salt when hashing the passwords).

Brute-forcing a such birthdate is absolutely not a daunting task, as there are right now less than:

5475

days to choose from after January 1st, 2000. Now of course we don't know precisely the format in which the birthdate was written: but that shouldn't stop an attacker from trying a couple of popular formats! (this only adds a few bits to the search space...)

Found password: 25-03-2006 with SHA1 hash: eb0176f4d146dd4118ec6ef04552ee92febc3550

(Did you see how that was fast?!)

'eb0176f4d146dd4118ec6ef04552ee92febc3550'