CoCalc Public Files3 - Hashes to ashes (prof) / Hashes to ashes (prof).sagews
Author: Gabriel Chênevert
Views : 347
Compute Environment: Ubuntu 18.04 (Deprecated)

Due date: Monday, October 13th, 8:25AM.

Write here the names of the collaborators (just in case):

## A) Pearson hashing

You are given a permutation of the integers 0 to 2^16 - 1, as a list:
perm = load("perm")

If you want to know the image of, say, 10000, under this permutation, just look at the corresponding entry:
perm[10000]

28390
And the other way around:
perm.index(28390)

10000

Use this permutation to construct a Pearson hash function turning arbitrarily long byte strings into 2-character hashes (use some padding scheme when the original string has odd length).

Make sure that you get the following values:

$m = \tt{hello!}$, $h = \tt{1b3f}$

$m = \tt{HELLO?}$, $h = \tt{fea5}$

$m = \verb{A longer one{$,$h = \tt{e1ba}$

# Utility function for XORing strings together

def xor(a,b):

return ''.join([ chr( ord(x)^^ord(y) ) for (x,y) in zip(a,b) ])

# Pearson hash function itself

def Pearson(m):

if len(m)%2 == 1:
print "Error: Pearson hash for odd-length strings not implemented yet."
print "A padding scheme that avoids trivial collisions should be used."
return

h = 2*chr(0)

for i in range(len(m)/2):

h = xor(h,m[2*i:2*i+2])
h = 256*ord(h[0]) + ord(h[1])
h = perm[h]
h = chr(h // 256) + chr(h % 256)

return h

# Allard-Bouyé very fast implementation

def pearsonHash(m, perm):
h = 0
for i in range(0, len(m), 2):
index = h ^^ int(m[i:i+2].encode('hex'), 16)
h = perm[index]
return hex(h)


Tests:

Pearson("hello!").encode("hex") == "1b3f"
Pearson("HELLO?").encode("hex") == "fea5"
Pearson("A longer one").encode("hex") == "e1ba"

True True True

## B) Birthday attack

From now on, you play the attacker and try to break the above (insecure) hash function, to which you have unlimited access (i.e., you can compute as many hashes as you want with it).

How many hashes of randomly generated strings do you expect it would take before a collision is found for this hash function?

The number of unique 2-byte hashes produced by this function is $\color{blue} 256^2 = 2^{16} = 65536$. By the general theory, we expect a collision after having generated roughly the square root of this number of hashes of random strings, or $\color{blue} 2^8 = 256$.

Check how lucky you are today by performing a birthday attack on the Pearson hash function.
def random_string():

# Let's use only printable characters for user-friendliness

length = 2*ZZ.random_element(10,50)

return "".join( [ chr(ZZ.random_element(32,127)) for i in [1..length]] )

%time

N = 0

found = false

strings = []
hashes  = []

m = random_string()
h = Pearson(m)

if h in hashes:
found = true
mm = strings[ hashes.index(h) ]
else:
strings += [ m ]
hashes  += [ h ]

N += 1

print "Collision found after %s trials:" % N
print
print "m1 = %s" % m
print "h1 = %s" % Pearson(m).encode("hex")
print
print "m2 = %s" % mm
print "h2 = %s" % Pearson(mm).encode("hex")
print

Collision found after 281 trials: m1 = 0/#i_kKD<+i3H#{]6^3Mwl]3]rJIqtJH^HD8|cuvaA^d3>QL_z3KS(PY~Y<$fBSQ h1 = 8c47 m2 = xC/-,DhhR^v8Pomx>X-6P3.q'Ue)JJmxPoIM<-rPXR%uOrnw"5Yf.Qp96*R<J?pwef+4yb h2 = 8c47 CPU time: 0.13 s, Wall time: 0.14 s ## C) Extension attack The situation is much worse than that, since the Pearson hash function is not designed to be cryptographically secure. Convince yourself that, given two strings, such as $a = \tt{target}$ and$b = \verb{beginning of the end{$, it is always possible to append two characters at the end of$b$to get a new string$b'$such that$H(b') = H(a)\$. Be fair game by using only information known to the attacker (i.e., it is forbidden to look at $\tt{perm}$ but you can perform as many hash evaluations as you like).

Since there are only $\color{blue} 2^{16}$ 2-character strings to choose from, we can reasonably consider brute-forcing a collision by trying them all:

%time

a = "target"

h = Pearson(a)

b = "beginning of the end"

found = false

i = 0

suffix = chr( i // 256 ) + chr( i % 256 )

if Pearson(b + suffix) == h:
found = true

i += 1

print "Collision found after %s trials" % i
print
print "m1 = %s" % a
print "h1 = %s" % h.encode("hex")
print
print "m2 = %s + %s" % (b, suffix.encode("hex"))     # non-printable characters don't print nicely...
print "h2 = %s" % Pearson(b + suffix).encode("hex")
print

Collision found after 48258 trials m1 = target h1 = 9902 m2 = beginning of the end + bc81 h2 = 9902 CPU time: 5.95 s, Wall time: 5.97 s

But actually... If you think about the computations that are being made by the Pearson hash function to compute the hash of $\color{blue} \mathtt{b + suffix}$, you realize that only the very last step changes: the hash of $\color{blue} \mathtt{b}$ is xored with the 2-character suffix, then permuted. In equations: $\color{blue} h( \mathtt{b + suffix} ) = \sigma( h(\mathtt{b}) \oplus \mathtt{suffix} ).$ To make this equal to a given value, say $\color{blue} h(\mathtt{a})$, we can solve $\color{blue} \sigma( h(\mathtt{b}) \oplus \mathtt{suffix} ) = h(\mathtt{a})$ to get the (unique) solution: $\color{blue} \mathtt{suffix} = \sigma^{-1}( h( \mathtt{a})) \oplus h(\mathtt{b}).$ Thus we could make this attack much more general by using brute force once and for all to recover the permutation -- and after that we can easily perform the above extension attack for any pair of strings $\color{blue} \mathtt{a}$ and $\color{blue} \mathtt{b}$.

%time

# First, recover the permutation by recording the hashes of all 2-character strings

my_perm = []

for i in range(2^16):

m = chr(i // 256) + chr(i % 256)
h = Pearson(m)
my_perm += [ 256*ord(h[0]) + ord(h[1]) ]

CPU time: 2.42 s, Wall time: 2.42 s

We can check that the Pearson hash function was successfully reverse-engineered:

my_perm == perm

True

Now collisions can be generated in the blink of an eye:

a = "This is the string with the desired hash"
b = "We want to append two characters to this one and reach the target hash"

ha = Pearson(a)
hb = Pearson(b)

ha_num = 256 * ord(ha[0]) + ord(ha[1])

tmp = my_perm.index(ha_num)

suffix = xor( chr(tmp // 256) + chr(tmp % 256), hb )

print "Collision found:"
print
print "m1 = %s" % a
print "h1 = %s" % ha.encode("hex")
print
print "m2 = %s + %s" % (b, suffix.encode("hex"))     # non-printable characters don't print nicely...
print "h2 = %s" % Pearson(b + suffix).encode("hex")

Collision found: m1 = This is the string with the desired hash h1 = 64ba m2 = We want to append two characters to this one and reach the target hash + 7e7a h2 = 64ba

## D) Dictionary attack

Now a "real-world" example. Some hackers have leaked a large database of SHA1 password hashes from a well-known online retailer, where we learn that Alice's password hashes to eb0176f4d146dd4118ec6ef04552ee92febc3550.

You think Alice might be the kind of person to use her birthdate as password, and you know she was born after 2000. Can you recover her password? (the website administrators carelessly didn't use any salt when hashing the passwords).

Brute-forcing a such birthdate is absolutely not a daunting task, as there are right now less than:

15 * 365

5475

days to choose from after January 1st, 2000. Now of course we don't know precisely the format in which the birthdate was written: but that shouldn't stop an attacker from trying a couple of popular formats! (this only adds a few bits to the search space...)



from Crypto.Hash import SHA

target = "eb0176f4d146dd4118ec6ef04552ee92febc3550"

for year in [0..14]:
for month in [1..12]:
for day in [1..31]:  # being lazy here

yy = "%02d" % year
mm = "%02d" % month
dd = "%02d" % day

yyyy = "20" + yy

formats = [
yy + "/" + mm + "/" + dd,
yyyy + "/" + mm + "/" + dd,
dd + "/" + mm + "/" + yy,
dd + "/" + mm + "/" + yyyy,
yy + "-" + mm + "-" + dd,
yyyy + "-" + mm + "-" + dd,
dd + "-" + mm + "-" + yy,
dd + "-" + mm + "-" + yyyy,
yy + mm + dd,
yyyy + mm + dd,
dd + mm + yy,
dd + mm + yyyy
]

for date in formats:
if SHA.new(date).hexdigest() == target:
print "Found password:  %s  with SHA1 hash:  %s" % (date, SHA.new(date).hexdigest())
break

Found password: 25-03-2006 with SHA1 hash: eb0176f4d146dd4118ec6ef04552ee92febc3550

(Did you see how that was fast?!)

SHA.new("25-03-2006").hexdigest()

'eb0176f4d146dd4118ec6ef04552ee92febc3550'