Birthday attack




A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likelihood of collisions found between random attack attempts and a fixed degree of permutations (pigeonholes). With a birthday attack, it is possible to find a collision of a hash function in 2n=2n/2{textstyle {sqrt {2^{n}}}=2^{n/2}}{textstyle {sqrt {2^{n}}}=2^{n/2}}, with 2n{textstyle 2^{n}}{textstyle 2^{n}} being the classical preimage resistance security. There is a general (though disputed[1]) result that quantum computers can perform birthday attacks, thus breaking collision resistance, in 2n3=2n/3{textstyle {sqrt[{3}]{2^{n}}}=2^{n/3}}{textstyle {sqrt[{3}]{2^{n}}}=2^{n/3}}.[2]




Contents






  • 1 Understanding the problem


  • 2 Mathematics


    • 2.1 Source code example


    • 2.2 Simple approximation




  • 3 Digital signature susceptibility


  • 4 See also


  • 5 Notes


  • 6 References


  • 7 External links





Understanding the problem



As an example, consider the scenario in which a teacher with a class of 30 students (n = 30) asks for everybody's birthday (for simplicity, ignore leap years) to determine whether any two students have the same birthday (corresponding to a hash collision as described further). Intuitively, this chance may seem small. If the teacher picked a specific day (say, 16 September), then the chance that at least one student was born on that specific day is 1−(364/365)30{displaystyle 1-(364/365)^{30}}1 - (364/365)^{30}, about 7.9%. However, counter-intuitively, the probability that at least one student has the same birthday as any other student on any day is around 70% (for n = 30), from the formula 1−365!/((365−n)!⋅365n){displaystyle 1-365!/((365-n)!cdot 365^{n})}1-365!/((365-n)!cdot 365^n).[3]



Mathematics


Given a function f{displaystyle f}f, the goal of the attack is to find two different inputs x1,x2{displaystyle x_{1},x_{2}}x_{1},x_{2} such that f(x1)=f(x2){displaystyle f(x_{1})=f(x_{2})}f(x_{1}) = f(x_{2}). Such a pair x1,x2{displaystyle x_{1},x_{2}}x_{1},x_{2} is called a collision. The method used to find a collision is simply to evaluate the function f{displaystyle f}f for different input values that may be chosen randomly or pseudorandomly until the same result is found more than once. Because of the birthday problem, this method can be rather efficient. Specifically, if a function f(x){displaystyle f(x)}f(x) yields any of H{displaystyle H}H different outputs with equal probability and H{displaystyle H}H is sufficiently large, then we expect to obtain a pair of different arguments x1{displaystyle x_{1}}x_{1} and x2{displaystyle x_{2}}x_{2} with f(x1)=f(x2){displaystyle f(x_{1})=f(x_{2})}f(x_{1}) = f(x_{2}) after evaluating the function for about 1.25H{displaystyle 1.25{sqrt {H}}}1.25sqrt{H} different arguments on average.


We consider the following experiment. From a set of H values we choose n values uniformly at random thereby allowing repetitions. Let p(nH) be the probability that during this experiment at least one value is chosen more than once. This probability can be approximated as


p(n;H)≈1−e−n(n−1)/(2H)≈1−e−n2/(2H){displaystyle p(n;H)approx 1-e^{-n(n-1)/(2H)}approx 1-e^{-n^{2}/(2H)}}{displaystyle p(n;H)approx 1-e^{-n(n-1)/(2H)}approx 1-e^{-n^{2}/(2H)}}

Let n(pH) be the smallest number of values we have to choose, such that the probability for finding a collision is at least p. By inverting this expression above, we find the following approximation


n(p;H)≈2Hln⁡11−p{displaystyle n(p;H)approx {sqrt {2Hln {frac {1}{1-p}}}}}{displaystyle n(p;H)approx {sqrt {2Hln {frac {1}{1-p}}}}}

and assigning a 0.5 probability of collision we arrive at


n(0.5;H)≈1.1774H{displaystyle n(0.5;H)approx 1.1774{sqrt {H}}}{displaystyle n(0.5;H)approx 1.1774{sqrt {H}}}

Let Q(H) be the expected number of values we have to choose before finding the first collision. This number can be approximated by


Q(H)≈π2H{displaystyle Q(H)approx {sqrt {{frac {pi }{2}}H}}}{displaystyle Q(H)approx {sqrt {{frac {pi }{2}}H}}}

As an example, if a 64-bit hash is used, there are approximately 1.8 × 1019 different outputs. If these are all equally probable (the best case), then it would take 'only' approximately 5 billion attempts (5.38 × 109) to generate a collision using brute force. This value is called birthday bound[4] and for n-bit codes it could be computed as 2n/2.[5] Other examples are as follows:






















































































































Bits
Possible outputs (H)
Desired probability of random collision
(2 s.f.) (p)
10−18
10−15
10−12
10−9
10−6
0.1%
1%
25%
50%
75%
16
216 (~6.5 x 104)
<2
<2
<2
<2
<2
11
36
190
300
430
32
232 (~4.3 × 109)
<2
<2
<2
3
93
2900
9300
50,000
77,000
110,000
64
264 (~1.8 × 1019)
6
190
6100
190,000
6,100,000
1.9 × 108
6.1 × 108
3.3 × 109
5.1 × 109
7.2 × 109
128
2128 (~3.4 × 1038)
2.6 × 1010
8.2 × 1011
2.6 × 1013
8.2 × 1014
2.6 × 1016
8.3 × 1017
2.6 × 1018
1.4 × 1019
2.2 × 1019
3.1 × 1019
256
2256 (~1.2 × 1077)
4.8 × 1029
1.5 × 1031
4.8 × 1032
1.5 × 1034
4.8 × 1035
1.5 × 1037
4.8 × 1037
2.6 × 1038
4.0 × 1038
5.7 × 1038
384
2384 (~3.9 × 10115)
8.9 × 1048
2.8 × 1050
8.9 × 1051
2.8 × 1053
8.9 × 1054
2.8 × 1056
8.9 × 1056
4.8 × 1057
7.4 × 1057
1.0 × 1058
512
2512 (~1.3 × 10154)
1.6 × 1068
5.2 × 1069
1.6 × 1071
5.2 × 1072
1.6 × 1074
5.2 × 1075
1.6 × 1076
8.8 × 1076
1.4 × 1077
1.9 × 1077


Table shows number of hashes n(p) needed to achieve the given probability of success, assuming all hashes are equally likely. For comparison, 10−18 to 10−15 is the uncorrectable bit error rate of a typical hard disk.[6] In theory, MD5 hashes or UUIDs, being 128 bits, should stay within that range until about 820 billion documents, even if its possible outputs are many more.

It is easy to see that if the outputs of the function are distributed unevenly, then a collision could be found even faster. The notion of 'balance' of a hash function quantifies the resistance of the function to birthday attacks (exploiting uneven key distribution.) However, determining the balance of a hash function will typically require all possible inputs to be calculated and thus is infeasible for popular hash functions such as the MD and SHA families.[7]
The subexpression ln⁡11−p{displaystyle ln {frac {1}{1-p}}}lnfrac{1}{1-p} in the equation for n(p;H){displaystyle n(p;H)}n(p;H) is not computed accurately for small p{displaystyle p}p when directly translated into common programming languages as log(1/(1-p)) due to loss of significance. When log1p is available (as it is in C99) for example, the equivalent expression -log1p(-p) should be used instead.[8] If this is not done, the first column of the above table is computed as zero, and several items in the second column do not have even one correct significant digit.



Source code example


Here is a Python function that can accurately generate most of the above table:


from math import log1p, sqrt

def birthday(probability_exponent, bits):
probability = 10.0**probability_exponent
outputs = 2.0**bits
return sqrt(2.0*outputs*-log1p(-probability))

If the code is saved in a file named birthday.py, it can be run interactively as in the following example:


$ python -i birthday.py
>>> birthday(-15, 128)
824963474247.1193
>>> birthday(-6, 32)
92.68192319417072


Simple approximation


A good rule of thumb which can be used for mental calculation is the relation


p(n)≈n22H{displaystyle p(n)approx {n^{2} over 2H}}{displaystyle p(n)approx {n^{2} over 2H}}

which can also be written as



H≈n22p(n){displaystyle Happrox {n^{2} over 2p(n)}}{displaystyle Happrox {n^{2} over 2p(n)}}.

or



n≈2H×p(n){displaystyle napprox {sqrt {2Htimes p(n)}}}{displaystyle napprox {sqrt {2Htimes p(n)}}}.

This works well for probabilities less than or equal to 0.5.


This approximation scheme is especially easy to use when working with exponents. For instance, suppose you are building 32-bit hashes (H=232{displaystyle H=2^{32}}{displaystyle H=2^{32}}) and want the chance of a collision to be at most one in a million (p≈2−20{displaystyle papprox 2^{-20}} p approx 2^{-20} ), how many documents could we have at the most?


n≈232×2−20=21+32−20=213=26.5≈90.5{displaystyle napprox {sqrt {2times 2^{32}times 2^{-20}}}={sqrt {2^{1+32-20}}}={sqrt {2^{13}}}=2^{6.5}approx 90.5}n approx sqrt { 2 times 2^{32} times 2^{-20}} = sqrt { 2^{1+32-20} } = sqrt { 2^{13} } = 2^{6.5} approx 90.5

which is close to the correct answer of 93.



Digital signature susceptibility


Digital signatures can be susceptible to a birthday attack. A message m{displaystyle m}m is typically signed by first computing f(m){displaystyle f(m)}f(m), where f{displaystyle f}f is a cryptographic hash function, and then using some secret key to sign f(m){displaystyle f(m)}f(m). Suppose Mallory wants to trick Bob into signing a fraudulent contract. Mallory prepares a fair contract m{displaystyle m}m and a fraudulent one m′{displaystyle m'}m'. She then finds a number of positions where m{displaystyle m}m can be changed without changing the meaning, such as inserting commas, empty lines, one versus two spaces after a sentence, replacing synonyms, etc. By combining these changes, she can create a huge number of variations on m{displaystyle m}m which are all fair contracts.


In a similar manner, Mallory also creates a huge number of variations on the fraudulent contract m′{displaystyle m'}m'. She then applies the hash function to all these variations until she finds a version of the fair contract and a version of the fraudulent contract which have the same hash value, f(m)=f(m′){displaystyle f(m)=f(m')}f(m) = f(m'). She presents the fair version to Bob for signing. After Bob has signed, Mallory takes the signature and attaches it to the fraudulent contract. This signature then "proves" that Bob signed the fraudulent contract.


The probabilities differ slightly from the original birthday problem, as Mallory gains nothing by finding two fair or two fraudulent contracts with the same hash. Mallory's strategy is to generate pairs of one fair and one fraudulent contract. The birthday problem equations apply where n{displaystyle n}n is the number of pairs. The number of hashes Mallory actually generates is 2n{displaystyle 2n}2n.


To avoid this attack, the output length of the hash function used for a signature scheme can be chosen large enough so that the birthday attack becomes computationally infeasible, i.e. about twice as many bits as are needed to prevent an ordinary brute-force attack.


Besides using a larger bit length, the signer (Bob) can protect himself by making some random, inoffensive changes to the document before signing it, and by keeping a copy of the contract he signed in his own possession, so that he can at least demonstrate in court that his signature matches that contract, not just the fraudulent one.


Pollard's rho algorithm for logarithms is an example for an algorithm using a birthday attack for the computation of discrete logarithms.



See also



  • Collision attack

  • Meet-in-the-middle attack



Notes





  1. ^ Daniel J. Bernstein. "Cost analysis of hash collisions : Will quantum computers make SHARCS obsolete?" (PDF). Cr.yp.to. Retrieved 29 October 2017..mw-parser-output cite.citation{font-style:inherit}.mw-parser-output .citation q{quotes:"""""""'""'"}.mw-parser-output .citation .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/6/65/Lock-green.svg/9px-Lock-green.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-limited a,.mw-parser-output .citation .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Lock-gray-alt-2.svg/9px-Lock-gray-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Lock-red-alt-2.svg/9px-Lock-red-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration{color:#555}.mw-parser-output .cs1-subscription span,.mw-parser-output .cs1-registration span{border-bottom:1px dotted;cursor:help}.mw-parser-output .cs1-ws-icon a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/4/4c/Wikisource-logo.svg/12px-Wikisource-logo.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output code.cs1-code{color:inherit;background:inherit;border:inherit;padding:inherit}.mw-parser-output .cs1-hidden-error{display:none;font-size:100%}.mw-parser-output .cs1-visible-error{font-size:100%}.mw-parser-output .cs1-maint{display:none;color:#33aa33;margin-left:0.3em}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration,.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left,.mw-parser-output .cs1-kern-wl-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right,.mw-parser-output .cs1-kern-wl-right{padding-right:0.2em}


  2. ^ Brassard, Gilles; HØyer, Peter; Tapp, Alain (20 April 1998). LATIN'98: Theoretical Informatics. Lecture Notes in Computer Science. 1380. Springer, Berlin, Heidelberg. pp. 163–169. arXiv:quant-ph/9705002. doi:10.1007/BFb0054319. ISBN 978-3-540-64275-6.


  3. ^ "Math Forum: Ask Dr. Math FAQ: The Birthday Problem". Mathforum.org. Retrieved 29 October 2017.


  4. ^ See upper and lower bounds.


  5. ^ Jacques Patarin, Audrey Montreuil (2005). "Benes and Butterfly schemes revisited" (PostScript, PDF). Université de Versailles. Retrieved 2007-03-15.


  6. ^ Gray, Jim; van Ingen, Catharine (25 January 2007). "Empirical Measurements of Disk Failure Rates and Error Rates". arXiv:cs/0701166.


  7. ^ "CiteSeerX". Archived from the original on 2008-02-23. Retrieved 2006-05-02.


  8. ^ "Compute log(1+x) accurately for small values of x". Mathworks.com. Retrieved 29 October 2017.




References




  • Mihir Bellare, Tadayoshi Kohno: Hash Function Balance and Its Impact on Birthday Attacks. EUROCRYPT 2004: pp401–418


  • Applied Cryptography, 2nd ed. by Bruce Schneier



External links




  • "What is a digital signature and what is authentication?" from RSA Security's crypto FAQ.


  • "Birthday Attack" X5 Networks Crypto FAQs









Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python