Alot of the crypotgraphic software we use relies on hashing and it's uniqueness guarantees to provide core functionality, such asauthentication, message integrity, message fingerprinting, data corruption detection, etc... Guaranteeing this uniqueness however is fundamentally impossible following from the Pigeonhole Principle if your input data size is going to be larger than your maximum size of your key space.

At some point you're guaranteed to have two different sets of data that will map to the same key. So just how frequently will these types of collisions occur?

```
Given n randomly generated values, where each is a k-bit integer, what is the probability that any two of them are equal?
```

To calculate this probability, it might be simpler to first look at the probability of no collisions happening, since:

$$P(Any~Collision) = 1 - P(No~Collisions)$$

If we have a k-bit integer, that means it's maximum value is `N = 2^k`

. The probability of one number not colliding with another would be:

$$N-1/N$$

since we have `N-1`

remaining that are unique. If we generate another value, the probability then becomes:

$$N-1/N \times N-2/N$$

by the multiplicative rule of probability. Extending this to `n`

different values, we get:

$$N-1/N \times N-2/N \times ... \times N-n/N$$

Factoring out the N to simplify:

$$e^{1-1/N} \times e^{1-2/N} ... \times e^{1-n/N} $$

If `N >> n^2`

, we can apply the approximation that `1-x < e^-x`

:

$$e^{-1/N} \times e^{-2/N} \times ... \times e^{-n/N} $$

Simplifying the products we get:

$$e^{-1/N\sum_{j=0}^{n-1}j}$$

And reducing the sum we get:

$$e^{\frac{-n(n-1)}{2N}}$$

So now that we have a usuable approximation for the probability of a hash collision not happening, to the probability of a collision does happen becomes:

$$P(Any~Collision) = 1 - e^{\frac{-n(n-1)}{2N}}$$

With this we can get a general sense of just how often we can expect a collision given a certain value size.

Bits | Output Size | # of hashes to get for 50% chance of collsion |
---|---|---|

16 | ~6.5 x 104 | 300 |

32 | 4.3 × 10^9 | 77,000 |

64 | ~1.8 × 10^19 | 5.1 × 10^9 |

128 | ~3.4 × 10^38 | 2.2 × 10^19 |

256 | ~1.2 × 10^77 | 4.0 × 10^38 |

512 | ~1.3 × 10^154 | 1.4 × 10^77 |