CNS - 05 - Password Security

Lecture Info

Date: [2020-09-30 mer 14:00]
Slides: CNS 05 - Passwords Security
Introduction: In this lecture we will define what user authentication means and what are the problems of password with some technical elements in order to measure the hardness/robusteness of passwords.

1 Passwords and Secrets

In the previous lecture we defined authentication as a process in which the user had to prove that he knew some secret. This secret is typically a password. Conceptually speaking then password and secret are the same.

In practice however there is a huge difference between a password and a secret. In cryptography a secret key is a random bits of \(N\) bit, while a password is a low entropy string. Later we will define better the concept of entropy, but for now it suffices to say that this means that the probability to guess the secret key is exactly \(1/2^N\), while the probability to guess the password is much higher than \(1/2^N\).

2 The Problems of Passwords

Passwods have four major problems:

Password overload: users very often reuse passwords across different sites. This means that if a site gets hacked and the passwords are retrieved, then the user also loses the protection on the other sites.
Restricted charset: since passwords are typed with a keyboard, which has \(< 100\) characters, not all possible characters for a byte are used, which in total would be 256.
Low entropy: since password are meant to be remembered, they are not nearly close to be random.
Predicatbility: (dictionary attacks) bad habits in choosing passwords, and usage of human-friendly rules to "generate" passwords.

2.1 Password Overload

In a recente work (https://people.cs.vt.edu/gangwang/pass) some US guys in 2017 made a huge analysis and computed the following stats:

The average number of different digital accounts per user is 130.
Since its very hard to remember 130 distinct password without any external help, 38% of users reuse identical password across sites.
Even with the 21% of the users modified the passwords, the "rules" used were very predictable as 30% were cracked in 10 or less attempts, and 46.5% cracked in 100 or less attempts.

In general password overload leads to the so called cross-site breach, in which a brach of a particular site leads to braches in other sites because the user reause the passwords.

2.2 Restricted Charset

Assume we have a secret key of \(8\) bytes. Since \(1\) byte has \(256\) possible values, the probability of guessing a single byte is \(1/256\), and so the probability of guessing the entire secret key is the following

\[\frac{1}{256} \cdot \frac{1}{256} \ldots \frac{1}{256} = \Big(\frac{1}{256}\Big)^8 = \frac{1}{18 446 744 073 551 616}\]

To understand how big this number is, let us assume we can perform \(66\) milion guesses per second. Then, the average guess time is the following

\[\frac{\text{number of attempts}}{\text{number of attempts per second}} = \frac{\frac{1.8 \times 10^{19}}{2}}{6.6 \times 10^7} \approx 4431 \text{ years}\]

Take now a password of \(8\) bytes, and suppose the user only uses number and low case letters. This means that for each byte of the password we only have \(36\) possible. Thus, the probability of guessing the entire password is

\[\frac{1}{36} \cdot \frac{1}{36} \ldots \frac{1}{36} = \Big(\frac{1}{36}\Big)^8 = \frac{1}{2821109907456}\]

but this time using the same assumtpion as before (\(66\) milion guesses/s) we get an average guess time of

\[\frac{\text{number of attempts}}{\text{number of attempts per second}} = \frac{\frac{2.8 \times 10^{12}}{2}}{6.6 \times 10^7} \approx 5.9 \text{ hours}\]

Observation: Even the previous average guess time is not ideal for todays standard in modern security.

Observation: This considerations assume that you can perform a so-called offline attack, in which an attacker downloads a set of hashed-passwords and then can brute-force all the passwords offline, which means that the number of guesses to do is only limited by your machine computational power.

2.3 Low Entropy

Excluding very few and rare exceptions, the passwords string that are human generated do not look random at all. To actually measure the randomness of a password we will now borrow some tools from information theory. To quantify randomness we will use the concept of (Shannon) entropy.

2.3.1 Defining Entropy

Def: Let \(X\) be a discrete random variable, that is a random quantity that has a finite random of outcome \(x_1, x_2, \ldots, x_n\), each of them with a different probability \(p_1, p_2, \ldots, p_n\) such that \(\sum_i p_i = 1\). We then define entropy as follows

\[H(X) := - \sum_i p_i \cdot \log_2 p_i\]

The entropy \(H(X)\) is measured in bit, which can be thought of as a measurement of the information carried by \(X\).

2.3.2 Examples

Example 1: Consider a coin flip with equal probability of being tail \(x_1\) or head \(x_2\), that is \(p_1 = p_2 = 1/2\). The entropy of this coin flip is

\[H(X) = -2 \cdot \Big(\frac12 \cdot \log_2{\frac12} \Big) = -\log_2{\frac12} = \log_2{2} = 1\]

thus we say that the information carried by this coin flip is exactly \(1\) bit.

Example 2: The entropy of a dice roll, with an equiprobable dice of \(6\) faces is

\[H(X) = -6 \Big(\frac16 \cdot \log_2{\frac16} \Big) = \log_2{6} = 2.58\]

thus we say that the information carried by this dice roll is exactly \(2.58\) bits.

Example 3: The entropy of a generating random byte, with 256 possible outcomes each of which has the same probability is

\[H(X) = -256 \Big(\frac{1}{256} \cdot \log_2{\frac{1}{256}} \Big) = \log_2{256} = 8\]

thus we say that the information carried by a random byte is exactly \(8\) bits.

2.3.3 Understanding Entropy

Before actually using entropy for our purposes, let us understand why entropy is defined like this. The definition comes from the work of Claude Shannon in 1948.

Shannon reasoned along the following lines: the amount of information you get out of an event \(x_i\) depends on how much the event \(x_i\) is unexpected. In particular, the lower is the probability of the event \((p_i)\), and the more the event is surprising. This is why the information content of an event is defined as a function of the inverse of the probability of that event \(1/p_i\).

A convenient function to use for this is the \(- \log_2(\cdot)\) function. Indeed, suppose an event happens with probability \(1/2^b\). Then on average it happens once every \(2^b\) and its information content is defined as

\[-\log_2{\frac{1}{2^b}} = \log_2{2^b} = b\]

Thus we define the information content of the value \(x_i\) as

\[\text{IC}(x_i) := - \log_2{\frac{1}{p_i}}\]

Then the entropy, which is a function of the random variable (and not of a particular event), is defined as the average value over the the information content of all the possible events of the random variable. Formally,

\[H(X) := \mathbb{E}[\text{IC}(X)] = \sum_i p_i \text{IC}(x_i) = -\sum_i p_i \cdot \log_2{p_i}\]

2.3.4 Using Entropy

Entropy is a quantitative measure of "how predictable" is a random event. Given a random variable with \(N = 2^b\) possible outcomes, depending on the value of its entropy we can decude how random the variable actually is. In particular,

If \(H(X) = b = \log_2 N\), then we say that the variable is perfectly random, that is, every outcome is equiprobable.
If \(H(X) = 0\), then the random variable is actually a deterministic event.
If \(0 < H(X) < b\), then some events are more likely to happen than some others.

Observation: There are a bunch of interesting consequences of using entropy, all of which are heavily used in telecommunication. The general idea is that if every bit carries \(0.81\) bits of information, it means that we can compress the stream of bits by no more than \(19\%\). The takeaway here is that that information is not really the amount of bits you transmit, and with entropy you can measure how compressible is the data you want to transfer.

Example: Consider a bit flip \(x_k = \{0, 1\}\) with probability \(1/2\). A single coin flip has entropy of \(1\) bit. A sequence of three independent coin flips \(x_1, x_2, x_3\) has entropy of \(3\) bits. But if the second and third coin flips \(x_2, x_3\) have the same values as the first coin flip \(x_1\), then the total entropy is \(1\) bit, even though \(3\) bits are generated.

2.3.5 Entropy of English

In 1950 Shannon tried to estimate how much information has a printed english text in a paper called Prediction and Entropy of Printed English.

If the letters were equiprobable, then we would have the entropy of a single letter to be

\[-26 \cdot \frac{1}{26} \log_2(\frac{1}{26}) = 4.70\]

This however is actually not the case: in natural languages letters are not equiprobable. In particular shannon computed the following table, in which upper and lower bounds are given for increasing long sequences of letters

This tells us that in the best case a letter brings \(1.3\) bits of information, while in the worst case it brings \(0.6\) bits of information.

2.3.6 Rule of Thumb

Extrapolating from Shannon's work we can roughly declare the following.

Rule of Thumb: roughly speaking every letter has a contribution to the total entropy equal to 2, which means that every letter of your password has an entropy equal to 2.

Now, suppose we want to break a password generated with \(10\) random letters.

If we assume that the letters are equiprobably distributed, then we'd have to make roughly \(2^{(4.7 \times 10)} = 2^{47}\) attempts, since

We have \(26\) possible characters to use.
The number of possible passwords of length 10 is \(26^{10}\).
The number \(26^{10}\) can also be expressed as a power of \(2\) in the following way

\[26^{10} = (2^{\log_2 26})^{10} = 2^{\log_2 26 \cdot 10} \approx 2^{4.7 \cdot 10}\]

If instead the letters are not equiprobably, but if each letter has entropy of \(2\), then we can make the same computation as before to get that the total number of attempts is

\[2^{2 \cdot 10} = 2^{20}\]

which is a factor of \(2^27 \approx 100\) milion times less than computer-generated. Thus, a human generated password is extremely worse than a computer generated one.

Observation: Roughly speaking every \(2^{10}\) is a factor of \(1000\), thus \(2^{20}\) is a factor of 1 million. Since \(2^7\) is roughly a factor of \(100\) we get that \(2^{27}\) is a factor of 100 million. Cryptograhpers need to understand the differences between these big powers of \(2\). For example, \(2^{47}\) is \(4\) times a factor of \(1000\) and \(1\) time a factor of \(100\). When we move from \(2^{52}\) to \(2^{46}\) we have reduced the security by a factor of \(100\).

2.3.7 The Big Takeaway

From all of this entropy discussion the big takeway here is that you should not generate yourself the password you use. The best thing you can do is to generate random passwords and store them in some password managers.

2.4 Dictionary Attack

Most of the times in reality you don't even need to do brute-force attack. You can instead of some euristic which facilitate the attacking of passwords. These types of attacks are called dictionary attacks, which can make use of

A set of "common" words (dictionary).
Many cracks have left some password databases in clear. You can try first the passwords that other people use. This works because typically people use the same passwords. (password overload attacks)
Use dictionaries customized to target victim.

There are two types of dictionary attacks:

Online: easy to defend, as they limit your number of attempts.
Offline: theres no defense except choosing strong passwords.

2.4.1 Some Statistics

In a paper called Investigating the Distribution of Password Choices (David Malone) a team of researcher used a recently breach to check if those passwords were used in other sites as well. The Common passwords were the following

123456
12345
123456789
password
iloveyou

Instead of doing an attack in which I first crack a user and then another use one can do a password spraying attack, in which the same password is horizontally tried on many different users. Since many people use the same password, this type of attack is extremely effective.

Some interesting statistics:

\(16 \%\) of users used the first name as their passwords
\(4 \%\) of users used a "password" variant

2.4.2 Software

Software to crack passwords:

John the Ripper
Hashcat

They all work with two main components:

A baseline database
Password generation rules

3 Notes on Generating Randomness

Il team di bianchi ha provato ad utilizzare il battito cardiaco per generare dei random bits. Pattern ripetitivo con variazioni tra pattern causali (rumore).

Il random non è algoritmico. Sul libro nuovo c'è tutto un capitolo che insiste sulla generazione di valori veramente randomici, che si basa su avere una source of entropy. La quantità di bit al secondo dipende da qual è il tipo di fonte.

Fonte di entropia in linux: /dev/urandom .