Introduction
To determine if a password is strong enough, two things need to be
known:
- the amount of randomness of the password
- the amount of randomness that is needed
Randomness of the password:
Human text is just slightly random. When typing text in lower ascii
characters the randomness of the text is estimated at 2-3 bits per
character (see RFC 1750). The 26 lowercase ascii characters represent
4.7 bits each ( log(26)/log(2) = 4.7 ). So just because a human has
choosen these characters, 40% to 60% of the randomness is lost.
This makes that the amount of randomness of a human chosen password
can be estimated with:
b = log(c)/(log(2)*2)
where:
c is the size of the character set chosen from
and
b is the amount of random bits per character
Randomness needed:
RFC 1750 gives a nice example of calculating the randomness needed.
It makes the following assumptions:
- One guess of the password takes 6 seconds (for example because of
a delay in the login system).
- An attempt to brute-force the password will be detected within a
month.
- A chance of guessing the password of 1 in 1000 per attempt is
acceptable.
This translates to 500,000 tries before the attack is detected. Taken
the acceptable chance of 1 in 1000, the password needs to be randomly
chosen out of 500,000,000 possibilities, which equivalents 29 bits of
random data.
More generic the equation is:
b = log((d*86400)/(s*c))/log(2)
where:
b is the needed amount of bits
d is the amount of days before detecting an attack
s is the delay in seconds when reattempting to login
c is the acceptable chance of a successful attack (as fraction)
Some examples of this calculation:
|
Detection (Days) | Delay (sec) | Chance (1 in) | Randomness needed
|
| 1 | 60 | 1,000 | 21
|
| 7 | 10 | 1,000 | 26
|
| 7 | 30 | 10,000 | 28
|
| 31 | 5 | 1,000 | 29
|
| 7 | 60 | 1,000,000 | 34
|
| 14 | 10 | 100,000 | 34
|
| 21 | 20 | 1,000,000 | 37
|
| 31 | 1 | 1,000,000 | 42
|
Two scenario's seem realistic to me:
- low-security system
A system like this is not very intensively monitored and you don't
want to set a long delay in case of a wrong password. This results
in a detection time of 31 days, a delay of 5 seconds and an
acceptable chance of 1 in 1,000. The amount of randomness needed is
29 bits for such a system.
- high-security system
A system like this is much more intensively monitored. Detection
within 7 days is likely. At the same time it is acceptable to
increase the login delay to 30 seconds or more in case of many
failing attempts. At the same time the acceptable chance is much
lower, 1 in 1,000,000. In these conditions the amount of randomness
needed is 35 bits.
So how long should the password be?
Please make your calculation for yourself! Take a look at your local
circumstances, acceptable risks etc. Choosing the length of a
password is also seeking a balance between security and nagging your
users. Having said that, I can answer this question for the two
servers described above:
low-security server:
| When using: | characters needed:
|
| numeric only | 18
|
| lower only | 13
|
| upper only | 13
|
| symbols only | 12
|
| lower + numeric | 12
|
| upper + numeric | 12
|
| upper + lower | 11
|
| lower + symbols | 10
|
| upper + symbols | 10
|
| lower + upper + numeric | 10
|
| lower + upper + symbols | 10
|
| lower + upper + numeric + symbols | 9
|
high-security server:
| When using: | characters needed: |
| numeric only | 22 |
| lower only | 15 |
| upper only | 15 |
| symbols only | 15 |
| lower + numeric | 14 |
| upper + numeric | 14 |
| upper + lower | 13 |
| lower + symbols | 12 |
| upper + symbols | 12 |
| lower + upper + numeric | 12 |
| lower + upper + symbols | 11 |
| lower + upper + numeric + symbols | 11 |
Dictionaries
This story is nice, but anybody who has ever played with a password
cracker like john the ripper, knows that there are nice lists with
passwords that are used very commonly. Using these dictionaries
increases the chance of success dramatically. According to the
calculations above, the password "August2008" has a randomness of
approximately 32 bits. That is equivalent to a chance of
1 in 4,000,000,000 of guessing the password in one guess. When using
a dictionary, a chance of 1 in 1000 would be more realistic. So when
combinations that are likely to be part of a dictionary attack
are part of the password, its randomness should be decreased.
However, this is not hard science, just like composing the
dictionaries themselves is a combination of statistics, experience
and intuition. Thereby is it language dependent: commonly used names
or the month of a year are different for each language.
Wrapping it up in a script
Note: the script described here is part of HelpIM
First of all we check the password against the blacklist of
'forbidden' combinations. Any part of the password that matches one
of these regexes is discarded for the further calculation of the
strength.
The amount of randomness of what is left of the passward can now be
calculated by multiplying the length by a factor depending on the
characters used. To calculate this factor we first need to know the
size of the character set where the password is chosen from:
| when using: | adds to character set size:
|
| space | 1
|
| numeric | 10
|
| lower | 26
|
| upper | 26
|
| symbols | 32 |
Now the amount of random bits per character can be calculated by
the formula mentioned earlier:
b = log(c)/(log(2)*2)
The randomness of the password can now be calculated by multiplying
the length of (what is left of) the password by randomness per bit.
This score is compared with a minimal amount of randomness for the
site. On base of this a percentage and a color is calculated for a
nice strength-bar.
Problems with this way of calculating:
- This way of calculating is very dependant on the quality of the
dictionaries used. Choose them with care!
- Diacritical and other non-ascii characters are not accounted for,
although these make the password much stronger (but are quite clumpsy
to use in a password, unless you use a localized keyboard containing
them).