From Yahoo, MySpace and TalkTalk to Ashley Madison and Adult Friend Finder, personal information has been stolen by hackers from around the world.
But with each hack there’s the big question of how well the site protected its users’ data. Was it open and freely available, or was it hashed, secured and practically unbreakable?
From cleartext to hashed, salted, peppered and bcrypted, here’s what the impenetrable jargon of password security really means.
The terminology
Plain text
When something is described being stored as “cleartext†or as “plain text†it means that thing is in the open as simple text – with no security beyond a simple access control to the database which contains it.
If you have access to the database containing the passwords you can read them just as you can read the text on this page.
Hashing
When a password has been “hashed†it means it has been turned into a scrambled representation of itself. A user’s password is taken and – using a key known to the site – the hash value is derived from the combination of both the password and the key, using a set algorithm.
To verify a user’s password is correct it is hashed and the value compared with that stored on record each time they login.
You cannot directly turn a hashed value into the password, but you can work out what the password is if you continually generate hashes from passwords until you find one that matches, a so-called brute-force attack, or similar methods.
Advertisement
Salting
Passwords are often described as “hashed and saltedâ€. Salting is simply the addition of a unique, random string of characters known only to the site to each password before it is hashed, typically this “salt†is placed in front of each password.
The salt value needs to be stored by the site, which means sometimes sites use the same salt for every password. This makes it less effective than if individual salts are used.
The use of unique salts means that common passwords shared by multiple users – such as “123456†or “password†– aren’t immediately revealed when one such hashed password is identified – because despite the passwords being the same the salted and hashed values are not.
Large salts also protect against certain methods of attack on hashes, including rainbow tables or logs of hashed passwords previously broken.
Both hashing and salting can be repeated more than once to increase the difficulty in breaking the security.
Peppering
Cryptographers like their seasonings. A “pepper†is similar to a salt – a value added to the password before being hashed – but typically placed at the end of the password.
There are broadly two versions of pepper. The first is simply a known secret value added to each password, which is only beneficial if it is not known by the attacker.
The second is a value that’s randomly generated but never stored. That means every time a user attempts to log into the site it has to try multiple combinations of the pepper and hashing algorithm to find the right pepper value and match the hash value.
Advertisement
Even with a small range in the unknown pepper value, trying all the values can take minutes per login attempt, so is rarely used.
Encryption
Encryption, like hashing, is a function of cryptography, but the main difference is that encryption is something you can undo, while hashing is not. If you need to access the source text to change it or read it, encryption allows you to secure it but still read it after decrypting it. Hashing cannot be reversed, which means you can only know what the hash represents by matching it with another hash of what you think is the same information.
If a site such as a bank asks you to verify particular characters of your password, rather than enter the whole thing, it is encrypting your password as it must decrypt it and verify individual characters rather than simply match the whole password to a stored hash.
Encrypted passwords are typically used for second-factor verification, rather than as the primary login factor.
Hexadecimal
A hexadecimal number, also simply known as “hex†or “base 16â€, is way of representing values of zero to 15 as using 16 separate symbols. The numbers 0-9 represent values zero to nine, with a, b, c, d, e and f representing 10-15.
They are widely used in computing as a human-friendly way of representing binary numbers. Each hexadecimal digit represents four bits or half a byte.
The algorithms
MD5
Originally designed as a cryptographic hashing algorithm, first published in 1992, MD5 has been shown to have extensive weaknesses, which make it relatively easy to break.
Its 128-bit hash values, which are quite easy to produce, are more commonly used for file verification to make sure that a downloaded file has not been tampered with. It should not be used to secure passwords.
SHA-1
Secure Hash Algorithm 1 (SHA-1) is cryptographic hashing algorithm originally design by the US National Security Agency in 1993 and published in 1995.
It generates 160-bit hash value that is typically rendered as a 40-digit hexadecimal number. As of 2005, SHA-1 was deemed as no longer secure as the exponential increase in computing power and sophisticated methods meant that it was possible to perform a so-called attack on the hash and produce the source password or text without spending millions on computing resource and time.
SHA-2
The successor to SHA-1, Secure Hash Algorithm 2 (SHA-2) is a family of hash functions that produce longer hash values with 224, 256, 384 or 512 bits, written as SHA-224, SHA-256, SHA-384 or SHA-512.
It was first published in 2001, designed by again by the NSA, and an effective attack has yet to be demonstrated against it. That means SHA-2 is generally recommended for secure hashing.
Advertisement
SHA-3, while not a replacement for SHA-2, was developed not by the NSA but by Guido Bertoni, Joan Daemen, Michaël Peeters and Gilles Van Assche from STMicroelectronics and Radboud University in Nijmegen, Netherlands. It was standardised in 2015.
Bcrypt
As computational power has increased the number of brute-force guesses a hacker can make for an efficient hashing algorithm has increased exponentially.
Bcrypt, which is based on the Blowfish cipher and includes a salt, is designed to protect against brute-force attacks by intentionally being slower to operate. It has a so-called work factor that effectively puts your password through a definable number of rounds of extension before being hashed.
By increasing the work factor it takes longer to brute-force the password and match the hash. The theory is that the site owner sets a sufficiently high-enough work factor to reduce the number of guesses today’s computers can make at the password and extend the time from days or weeks to months or years, making it prohibitively time consuming and expensive.
PBKDF2
Password-Based Key Derivation Function 2 (PBKDF2), developed by RSA Laboratories, is another algorithm for key extension that makes hashes more difficult to brute force. It is considered slightly easier to brute force than Bcrypt at a certain value because it requires less computer memory to run the algorithm.
Scrypt
Scrypt like Bcrypt and PBKDF2 is an algorithm that extends keys and makes it harder to brute-force attack a hash. Unlike PBKDF2, however, scrypt is designed to use either a large amount of computer memory or force many more calculations as it runs.
For legitimate users having to only hash one password to check if it matches a stored value, the cost is negligible. But for someone attempting to try 100,000s of passwords it makes cost of doing so much higher or take prohibitively long.
But what about the passwords?
If a password is properly hashed using SHA-2 or newer, and is salted, then to break a password requires a brute-force attack.
The longer the password, the longer the brute-force attack is going to last. And the longer the brute-force attack required, the more time-consuming and expensive it is to match the hash and discover the password.
Which means the longer the password the better, but the configuration of the password also makes a difference. A truly random eight-character password will be more secure than a eight-letter dictionary word, because brute-force attacks use dictionaries, names and other lists of words as fodder.
However, if the site stores your password as plain text it doesn’t matter how long your password is if the database is stolen.