# Summary algorithm (hash algorithm, hash algorithm)

## A brief introduction to the algorithm

The algorithm is also called hash algorithm and hash algorithm. It uses a function to convert any length of data into a fixed length data string (usually represented by a hexadecimal string).
Python's hashlib provides common summary algorithms, such as MD5, SHA1, SHA512, and so on.

Tips:
It should be noted that the digest algorithm is not an encryption algorithm, it can not be used for encryption (because it can't reverse the plaintext through the digest), but it can only be used for tamper proof, but its one-way calculation feature determines that the user password can be verified without storing the plaintext password.

If the original content is a string 'I am Jason', MD5 calculates that the content summary is' 2d3ec0dd5d4b99a2c5f1eb47656637e0 '. If someone tampers with your article and publishes it as' I am Ross', and the modified string calculates a summary ('6845af67ef35bfe261f6fed5a66ff3ab ') which is different from the original summary, you will know that the content has been changed in series.

The reason why the algorithm can point out whether the data has been tampered with is that the summary function is a one-way function, so it is easy to calculate f(data), but it is very difficult to deduce data through digest. Moreover, a bit modification of the original data will result in a completely different summary. Abstract algorithm is to calculate the digest of fixed length by the digest function f() for data of any length.

## MD5 digest algorithm example

MD5 is the most common digest algorithm with fast speed. The result is a fixed 128 bit byte, which is usually represented by a 32-bit hexadecimal string.

When the amount of data is small, the summary can be obtained at one time

``````import hashlib
md5 = hashlib.md5()

# The data in update needs to be data of type bytes.
md5.update(bytes('I am Jason', encoding='utf-8'))

# digest() returns a binary string representation
data1 = md5.digest()

# hexdigest() returns a hexadecimal string representation
data2 = md5.hexdigest()

print(data1)  # b'->\xc0\xdd]K\x99\xa2\xc5\xf1\xebGef7\xe0'
print(data2)  # 2d3ec0dd5d4b99a2c5f1eb47656637e0
``````

There is a large amount of data. You can call update() in blocks many times. The final calculation result is the same:

``````import hashlib

md5 = hashlib.md5()
md5.update(bytes('I am ', encoding='utf-8'))
md5.update(bytes('Jason', encoding='utf-8'))

data = md5.hexdigest()
print(data) # Is the hexadecimal string representation returned after multiple evaluation or 2d3ec0dd5d4b99a2c5f1eb47656637e0
``````

## SHA1 summary algorithm example

Another common digest algorithm is SHA1. Calling SHA1 is exactly like calling MD5. The result of SHA1 is 160 bit bytes, usually represented by a 40 bit hexadecimal string.

SHA1 can also divide data into multi segment calculation summaries

``````import hashlib

md5 = hashlib.sha1()
md5.update(bytes('I am ', encoding='utf-8'))  # The data in update needs to be data of type bytes.
md5.update(bytes('Jason', encoding='utf-8'))  # The data in update needs to be data of type bytes.

data = md5.hexdigest()
print(data) # Returns the hexadecimal string representation c0f965c41ef25423c2fbbcd05dfc767b04c9ba7f
``````

Algorithms SHA256 and SHA512 are more secure than SHA1, but the more secure the algorithm is, the slower it is and the longer the digest length is.

## collision

Is it possible that two different data can get the same summary through a certain summary algorithm? It is possible because any summarization algorithm maps infinite data sets into a finite set, which is called collision.

For example, the md5 digest of a string 'do you have the same hash value with me?' is b9455cde6391d136c33541b7293ec394, which is probably the same value as the md5 digest of another string.

## Abstract algorithm application

A common scenario of algorithm is the storage of user password in database.

If the user password is saved in clear text, and if the database is leaked, the passwords of all users will fall into the hands of hackers. In addition, the operation and maintenance personnel of the website can access the database, that is, they can obtain the passwords of all users

The correct way to save the password is not to store the user's clear text password, but to store the summary of the user's account order, such as MD5:

user03 99b1c2188db85afee403b1536010c2c9

When the user logs in, first calculate the MD5 of the plaintext password entered by the user, and then compare it with the MD5 stored in the database. If it is consistent, it means that the password is entered correctly. If it is inconsistent, the password is definitely wrong.

Here is a simple code demonstration

``````import hashlib

db = {
'user01': '202cb962ac59075b964b07152d234b70',  # 123
'user02': '289dff07669d7a23de0ef88d2f7129e7',  # 234
'user03': 'd81f9c1be2e08964bf9f24b15f0e4900',  # 345
}

def check():
while 1:
md5 = hashlib.md5()
md5.update(passwd.encode(encoding='utf-8'))
pwd_md5 = md5.hexdigest()
print('OK')
else:

check()
``````

## MD5 security

Using MD5 to store passwords is not necessarily safe. Because if a hacker has got the database to store MD5 password, he can reverse the user's plaintext password by brute force (of course, the real hacker will not be so stupid). Hackers can calculate the MD5 value of some common passwords in advance, get a backstepping table (rainbow table), and then collide one by one (not all can succeed).

202cb962ac59075b964b07152d234b70 123
289dff07669d7a23de0ef88d2f7129e7 234
d81f9c1be2e08964bf9f24b15f0e4900 345

In this way, there is no need to crack, just need to compare MD5 of the database, the hacker will get the user account using the common password.

For users, too simple a password is not recommended. In addition, we can also strengthen the protection of simple password in program design. This is the salt addition mentioned below.

## Salt

Because the MD5 value of the common password is easy to be collided, the stored user password is no longer the MD5 of the common password that has been calculated. We can realize confusion by adding a complex string to the original password, commonly known as "adding salt":

The MD5 password processed by Salt, as long as Salt is not known by hackers, is difficult to reverse the plaintext password through MD5 even if the user enters a simple password.

There are two ways to add salt:
Method 1

``````salt = b'xxx'
md5 = hashlib.md5(salt)
md5.hexdigest()
``````

Method 2

``````salt = b'xxx'
md5 = hashlib.md5()
md5.hexdigest()
``````
``````import hashlib

db = {
'user01': 'cb2921a386719d7467412b5573973529',  # 123 salt=b'xxx';
'user02': '8d36de04ecc00c605caf4b2798328a59',  # 234 salt=b'xxx';
'user03': '658d38b0b92c8e7e5eab2bef72b539c7',  # 345 salt=b'xxx';
}

def check():
while 1:
md5 = hashlib.md5()
md5.update(passwd.encode(encoding='utf-8') + b'xxx')
pswdmd5 = md5.hexdigest()
print('OK')
else:

check()
``````

## On the MD5 value of the same password

However, if two users use the same simple password, such as 123456, two identical MD5 values will be stored in the database, which means that the passwords of these two users are the same.

In order for users with the same password to store different MD5, we assume that the user cannot modify the login name, we can calculate MD5 by taking the login name as a part of Salt, so that users with the same password can also store different MD5.

``````import hashlib

salt = 'xxx'