# Hash table and its common algorithms (code examples)

< properties of Hash Table >

The hash table uses O(1) time to insert, delete and search data, but the hash table does not guarantee the order of the data in the table, so it is the most convenient to search in the hash table big data Or the minimum data time is O(N) .

# < addressing and hash Function >

In the ideal state, the hash is large enough, and each data is stored in a hash storage unit, so that a certain data can be directly obtained for insertion, deletion and search. However, in reality, the hash table cannot be infinite, and there is no limit to the number of data saved in theory, so the number of data saved is far greater than the number of storage units of the hash table.

In order to insert, delete and search data in O(1), a data must be mapped to a fixed position in the hash table. This mapping function is the hash function. The hash function calculates the data to get a location address in the hash table.

Figure 1.1 ideal hash table

Select a better hash function and the number of storage units in the hash table, so that the data stored in the hash table can be evenly distributed. The ideal state is unlikely to be realized. Since the number of stored data is much larger than the number of hash table storage units, the best hash function may also make different data get the same mapping position, which leads to conflict. But good hash Function can minimize this conflict.

# < detach link >

The first way to solve this conflict is to use the linked list, that is, the data is actually stored in the linked list linked with the hash table storage unit, not the hash In a storage unit.

Figure 2.1 separate linked list

In case of conflict, link the two data in the linked list saved in the same hash storage unit. When there are multiple data in the linked list stored in a storage unit, the search, addition and deletion of the data behind the linked list is not O(1) in the strict sense. A good hash function can make the linked list very short. In the worst case, when all data is stored in the linked list specified by a hash unit, the hash It's like a linked list.

# < open address >

When using the open address method to solve the conflict, the data is still stored in the storage unit of the hash table, but when the conflict occurs, the new address should be calculated again.

The commonly used open address method is linear exploration, that is, when a data is inserted, deleted or searched, it is found that this location is not the data to be found through hash function calculation. At this time, it is necessary to check the next storage unit until the data to be operated is found.

In addition to linear exploration, there are secondary exploration and hash Methods such as how to determine the new location again when the calculated location is not the data to be found.

# < Complete hash Table >

When the conflict is solved by separating the linked list, when multiple data are mapped to an address, they form a linked list. To operate one of the data, you must operate in the linked list. If the hash function is selected well, the linked list will be very short. This operation is approximate to O(1), but it is not exactly equal to O(1). It is related to the length of the linked list. The worst-case access to data is also the hash of O(1), which is called the complete hash table.

Such a hash table is a two-level hash table. The first level hash is the same as the hash using the separate link method, but the hash storage unit points to another hash instead of a linked list Watch.

Figure 4.1 complete hash table

Be careful to select the first level and second level hash functions, which can completely ensure the second level hash There are no conflicts in the table.

➤ common algorithm

➣ direct addressing method: address set and keyword set have the same size

➣ digital analysis method: select the appropriate hash algorithm according to the characteristics of the keywords requiring hash, and try to find the differences of each keyword

➣ square centering method: take the middle pole after the square of the keyword as the hash address. The middle digits after the square of a number are related to each digit of the number, and the number of digits obtained is determined by the table length. For example, if the table length is 512, = 2 ^ 9, the middle 9-bit binary number after square can be taken as the hash address.

➣ folding method: when the number of key words is large and the numbers on each bit of the key words are roughly evenly distributed, the folding method can be used to obtain the hash address. Except for the retention and remainder method, P can be selected as the prime number or the composite number without a quality factor less than 20

➣ random number method: it is more appropriate to use this method to construct hash function when keywords are different.

➤ in actual work, different hash functions need to be used according to different situations

➣ considerations: time required to calculate hash function, hardware instructions, etc.

➣ keyword length

➣ hash table size

➣ keyword distribution

➣ record the frequency of searches. (huffeman tree)

The specific codes are as follows:

#include<stdlib.h> #include<math.h> struct HashTable; struct ListNote; typedef struct HashTable *HashTbl; typedef struct ListNote *Position; typedef Position List; int Hash(int key,int tablesize); int NextPrime(int x); HashTbl InitalizeTable(int TableSize); void DestroyTable(HashTbl H); Position Find(int key,HashTbl H); void Insert(int key, HashTbl H); void Delete(int key,HashTbl H); struct HashTable{ int TableSize; Position *TheList; }; struct ListNote{ int element; Position next; }; int Hash(int key,int tablesize){ return key%tablesize; } int NextPrime(int x){ int flag; while(1){ flag = 0; int i; int n = sqrt((float)x); for(i = 2 ;i <= n;i++){ if(x % i == 0){ flag = 1; break; } } if(flag == 0) return x; else x++; } } HashTbl InitalizeTable(int TableSize){ if(TableSize <= 0){ printf("There is a problem with the hash size\n"); return NULL; } HashTbl table = (HashTbl)malloc(sizeof(struct HashTable)); if(table == NULL) printf("allocation failed"); table->TableSize = NextPrime(TableSize); table->TheList = (Position*)malloc(sizeof(List) * table->TableSize); if(table->TheList == NULL) printf("allocation failed"); table->TheList[0] = (Position)malloc(table->TableSize*sizeof(struct ListNote)); if(table->TheList == NULL) printf("allocation failed"); int i; for(i = 0;i < table->TableSize;i++){ table->TheList[i] = table->TheList[0] + i; table->TheList[i]->next = NULL; } return table; } Position Find(int key,HashTbl H){ Position p; List L = H->TheList[Hash(key,H->TableSize)]; p = L->next; while(p != NULL && p->element != key) p = p->next; if(p == NULL) return L; else return p; } void Insert(int key,HashTbl H){ Position p,NewCell; p = Find(key,H); if(p->element != key){ NewCell = (Position)malloc(sizeof(struct ListNote)); if(NewCell == NULL) printf("allocation failed"); else{ p = H->TheList[Hash(key,H->TableSize)]; NewCell->next = p->next; p->next = NewCell; NewCell->element = key; } } else printf("The value already exists\n"); } void Delete(int key,HashTbl H){ Position p ,NewCell; p = Find(key,H); if(p->element == key){ NewCell = H->TheList[Hash(key,H->TableSize)]; while(NewCell->next != p) NewCell = NewCell->next; NewCell->next = p->next; free(p); } else printf("There is no such value"); } int main(){ HashTbl table = InitalizeTable(10); Position p = NULL; p = Find(10,table); printf("%d\n",p->element); Insert(55,table); Insert(90,table); Insert(35,table); Insert(33,table); p = Find(55,table); printf("%d\n",p->element); p = Find(33,table); printf("%d\n",p->element); Delete(33,table); Delete(44,table); system( "pause" ); return 0 ; }