Hash in C + +_ The difference between map and map

hash_ What is the difference between map and map?
Constructor. hash_map needs hash function, which is equal to function; map only needs to compare functions (less than functions)  
Storage structure. hash_map is stored by hash table, and map is generally implemented by red black tree (RB Tree). Therefore, its memory data structure is different.  
When do I need to use hash_map, when do I need to use map?
In general, hash_ The search speed of map will be faster than that of map, and the search speed and the amount of data belong to the constant level; The search speed of map is at the log(n) level. The constant is not necessarily smaller than log(n). Hash and hash function take time. See, if you consider efficiency, especially when the element reaches a certain order of magnitude, consider hash_map. But if you are very strict with memory and want the program to consume as little memory as possible, be careful, hash_map may make you embarrassed, especially when you hash_ When there are too many map objects, you can't control them, and hash_map construction is slow.

Do you know how to choose now? Weigh three factors: search speed, data volume, and memory usage.

For details, see Hash in C + +_ The difference between map and map_ Yusi Guyuan's column - CSDN blog_ c++ hash_map

1) Why hash_map
Have you used map? Map provides a very common function, that is, it provides key value storage and search functions. For example, I want to record a person's name and corresponding storage, and add it at any time. I want to quickly find and modify:

Yue buqun, leader of Huashan sect, is called Gentleman sword
Zhang Sanfeng, leader of Wudang and founder of Taijiquan
Oriental invincible - the first master, sunflower classic
...
    This information is not complicated if it is saved, but it is more troublesome to find. For example, if I want to find the information of "Zhang Sanfeng", the stupidest way is to get all the records and compare them one by one according to the name. If you want to be fast, you need to arrange these records in alphabetical order, and then look them up according to the dichotomy. However, when adding records, you need to keep the records in order, so you need to insert sorting. Considering efficiency, this requires the use of binary trees. If you use the STL map container, you can easily implement this function without paying attention to its details. For details about the data structure of map, interested friends can refer to learning the data structure basis of STL map and STL set. Look at the implementation of map:

#include <map>
 
#include <string>
 
using namespace std;
 
...
 
map<string, string> namemap;
//Increase...
namemap["yue buqun"]="The leader of Huashan sect is called Gentleman sword";
namemap["Zhang Sanfeng"]="Leader of Wudang, founder of Taijiquan";
namemap["invincible eastern"]="The first master, sunflower classic";
...

//Find..
if(namemap.find("yue buqun") != namemap.end()){
        ...
}

  Don't you think it's easy to use? And the efficiency is very high. You can find the record you are looking for by comparing 1 million records 20 times at most; Two million records can only be compared 21 times.

    Speed can never meet the needs of reality. If there are 1 million records and I need to search frequently, 20 comparisons will also become a bottleneck. Is it possible to reduce it to one or two comparisons? And when the number of records reaches 2 million, it is also a comparison once or twice. Is it possible? It also needs to be as easy to use as map.

    The answer is yes. Then you need has_map. Although hash_map is not included in the C + + standard template library, but almost every version of STL provides a corresponding implementation. And it is widely used. In the formal use of hash_ Before mapping, look at the hash_ The principle of map.

2) Data structure: hash_map principle
    This is a section for you to understand hash deeply_ If you just want to swallow the introduction of map and don't want to understand its principle, you can skip this section, but I suggest you take a look and know more about it.

hash_ The map is based on a hash table. The biggest advantage of hash table is that the time consumed in data storage and search is greatly reduced, which can almost be regarded as constant time; The cost is just consuming more memory. However, with more and more available memory, it is worthwhile to trade space for time. In addition, easy coding is also one of its characteristics.

    Its basic principle is to use an array with a large subscript range to store elements. You can design a function (hash function, also known as hash function) so that the keyword of each element corresponds to a function value (i.e. array subscript, hash value), so use this array unit to store this element; It can also be simply understood as "classifying" each element according to keywords, and then storing this element in the place corresponding to the corresponding "class", which is called bucket.

    However, it cannot be guaranteed that the keyword of each element corresponds to the function value one by one. Therefore, it is very likely that the same function value is calculated for different elements, resulting in "conflict". In other words, different elements are divided into the same "class". In general, "direct addressing" and "conflict resolution" are the two characteristics of hash tables.

    hash_map, first allocate a large amount of memory to form many buckets. The hash function is used to map key s to different areas (buckets) for saving. The insertion process is:

  • Get key  
  • Get the hash value through the hash function  
  • Get the bucket number (generally hash value, modulo the bucket number)  
  • Store key and value in bucket.  

The value process is:

  • Get key
  • Get the hash value through the hash function  
  • Get the bucket number (generally hash value, modulo the bucket number)  
  • Compare whether the internal elements of the bucket are equal to the key. If they are not equal, they are not found.  
  • Get the value of the equal record.  

    hash_ The direct address in map is generated by hash function to solve the conflict, and solved by comparison function. It can be seen here that if there is only one element in each bucket, there is only one comparison. Many queries are faster when there are no values in many buckets

    It can be seen that the hash function and comparison function are related to the user to implement the hash table. These two parameters happen to be the hash we are using_ Map requires the specified parameters.

3) hash_map usage
     A simple example
Don't worry about how to use "Yue buqun" with hash_map shows that we first look at a simple example: We randomly give you an ID number and the corresponding information of the ID number. The range of the ID number is the 31st power of 1 ~ 2. How to quickly save a search.

#include <hash_map>
 
#include <string>
 
using namespace std;
 
int main(){
 
hash_map<int, string> mymap;
 
mymap[9527]="Tang Bohu points Qiuxiang";
 
mymap[1000000]="The life of a millionaire";
 
mymap[10000]="White collar wage bottom line";
 
...
 
if(mymap.find(10000) != mymap.end()){
 
...
 
}

       It's simple enough, just like map. Then you might ask? What about hash function and comparison function? Don't you want to specify? You're right, but when you don't specify the hash function and comparison function, you will have a default function. Look at the hash function_ Map statement, you will understand more. The following is the statement of SGI STL:

 
template <class _Key, class _Tp, class _HashFcn = hash<_Key>,
 
class _EqualKey = equal_to<_Key>,
 
class _Alloc = __STL_DEFAULT_ALLOCATOR(_Tp) >
 
class hash_map
 
{
 
...
 
}

That is, in the above example, there is the following equivalent relationship:

...
 
hash_map<int, string> mymap;
 
//Equivalent to:
 
hash_map<int, string, hash<int>, equal_to<int> > mymap;

Let's not pay too much attention to Alloc (for friends who want to know more about Alloc, please refer to the standard library STL: what can Alloc do)

hash_ Hash function of map

hash<   What exactly is int > like? Look at the source code:

struct hash<int> {
 
size_t operator()(int __x) const { return __x; }
 
};
 
It turned out to be a function object. stay SGI STL In, the following are provided hash Function:
  
 
struct hash<char*>
 
struct hash<const char*>
 
struct hash<char>
 
struct hash<unsigned char>
 
struct hash<signed char>
 
struct hash<short>
 
struct hash<unsigned short>
 
struct hash<int>
 
struct hash<unsigned int>
 
struct hash<long>
 
struct hash<unsigned long>

 

That is, if your key uses one of the above types, you can use the default hash function. Of course, you can also define your own hash function. For custom variables, you can only do this. For example, for string, you must customize the hash function. For example:

 
struct str_hash{
 
size_t operator()(const string& str) const
 
{
 
unsigned long __h = 0;
 
for (size_t i = 0 ; i < str.size() ; i ++)
 
__h = 5*__h + str[i];
 
return size_t(__h);
 
}
 
};

//If you want to use the system defined string hash function, you can write as follows:

 
struct str_hash{
 
size_t operator()(const string& str) const
 
{
 
return __stl_hash_string(str.c_str());
 
}
 
};
When declaring your own hash function, you should pay attention to the following points:

 

use struct,Then reload operator(). 
Return yes size_t 
The parameter is what you want hash of key Type of. 
Function is const Type. 
If these are difficult to remember, the easiest way is to follow the cat and change a function.

Now you can start with"yue buqun"Hashed  . Simply replace with the following statement:

map<string, string> namemap; 
//Replace with:
hash_map<string, string, str_hash> namemap;
Other uses do not use edges. Of course, don't forget str_hash The declaration and header file of are changed to hash_map. 

You might ask: what about the comparison function? Don't worry, let's start here hash_map Comparison function in.

   hash_map Comparison function of
 stay map The comparison function in needs to be provided less Function. If it is not provided, the default is also less< Key> . stay hash_map In, compare the data in the bucket with key Equal or not, so what is needed is a function of equal or not:equal_to< Key> . Have a look first equal_to Source code:

 

//This code can be downloaded from SGI STL
 
//Look at binary first_ Function function declaration is actually just defining some types.
 
template <class _Arg1, class _Arg2, class _Result>
 
struct binary_function {
 
typedef _Arg1 first_argument_type;
 
typedef _Arg2 second_argument_type;
 
typedef _Result result_type;
 
};
 
//Look at equal_ Definition of to:
 
template <class _Tp>
 
struct equal_to : public binary_function<_Tp,_Tp,bool>
 
{
 
bool operator()(const _Tp& __x, const _Tp& __y) const { return __x == __y; }
 
};

If you use a custom data type, such as struct, mystruct, or const char * string, how to use the comparison function? There are two ways to use the comparison function. The first is to overload the = = operator and use equal_to; Take the following example:

struct mystruct{
 
int iID;
 
int len;
 
bool operator==(const mystruct & my) const{
 
return (iID==my.iID) && (len==my.len) ;
 
}
 
};
    

In this way, equal can be used_ To < mystruct > is used as the comparison function. Another way is to use function objects. Customize a comparison function body:

 
struct compare_str{
 
bool operator()(const char* p1, const char*p2) const{
 
return strcmp(p1,p2)==0;
 
}
 
};

With compare_str, you can use hash_ It's too late.

typedef hash_map<const char*, string, hash<const char*>, compare_str> StrIntMap;
StrIntMap namemap;
namemap["yue buqun"]="The leader of Huashan sect is called Gentleman sword";
namemap["Zhang Sanfeng"]="Leader of Wudang, founder of Taijiquan";
namemap["invincible eastern"]="The first master, sunflower classic";

hash_map function
hash_ The function of map is similar to that of map. For specific function parameters and explanations, please refer to STL programming manual: hash_ Map, here we mainly introduce several common functions.

hash_map(size_type n) if efficiency is important, this parameter must be set. N is mainly used to set hash_ The number of hash buckets in the map container. The more buckets, the smaller the probability of hash function conflict and the smaller the probability of re applying for memory. The greater the N, the higher the efficiency, but the greater the memory consumption.  
const_ Iterator find (const key_type & K) const. Search, enter as key value and return as iterator.  
data_ Type & operator [] (const key_type & K). This is one of my most commonly used functions. Because of its special convenience, it can be used like an array. However, it should be noted that when you use the [key] operator, if there is no key element in the container, it is equivalent to automatically adding a key element. So when you just want to know if there is a key element in the container, you can use find. If you want to insert this element, you can directly use the [] operator.  
Insert function. When the container does not contain the key value, the function of the insert function is similar to that of the [] operator. However, when there are more and more elements in the container, the elements in each bucket will increase. In order to ensure efficiency, hash_map will automatically request more memory to generate more buckets. Therefore, the previous iterator may not be available after insert.  
Erase function. In the process of insert ing, when there are too many elements in each bucket, hash_map may automatically expand the memory of the container. However, in sgi stl, erase does not automatically reclaim memory. Therefore, after you call erase, the iterator s of other elements are still available.  
3) Related hash container
Hash container except hash_ In addition to map, there is hash_set, hash_multimap, has_multiset, these containers are different from set, multimap, multiset and hash_ The difference between map and map is the same. I don't think I need to elaborate one by one.

4) Other
Here are some common questions that you should understand and use hash_map is more helpful.

     hash_ What is the difference between map and map?
Constructor. hash_map needs hash function, which is equal to function; map only needs to compare functions (less than functions)  
Storage structure. hash_map is stored by hash table, and map is generally implemented by red black tree (RB Tree). Therefore, its memory data structure is different.  
     When do I need to use hash_map, when do I need to use map?
In general, hash_ The search speed of map will be faster than that of map, and the search speed and the amount of data belong to the constant level; The search speed of map is at the log(n) level. The constant is not necessarily smaller than log(n). Hash and hash function take time. See, if you consider efficiency, especially when the element reaches a certain order of magnitude, consider hash_map. But if you are very strict with memory and want the program to consume as little memory as possible, be careful, hash_map may make you embarrassed, especially when you hash_ When there are too many map objects, you can't control them, and hash_map construction is slow.

Do you know how to choose now? Weigh three factors: search speed, data volume, and memory usage.

Here's another one about hash_map and map's short story: http://dev.csdn.net/Develop/article/14/14019.shtm

4.3 how to hash_ Add self defined types to the map?
You just need to do two things, define the hash function, and define the comparison function. The following code is an example:

-bash-2.05b$ cat my.cpp
 
#include <hash_map>
 
#include <string>
 
#include <iostream>
  
 
using namespace std;
 
//define the class
 
class ClassA{
 
public:
 
ClassA(int a):c_a(a){}
 
int getvalue()const { return c_a;}
 
void setvalue(int a){c_a;}
 
private:
 
int c_a;
 
};
  
 
//1 define the hash function
 
struct hash_A{
 
size_t operator()(const class ClassA & A)const{
 
// return hash<int>(classA.getvalue());
 
return A.getvalue();
 
}
 
};
  
 
//2 define the equal function
 
struct equal_A{
 
bool operator()(const class ClassA & a1, const class ClassA & a2)const{
 
return a1.getvalue() == a2.getvalue();
 
}
 
};
  
 
int main()
 
{
 
hash_map<ClassA, string, hash_A, equal_A> hmap;
 
ClassA a1(12);
 
hmap[a1]="I am 12";
 
ClassA a2(198877);
 
hmap[a2]="I am 198877";
  
 
cout<<hmap[a1]<<endl;
 
cout<<hmap[a2]<<endl;
 
return 0;
 
}

-bash-2.05b$ make my
 
c++ -O -pipe -march=pentiumpro my.cpp -o my
 
-bash-2.05b$ ./my
 
I am 12
 
I am 198877

How to use hash_map replace the existing map container in the program?
This is easy, but you need to have a good programming style. It is recommended that you try to use typedef to define your type:

typedef map<Key, Value> KeyMap;
When you want to use hash_ When replacing with map, you only need to modify:

typedef hash_map<Key, Value> KeyMap;
Others remain basically unchanged. Of course, you need to pay attention to whether there are hash functions and comparison functions of Key type.

     Why hash_map is not standard?
I don't know why it's not standard. An explanation says that when STL is added to standard C + +, hash_ The map series was not fully implemented at that time and should become a standard in the future. If anyone knows a more reasonable explanation, also hope to tell me. But what I want to say is that it is because of hash_map is not standard, so g + + compiler is installed on many platforms, not necessarily hash_ Implementation of map. I encountered such an example. Therefore, when using these non-standard libraries, we must test them in advance. In addition, if platform migration is considered, it is better to use less.

About a use of hash_ Simple example of map

#if defined(__GNUC__)
 
#if __GNUC__ < 3 && __GNUC__ >= 2 && __GNUC_MINOR__ >= 95
 
#include <hash_map>
 
#elif __GNUC__ >= 3
 
#include <ext/hash_map>
 
using namespace __gnu_cxx;
 
#else
 
#include <hash_map.h>
 
#endif
 
#elif defined(__MSVC_VER__)
 
#if __MSVC_VER__ >= 7
 
#include <hash_map>
 
#else
 
#error "std::hash_map is not available with this compiler"
 
#endif
 
#elif defined(__sgi__)
 
#include <hash_map>
 
#else
 
#error "std::hash_map is not available with this compiler"
 
#endif
 
#include <string>
 
#include <iostream>
 
#include <algorithm>
 
using namespace std;
 
struct str_hash{
 
size_t operator()(const string& str) const
 
{
 
return __stl_hash_string(str.c_str());
 
}
 
};
 
struct str_equal{
 
bool operator()(const string& s1,const string& s2) const {
 
return s1==s2;
 
}
 
};
  
 
int main(int argc, char *argv[])
 
{
 
hash_map<string,string,str_hash,str_equal> mymap;
 
mymap.insert(pair<string,string>("hcq","20"));
 
mymap["sgx"]="24";
 
mymap["sb"]="23";
 
cout<<mymap["sb"]<<endl;
 
if(mymap.find("hcq")!=mymap.end())
 
cout<<mymap["hcq"]<<endl;
 
return 0;
 
}
 

 

Tags: C++ data structure

Posted on Wed, 06 Oct 2021 18:39:02 -0400 by if