hashCode and the relationship between hashCode() and equals()

hashCode and the relationship between hashCode() and equals()

1, What is hashCode:

hashCode is the hash code of an object. It is an integer value derived from some information of the object. By default, it represents the storage address of the object. Hash code can improve the efficiency of retrieval. It is mainly used to quickly determine the storage address of objects in hash storage structure, such as Hashtable and hashMap.

Why can hashcode improve retrieval efficiency? Let's take a look at an example. What is the simplest way to judge whether a collection contains an object? Take out each element in the collection one by one and compare it with the object to be found. When the result of equals() comparison between the element and the object to be found is true, stop searching and return true. Otherwise, return false. If there are many elements in a set, such as 10000 elements, and there is no object to find, it means that your program needs to take 10000 elements from the set and compare them one by one to get a conclusion. This is very inefficient. At this time, hash algorithm (hash algorithm) can be used to improve the efficiency of finding elements from the set, and the data can be directly allocated to different regions according to a specific algorithm. The set is divided into several storage areas. Each object can calculate a hash code, and the hash codes can be grouped (calculated using different hash functions). Each group corresponds to a storage area. According to the hash code of an object, you can determine which area the object should be stored in, greatly reducing the number of query matching elements.

For example, HashSet uses the hash algorithm to access the set of objects. It internally groups the hash code and divides the storage area of the object by taking the remainder of a number n. when looking for an object from the HashSet set, the Java system first calls the hashCode() method of the object to obtain the hash code of the object, and then finds the corresponding storage area according to the hash, Finally, each element in the storage area is obtained and compared with the object by equals(), so that the conclusion can be obtained without traversing all the elements in the collection.

Next, calculate a group of hash codes through hashCode() of String class:

public class HashCodeTest {
	public static void main(String[] args) {
		int hash= 0;
		String s= "ok";
		StringBuilder sb = new StringBuilder(s);
		
		System.out.println(s.hashCode() + "  " + sb.hashCode());
		
		String t = new String("ok");
		StringBuilder tb =new StringBuilder(s);
		System.out.println(t.hashCode() + "  " + tb.hashCode());
	}
}

Operation results:
3548 1829164700
3548 2018699554
We can see that the strings s and t have the same hash code, because the hash code of the string is derived from the content. The string buffer sb and tb have different hash codes. This is because StringBuilder does not override the hashCode() method. Its hash code is the Object storage address calculated by the default hashCode() of the Object class, so the hash code is naturally different. So how to rewrite a better hashCode method is not difficult. As long as we reasonably organize the hash codes of objects, we can make different objects produce more uniform hash codes. For example, the following example:

public class Model {
	private String name;
	private double salary;
	private int sex;
	
	@Override
	public int hashCode() {
		return name.hashCode() + new Double(salary).hashCode() + new Integer(sex).hashCode();
	}
}

In the above code, we can combine the hash codes of various attribute objects reasonably, and finally produce a relatively good or more uniform hash code. Of course, the above is only a reference example, and we can also implement it in other ways, as long as we can make the hash code more uniform (the so-called uniformity means that the hash codes generated by each Object should not conflict). However, there are two improvements to the hashCode method in java 7. First, the Java publisher wants us to use a safer calling method to return the hash code, that is, use the null safe method Objects.hashCode (note that it is not an Object but java.util.Objects) Method. The advantage of this method is that if the parameter is null, it only returns 0, otherwise it returns the result of the hashCode called by the Object parameter. The source code of Objects.hashCode is as follows:

public static int hashCode(Object o) {
        return o != null ? o.hashCode() : 0;
    }

Therefore, our modified code is as follows:

import java.util.Objects;
public  class Model {
	private   String name;
	private double salary;
	private int sex;
	@Override
	public int hashCode() {
		return Objects.hashCode(name) + new Double(salary).hashCode() + new Integer(sex).hashCode();
	}
}

java 7 also provides another method java.util.Objects.hash(Object... objects), which can be called when we need to combine multiple hash values. Further simplify the above code:

import java.util.Objects;
public  class Model {
	private   String name;
	private double salary;
	private int sex;
	
	@Override
	public int hashCode() {
		return Objects.hash(name,salary,sex);
	}
}

Well, we've talked about this hashCode() introduction. There's one more thing to say. If we provide an array type variable, we can call Arrays.hashCode() to calculate its hash code, which is composed of the hash code of the array element.

2, Relationship between equals() and hashCode():

The Java superclass Object class has defined the equals() and hashCode() methods. In the Object class, equals() compares whether the memory addresses of the two objects are equal, and hashCode() returns the memory address of the Object. Therefore, hashCode is mainly used for searching, while equals() It is used to compare whether two objects are equal. However, sometimes we may need to rewrite these two methods according to specific requirements. When rewriting these two methods, we mainly pay attention to the following characteristics:

(1) If the equals() result of two objects is true, the hashcodes of the two objects must be the same;

(2) The hashCode() results of the two objects are the same, which does not mean that the equals() of the two objects must be true. It only means that the two objects are in a hash storage structure.

(3) If the object's equals() is overridden, the object's hashCode() is also overridden.

3, Why override the hashCode() method when overriding equals():

Before answering this question, let's first understand the process of putting elements into the collection, as shown in the following figure:

When putting an object into a collection, first judge whether the hashcode value of the object to be placed is equal to the hashcode value of any element in the collection. If not, directly put the object into the collection. If the hashcode values are equal, then judge whether the object to be placed is equal to any object in the storage area through equals(), if equals() If it is not equal, put the element directly into the collection, otherwise it will not be put into the collection.

Similarly, when using get() to query elements, the collection class also calls key.hashCode() to calculate the array subscript, and then looks at the results of equals(). If it is true, it is found, otherwise it is not found.

Suppose we override the Object's equals() but not the hashcode () method, because the hashcode () method in the superclass Object always returns the memory address of an Object, and the memory address of different objects is always unequal. At this time, even if we rewrite the equals() method, there will be no specific effect, because we cannot ensure that the two objects whose equals() result is true will be hashed in the same storage area, that is, the result of obj1.equals(obj2) is true, but we cannot ensure that the result of obj1.hashCode() == obj2.hashCode() expression is also true; In this case, the data is not unique, because if the hashcode () is not equal, the equals method will not be called for comparison, so rewriting equals() is meaningless.

Taking HashSet as an example, if the hashCode() method of a class does not comply with the above requirements, when the comparison results of two instance objects of this class with the equals() method are equal, they should not be stored in the set set at the same time. However, when they are stored in the HashSet set set, because the return values of their hashCode() methods are different (the HashSet uses the hashCode() in the Object, and its return value is the memory address of the Object). The second Object is first calculated according to the hash code, and may be placed in a region different from the first Object. In this way, it is impossible to compare the equals method with the first Object, and it may also be stored in the HashSet set. Therefore, the hashCode() in the Object class Method can't meet the requirement that objects are stored in the HashSet, because its return value is calculated from the memory address of the Object, and the hash value returned by the same Object at any time during program operation is always the same. Therefore, as long as there are two different instance objects, even if their equals method comparison results are equal, their default hashCode method is the same The return value is different.

Next, let's give a few small examples to test:

3.1 test 1: overwrite equals() but not hashCode(), resulting in non uniqueness of data.

public class HashCodeTest {  
    public static void main(String[] args) {  
        Collection set = new HashSet();  
        Point p1 = new Point(1, 1);  
        Point p2 = new Point(1, 1);  
  
        System.out.println(p1.equals(p2));  
        set.add(p1);   //(1)  
        set.add(p2);   //(2)  
        set.add(p1);   //(3)  
  
        Iterator iterator = set.iterator();  
        while (iterator.hasNext()) {  
            Object object = iterator.next();  
            System.out.println(object);  
        }  
    }  
}  
  
class Point {  
    private int x;  
    private int y;  
  
    public Point(int x, int y) {  
        super();  
        this.x = x;  
        this.y = y;  
    }  
  
    @Override  
    public boolean equals(Object obj) {  
        if (this == obj)  
            return true;  
        if (obj == null)  
            return false;  
        if (getClass() != obj.getClass())  
            return false;  
        Point other = (Point) obj;  
        if (x != other.x)  
            return false;  
        if (y != other.y)  
            return false;  
        return true;  
    }  
  
    @Override  
    public String toString() {  
        return "x:" + x + ",y:" + y;  
    }  
}  
Output results:
true
x:1,y:1  
x:1,y:1 

Cause analysis:

When set.add(p1) is executed (1), the set is empty and directly stored in the set;
When set.add(p2) is executed (2) First, judge whether the storage area where the hashCode value of the Object p2 is located has the same hashCode. Because the hashCode method is not overwritten, the hashCode method of the Object is used by default to return the integer after memory address conversion. Because the address values of different objects are different, there are no objects with the same hashCode value as p2, so they are directly stored in the collection.
When set.add(p1) (3) is executed, because p1 has been stored in the collection, the hashCode value returned by the same object is the same. Continue to judge whether equals returns true. Because it is the same object, it returns true. At this time, jdk considers that the object already exists in the collection, so it discards it.

3.2 test 2: overwriting hashCode() but not equals() will still lead to non uniqueness of data.

Modify the Point class:

class Point {  
    private int x;  
    private int y;  
  
    public Point(int x, int y) {  
        super();  
        this.x = x;  
        this.y = y;  
    }  
  
    @Override  
    public int hashCode() {  
        final int prime = 31;  
        int result = 1;  
        result = prime * result + x;  
        result = prime * result + y;  
        return result;  
    }  
  
    @Override  
    public String toString() {  
        return "x:" + x + ",y:" + y;  
    }  
  
}  
Output results:
false
x:1,y:1  
x:1,y:1 

Cause analysis:
When set.add(p1) is executed (1), the set is empty and directly stored in the set;
When executing set.add(p2) (2), first judge whether the storage area where the hashCode value of p2 of the object is located has the same hashCode. The hashCode method is overwritten here. The hashcodes of p1 and p2 are equal, so continue to judge whether equals() is equal, because equals() is not overwritten here. The default is to use "" instead of "" The memory addresses of the two objects are compared, so here equals () will return false, so the set is considered to be different objects, so p2 is stored in the set.
When set.add(p1) (3) is executed, because p1 has been stored in the collection, the hashCode value returned by the same object is the same, and equals returns true. At this time, it is considered that the object already exists in the collection, so it is discarded.

Combining the above two tests, to ensure the uniqueness of elements, you must cover hashCode and equals at the same time.

(Note: when inserting the same element in the HashSet (hashCode and equals are equal), the newly added element will be discarded, while when inserting the same Key (Value is different) in the HashMap, the original element will be overwritten.)

4, Memory leak caused by hashCode():

public class RectObject {
	public int x;
	public int y;
	public RectObject(int x,int y){
		this.x = x;
		this.y = y;
	}
	@Override
	public int hashCode(){
		final int prime = 31;
		int result = 1;
		result = prime * result + x;
		result = prime * result + y;
		return result;
	}
	@Override
	public boolean equals(Object obj){
		if(this == obj)
			return true;
		if(obj == null)
			return false;
		if(getClass() != obj.getClass())
			return false;
		final RectObject other = (RectObject)obj;
		if(x != other.x){
			return false;
		}
		if(y != other.y){
			return false;
		}
		return true;
	}
}

We have rewritten the hashCode and equals methods in the parent Object. We can see that in the hashCode and equals methods, if the X and Y values of two RectObject objects are equal, their hashCode values are equal, and equals returns true;

import java.util.HashSet;
public class Demo {
	public static void main(String[] args){
		HashSet<RectObject> set = new HashSet<RectObject>();
		RectObject r1 = new RectObject(3,3);
		RectObject r2 = new RectObject(5,5);
		RectObject r3 = new RectObject(3,5);
		set.add(r1);
		set.add(r2);
		set.add(r3);
		r3.y = 7;
		System.out.println("Size before deletion size:"+set.size());//3
		set.remove(r3);
		System.out.println("Size after deletion size:"+set.size());//3
	}
}
Operation results:
Size before deletion size:3
 Size after deletion size:3

Here, we found a problem. When we call remove to delete the r3 object, we think r3 has been deleted, but in fact it has not been deleted. This is called a memory leak, that is, the unused object is still in memory. Therefore, after we do this many times, the memory explodes. Take a look at the source code of remove:

    public boolean remove(Object o) {
        return map.remove(o)==PRESENT;
    }

Then take a look at the source code of the remove method of map:

   public V remove(Object key) {
       Entry<K,V> e = removeEntryForKey(key);
       return (e == null ? null : e.value);
   }

Take another look at the source code of the removeEntryForKey method:

/**
     * Removes and returns the entry associated with the specified key
     * in the HashMap.  Returns null if the HashMap contains no mapping
     * for this key.
     */
    final Entry<K,V> removeEntryForKey(Object key) {
        int hash = (key == null) ? 0 : hash(key);
        int i = indexFor(hash, table.length);
        Entry<K,V> prev = table[i];
        Entry<K,V> e = prev;
 
        while (e != null) {
            Entry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }
 
        return e;
    }

We can see that when we call the remove method, we will first use the hashCode value of the object to find the object and then delete it. This problem is because we are modifying the value of the Y attribute of the r3 object and the hashCode() of the RectObject object There is a y value in the method to participate in the operation, so the hashCode of r3 object is changed, so r3 is not found in the remove method, so the deletion fails. That is, the hashCode of r3 has changed, but its storage location has not been updated and is still in the original location, so we can't find it when we use its new hashCode

The above memory leak tells me a message: if we participate in the hashCode operation of the object's attribute value, we can't modify its attribute value during deletion, otherwise it will lead to memory leak.

5, hashCode() method and equals() method of basic data type and String type:

(1) hashCode(): the eight basic types of hashCode() are very simple, which is to directly return their numerical size. String objects use a complex calculation method, but this calculation method can ensure that if the values of this string are equal, their hashcodes are equal.

(2) equals(): the equals method of eight basic types of encapsulated classes is to directly compare values. The equals method of String type is to compare the values of strings.

6, Does the hashcode value change before and after the JVM GC?

Answer the question of a small partner in the comment area: the storage location of an object will change after GC. Will the hashcode of this object change? If the user thread obtains the hashcode of the object before GC, and then GC, will the object not be found after GC according to the hashcode? The answer was no!

As mentioned earlier, without rewriting hashcode (), hashcode is generated according to the memory address mapping of the object. Moreover, the hashcode () method of java.lang.Object has three conventions:

First, when the field used by an object's equals() method remains unchanged, the value of multiple calls to the hashCode() method should remain unchanged.
Second, if the two object equals(Object o) methods are equal, the hashCode() method value must be equal.
Third, if the two object equals(Object o) methods are not equal, the hashCode() method value is not required to be equal, but in this case, try to ensure that the hashcodes are different to improve performance.
We know that when the JVM performs GC operations, the memory address of the object will change whether it is the tag replication algorithm or the tag collation algorithm, but the hashcode needs to remain unchanged. How does the JVM realize this function?

When the hashcode method is not called, the position used to store the hashcode in the object header is 0. However, when the hashCode() method is called for the first time, the corresponding hashcode value will be calculated and stored in the object header. When called again, you can get the calculated hashcode directly from the object header.

The above method ensures that even if the GC occurs and the object storage address changes, the hashcode value will not be affected. For example, before GC, hashCode() method was called, and hashcode value was stored, never mind even if the address changed. This is especially true when hashCode is invoked after GC occurs.

(1) Code validation:

The following is a simple code to verify the results of memory addresses and hashcode values before and after GC. First, introduce JOL dependency into the project:

<dependency>
    <groupId>org.openjdk.jol</groupId>
    <artifactId>jol-core</artifactId>
    <version>0.10</version>
</dependency>

The verification code is as follows:

public static void main(String[] args)
{
    Object obj = new Object();
    long address = VM.current().addressOf(obj);
    long hashCode = obj.hashCode();
    System.out.println("GC front-Memory address:" + address);
    System.out.println("GC front-hashcode Value:" + hashCode);

    new Object();
    new Object();
    new Object();
    System.gc();

    long afterAddress = VM.current().addressOf(obj);
    long afterHashCode = obj.hashCode();
    System.out.println("GC after-Memory address:" + afterAddress);
    System.out.println("GC after-hashcode Value:" + afterHashCode);
    System.out.println("---------------------");

    System.out.println("Memory address = " + (address == afterAddress));
    System.out.println("hashcode = " + (hashCode == afterHashCode));
}

Output results:

Before GC - memory address: 31883104632
Pre GC hashcode value: 331844619
After GC - memory address: 29035177568
Post GC hashcode value: 331844619

Memory address = false
hashcode = true
The storage method of hashcode was also mentioned earlier. Let's simply verify and observe the changes of information in object header:

    public static void main(String[] args)
    {
        // Create objects and print information about objects in the JVM
        Object person = new Object();
        System.out.println(ClassLayout.parseInstance(person).toPrintable());
 
        // Call the hashCode method. If the hashCode method is overridden, call the System#identityHashCode method
        System.out.println(person.hashCode());
        // System.out.println(System.identityHashCode(person));
 
        // Print the information in the object JVM again
        System.out.println(ClassLayout.parseInstance(person).toPrintable());
    }

Execution results:

java.lang.Object object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4        (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4        (object header)                           e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
 
863831416
java.lang.Object object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           01 78 05 7d (00000001 01111000 00000101 01111101) (2097510401)
      4     4        (object header)                           33 00 00 00 (00110011 00000000 00000000 00000000) (51)
      8     4        (object header)                           e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

Before and after calling the hashcode method, we can see that the Value stored in a row with OFFSET 0 changes from 1 to 2097510401, that is, the Value of hashcode is stored. If the corresponding method is not called, it will not be stored.

--------
Copyright notice: This is the original article of CSDN blogger "Zhang weipeng", which follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this notice for reprint.
Original link: https://blog.csdn.net/a745233700/article/details/83186808

Tags: Java Algorithm

Posted on Sat, 02 Oct 2021 21:28:05 -0400 by JC99