[code without size] implement a HashMap

Force deduction question

This problem is on the power button Design hash mapping . If you think about it, it's very simple. Just design a simple HashMap class according to the design principle of hash table.

A simple HashMap needs to support three methods:

void put(K key, V value)
V get(K key)
void remove(K key)

In order to implement the hash node linked list array, you also need to specify the capacity of the array and design the node class HashNode. Here is my answer to this question [very simple]:

class MyHashMap { int capacity; HashNode[] hashList; public MyHashMap() { this.capacity = 1000; this.hashList = new HashNode[this.capacity]; } public void put(int key, int value) { int i = key % this.capacity; HashNode node = this.hashList[i]; HashNode pre = null; while (node != null && node.key != key) { pre = node; node = node.next; } if (node == null) { if (pre == null) { this.hashList[i] = new HashNode(key, value); } else { pre.next = new HashNode(key, value); } } else { node.value = value; } } public int get(int key) { int i = key % this.capacity; HashNode node = this.hashList[i]; while (node != null && node.key != key) { node = node.next; } if (node == null) { return -1; } else { return node.value; } } public void remove(int key) { int i = key % this.capacity; HashNode node = this.hashList[i]; HashNode pre = null; while (node != null && node.key != key) { pre = node; node = node.next; } if (node != null && pre != null) { pre.next = node.next; } else if (node != null) { this.hashList[i] = node.next; } } } class HashNode { int key; int value; HashNode next; public HashNode(int key, int value) { this.key = key; this.value = value; } } /** * Your MyHashMap object will be instantiated and called as such: * MyHashMap obj = new MyHashMap(); * obj.put(key,value); * int param_2 = obj.get(key); * obj.remove(key); */

Java source code (JDK14)

It's so simple to implement a simple hash table, but it's worth studying how the Java source code implements it. Let's analyze the official code [see the official document for the complete source code, and only select the important instructions here]:

HashMap class annotation

The hash table is implemented based on the map interface and provides all optional map operations, allowing null values and null keys.
HashMap and Hashtable are roughly the same. The only difference is that the threads of the former are not synchronized, and null values and null keys are allowed.
HashMap is out of order, and the internal order may change over time.
The time complexity of get and put is constant.
If you care about iteration performance, remember not to set the initial capacity too large or the load factor too small.
The hash table will rehash and expand (to twice the original) when the number of entries exceeds the load factor * current capacity.
The default value of the load factor is 0.75, which is a compromise between time performance and space cost. Higher values will reduce the space cost, but will bring higher query cost.
Remember: HashMap is out of sync with threads. If multiple threads operate on a HashMap at the same time, and at least one thread structurally modifies the HashMap, it must be explicitly synchronize d. Structured modification specifically refers to inserting or deleting a mapping. Only the value of an existing key in the table is not a structured modification. In practice, it is best to encapsulate HashMap · into thread safe classes. For example, Collections.synchronizedMap ` is a good tool.
Do not structurally modify (add or delete) HashMap while traversing it.

Class declaration

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable

Notes and foreword of specific implementation

HashMap is usually a bucket hash, but when the bucket is too large, the too large bucket will be transformed into a tree, and each bucket is similar to the implementation of TreeMap. When trees become smaller, they degenerate into barrels.
Ideally, the frequency of nodes in the bucket follows Poisson distribution (the average parameter is 0.5, when the default loading factor is 0.75). Ignoring the influence of variance, when a collision occurs at each location, the probability that the number of linked lists at this location is greater than or equal to 8 is less than 1 / 1000000. Therefore, set the loading factor between 0.7 and 0.75 (the default).

Important member variables

static final int DEFAULT_ INITIAL_ CAPACITY = 1 << 4; The default initial capacity must be a power of 2.
static final int MAXIMUM_ CAPACITY = 1 << 30; Maximum capacity, power of 2.
static final float DEFAULT_LOAD_FACTOR = 0.75f; Default load factor
static final int TREEIFY_THRESHOLD = 8; Treelized bucket size threshold
static final int UNTREEIFY_THRESHOLD = 6; The bucket tree size threshold is only used when resize occurs. Re hashing results in fewer tree nodes at some locations. Below this threshold, it degenerates into a bucket
static final int MIN_TREEIFY_CAPACITY = 64; Minimum treelized capacity threshold,. Should be at least tree_ Four times the threshold. Otherwise, it will cause the conflict between resize and tree.

Bucket node class

It's very simple. It's all Object methods.

static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } }

hash function

This function is used to calculate the hash value of the key in HashMap:

static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }

It can be seen that instead of directly using key.hashCode() as the hash value of the key, he XORed the high 16 bits and low 16 bits of its hashCode() as the hash value of the key. This operation is called perturbation function.

The reason why the hash needs to be disturbed is that Object.hashCode() will get a 32-bit int value, which is loose in the int space, but not necessarily loose in the HashMap (because the capacity is limited, such as the default 16). In the empty key of HashMap, if the capacity is 16, only the lower 4 bits of hashCode() will be taken, which may cause serious collision. However, if the upper 16 bits of hashCode() are pulled down and the lower 16 bits are XOR, the 32-bit information will be disturbed, which can reduce the risk of collision.

Member variable

transient Node<K,V>[] table;
transient Set<Map.Entry<K,V>> entrySet;
transient int size;
transient int modCount; (structured) modify counter
int threshold; Capacity expansion threshold (capacity * load factor)
final float loadFactor; Loading factor

constructor

public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }

Among them, this function is used to obtain the threshold threshold. This function uses bit operation and is very efficient. Given a number cap, the output is not less than the power of the minimum 2 of cap.

static final int tableSizeFor(int cap) { int n = -1 >>> Integer.numberOfLeadingZeros(cap - 1); return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }

Size() and isEmpty()

Return the size of the hash table. There's nothing to say

public int size() { return size; }

Judge whether it is empty

public boolean isEmpty() { return size == 0; }

get method

public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }

The get method first calls hash() to calculate the hash value of the key, and then obtains the node corresponding to the key through the getNode method. If the node is not empty, the value value is returned. The hash() function performs perturbation processing and XOR the upper 16 bits and the lower 16 bits of key.hashcode() (the implementation schemes of different jdk versions are different).

Take a closer look at the implementation details of getNode. There are several points:

First condition for judgment: the hash table is not empty & & the length of the hash table is not 0 & & the first element of the bucket corresponding to the hash in the hash table is not empty.
When calculating hash% (n-1), use bit operation to speed up hash & (n-1)
Hash% (n-1) is used to locate the bucket, and hash is used to compare whether the key s are the same.
Always check the first node of each bucket first, and return first if it is the key to get.
The judgment condition that node e is the target node is e.hash = = hash & & ((k = e.key) = = key | (key! = null & & key. Equals (k))). Note that both hash and key must be equal. The hash value is obtained by calling the Object.hashcode() method, and equal keys include equal memory or equal () methods. Therefore, if you want to implement the HashMap function of the user-defined class, you need to design the hashcode() and equals() of the user-defined class.
If the first node is a tree node, execute the tree search method getTreeNode().

put method

The put method calls the putVal method to implement the insert update. It should be noted that:

**The put method has a return value** If put is an insert, null is returned; if it is an update, the old value is returned.
The resize() method is used to initialize the table or expand the original table, as shown below.
If the hash table is empty or has a length of 0, resize() is called to initialize.
If the first element of the bucket corresponding to hash%(n-1) is empty, a new node is created at this location.
If 3 and 4 are not satisfied, enter the bucket linked list to search. The search criteria are still the same hash value and the same key. If the first node is the target node, record it; Otherwise, if the first node is a tree node, go to the tree node and insert putTreeVal; Otherwise, the bucket linked list is traversed. If there is a node that meets the conditions, it is recorded. Otherwise, a new node is inserted and it is judged whether the position exceeds the treelization threshold. If so, the treeifyBin() is executed.
If the target node is found, the update logic is executed.

public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }

As for the resize() method, it is nothing more than to create a larger (twice as large as the original) hash table and scatter the original bucket linked list into the new hash table with a new hash function (originally hash%(oldCap-1), now hash%(newCap-1)). If it is a tree, you need to call the split method of the node to split the tree. The code is a little long, so no specific analysis is needed.

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }

remove method

The remove method is similar to the put method. One is to locate the node for modification (or insertion), and the other is to locate the node and delete it.

public V remove(Object key) { Node<K,V> e; return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value; } final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; if ((tab = table) != null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) != null) { Node<K,V> node = null, e; K k; V v; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) node = p; else if ((e = p.next) != null) { if (p instanceof TreeNode) node = ((TreeNode<K,V>)p).getTreeNode(hash, key); else { do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) { node = e; break; } p = e; } while ((e = e.next) != null); } } if (node != null && (!matchValue || (v = node.value) == value || (value != null && value.equals(v)))) { if (node instanceof TreeNode) ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); else if (node == p) tab[index] = node.next; else p.next = node.next; ++modCount; --size; afterNodeRemoval(node); return node; } } return null; }

clear()

clear() is very simple and rough. It directly assigns null to each bucket.

publicvoid clear() { Node<K,V>[] tab; modCount++; if ((tab = table) != null && size > 0) { size = 0; for (int i = 0; i < tab.length; ++i) tab[i] = null; } }

containsKey method

public boolean containsKey(Object key) { return getNode(hash(key), key) != null; }

Call getNode to find the node corresponding to the key. If it is not empty, it exists.

containsValue()

If you compare the hash table to a two-dimensional node matrix, and the hash table stores the pointer of each bucket linked list, then judging whether the value exists is actually a double-layer traversal without accelerating the operation, and the complexity is O(n^2)

public boolean containsValue(Object value) { Node<K,V>[] tab; V v; if ((tab = table) != null && size > 0) { for (Node<K,V> e : tab) { for (; e != null; e = e.next) { if ((v = e.value) == value || (value != null && value.equals(v))) return true; } } } return false; }