Data structure -- sequential structure of binary tree -- heap

preface

We have learned about the basic structure and concepts of trees and binary trees, as well as the storage structure of binary trees, including sequential structure and chain structure. If not, please go to - Click me to understand the basic structure and concepts of tree and binary tree
This time, let's learn the sequential structure of binary tree - heap and heap sorting and TOP-K problem

1, Sequential structure of binary tree

Ordinary binary trees are not suitable for storing with arrays, because there may be a lot of space waste. The complete binary tree is more suitable for sequential structure storage.
In reality, we usually store the heap (a binary tree) in an array of sequential structure. It should be noted that the heap here is different from the heap in the address space of the virtual process of the operating system. One is the data structure, and the other is a partition of an area of the operating system that manages memory.

2, Concept and structure of reactor

If there is a key set K = {K0, k1, k2,..., kn-1}, all its elements are stored in a one-dimensional array in the order of complete binary tree, and meet the following requirements: ki < = k 2I + 1 and Ki < = k 2I + 1 (k i > = k 2I + 1 and K I > = k 2I + 2) I = 0, 1, 2..., it is called small heap (or large heap). The heap with the largest root node is called the maximum heap or large root heap, and the heap with the smallest root node is called the minimum heap or small root heap.

  • The value of a node in the heap is always not greater than or less than the value of its parent node;
  • Heap is always a complete binary tree.

3, Implementation of heap

1. Downward adjustment algorithm

Now we give an array, which is logically regarded as a complete binary tree. We can adjust it into a small heap through the downward adjustment algorithm starting from the root node. The downward adjustment algorithm has a premise: the left and right subtrees must be a heap before adjustment.
Suppose there is such an array: int array[] = {27,15,19,18,28,34,65,49,25,37}, I need to build this array into a small heap, because for 27(array[0]), its left and right subtrees are small heaps, so I only need to adjust 27 downward

The code is as follows:

//Downward adjustment
void AdjustDown(HPDataType* a, int n, int parent)
{
	int child = parent * 2 + 1;//The default left child is smaller than the right child
	while (child < n)
	{
		if (child + 1 < n && a[child + 1] < a[child])//If the right child is smaller than the left child, change it
		{
			child++;
		}
		if (a[child] < a[parent])
		{
			swap(&a[parent], &a[child]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			break;
		}
	}
}

Code test:

To put it simply, the downward adjustment algorithm is that the parent node and the smaller (larger) of its left and right children exchange positions for a parent node. If it is not satisfied that the parent node is smaller (larger) than the child node, the adjustment is ended. At this time, the tree with the parent node as the root node is adjusted into a small (large) heap.
Find the child's subscript through the subscript of the parent node, but we default that the left child is the smallest (largest). We only need to compare with the right child to know who is the largest, and then assign the subscript with the largest value to child. (the physical structure is an array, but the logical structure is a complete binary tree)

2. Creation of heap

As mentioned above, if you want to build (size) heap for an array, the premise is that each element of the array satisfies that its left and right subtrees are (size) heap. But at this time, we give an array, for example: int a[] = {1,5,3,8,7,6}, and we need to build it into a lot

But through drawing, we find that the left and right subtrees of 1, 5 and 3 are not a lot. How should we adjust them? First of all, we need to know that for a tree with only one node (physical structure or an array), we can treat it as either a large pile or a small pile, because the left and right subtrees of this node are empty. Secondly, let's look at the figure. We can take 8, 7 and 6 as a lot, but 3, 5 and 1 are not, so we can adjust the subtree 3-6 to a lot, and then adjust the subtree 5-8-7 to a lot. Finally, for 1, its left and right subtrees are a lot, so you just need to adjust itself

void heap(int* a, int n)
{
	for (int i = (n - 2) / 2; i >= 0; i--)
	{
		AdjustDown(a, n, i);//Adjust from the penultimate non leaf node
	}
}

Code test:

4, Overall code of the heap

//Print data in the heap
void HeapPrint(Heap* hp)
{
	for (int i = 0; i < hp->size; i++)
	{
		printf("%d ", hp->a[i]);
	}
	printf("\n");
}
//Swap two data in the heap
void swap(HPDataType* a, HPDataType* b)
{
	HPDataType tmp = *a;
	*a = *b;
	*b = tmp;
}

//Upward adjustment
void AdjustUp(HPDataType* a, int child)
{
	assert(a);
	int parent = (child - 1) / 2;
	while (child > 0)
	{
		if (a[child] < a[parent])
		{
			swap(&a[child], &a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
			break;
		}
	}
}

//Downward adjustment
void AdjustDown(HPDataType* a, int n, int parent)
{
	int child = parent * 2 + 1;//The default left child is smaller than the right child
	while (child < n)
	{
		if (child + 1 < n && a[child + 1] < a[child])//If the right child is smaller than the left child, change it
		{
			child++;
		}
		if (a[child] < a[parent])
		{
			swap(&a[parent], &a[child]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			break;
		}
	}
}
// Heap construction
void HeapInit(Heap* hp)
{
	assert(hp);
	hp->a = NULL;
	hp->capacity = hp->size = 0;
}
// Heap destruction
void HeapDestory(Heap* hp)
{
	assert(hp);
	free(hp->a);
	hp->a = NULL;
	hp->capacity = hp->size = 0;
}
// Heap insertion
void HeapPush(Heap* hp, HPDataType x)
{
	assert(hp);
	if (hp->capacity == hp->size)
	{
		int newcapacity = hp->capacity == 0 ? 4 : hp->capacity * 2;
		HPDataType* tmp = (HPDataType*)realloc(hp->a, sizeof(HPDataType)*newcapacity);
		if (tmp == NULL)
		{
			printf("realloc fail\n");
		}
		hp->a = tmp;
		hp->capacity = newcapacity;
	}
	hp->a[hp->size] = x;
	hp->size++;
	AdjustUp(hp->a, hp->size - 1);

}
// Deletion of heap top data
void HeapPop(Heap* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));
	swap(&hp->a[0], &hp->a[hp->size - 1]);
	hp->size--;
	AdjustDown(hp->a, hp->size, 0);
}
// Take the data from the top of the heap
HPDataType HeapTop(Heap* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));
	return hp->a[0];
}
// Number of data in the heap
int HeapSize(Heap* hp)
{
	assert(hp);
	return hp->size;
}
// Empty judgment of heap
bool HeapEmpty(Heap* hp)
{
	assert(hp);
	return hp->size == 0;
}

We found that when inserting data, we use upward adjustment instead of downward adjustment. Because we insert data at the end of the array, the number itself is already a small (large) heap, so we only need to compare it with the number above it (all nodes on the path from it to the root node (subscript 0))
See Figure:


Deletion of heap
Deleting the heap is to delete the data at the top of the heap, replace the last data of the data root at the top of the heap, delete the last data of the array, and then adjust the algorithm downward.

5, Time complexity of reactor building

Because the heap is a complete binary tree, and the full binary tree is also a complete binary tree, the full binary tree is used here for simplification (the time complexity is an approximate value, and more nodes do not affect the final result):

6, Heap sorting and top-k problem

1. Heap sorting

Heap sorting uses the idea of heap to sort. It is divided into two steps:
1. Pile building

  • Ascending order: build a pile
  • Descending order: build small piles
Why should we build a large pile in ascending order and a small pile in descending order? See Figure:


2. Use the idea of heap deletion to sort
Downward adjustment is used in both heap creation and heap deletion, so once you master downward adjustment, you can complete heap sorting.

void AdjustDown(int* a, int n, int root)//Build a large pile in ascending order and a small pile in descending order. The time complexity of building a pile is O(N)
{
	int parent = root;
	int child = parent * 2 + 1;//The default is left child
	while (child < n)
	{
		if (child + 1 < n && a[child] < a[child + 1])
		{
			child += 1;
		}
		if (a[child] > a[parent])
		{
			Swap(&a[child], &a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else{
			break;
		}
	}
}


void heapSort(int* a, int n)
{
	for (int i = (n - 2) / 2; i >= 0; i--)
	{
		AdjustDown(a, n, i);
	}

	int end = n - 1;
	while (end > 0)
	{
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);
		end--;
	}
}

Code test:

2.TOP-K problem

TOP-K problem: that is to find the first K largest or smallest elements in data combination. Generally, the amount of data is relatively large.
For example, the top 10 professional players, the world's top 500, the rich list, the top 100 active players in the game, etc. For the Top-K problem, the simplest and direct way you can think of is sorting. However, if the amount of data is very large, sorting is not desirable (maybe all the data can not be loaded into memory at once). The best way is to use heap. The basic idea is as follows:
1. Use the first K elements in the data set to build the heap

  • Find the largest first k and create a small heap of K
  • Find the smallest first k and establish a list of K numbers
2. Compare the remaining N-K elements with the top elements in turn. If not, replace the top elements. After comparing the remaining N-K elements with the top elements in turn, the remaining K elements in the heap are the first K minimum or maximum elements.
Why find the largest first k and build a small heap of K?
Because the heap top of the small heap is the smallest, if the remaining number is larger than this number, it needs to replace the heap top and then adjust it downward. The heap top of the new heap is the smallest. Compare it with the remaining number until the comparison is completed. Finally, the number of the whole heap is the top k largest number we want to find. The same is true: find the smallest first k, and establish the number of K
void PrintTopK(int* a, int n, int k)
{
	int* tmp = (int*)malloc(sizeof(int)* k);
	for (int i = 0; i < k; i++)
	{
		tmp[i] = a[i];
		AdjustUp(tmp, i);
	}
	for (int i = 0; i < 10000; i++)
	{
		if (a[i] > tmp[0])
		{
			tmp[0] = a[i];
			AdjustDown(tmp, k, 0);
		}
	}
	for (int i = 0; i < k; i++)
	{
		printf("%d ", tmp[i]);
	}
	printf("\n");
}


void TestTopk()
{
	int* a = (int*)malloc(sizeof(int)* 10000);
	for (int i = 0; i < 10000; i++)
	{
		a[i] = rand() % 10000;
	}
	a[10] = 155555;
	a[45] = 155556;
	a[695] = 155557;
	a[157] = 155558;
	a[4598] = 155559;
	PrintTopK(a, 10000, 5);
}

Here, we randomly select 10000 numbers, but the size of each number does not exceed 9999, but we change 5 numbers in the array to numbers greater than 9999 (1555555556155575757585855559 respectively).
Finally, we will find the five largest numbers in the array, that is, the modified ones.
Test:

summary

This time, we learned heap (sequential structure of binary tree) - the physical structure is array and the logical structure is complete binary tree. We also learned the application of heap, such as heap sorting and TOP-K problem. We know what kind of heap we need to build for different sorting. What should we do to find the first K minimum (maximum) numbers in the data. Later, we will also learn the chain structure of binary tree and other data structures.

Tags: data structure

Posted on Fri, 26 Nov 2021 12:49:42 -0500 by cravin4candy