Tsinghua Deng Junhui data structure learning notes - introduction, vector, list

Chapter one introduction

(a) Calculation

Good algorithm: correct (processing simple, large-scale, general, degenerate, legal input), robust, readable, efficient (fast, small storage space)
Calculate cost T(n): the basic number of operations required to solve the problem with scale n. in all cases with scale n, only the worst (the highest cost) is concerned

(b) Calculation model

Turing machine model (TM) (q, c; d, L/R, p): the status is q, and the current character is c; change the current character to d, and turn to the left / right adjacent cell; turn to p, and once the status is' h ', stop, such as (< 1, 0, l, < etc.)
RAM (Random Access Machine): register sequence number, each operation only needs constant time

(c) Big O mark

Represents the upper bound. According to T(n) < C.f(n), f(n) is used instead of T(n), and constant coefficient and lower term can be ignored
O(1) constant

//Constant condition
2013×2013
//Including cycle
for(i=0; i<n; i+= n/2013 + 1);
for(i=1; i<n; i=1 << i);
//Containing branches
if((n+m)*(n+m) < 4*n*m) goto UNREACHABLE;
//Include recursion
if(2==(n*n)% 5) 01(n);

O(logn) logarithm: this kind of algorithm is very effective, the complexity is infinitely close to constant, and constant base number and constant power can be ignored
O(n) linearity and Polynomials: linearity, from n to n2 is the main coverage of programming problems
O(2^n) index: the watershed from polynomial to index that is considered to be effective algorithm to invalid algorithm

(d) Algorithm analysis

Two tasks: correctness (invariance, monotonicity) + complexity
Complexity analysis: guess + verification, iteration (sum of Series), recursion (recursion tracking + recursion equation)
Arithmetic series: the same order as the square of the last term
Power series: one order higher than power
Geometric series (a > 1): the same order as the last term
Convergence series: O(1)
Harmonic series: 1 + 1 / 2 + 1 / 3 + +1/n=O(logn)
Log series: log1+log2 + +logn=O(nlogn)

eg: bubble sorting

void bubblesort(int A[], int n){
	for (bool sorted = false; sorted = !sorted; n--)
		for (int i=1; i<n; i++)
			if (A[i-1] > A[i]){
				swap(A[i-1], A[i]);
				sorted = false;
			}
}

Invariance: the largest k elements are in place after k-round exchange
Monotonicity: the scale of the problem is reduced to n-k after k-round exchange
Correctness: after n scans at most, the algorithm will terminate and give the correct solution

(e) Iteration and recursion

Reduce and govern: divided into two sub problems, one is ordinary, the other is reduced;

//Array summation
sum(int A[], int n){
	return
	(n<1)?
		0: sum(A, n-1) + A[n-1];	
}

Solve sum(A, n)
Recursive base: sum(A, 0)
Recurrence formula: t (n) = t (n-1) + O (1) t (0) = O (1)

//Array inversion
while(lo < hi) swap(A[lo++], A[hi--]);

Divide and rule: divided into two subproblems, roughly the same scale

//Array summation
sum(int A[], int lo, int hi){
	if (lo == hi) return A[lo];
	int mi = (lo + hi) >> 1;
	return sum(A, lo, mi) + sum(A, mi + 1, hi)
}

Solve sum(A, lo, hi)
Recursive base: sum(A, lo, lo)
Recurrence formula: t (n) = 2 * t (n / 2) + O (1) t (1) = O (1)
Complexity: T(n) = O(n)

eg: find the two largest integers from array interval A[lo, hi)

//Max2: iteration 1
void max2(int A[], int lo, int hi, int & x1, int & x2){
	for (x1 = lo, int i = lo + 1; i < hi; i++)
		if (A[x1] < A[i]) x1 = i;
	for (x2 = lo, int i = lo + 1; i < x1; i++)
		if (A[x2] < A[i]) x2 = i;
	for (int i = x1 + 1; i < hi; i++)
		if (A[x2] < A[i]) x2 = i;
}

Comparison times O(2n-3)

//Max2: iteration 2
void max2(int A[], int lo, int hi, int & x1, int & x2){
	if (A[x1 = lo] < A[x2 = lo + 1]) swap(x1, x2);
	for(int i = lo + 2; i < hi; i++)
		if (A[x2] < A[i])
			if(A[x1] < A[x2 = i])
				swap(x1, x2);
}

Best case: n - 1
Worst case: 2n - 3

//Max2: recursion + divide and conquer
void max2(int A[], int lo, int hi, int & x1, int & x2){
	if (lo + 2 == hi){/*...*/;return;}
	if (lo + 3 == hi){/*...*/;return;}
	int mi = (lo + hi)/2;
	int x1L, x1R; max2(A, lo, mi, x1L, x2L);
	int x1R, x2R; max2(A, mi, hi, x1R, x2R);
	if(A[x1L] > A[x1R]){
		x1 = x1L; x2 = (A[x2L] > A[x1R])?x2L : x1R;
	}else{
		x1 = x1R; x2 = (A[x1L] > A[x2R])?x1L : x2R;
	}
}

Comparison times: 5n/3 - 2

(f) Dynamic planning

eg1: Fibonacci sequence: fib(n) = fib(n-1) + fib(n-2)

int fib(n){return (2 > n)? n : fib(n-1) + fib(n -2)}

Inefficient because each recursive instance is repeatedly called O(2^n)
Solution A: tabulate the results of calculated instances for future reference
Solution B: dynamic planning, change top-down recursion to bottom-up iteration

f = 0; g = 1;
while (0 < n--){
	g = g + f;
	f = g - f;
}
return g;

T(n) = O(n), only O(1) space is needed
eg2: longest common subsequence (may have multiple; may have ambiguity)

For sequences A[0, n] and B[0, m], LSC has three cases:
(1) n = -1 or m = -1, null sequence ("")
(2) A[n] = 'X' = B[m], take LSC(A[0, n),B[0, m)) + 'X' (reduce and treat)
(3) A[n] ≠ B[m], then the longer one (divide and rule) is taken in LCS(A[0, n], B[0, m)) and LCS(A[0, n), B[0, m])

Best case: O(n + m)
Worst case: O (2 ^ n) (n = m)

As with Fibonacci series, there are a lot of repeated recursive examples. If dynamic programming is adopted, all subproblems can be calculated in O(nm) time, so it is only necessary to (1) list all subproblems into a table; (2) reverse the calculation direction, and calculate all terms in turn from LCS(A[0], B[0]).

Chapter 2 vector

(a) Interface and Implementation

size() - reports the size of the current vector (total number of elements)
Get ® - get the element with rank r
put(r,e) - replace the value with e whose rank is r element
insert(r,e) - e is inserted as the rank r element, and the original and subsequent elements move backward in turn
remove ® - delete the element with rank r and return the object stored in the element
Disorded () - determines whether all elements are in a non descending order
sort() - adjust the position of the elements so that they are arranged in a non descending order
find(e) - find the target element e
search(e) - finds the target element E and returns the element with the largest rank and no greater than e
Duplicate () - delete duplicate elements
Unify() - remove duplicate elements (ordered vectors)
traverse() - traverses the vector and processes all elements in a unified way. The processing method is specified by the function object

vector template class

typedef int Rank;
#define DEFAULT_CAPACITY 3
template <typename T> class Vector{
	private:Rank _size; int _capacity; T* _elem;
	protected:
		/*...Internal function*/
	public:
		/*...Constructor*/
		/*...Destructor*/
		/*...Read only interface*/
		/*...Writable interface*/
		/*...Traversal interface*/
}

Structure and analysis

Vector(int c = DEFAULT_CAPACITY)
	{ _elem = new T[_capacity = c]; _size = 0;} //default
Vector(T const *A, Rank lo, Rank hi) //Array interval copy
	{ copyFrom(A, lo, hi); }
Vector(Vector<T> const& V, Rank lo, Rank hi)
	{ copyFrom(V._elem, lo, hi); } //Vector interval replication
Vector(Vector<T> const& V)
	{ copyFrom(V._elem, 0, V._size); }
~Vector() { delete [] _elem; } //Free up internal space

Replication based construction

template <typename T> //T is the basic type or the overloaded copy operator '='
void Vector<T>::copyFrom(T* const A, Rank lo, Rank hi){
	_elem = new T[_capacity = 2*(hi - lo)];  //Allocation space
	_size = 0; //Scale clearing
	while (lo < hi) //Elements in A[lo, hi) one by one
		_elem[_size++] = A[lo++]; //Copy to elem [0, hi LO)

}

(b) Extendable vector

Static space management: open up an internal array and use a physical space with continuous address. If the total capacity of the capacity is fixed, there are obvious shortcomings:
(1) Overflow: _elem [] is not enough for all elements, although the system still has enough space at this time
(2) Underflow: very few elements in elem [], filling factor less than 50%
Dynamic space management: expanding the capacity of internal arrays when overflowing
Implementation of capacity expansion algorithm

template <typename T>
void Vector<T>::expand(){
	if (_size < _capacity) return;
	_capacity = max(_capacity, DEFAULT_CAPACITY);
	T* oldElem = _elem; _elem = new T[_capacity <<= 1]; //Capacity doubling
	for(int i=0; i<_size;i++) //Copy original vector content
		_elem[i] = oldElem[i];
	delete [] oldElem;
}

Capacity increasing strategy

T* oldElem = _elem; _elem = new T[_capacity += INCREMENT]; //Append fixed size capacity

Worst case: in the space vector with initial capacity of 0, insert two elements (n = m * I > > continuously), and expand the capacity when inserting 1, I+1, 2I+1, 3I+1. The time cost of each expansion is 0, I, 2I , (m-1)I, total time O(n^2), cost sharing for each expansion O(n)

Capacity doubling strategy

T* oldElem = _elem; _elem = new T[_capacity <<= 1];

Worst case: in the full vector with an initial capacity of 1, insert two elements with n = 2 ^ m > > in succession. The first, second, fourth, eighth and sixteenth inserts need to be expanded. The time cost of copying the original vector in each expansion is 1,2,4,8 , 2^m = n, total time O(n), apportioned cost of each expansion O(1)
Average analysis complexity: according to the distribution of the occurrence probability of various operations of data structure, the corresponding cost is weighted average, and various possible operations are examined as independent events, which cuts off the correlation and coherence between operations, and cannot accurately evaluate the real performance of data structure and algorithm;
Allocation complexity: the data structure is continuously operated enough times, the total cost is allocated to a single operation, and a series of operations are considered as a whole

(c) Disordered vector

Element access
Using V.get ® and V.put(r, e) interfaces to read and write vector elements is not as convenient and efficient as array A[r], so we need to overload the subscript operator []

template <typename T>
T & Vector<T>::operator[](Rank r) const{return _elem[r];}

After that, the external V[r] corresponds to the internal v. "elem [R]
Right value: T x = V[r] + U[s] * W[t];
Left value: V[r] = (T)(2*x + 3)
insert

template <typename T> //Insert e as rank r element
Rank Vector<T>::insert(Rank r, T const & e){
	expand(); //Expand if necessary
	for (int i=_size; i>r; i--)
		_elem[i] = _elem[i-1]; //Subsequent elements move one unit backward
	_elem[r] = e; _size++;
	return r; //Return rank
}

Interval deletion

template <typename T>
int Vector<T>::remove(Rank lo, Rank hi){
	if(lo == hi) return 0; //In consideration of efficiency, degradation is treated separately
	while(hi<_size) _elem[lo++] = _elem[hi++]; //[hi,_size) move the hi Lo bit forward in sequence
	_size = lo; shrink(); //Scale up, shrink if necessary
	return hi-lo; //Returns the number of deleted elements
}

lookup
Unordered vector: T is the basic type of decidable or overloaded operator "= =" or "! ="
Ordered vector: T is a basic type that can be compared, or the overloaded operator "< or" >

template <typename T>
Rank Vector<T>::find(T const & e, Rank lo, Rank hi) const{
	while((lo < hi--) && (e!=_elem[hi]));
	return hi;
}

Complexity input sensitivity, best O(1), worst O(n)
Single element deletion
It can be regarded as a special case of interval deletion: [r] = [r, r+1)

template <typename T> //Delete the element with rank r in the vector
T Vector<T>::remove(Rank r){
	T e = _elem[r]; //Backup deleted elements
	remove(r,r+1);
	return e;
}

Q: Can I call remove ® repeatedly to implement remove(lo,hi)?
Each cycle time is directly proportional to the suffix length of the deleted interval = n - hi = O(n), and the number of cycles is equal to the interval width = hi - lo = O(n), which will lead to the complexity of O(n^2).

(d1) ordered vector: uniqueness

Application example: local results of network search are de duplicated to form final report
Wrong Edition

template <typename T>
int Vector<T>::deduplicate(){
	int oldSize = _size; //Record the original scale
	Rank i = 1;
	while(i<size)
		(find(_elem[i], 0, i)<0)? //Find the same person in the prefix
			i++ : remove(i);
	return oldSize - _size;
}

Correctness:
1. Invariance. In the prefix V[0, i) of the current element V[i], each element is different from each other, and the initial i=1 naturally holds.
2. Monotonicity. With the iteration of while, the prefix increases, the suffix decreases, and the algorithm iterates O(n) rounds at most.

Time complexity:
find() and remove() consume linear time in each iteration, so the total is O(n^2)

Further optimization:
1. Following the efficient version of unify(), the number of element moves is reduced to O(n) - but the number of comparisons is still O(n^2);
2. First mark the elements to be deleted, and then delete them uniformly - the stability is maintained, but the search length is longer, resulting in more comparison operations;
3. V.sort(). Unify(): concise realization of optimal O(nlogn)

ergodic
Traverse the vector, and uniformly implement visit operation for each element
Using function pointer mechanism, read-only or local modification

template <typename T>
void Vector<T>::traverse(void (*visit)(T&)) //Function pointer
	{for (int i = 0; i < _size; i++) visit(_elem[i]);}

Global modification using function object mechanism

template <typename T> template <typename VST>
void Vector<T>::traverse(VST& visit) //Function object
	{for (int i = 0; i < _size; i++) visit(_elem[i]);}

Example: add one to all elements in the vector
First, implement a class that adds one to a single T-type element

template <typename T> //Suppose T can directly increment or overload operators
struct Increase{ //Function object: implemented by overloaded operator ()
	virtual void operator()(T & e){e++;} //add one-tenth
}

Thereafter

template <typename T> void increase(Vector<T> & V){
	V.traverse(Increase<T>()); //Basic operation traversal vector
}

Order and its discrimination
The number of adjacent reverse order pairs, which can be used to measure the degree of reverse order of vectors

template <typename T>
int Vector<T>::disordered() const{
	int n = o;
	for (int i = 1; i < _size; i++) //Check each pair of adjacent elements one by one
		n += (_elem[i-1] > _elem[i]); //Count in reverse order
	return n; //Vector order if and only if n = 0
} //If it is only necessary to judge whether it is in order, it can be terminated immediately after the first encounter of reverse order pair

Uniqueness
Inefficient algorithm
Observation: in an ordered vector, repeated elements must be adjacent to each other to form an interval, each interval only needs to retain a single element

template <typename T> int Vector<T>::uniquify(){
	int oldSize = _size; int i = 0;
	(_elem[i] == _elem[i+1]) ? remove(i+1); i++;
	return oldSize - _size; //Total number of deleted elements
}

Complexity: the running time mainly depends on the while loop, with a total of n-1 times; in the worst case, remove() is called every time, taking O(n-1) ~ O(1), accumulating O(n^2). Although find() is omitted, it is the same as the duplicate () of the unordered vector.
Introspection: the root of low efficiency lies in that the same element can be moved forward many times as the successor of the deleted element. If the same element is deleted in batches in the unit of repetition interval, the performance will be improved.
Efficient algorithm

template <typename T> int Vector<T>::uniquify(){
	Rank i = 0, j = 0; //Rank of each pair of mutually different adjacent elements
	while (++j < _size) //Scan one by one until the last element
		//Skip the same person and move forward to the right next to the different elements
		if (_elem[i]!=_elem[j]) _elem[++i]=_elem[j];
	_size = ++i; shrink(); //Directly cut off the redundant elements in the tail
	return j-i; //Total number of deleted elements
}

Complexity: n-1 iterations in total, each constant time, cumulative O(n) time

(d2) ordered vector: binary search

Unified interface

template <typename T> //Search algorithm unified interface
Rank Vector<T>::search(T const & e, Rank lo, Rank hi) const{
	return (rand() % 2)? //Randomly selected according to 50% probability
		binSearch(_elem, e, lo, hi) //Binary search or
	: 	fibSearch(_elem, e, lo, hi); //Fibonacci search method
}

Question: how to deal with special situations? For example, the target element does not exist, or there are multiple target elements
Semantic Convention: at least it should be convenient to maintain the ordered vector itself V.insert(1 + V.search(e), e). Even if it fails, it should give the appropriate insertion position of new elements. If it is allowed to repeat elements, each group should be arranged according to its insertion order.
Convention: in the ordered vector interval V[lo,hi), determine the last element (rank) not greater than e
If - ∞ < e < v [Lo], return lo-1 (left sentry)
If v [hi - 1] < e < + ∞, then hi - 1 is returned (end element = right guard next to)
Version A:
Reduce and Conquer: any element x = S[mi] is bounded, and the search interval can be divided into three parts. S [lo, MI) < = S[mi] < = s (MI, HI). S[mi] is called the pivot point. After at most two comparisons, it can hit, or reduce the problem size to half

template <typename T> 
static Rank binSearch(T* A, T const& e, Rank lo, Rank hi){
	while(lo < hi){
		Rank mi = (lo + hi) >> 1; //Center point as pivot point
		if (e < A[mi]) hi = mi;
		else if (A[mi] < e) lo = mi + 1;
		else return mi; //Hit at mi
	}
	return -1; //Search failed
}

Linear recursion: T(n) = T(n/2) + O(1) = O(logn), which is much better than sequential search. The recursion depth is O(logn). Each recursion instance takes O(1).
Search length: evaluate the performance of search algorithm more precisely, and check the comparison times of key codes, i.e. search length

(d3) ordered vector: Fibonacc search

Idea: the key comparison times before turning to the left and right branches are not equal, but the recursion depth is the same. If we can compensate the imbalance of turning cost through the imbalance of recursion depth, the average search depth should be further shortened.
For example, n = fib(k) - 1, MI = FIB (k-1) - 1 can be taken, so the length of the front and back sub vectors are FIB (k-1) - 1, FIB (K-2) - 1, respectively

template <typename T>
static Rank fibSearch(T* A, T const & e, Rank lo, Rank hi){
	Fib fib(hi - lo); //Create Fib series
	while (lo < hi){
		while (hi - lo < fib.get()) fib.prev(); //How many iterations at most?
			//Determine the pivot point of Fib(k) - 1 by looking forward
		Rank mi = lo + fib.get() - 1; 
		if (e < A[mi]) hi = mi;
		else if (A[mi] < e) lo = mi + 1;
		else return mi;
	}
	return -1;
}

(d4) ordered vector: binary search (improved)

Improvement idea: each iteration only makes a key comparison, and all branches only have two directions

template <typename T> static Rank binSearch(T* A, T const & e, Rank lo, Rank hi){
	while(1 < hi - lo){
		Rank mi = (lo + hi) >> 1;
		(e < A[mi]) ? hi = mi : lo = mi; //[lo,mi) or [mi,hi)
	}
	return (e == A[lo])? lo:-1
}

(d5) ordered vector: interpolation search

Semantic Convention: search() interface convention, return the last element no greater than E. Only by fulfilling this agreement can algorithms be effectively supported, such as V.insert(1+V.search(e), e)
(1) When more than one hit element is hit, the last (rank maximum) must be returned;
(2) In case of failure, the largest (including sentinel [lo - 1]) less than e should be returned

template <typename T> static Rank binSearch(T* A, T const& e, Rank lo, Rank hi){
	while (lo < hi){
		Rank mi = (lo + hi) >> 1;
		(e < A[mi])? hi = mi : lo = mi + 1;
	} //At exit, A[lo = hi] is the smallest element greater than e
	return --lo; //lo - 1 is the maximum rank of the element not greater than e
}

When the search interval is shortened to 0 instead of 1, the algorithm ends; when the right sub vector is transferred, the left boundary is taken as mi+1 instead of mi; whether it succeeds or not, the returned rank strictly conforms to the semantic convention of the interface.

(e) Bubbling order

Sorter: unified entry

template <typename T>
void Vector<T>::sort(Rank lo, Rank hi){
	switch (rand() % 5){
		case 1 : bubbleSort(lo, hi); break; //bubble sort
		case 2 : selectionSort(lo, hi); break; //Selection sort
		case 3 : mergeSort(lo, hi): break; //Merge sort
		case 4 : heapSort(lo, hi);break; //Heap sorting (Chapter 10)
		default : quickSort(lo, hi);break; //Quick sort (Chapter 12)
	}
}

bubble sort

template <typename T> void Vector<T>::bubbleSort(Rank lo, Rank hi)
{ while (!bubble(lo,hi--)); } //Scan and exchange one by one until the whole sequence

template <typename T> bool Vector<T>::bubble(Rank lo, Rank hi){
	bool sorted = true; //Overall orderly sign
	while (++lo < hi) //Check each pair of adjacent elements from left to right
		if (_elem[lo - 1] > _elem[lo]){ //If in reverse order
			sorted = false;
			swap(_elem[lo - 1], _elem[lo]); //exchange
		}
	return sorted; //Return to ordered flag
}//When the disorder order is limited to [0, root sign n), it still needs 3 / 2 of O(n) time: O(n.r)

Improvement: the previous version of the logical flag sorted, changed to rank last

template <typename T> void Vector<T>::bubbleSort(Rank lo, Rank hi)
{ while (lo < (hi = bubble(lo,hi)));}

template <typename T> Rank Vector<T>::bubble(Rank lo, Rank hi){
	Rank last = lo; //The rightmost reverse pair is initialized to [lo - 1, lo]
	while (++lo < hi) //Check each pair of adjacent elements from left to right
		if (_elem[lo - 1] > _elem[lo]){//If in reverse order
			last = lo; //Update the position of the rightmost reverse pair
			swap(_elem[lo - 1], _elem[lo]); 
		}
	return last;
}

Comprehensive evaluation:
(1) The efficiency is the same as the version of the first chapter for integer array, the best O(n), the worst O(n^2);
(2) The stability (relative order of input and output sequences) of the algorithm remains unchanged when the repeated elements are input;
(3) In the bubbling sequence, the relative positions of elements a and b change, there is only one possibility: after exchanging with other elements respectively, they are close to each other until they are adjacent to each other. In the next round of scanning exchange, they exchange positions because of reverse order.

(f) Merge sort

Principle: divide and conquer strategy (universal for vector and list) run time O(nlogn)
Sequence bisection / / O(1)
Subsequence recursive sort / / 2 × T(n/2)
Merge ordered subsequences / / O(n)

template <typename T>
void Vector<T>::mergeSort(Rank lo, Rank hi){
	if (hi - lo < 2) return; //Natural order of single element interval
	int mi = (lo + hi) >> 1//Bounded by midpoint
	mergeSort(lo, mi); //Sort first half
	mergeSort(mi, hi); //Sort the second half
	merge(lo, mi, hi); //Merge
}

Two way merge principle: merge two ordered sequences into an ordered sequence s [lo, hi] = s [lo, MI) + s [MI, HI)

template <typenamee T> void Vector<T>::merge(Rank lo, Rank mi, Rank hi){
	T* A = _elem + lo;  //The combined sub vector a [0, hi - Lo) = ﹣ elem [lo, HI)
	int lb = mi - lo; T* B = new T[lb]; //Front sub vector B [0, LB) = ﹣ elem [lo, MI)
	for (Rank i = 0; i < lb; B[i] = A[i++]); //Copy previous sub vector B
	int lc = hi - mi; T* c = _elem + mi; //Posterior sub vector C [0, LC) = ﹣ elem [MI, HI)
	for (Rank i = 0, j = 0, k = 0; (j <lb) || (k < lc)){//The smaller of B[j] and C[k] goes to the end of A
		if ((j < lb) && (lc <= k || (B[j] <= C[k]))) A[i++] = B[j++]; //C[k] is no longer small
		if ((k < lc) && (lb <= j || (C[k] < B[j]))) A[i++] = C[k++]; //B[j] no longer or larger
	}//The loop is compact, but not as efficient as split processing
delete [] B; //Release temporary space B
}

Complexity analysis: the running time of the algorithm is mainly consumed in for loop, and there are two control variables J and K, the initial j=0, k=0; the final j=lb, k=lc; that is, j + k = LB + LC = hi Lo = n.
Observe that after each iteration, at least one of J and k will be added (at least one of j+k will be added), so the total iteration of merge() does not exceed * * O(n) * * times, and the cumulative linear time. This conclusion does not contradict the next generation of nlogn, because B and C are in order.
Note: the subsequence to be merged does not have to be equal in length. lb ≠ lc, mi ≠ (lo + hi) / 2 are allowed. This algorithm and conclusion can also be applied to another kind of sequence list.

Chapter 3 list

(a) Interface and Implementation

According to whether to change the data structure, all operation modes are roughly divided into: (1) static: read only, the content and composition of the data structure are generally unchanged, get and search; (2) dynamic: to be written, the part or the whole of the data structure will change, insert and remove.
The storage and organization of data elements: (1) static: the physical storage order of data elements is strictly consistent with its logical order, which can support static operations of colleges and universities, such as vectors; (2) dynamic: the physical space dynamically allocated and recycled for each data element, logically adjacent elements record each other's physical addresses, logically forming a whole, which can support efficient dynamic operations , such as the list.
List elements are called nodes, and adjacent nodes are called each other's predecessors or successors. If they exist, they are unique. If there is no predecessor / successor, the unique node is the first / last node.
Vector: rank based access, O(1) time to determine physical address
List: location-based access, using mutual references between nodes
ADT interface (ListNode)

pred() //Location of the predecessor node of the current node
succ() //Location of the current node's successor node
data() //Data object saved by current node
insertAsPred(e) //Insert the precursor node, save the referenced object e, and return to the new node location
insertAsSucc(e) //Insert the successor node, save the introduced object e, and return to the new node location

List node: ListNode template class

#define Posi(T) ListNode<T>* //List node location
template <typename T> //For simplicity, it's completely open and not over packaged
struct ListNode{//List node template class (in the form of double linked list)
	T data; //numerical value
	Posi(T) pred; //precursor
	Posi(T) succ; //Successor
	ListNode(){} //Construction for header and tracer
	ListNode(T e, Posi(T) p = NULL, Posi(T) s = NULL):data(e),pred(p),succ(s){}//default constructor 
	Posi(T) insertAsPred(T const& e);//Insert before
	Posi(T) insertAsSucc(T const& e);//Post insertion
}

Other ADT interfaces

size() //Current size of report list (total number of nodes)
first(),last() //Return the position of the first and last nodes
insertAsFirst(e),insertAsLast(e) //Insert e as the first and last nodes
insertBefore(p,e),insertAfter(p,e) //Taking e as the direct precursor and subsequent insertion of node p
remove(p) //Delete the node at position p and return its reference
disordered() //Determine whether all nodes have been arranged in non descending order
sort() //Adjust the position of each node in non descending order
find(e) //Find target element e, return NULL on failure
search(e) //Find e and return the node with the largest rank (ordered list)
deduplicate(),uniquify() //Eliminate duplicate nodes
traverse() //Traversal list

Lists: List template classes

#include "listNode.h" //Import list node class
template <typename T> class List{//List template class
private: int _size; Posi(T) header; Posi(T) trailer;//Head and tail sentry
protected: /*...Internal function */
public: /*...Constructor, destructor, read-only interface, writable interface, traversal interface/
}

structure

template <typename T> void List<T>::init(){
	header = new ListNode<T>; //Create a sentinel node
	trailer = new ListNode<T>; //Create tail sentry node
	header->succ = trailer; header->pred = NULL; //interconnection
	trailer->pred = header; trailer->succ = NULL;//interconnection
	_size = 0;//Record size
}

(b) Unordered list

From rank to position, imitate vector access by rank, overload subscript operator

template <typename T>
T List<T>::operator[](Rank r) const{
	Posi(T) p = first(); //Starting from the head node
	while(0 < r--) p = p->succ; //The r-th node in sequence
	return p->data;//Target node
}//The rank of any node, that is, the total number of its precursors

Find: find the last one equal to e in the n (true) precursors of node p (which may be a tracer)

template <typename T> //When called from outside, 0 < = n < = rank (P) < u size
Posi(T) List<T>::find(T const & e, int n, Posi(T) p) const{
	while(0 < n--) //From right to left, compare the precursor of p with e one by one
		if( e == ( p = p->pred )->data) return p; //Until hit or out of range
	return NULL; //If it goes beyond the left boundary, it means the search fails
}//The existence of header makes the processing more concise

insert

template <typename T> Posi(T) List<T>::insertBefore(Posi(T) p, T const& e)
{ _size++; return p->insertAsPred(e);} //e as the precursor of p
template <typename T> //Pre insertion algorithm (symmetry of post insertion algorithm)
Posi(T) x = new ListNode(e, pred, this); //Create (100 times)
pred->succ = x; pred = x; return x; //Establish a connection and return to the location of the new node

Replication based construction

template <typename T>//Basic interface
void List<T>::copyNodes(Posi(T) p, int n){ //O(n)
	init();//Create and initialize the head and tail sentinel nodes
	while (n--) //Insert the n-term from p as the end node in turn
		{insertAsLast(p->data); p = p->succ;}
}

delete

template <typename T> //Delete the node at legal position p and return its value
T List<T>::remove(Posi(T) p){  //O(1)
	T e = p->data; //Backup the value of the node to be deleted (set type T to be assigned directly)
	p->pred->succ = p->succ;
	p->succ->pred = p->pred;
	delete p; _size--; return e; //Return backup value
}

Deconstruction

template <typename T> List<T>::~List() //List deconstruction
{clear();delete header; delete trailer;} //Clear the list and release the head and tail sentry nodes
template <typename T> int List<T>::clear(){//clear list
	int oldSize = _size;
	while(0 < _size) //Repeatedly eliminate the first node until the list is empty
		remove(header->succ);
	return oldSize;
} //O(n), linearly proportional to list size

Uniqueness

template <typename T> int List<T>::deduplicate(){//Eliminate duplicate nodes in unordered list
	if (_size < 2) return 0; //Trivial list without repetition
	int oldSize = _size; //Record size
	Posi(T) p = first(); Rank r = 1; //p from the first node
	while (trailer != (p = p->succ)){//Successively to the end node
		Posi(T) q = find(p->data, r, p); //In the r (true) antecedents of p, find the same
		q ? remove(q) : r++; //If it does exist, delete it, otherwise rank increment
	}
	return oldSize - _size; //List size change, that is, the total number of deleted elements
}//Correctness is the efficiency analysis method and conclusion, the same as vector:: duplicate()

(c) Ordered list

Uniqueness

template <typename T> int List<T>::uniquify(){//Eliminating duplicate elements in batches
	if( _size < 2) return 0; //Trivial list without repetition
	int oldSize = _size; //Record the original scale
	ListNodePosi(T) p = first(); ListNodePosi(T) q; //p is the starting point of each section, q is its successor
	while ( trailer != ( q = p->succ)) //Repeatedly check the nearest node pair (p,q)
		if (p->data != q->data) p = q; //If they are different, turn to the next section;
		else remove(q); //Otherwise, delete the latter
	return oldSize - _size; //Change in scale, i.e. total number of deleted elements
} //O(n)

lookup

template <typename T> //Among the n (true) precursors of node p in the sequence table, the last one not greater than e is found
Posi(T) List<T>::search(T const & e, int n, Posi(T) p) const{
	while (0 <= n--) //For the nearest n precursors of p, right to left
		if(((p = p->pred ) -> data) <= e) break; //One by one comparison
	return p; //Returns the location where the search ends until the hit, value, or range is out of range
}//The best O(1), the worst O(n); the average O(n) in equal probability is proportional to the interval width

(d) Select sort

Implementation of selectionSort: select and sort n consecutive elements in the list starting from position p

template <typename T> void List<T>::selectionSort(Posi(T) p, int n){
	Posi(T) head = p->pred; Posi(T) tail = p; //Head, tail
	for(int i = 0; i < n; i++) tail = tail->succ;//head/tail could be a head/tail sentry
	while (1 < n) {//Repeatedly find out the largest from the non trivial to be sorted interval and move to the front of the ordered interval
		insertBefore(tail, remove(selectMax(head->succ,n)));
		tail = tail->pred; n--; //The range of the interval to be sorted and the ordered interval are updated synchronously
	}
}

selectMax implementation: select the largest one from the n elements starting from position p, 1 < n

template <typename T> 
Posi(T) List<T>::selectMax(Posi(T) p, int n){ //O(n)
	Posi(T) max = p; //The maximum is tentatively p
	for (Posi(T) cur = p; 1 < n; n--) //Subsequent nodes are compared with max one by one
		if( !lt((cur = cur->succ)->data, max->data)) //If > =max
			max = cur; //Update maximum element location record
	return max; //Return to maximum node location
}

Performance: there are n iterations in total. In the k-th iteration, selectMax() is O(n-k), remove() and insertBefore() are both O(1), so the overall complexity is O(n^2). Element moving operation is far less than bubble sorting, O(n^2) mainly comes from element comparison operation.

(e) Insert sort

Consider the sequence as two parts: Sorted + Unsorted, i.e. L[0, r) + L[r, n)
Initialization: | S| = r = 0 / / empty sequence does not matter in order
Iteration: focus on and deal with e = L[r], determine the appropriate position in S and insert e to get ordered L[0,r] / / insertion of ordered sequence
Invariance: with the increase of R, L[0,r) is always ordered until r=n, L is the overall order
insertionSort implementation: to sort the consecutive n elements starting from position p in the list, valid (p) & & rank (p) + n < = size

template <typename T> void List<T>::insertionSort(Posi(T) p, int n){
	for (int r = 0; r < n;r++){//Introduce each node one by one, and get Sr+1 from SR
		insertAfter(search( p->data, r, p),p->data); //Find + insert
		p = p->succ; remove( p->pred); //Move to next node
	}//n iterations, each O(r+1)
}//Only O(1) auxiliary space is used, which belongs to local algorithm

Average performance

  • Assuming that the values of each element follow the uniform and independent distribution, how many element comparisons should be made on average?
  • Check the moment when L[r] is just inserted. Which element in the ordered prefix L[0,r] is the previous L[r]?
  • All the r+1 elements are possible, and the probability is equal to 1/(r+1). Therefore, in the iteration just completed, the mathematical expectation of the time spent in introducing S[r] is [r+(r-1) + 3+2+1+0]/(r+1)+1=r/2+1, overall mathematical expectation = [0 + 1 + +(n-1)]/2+1=O(n^2)
Published 2 original articles, won praise 0, visited 32
Private letter follow

Tags: less Programming network

Posted on Sat, 11 Jan 2020 04:54:33 -0500 by stanleybb