# Chapter one introduction

### (a) Calculation

Good algorithm: correct (processing simple, large-scale, general, degenerate, legal input), robust, readable, efficient (fast, small storage space)

Calculate cost T(n): the basic number of operations required to solve the problem with scale n. in all cases with scale n, only the worst (the highest cost) is concerned

### (b) Calculation model

Turing machine model (TM) (q, c; d, L/R, p): the status is q, and the current character is c; change the current character to d, and turn to the left / right adjacent cell; turn to p, and once the status is' h ', stop, such as (< 1, 0, l, < etc.)

RAM (Random Access Machine): register sequence number, each operation only needs constant time

### (c) Big O mark

Represents the upper bound. According to T(n) < C.f(n), f(n) is used instead of T(n), and constant coefficient and lower term can be ignored

O(1) constant

//Constant condition 2013×2013 //Including cycle for(i=0; i<n; i+= n/2013 + 1); for(i=1; i<n; i=1 << i); //Containing branches if((n+m)*(n+m) < 4*n*m) goto UNREACHABLE; //Include recursion if(2==(n*n)% 5) 01(n);

O(logn) logarithm: this kind of algorithm is very effective, the complexity is infinitely close to constant, and constant base number and constant power can be ignored

O(n) linearity and Polynomials: linearity, from n to n2 is the main coverage of programming problems

O(2^n) index: the watershed from polynomial to index that is considered to be effective algorithm to invalid algorithm

### (d) Algorithm analysis

Two tasks: correctness (invariance, monotonicity) + complexity

Complexity analysis: guess + verification, iteration (sum of Series), recursion (recursion tracking + recursion equation)

Arithmetic series: the same order as the square of the last term

Power series: one order higher than power

Geometric series (a > 1): the same order as the last term

Convergence series: O(1)

Harmonic series: 1 + 1 / 2 + 1 / 3 + +1/n=O(logn)

Log series: log1+log2 + +logn=O(nlogn)

eg: bubble sorting

void bubblesort(int A[], int n){ for (bool sorted = false; sorted = !sorted; n--) for (int i=1; i<n; i++) if (A[i-1] > A[i]){ swap(A[i-1], A[i]); sorted = false; } }

Invariance: the largest k elements are in place after k-round exchange

Monotonicity: the scale of the problem is reduced to n-k after k-round exchange

Correctness: after n scans at most, the algorithm will terminate and give the correct solution

### (e) Iteration and recursion

Reduce and govern: divided into two sub problems, one is ordinary, the other is reduced;

//Array summation sum(int A[], int n){ return (n<1)? 0: sum(A, n-1) + A[n-1]; }

Solve sum(A, n)

Recursive base: sum(A, 0)

Recurrence formula: t (n) = t (n-1) + O (1) t (0) = O (1)

//Array inversion while(lo < hi) swap(A[lo++], A[hi--]);

Divide and rule: divided into two subproblems, roughly the same scale

//Array summation sum(int A[], int lo, int hi){ if (lo == hi) return A[lo]; int mi = (lo + hi) >> 1; return sum(A, lo, mi) + sum(A, mi + 1, hi) }

Solve sum(A, lo, hi)

Recursive base: sum(A, lo, lo)

Recurrence formula: t (n) = 2 * t (n / 2) + O (1) t (1) = O (1)

Complexity: T(n) = O(n)

eg: find the two largest integers from array interval A[lo, hi)

//Max2: iteration 1 void max2(int A[], int lo, int hi, int & x1, int & x2){ for (x1 = lo, int i = lo + 1; i < hi; i++) if (A[x1] < A[i]) x1 = i; for (x2 = lo, int i = lo + 1; i < x1; i++) if (A[x2] < A[i]) x2 = i; for (int i = x1 + 1; i < hi; i++) if (A[x2] < A[i]) x2 = i; }

Comparison times O(2n-3)

//Max2: iteration 2 void max2(int A[], int lo, int hi, int & x1, int & x2){ if (A[x1 = lo] < A[x2 = lo + 1]) swap(x1, x2); for(int i = lo + 2; i < hi; i++) if (A[x2] < A[i]) if(A[x1] < A[x2 = i]) swap(x1, x2); }

Best case: n - 1

Worst case: 2n - 3

//Max2: recursion + divide and conquer void max2(int A[], int lo, int hi, int & x1, int & x2){ if (lo + 2 == hi){/*...*/;return;} if (lo + 3 == hi){/*...*/;return;} int mi = (lo + hi)/2; int x1L, x1R; max2(A, lo, mi, x1L, x2L); int x1R, x2R; max2(A, mi, hi, x1R, x2R); if(A[x1L] > A[x1R]){ x1 = x1L; x2 = (A[x2L] > A[x1R])?x2L : x1R; }else{ x1 = x1R; x2 = (A[x1L] > A[x2R])?x1L : x2R; } }

Comparison times: 5n/3 - 2

### (f) Dynamic planning

eg1: Fibonacci sequence: fib(n) = fib(n-1) + fib(n-2)

int fib(n){return (2 > n)? n : fib(n-1) + fib(n -2)}

Inefficient because each recursive instance is repeatedly called O(2^n)

Solution A: tabulate the results of calculated instances for future reference

Solution B: dynamic planning, change top-down recursion to bottom-up iteration

f = 0; g = 1; while (0 < n--){ g = g + f; f = g - f; } return g;

T(n) = O(n), only O(1) space is needed

eg2: longest common subsequence (may have multiple; may have ambiguity)

For sequences A[0, n] and B[0, m], LSC has three cases:

(1) n = -1 or m = -1, null sequence ("")

(2) A[n] = 'X' = B[m], take LSC(A[0, n),B[0, m)) + 'X' (reduce and treat)

(3) A[n] ≠ B[m], then the longer one (divide and rule) is taken in LCS(A[0, n], B[0, m)) and LCS(A[0, n), B[0, m])

Best case: O(n + m)

Worst case: O (2 ^ n) (n = m)

As with Fibonacci series, there are a lot of repeated recursive examples. If dynamic programming is adopted, all subproblems can be calculated in O(nm) time, so it is only necessary to (1) list all subproblems into a table; (2) reverse the calculation direction, and calculate all terms in turn from LCS(A[0], B[0]).

# Chapter 2 vector

### (a) Interface and Implementation

size() - reports the size of the current vector (total number of elements)

Get ® - get the element with rank r

put(r,e) - replace the value with e whose rank is r element

insert(r,e) - e is inserted as the rank r element, and the original and subsequent elements move backward in turn

remove ® - delete the element with rank r and return the object stored in the element

Disorded () - determines whether all elements are in a non descending order

sort() - adjust the position of the elements so that they are arranged in a non descending order

find(e) - find the target element e

search(e) - finds the target element E and returns the element with the largest rank and no greater than e

Duplicate () - delete duplicate elements

Unify() - remove duplicate elements (ordered vectors)

traverse() - traverses the vector and processes all elements in a unified way. The processing method is specified by the function object

vector template class

typedef int Rank; #define DEFAULT_CAPACITY 3 template <typename T> class Vector{ private:Rank _size; int _capacity; T* _elem; protected: /*...Internal function*/ public: /*...Constructor*/ /*...Destructor*/ /*...Read only interface*/ /*...Writable interface*/ /*...Traversal interface*/ }

Structure and analysis

Vector(int c = DEFAULT_CAPACITY) { _elem = new T[_capacity = c]; _size = 0;} //default Vector(T const *A, Rank lo, Rank hi) //Array interval copy { copyFrom(A, lo, hi); } Vector(Vector<T> const& V, Rank lo, Rank hi) { copyFrom(V._elem, lo, hi); } //Vector interval replication Vector(Vector<T> const& V) { copyFrom(V._elem, 0, V._size); } ~Vector() { delete [] _elem; } //Free up internal space

Replication based construction

template <typename T> //T is the basic type or the overloaded copy operator '=' void Vector<T>::copyFrom(T* const A, Rank lo, Rank hi){ _elem = new T[_capacity = 2*(hi - lo)]; //Allocation space _size = 0; //Scale clearing while (lo < hi) //Elements in A[lo, hi) one by one _elem[_size++] = A[lo++]; //Copy to elem [0, hi LO) }

### (b) Extendable vector

Static space management: open up an internal array and use a physical space with continuous address. If the total capacity of the capacity is fixed, there are obvious shortcomings:

(1) Overflow: _elem [] is not enough for all elements, although the system still has enough space at this time

(2) Underflow: very few elements in elem [], filling factor less than 50%

Dynamic space management: expanding the capacity of internal arrays when overflowing

Implementation of capacity expansion algorithm

template <typename T> void Vector<T>::expand(){ if (_size < _capacity) return; _capacity = max(_capacity, DEFAULT_CAPACITY); T* oldElem = _elem; _elem = new T[_capacity <<= 1]; //Capacity doubling for(int i=0; i<_size;i++) //Copy original vector content _elem[i] = oldElem[i]; delete [] oldElem; }

Capacity increasing strategy

T* oldElem = _elem; _elem = new T[_capacity += INCREMENT]; //Append fixed size capacity

Worst case: in the space vector with initial capacity of 0, insert two elements (n = m * I > > continuously), and expand the capacity when inserting 1, I+1, 2I+1, 3I+1. The time cost of each expansion is 0, I, 2I , (m-1)I, total time O(n^2), cost sharing for each expansion O(n)

Capacity doubling strategy

T* oldElem = _elem; _elem = new T[_capacity <<= 1];

Worst case: in the full vector with an initial capacity of 1, insert two elements with n = 2 ^ m > > in succession. The first, second, fourth, eighth and sixteenth inserts need to be expanded. The time cost of copying the original vector in each expansion is 1,2,4,8 , 2^m = n, total time O(n), apportioned cost of each expansion O(1)

Average analysis complexity: according to the distribution of the occurrence probability of various operations of data structure, the corresponding cost is weighted average, and various possible operations are examined as independent events, which cuts off the correlation and coherence between operations, and cannot accurately evaluate the real performance of data structure and algorithm;

Allocation complexity: the data structure is continuously operated enough times, the total cost is allocated to a single operation, and a series of operations are considered as a whole

### (c) Disordered vector

Element access

Using V.get ® and V.put(r, e) interfaces to read and write vector elements is not as convenient and efficient as array A[r], so we need to overload the subscript operator []

template <typename T> T & Vector<T>::operator[](Rank r) const{return _elem[r];}

After that, the external V[r] corresponds to the internal v. "elem [R]

Right value: T x = V[r] + U[s] * W[t];

Left value: V[r] = (T)(2*x + 3)

insert

template <typename T> //Insert e as rank r element Rank Vector<T>::insert(Rank r, T const & e){ expand(); //Expand if necessary for (int i=_size; i>r; i--) _elem[i] = _elem[i-1]; //Subsequent elements move one unit backward _elem[r] = e; _size++; return r; //Return rank }

Interval deletion

template <typename T> int Vector<T>::remove(Rank lo, Rank hi){ if(lo == hi) return 0; //In consideration of efficiency, degradation is treated separately while(hi<_size) _elem[lo++] = _elem[hi++]; //[hi,_size) move the hi Lo bit forward in sequence _size = lo; shrink(); //Scale up, shrink if necessary return hi-lo; //Returns the number of deleted elements }

lookup

Unordered vector: T is the basic type of decidable or overloaded operator "= =" or "! ="

Ordered vector: T is a basic type that can be compared, or the overloaded operator "< or" >

template <typename T> Rank Vector<T>::find(T const & e, Rank lo, Rank hi) const{ while((lo < hi--) && (e!=_elem[hi])); return hi; }

Complexity input sensitivity, best O(1), worst O(n)

Single element deletion

It can be regarded as a special case of interval deletion: [r] = [r, r+1)

template <typename T> //Delete the element with rank r in the vector T Vector<T>::remove(Rank r){ T e = _elem[r]; //Backup deleted elements remove(r,r+1); return e; }

Q: Can I call remove ® repeatedly to implement remove(lo,hi)?

Each cycle time is directly proportional to the suffix length of the deleted interval = n - hi = O(n), and the number of cycles is equal to the interval width = hi - lo = O(n), which will lead to the complexity of O(n^2).

### (d1) ordered vector: uniqueness

Application example: local results of network search are de duplicated to form final report

Wrong Edition

template <typename T> int Vector<T>::deduplicate(){ int oldSize = _size; //Record the original scale Rank i = 1; while(i<size) (find(_elem[i], 0, i)<0)? //Find the same person in the prefix i++ : remove(i); return oldSize - _size; }

Correctness:

1. Invariance. In the prefix V[0, i) of the current element V[i], each element is different from each other, and the initial i=1 naturally holds.

2. Monotonicity. With the iteration of while, the prefix increases, the suffix decreases, and the algorithm iterates O(n) rounds at most.

Time complexity:

find() and remove() consume linear time in each iteration, so the total is O(n^2)

Further optimization:

1. Following the efficient version of unify(), the number of element moves is reduced to O(n) - but the number of comparisons is still O(n^2);

2. First mark the elements to be deleted, and then delete them uniformly - the stability is maintained, but the search length is longer, resulting in more comparison operations;

3. V.sort(). Unify(): concise realization of optimal O(nlogn)

ergodic

Traverse the vector, and uniformly implement visit operation for each element

Using function pointer mechanism, read-only or local modification

template <typename T> void Vector<T>::traverse(void (*visit)(T&)) //Function pointer {for (int i = 0; i < _size; i++) visit(_elem[i]);}

Global modification using function object mechanism

template <typename T> template <typename VST> void Vector<T>::traverse(VST& visit) //Function object {for (int i = 0; i < _size; i++) visit(_elem[i]);}

Example: add one to all elements in the vector

First, implement a class that adds one to a single T-type element

template <typename T> //Suppose T can directly increment or overload operators struct Increase{ //Function object: implemented by overloaded operator () virtual void operator()(T & e){e++;} //add one-tenth }

Thereafter

template <typename T> void increase(Vector<T> & V){ V.traverse(Increase<T>()); //Basic operation traversal vector }

Order and its discrimination

The number of adjacent reverse order pairs, which can be used to measure the degree of reverse order of vectors

template <typename T> int Vector<T>::disordered() const{ int n = o; for (int i = 1; i < _size; i++) //Check each pair of adjacent elements one by one n += (_elem[i-1] > _elem[i]); //Count in reverse order return n; //Vector order if and only if n = 0 } //If it is only necessary to judge whether it is in order, it can be terminated immediately after the first encounter of reverse order pair

Uniqueness

Inefficient algorithm

Observation: in an ordered vector, repeated elements must be adjacent to each other to form an interval, each interval only needs to retain a single element

template <typename T> int Vector<T>::uniquify(){ int oldSize = _size; int i = 0; (_elem[i] == _elem[i+1]) ? remove(i+1); i++; return oldSize - _size; //Total number of deleted elements }

Complexity: the running time mainly depends on the while loop, with a total of n-1 times; in the worst case, remove() is called every time, taking O(n-1) ~ O(1), accumulating O(n^2). Although find() is omitted, it is the same as the duplicate () of the unordered vector.

Introspection: the root of low efficiency lies in that the same element can be moved forward many times as the successor of the deleted element. If the same element is deleted in batches in the unit of repetition interval, the performance will be improved.

Efficient algorithm

template <typename T> int Vector<T>::uniquify(){ Rank i = 0, j = 0; //Rank of each pair of mutually different adjacent elements while (++j < _size) //Scan one by one until the last element //Skip the same person and move forward to the right next to the different elements if (_elem[i]!=_elem[j]) _elem[++i]=_elem[j]; _size = ++i; shrink(); //Directly cut off the redundant elements in the tail return j-i; //Total number of deleted elements }

Complexity: n-1 iterations in total, each constant time, cumulative O(n) time

### (d2) ordered vector: binary search

Unified interface

template <typename T> //Search algorithm unified interface Rank Vector<T>::search(T const & e, Rank lo, Rank hi) const{ return (rand() % 2)? //Randomly selected according to 50% probability binSearch(_elem, e, lo, hi) //Binary search or : fibSearch(_elem, e, lo, hi); //Fibonacci search method }

Question: how to deal with special situations? For example, the target element does not exist, or there are multiple target elements

Semantic Convention: at least it should be convenient to maintain the ordered vector itself V.insert(1 + V.search(e), e). Even if it fails, it should give the appropriate insertion position of new elements. If it is allowed to repeat elements, each group should be arranged according to its insertion order.

Convention: in the ordered vector interval V[lo,hi), determine the last element (rank) not greater than e

If - ∞ < e < v [Lo], return lo-1 (left sentry)

If v [hi - 1] < e < + ∞, then hi - 1 is returned (end element = right guard next to)

Version A:

Reduce and Conquer: any element x = S[mi] is bounded, and the search interval can be divided into three parts. S [lo, MI) < = S[mi] < = s (MI, HI). S[mi] is called the pivot point. After at most two comparisons, it can hit, or reduce the problem size to half

template <typename T> static Rank binSearch(T* A, T const& e, Rank lo, Rank hi){ while(lo < hi){ Rank mi = (lo + hi) >> 1; //Center point as pivot point if (e < A[mi]) hi = mi; else if (A[mi] < e) lo = mi + 1; else return mi; //Hit at mi } return -1; //Search failed }

Linear recursion: T(n) = T(n/2) + O(1) = O(logn), which is much better than sequential search. The recursion depth is O(logn). Each recursion instance takes O(1).

Search length: evaluate the performance of search algorithm more precisely, and check the comparison times of key codes, i.e. search length

### (d3) ordered vector: Fibonacc search

Idea: the key comparison times before turning to the left and right branches are not equal, but the recursion depth is the same. If we can compensate the imbalance of turning cost through the imbalance of recursion depth, the average search depth should be further shortened.

For example, n = fib(k) - 1, MI = FIB (k-1) - 1 can be taken, so the length of the front and back sub vectors are FIB (k-1) - 1, FIB (K-2) - 1, respectively

template <typename T> static Rank fibSearch(T* A, T const & e, Rank lo, Rank hi){ Fib fib(hi - lo); //Create Fib series while (lo < hi){ while (hi - lo < fib.get()) fib.prev(); //How many iterations at most? //Determine the pivot point of Fib(k) - 1 by looking forward Rank mi = lo + fib.get() - 1; if (e < A[mi]) hi = mi; else if (A[mi] < e) lo = mi + 1; else return mi; } return -1; }

### (d4) ordered vector: binary search (improved)

Improvement idea: each iteration only makes a key comparison, and all branches only have two directions

template <typename T> static Rank binSearch(T* A, T const & e, Rank lo, Rank hi){ while(1 < hi - lo){ Rank mi = (lo + hi) >> 1; (e < A[mi]) ? hi = mi : lo = mi; //[lo,mi) or [mi,hi) } return (e == A[lo])? lo:-1 }

### (d5) ordered vector: interpolation search

Semantic Convention: search() interface convention, return the last element no greater than E. Only by fulfilling this agreement can algorithms be effectively supported, such as V.insert(1+V.search(e), e)

(1) When more than one hit element is hit, the last (rank maximum) must be returned;

(2) In case of failure, the largest (including sentinel [lo - 1]) less than e should be returned

template <typename T> static Rank binSearch(T* A, T const& e, Rank lo, Rank hi){ while (lo < hi){ Rank mi = (lo + hi) >> 1; (e < A[mi])? hi = mi : lo = mi + 1; } //At exit, A[lo = hi] is the smallest element greater than e return --lo; //lo - 1 is the maximum rank of the element not greater than e }

When the search interval is shortened to 0 instead of 1, the algorithm ends; when the right sub vector is transferred, the left boundary is taken as mi+1 instead of mi; whether it succeeds or not, the returned rank strictly conforms to the semantic convention of the interface.

### (e) Bubbling order

Sorter: unified entry

template <typename T> void Vector<T>::sort(Rank lo, Rank hi){ switch (rand() % 5){ case 1 : bubbleSort(lo, hi); break; //bubble sort case 2 : selectionSort(lo, hi); break; //Selection sort case 3 : mergeSort(lo, hi): break; //Merge sort case 4 : heapSort(lo, hi);break; //Heap sorting (Chapter 10) default : quickSort(lo, hi);break; //Quick sort (Chapter 12) } }

bubble sort

template <typename T> void Vector<T>::bubbleSort(Rank lo, Rank hi) { while (!bubble(lo,hi--)); } //Scan and exchange one by one until the whole sequence template <typename T> bool Vector<T>::bubble(Rank lo, Rank hi){ bool sorted = true; //Overall orderly sign while (++lo < hi) //Check each pair of adjacent elements from left to right if (_elem[lo - 1] > _elem[lo]){ //If in reverse order sorted = false; swap(_elem[lo - 1], _elem[lo]); //exchange } return sorted; //Return to ordered flag }//When the disorder order is limited to [0, root sign n), it still needs 3 / 2 of O(n) time: O(n.r)

Improvement: the previous version of the logical flag sorted, changed to rank last

template <typename T> void Vector<T>::bubbleSort(Rank lo, Rank hi) { while (lo < (hi = bubble(lo,hi)));} template <typename T> Rank Vector<T>::bubble(Rank lo, Rank hi){ Rank last = lo; //The rightmost reverse pair is initialized to [lo - 1, lo] while (++lo < hi) //Check each pair of adjacent elements from left to right if (_elem[lo - 1] > _elem[lo]){//If in reverse order last = lo; //Update the position of the rightmost reverse pair swap(_elem[lo - 1], _elem[lo]); } return last; }

Comprehensive evaluation:

(1) The efficiency is the same as the version of the first chapter for integer array, the best O(n), the worst O(n^2);

(2) The stability (relative order of input and output sequences) of the algorithm remains unchanged when the repeated elements are input;

(3) In the bubbling sequence, the relative positions of elements a and b change, there is only one possibility: after exchanging with other elements respectively, they are close to each other until they are adjacent to each other. In the next round of scanning exchange, they exchange positions because of reverse order.

### (f) Merge sort

Principle: divide and conquer strategy (universal for vector and list) run time O(nlogn)

Sequence bisection / / O(1)

Subsequence recursive sort / / 2 × T(n/2)

Merge ordered subsequences / / O(n)

template <typename T> void Vector<T>::mergeSort(Rank lo, Rank hi){ if (hi - lo < 2) return; //Natural order of single element interval int mi = (lo + hi) >> 1； //Bounded by midpoint mergeSort(lo, mi); //Sort first half mergeSort(mi, hi); //Sort the second half merge(lo, mi, hi); //Merge }

Two way merge principle: merge two ordered sequences into an ordered sequence s [lo, hi] = s [lo, MI) + s [MI, HI)

template <typenamee T> void Vector<T>::merge(Rank lo, Rank mi, Rank hi){ T* A = _elem + lo; //The combined sub vector a [0, hi - Lo) = ﹣ elem [lo, HI) int lb = mi - lo; T* B = new T[lb]; //Front sub vector B [0, LB) = ﹣ elem [lo, MI) for (Rank i = 0; i < lb; B[i] = A[i++]); //Copy previous sub vector B int lc = hi - mi; T* c = _elem + mi; //Posterior sub vector C [0, LC) = ﹣ elem [MI, HI) for (Rank i = 0, j = 0, k = 0; (j <lb) || (k < lc)；){//The smaller of B[j] and C[k] goes to the end of A if ((j < lb) && (lc <= k || (B[j] <= C[k]))) A[i++] = B[j++]; //C[k] is no longer small if ((k < lc) && (lb <= j || (C[k] < B[j]))) A[i++] = C[k++]; //B[j] no longer or larger }//The loop is compact, but not as efficient as split processing delete [] B; //Release temporary space B }

Complexity analysis: the running time of the algorithm is mainly consumed in for loop, and there are two control variables J and K, the initial j=0, k=0; the final j=lb, k=lc; that is, j + k = LB + LC = hi Lo = n.

Observe that after each iteration, at least one of J and k will be added (at least one of j+k will be added), so the total iteration of merge() does not exceed * * O(n) * * times, and the cumulative linear time. This conclusion does not contradict the next generation of nlogn, because B and C are in order.

Note: the subsequence to be merged does not have to be equal in length. lb ≠ lc, mi ≠ (lo + hi) / 2 are allowed. This algorithm and conclusion can also be applied to another kind of sequence list.

# Chapter 3 list

### (a) Interface and Implementation

According to whether to change the data structure, all operation modes are roughly divided into: (1) static: read only, the content and composition of the data structure are generally unchanged, get and search; (2) dynamic: to be written, the part or the whole of the data structure will change, insert and remove.

The storage and organization of data elements: (1) static: the physical storage order of data elements is strictly consistent with its logical order, which can support static operations of colleges and universities, such as vectors; (2) dynamic: the physical space dynamically allocated and recycled for each data element, logically adjacent elements record each other's physical addresses, logically forming a whole, which can support efficient dynamic operations , such as the list.

List elements are called nodes, and adjacent nodes are called each other's predecessors or successors. If they exist, they are unique. If there is no predecessor / successor, the unique node is the first / last node.

Vector: rank based access, O(1) time to determine physical address

List: location-based access, using mutual references between nodes

ADT interface (ListNode)

pred() //Location of the predecessor node of the current node succ() //Location of the current node's successor node data() //Data object saved by current node insertAsPred(e) //Insert the precursor node, save the referenced object e, and return to the new node location insertAsSucc(e) //Insert the successor node, save the introduced object e, and return to the new node location

List node: ListNode template class

#define Posi(T) ListNode<T>* //List node location template <typename T> //For simplicity, it's completely open and not over packaged struct ListNode{//List node template class (in the form of double linked list) T data; //numerical value Posi(T) pred; //precursor Posi(T) succ; //Successor ListNode(){} //Construction for header and tracer ListNode(T e, Posi(T) p = NULL, Posi(T) s = NULL):data(e),pred(p),succ(s){}//default constructor Posi(T) insertAsPred(T const& e);//Insert before Posi(T) insertAsSucc(T const& e);//Post insertion }

Other ADT interfaces

size() //Current size of report list (total number of nodes) first(),last() //Return the position of the first and last nodes insertAsFirst(e),insertAsLast(e) //Insert e as the first and last nodes insertBefore(p,e),insertAfter(p,e) //Taking e as the direct precursor and subsequent insertion of node p remove(p) //Delete the node at position p and return its reference disordered() //Determine whether all nodes have been arranged in non descending order sort() //Adjust the position of each node in non descending order find(e) //Find target element e, return NULL on failure search(e) //Find e and return the node with the largest rank (ordered list) deduplicate(),uniquify() //Eliminate duplicate nodes traverse() //Traversal list

Lists: List template classes

#include "listNode.h" //Import list node class template <typename T> class List{//List template class private: int _size; Posi(T) header; Posi(T) trailer;//Head and tail sentry protected: /*...Internal function */ public: /*...Constructor, destructor, read-only interface, writable interface, traversal interface/ }

structure

template <typename T> void List<T>::init(){ header = new ListNode<T>; //Create a sentinel node trailer = new ListNode<T>; //Create tail sentry node header->succ = trailer; header->pred = NULL; //interconnection trailer->pred = header; trailer->succ = NULL;//interconnection _size = 0;//Record size }

### (b) Unordered list

From rank to position, imitate vector access by rank, overload subscript operator

template <typename T> T List<T>::operator[](Rank r) const{ Posi(T) p = first(); //Starting from the head node while(0 < r--) p = p->succ; //The r-th node in sequence return p->data;//Target node }//The rank of any node, that is, the total number of its precursors

Find: find the last one equal to e in the n (true) precursors of node p (which may be a tracer)

template <typename T> //When called from outside, 0 < = n < = rank (P) < u size Posi(T) List<T>::find(T const & e, int n, Posi(T) p) const{ while(0 < n--) //From right to left, compare the precursor of p with e one by one if( e == ( p = p->pred )->data) return p; //Until hit or out of range return NULL; //If it goes beyond the left boundary, it means the search fails }//The existence of header makes the processing more concise

insert

template <typename T> Posi(T) List<T>::insertBefore(Posi(T) p, T const& e) { _size++; return p->insertAsPred(e);} //e as the precursor of p

template <typename T> //Pre insertion algorithm (symmetry of post insertion algorithm) Posi(T) x = new ListNode(e, pred, this); //Create (100 times) pred->succ = x; pred = x; return x; //Establish a connection and return to the location of the new node

Replication based construction

template <typename T>//Basic interface void List<T>::copyNodes(Posi(T) p, int n){ //O(n) init();//Create and initialize the head and tail sentinel nodes while (n--) //Insert the n-term from p as the end node in turn {insertAsLast(p->data); p = p->succ;} }

delete

template <typename T> //Delete the node at legal position p and return its value T List<T>::remove(Posi(T) p){ //O(1) T e = p->data; //Backup the value of the node to be deleted (set type T to be assigned directly) p->pred->succ = p->succ; p->succ->pred = p->pred; delete p; _size--; return e; //Return backup value }

Deconstruction

template <typename T> List<T>::~List() //List deconstruction {clear();delete header; delete trailer;} //Clear the list and release the head and tail sentry nodes template <typename T> int List<T>::clear(){//clear list int oldSize = _size; while(0 < _size) //Repeatedly eliminate the first node until the list is empty remove(header->succ); return oldSize; } //O(n), linearly proportional to list size

Uniqueness

template <typename T> int List<T>::deduplicate(){//Eliminate duplicate nodes in unordered list if (_size < 2) return 0; //Trivial list without repetition int oldSize = _size; //Record size Posi(T) p = first(); Rank r = 1; //p from the first node while (trailer != (p = p->succ)){//Successively to the end node Posi(T) q = find(p->data, r, p); //In the r (true) antecedents of p, find the same q ? remove(q) : r++; //If it does exist, delete it, otherwise rank increment } return oldSize - _size; //List size change, that is, the total number of deleted elements }//Correctness is the efficiency analysis method and conclusion, the same as vector:: duplicate()

### (c) Ordered list

Uniqueness

template <typename T> int List<T>::uniquify(){//Eliminating duplicate elements in batches if( _size < 2) return 0; //Trivial list without repetition int oldSize = _size; //Record the original scale ListNodePosi(T) p = first(); ListNodePosi(T) q; //p is the starting point of each section, q is its successor while ( trailer != ( q = p->succ)) //Repeatedly check the nearest node pair (p,q) if (p->data != q->data) p = q; //If they are different, turn to the next section; else remove(q); //Otherwise, delete the latter return oldSize - _size; //Change in scale, i.e. total number of deleted elements } //O(n)

lookup

template <typename T> //Among the n (true) precursors of node p in the sequence table, the last one not greater than e is found Posi(T) List<T>::search(T const & e, int n, Posi(T) p) const{ while (0 <= n--) //For the nearest n precursors of p, right to left if(((p = p->pred ) -> data) <= e) break; //One by one comparison return p; //Returns the location where the search ends until the hit, value, or range is out of range }//The best O(1), the worst O(n); the average O(n) in equal probability is proportional to the interval width

### (d) Select sort

Implementation of selectionSort: select and sort n consecutive elements in the list starting from position p

template <typename T> void List<T>::selectionSort(Posi(T) p, int n){ Posi(T) head = p->pred; Posi(T) tail = p; //Head, tail for(int i = 0; i < n; i++) tail = tail->succ;//head/tail could be a head/tail sentry while (1 < n) {//Repeatedly find out the largest from the non trivial to be sorted interval and move to the front of the ordered interval insertBefore(tail, remove(selectMax(head->succ,n))); tail = tail->pred; n--; //The range of the interval to be sorted and the ordered interval are updated synchronously } }

selectMax implementation: select the largest one from the n elements starting from position p, 1 < n

template <typename T> Posi(T) List<T>::selectMax(Posi(T) p, int n){ //O(n) Posi(T) max = p; //The maximum is tentatively p for (Posi(T) cur = p; 1 < n; n--) //Subsequent nodes are compared with max one by one if( !lt((cur = cur->succ)->data, max->data)) //If > =max max = cur; //Update maximum element location record return max; //Return to maximum node location }

Performance: there are n iterations in total. In the k-th iteration, selectMax() is O(n-k), remove() and insertBefore() are both O(1), so the overall complexity is O(n^2). Element moving operation is far less than bubble sorting, O(n^2) mainly comes from element comparison operation.

### (e) Insert sort

Consider the sequence as two parts: Sorted + Unsorted, i.e. L[0, r) + L[r, n)

Initialization: | S| = r = 0 / / empty sequence does not matter in order

Iteration: focus on and deal with e = L[r], determine the appropriate position in S and insert e to get ordered L[0,r] / / insertion of ordered sequence

Invariance: with the increase of R, L[0,r) is always ordered until r=n, L is the overall order

insertionSort implementation: to sort the consecutive n elements starting from position p in the list, valid (p) & & rank (p) + n < = size

template <typename T> void List<T>::insertionSort(Posi(T) p, int n){ for (int r = 0; r < n；r++){//Introduce each node one by one, and get Sr+1 from SR insertAfter(search( p->data, r, p),p->data); //Find + insert p = p->succ; remove( p->pred); //Move to next node }//n iterations, each O(r+1) }//Only O(1) auxiliary space is used, which belongs to local algorithm

Average performance

- Assuming that the values of each element follow the uniform and independent distribution, how many element comparisons should be made on average?
- Check the moment when L[r] is just inserted. Which element in the ordered prefix L[0,r] is the previous L[r]?
- All the r+1 elements are possible, and the probability is equal to 1/(r+1). Therefore, in the iteration just completed, the mathematical expectation of the time spent in introducing S[r] is [r+(r-1) + 3+2+1+0]/(r+1)+1=r/2+1, overall mathematical expectation = [0 + 1 + +(n-1)]/2+1=O(n^2)