MPI + OpenMP for quick sorting

subject

Combined with an example in the field of high-performance parallel computing, use MPI + OpenMP parallelization to write code and write a report to test the parallel efficiency.

method

  1. Define quick sort quickSort function and sort total NUM;
  2. Generate a two-dimensional inverse array with size * calculateSize, where MPI is used_ Comm_ Size get size, size * calculateSize = NUM;
  3. MPI_Comm_rank obtains the process ID through the communication function MPI_Scatter sends each sub matrix to each sub process, then each sub process performs quick sorting through quickSort function, and uses OpenMP's sections collection function to divide threads;
  4. Pass each child process through the communication function MPI_Gather recovers to process 0, and then sorts the array in process 0 that has recovered all the numbers for the last time, that is, merge and sort the size path to get the sorted array;
  5. Using MPI_ The wtime() function performs timing.

code

#include "mpi.h"
#include "omp.h"
#include <iostream>
#define NUM 12800
using namespace std;

void swap(int &a, int &b)  //Exchange function
{
	int temp;
	temp = a;
	a = b;
    b = temp;
}

void printArray(int *array, int len)  //Output array
{
    for (int i = 0; i < len; i++)
        cout << array[i] << " ";
    cout << endl;
}

void quickSort(int *array, int l, int r)  //Quick sort
{
    int i, m;
    if (l >= r) return;
    m = l;
    for (i = l + 1; i <= r; i++)
        if (array[i] < array[l])
            swap(array[++m], array[i]);
		
	swap(array[l], array[m]);
	#Pragma OMP parallel sections / / sections uses two threads, that is, two parallel blocks
    {
		#pragma omp section
        quickSort(array, l, m - 1);
		#pragma omp section
        quickSort(array, m + 1, r);
    };
}

int main(int argc, char *argv[])
{
	double time;  //Time stamp
	time = MPI_Wtime();  //Returns the elapsed clock time on the calling processor
	
	MPI_Init(&argc, &argv);	 //MPI initialization
	int size;  //The number of processes contained in the communication domain comm
	MPI_Comm_size(MPI_COMM_WORLD, &size); //Returns the total number of MPI processes in the specified communicator
	int calculateSize = (int)(NUM / size);  //Number of calculations allocated per process
    int processId;  //id of the process in comm
    MPI_Comm_rank(MPI_COMM_WORLD, &processId);  //Get MPI process number
    
    int arr[size][calculateSize];  //Initial array
	int arrMerge[size][calculateSize];  //Merge array
    int arrEach[calculateSize];  //size one-third of the number of initial arrays
    int figure = size * calculateSize;  //Used to assign a value to an array
    int arrFinal[size * calculateSize];  //Complete array of ordered

    for (int i = 0; i < size; i++)  //Generate array
	{
        for (int j = 0; j < calculateSize; j++) {
            arr[i][j] = figure--;
        }
    }
    
	//In parallel sorting, the array is divided into size blocks and then sorted by each sub process
	MPI_Scatter(arr, calculateSize, MPI_INT, arrEach, calculateSize, MPI_INT, 0, MPI_COMM_WORLD);	
	//Sorting in each process
	quickSort(arrEach, 0, calculateSize - 1);
	//Merge the ordered arrays of child processes into one
	MPI_Gather(arrEach, calculateSize, MPI_INT, arrMerge, calculateSize, MPI_INT, 0, MPI_COMM_WORLD);
	
	cout << "The current array of " << processId << ":" << endl;  //Output the sorting result of the current process
    printArray(arrEach, calculateSize);
		
	//The array is used to record the number of stored selections
	int numTimes[size] = {0};
	//An array used to store the maximum number of four when merging arrays
	int arrNumMax[size];
	
	for (int i = calculateSize * size - 1; i >= 0; i--)  //Merge into an array in ascending order
	{
		for (int j = 0; j < size; j++)  //Find szie the largest number in each array
		{
			if (numTimes[j] >= calculateSize)
				arrNumMax[j] = 0;
			else
				arrNumMax[j] = arrMerge[j][calculateSize - numTimes[j] - 1];
		}

		int maxNum = arrNumMax[0];  //Find the largest number among the size numbers, that is, size path merging
		for (int k = 1; k < size; k++)
		{
			if (arrNumMax[k] > maxNum)
				maxNum = arrNumMax[k];
		}

		for (int n = 0; n < size; n++)  //Array with the largest number of tags
		{
			if (maxNum == arrNumMax[n])
			{
				numTimes[n] = numTimes[n] + 1;
				break;
			}
		}
		arrFinal[i] = maxNum;  //Save to array
	}
	
	if (!processId)  
	{
		time = MPI_Wtime() - time;  //End timing
		cout << "The final array :" << endl;  //Output array
		printArray(arrFinal, size * calculateSize);
		cout << "NUM = "<< NUM << "\t" << "size = " << size << "\t" << "time = " << time * 1000 << " ms" << endl;
	}
	
    MPI_Finalize();  //Terminate MPI
    return 0;
} 

Result analysis

1. Simple output test: a total of 100 data are sorted, and 10 processes are used, of which the number of threads is export OMP_NUM_THREADS=2.

$ mpicc -o quicksort -fopenmp quicksort.cpp -lstdc++
$ time mpirun -np 10 ./quicksort
The current array of 2:
71 72 73 74 75 76 77 78 79 80 
The current array of 5:
41 42 43 44 45 46 47 48 49 50 
The current array of 6:
31 32 33 34 35 36 37 38 39 40 
The current array of 7:
21 22 23 24 25 26 27 28 29 30 
The current array of 9:
1 2 3 4 5 6 7 8 9 10 
The current array of 1:
81 82 83 84 85 86 87 88 89 90 
The current array of 4:
51 52 53 54 55 56 57 58 59 60 
The current array of 0:
91 92 93 94 95 96 97 98 99 100 
The final array :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 
Parallel running time = 54.8629 ms
The current array of 3:
61 62 63 64 65 66 67 68 69 70 
The current array of 8:
11 12 13 14 15 16 17 18 19 20

Result analysis: the program will automatically distribute the sorting tasks evenly according to the number of input processes. Therefore, a two-dimensional array is used to facilitate the distribution and collection of data. The initial array is in reverse order. After sorting, each process is merged and collected, and finally becomes an ascending array.

2. Efficiency test: when the total number of sorting NUM=10000, test the impact of different process numbers on time.


Result analysis: in this experiment, the number of threads is set to 2; Due to communication function MPI_ Due to the scatter limit, the number of processes must be divisible by 10000. According to the experimental data, when the sorting number is 10000, the process number of 5 is the fastest and has good efficiency; With the increasing number of processes, such as 50, each process needs to sort 2000 data, making the communication cost greater than the sorting cost, so the time rebounds.

summary

In large-scale parallel between nodes, because the amount of communication between nodes increases in square terms, the bandwidth will soon appear insufficient. Therefore, the parallel part is written with MPI+OpenMP hybrid, that is, each MPI process executes multiple OpenMP threads. Because OpenMP does not need inter process communication and directly exchanges information through memory sharing, it can significantly reduce the information required by the program.

Tags: Algorithm quick sort OPENMP MPI

Posted on Sun, 31 Oct 2021 06:18:10 -0400 by Charles Wong