OpenMP-based Parallel Computing
All of my blogs are for personal knowledge accumulation only. Plagiarism and plagiarism for commercial and non-commercial purposes are prohibited. If there are any errors, you are welcome to criticize and correct them. We will exchange discussions and grow together.
Preface: This term, I chose the course GPU Parallel Computing to learn the hardware basis of parallel computing, parallel program design ideas and methods, parallel algorithm design and analysis, and so on, to improve personal coding ability and the ability of bottom hardware to connect with upper software. Let's start!
Parallel programs based on OpenMP(Open Multi-Processing) add compilation instructions (similar to statements with "#" in C/C++) based on the original serial program.Thus, programming tasks are small and easy to get started. Parallel behavior relies mainly on the underlying runtime support libraries. Programmers do not need to focus on these, they only need to consider the division of parallel areas and the design of parallel algorithms.
First, when using OpenMP (hereinafter referred to as "omp"), the header file you need to include is shown in the code block below. The first header file contains omp's compilation instructions, and the second header file is meant to test the time of parallel computing using the data type clock_t.
#include<omp.h> #include<time.h>
Parallel omp-based statements start with the following key phrases:
#pragma omp
The following statement indicates that the code snippet has entered parallel mode:
#pragma omp parallel
Using the main function below, you can test how many threads the CPU has created to execute parallel snippets without specifying the number of threads. Each line of code in a parallel block is executed repeatedly by multiple threads.
int main() { #pragma omp parallel cout << "hello world" << endl; return 0; }
Here are a few simple and common sets of parallel test programs to understand the parallel programming method based on omp.
Opening multiple threads in a loop statement requires the following statement, and the two statements are equivalent.
#pragma omp parallel for
#pragma omp parallel { #pragma omp for for(...) { ... } ... }
Notably,
1. In the second set of code snippets, parallel cannot be achieved without #pragma omp parallel, and only a combination of the two will be effective;
2. #pragma omp for statement has scope only in the current loop and does not work for other loops in the parallel area. If all for loops in the parallel block require parallel computation, the #pragma omp for statement should be added before each set of loops starts.
3. Threads participating in parallel computing must be guaranteed to be independent, that is, the loops are not related to each other, and the latter loop is not dependent on the results of the previous loops.
4. This kind of parallel mode belongs to data level parallel.
If the number of threads specified is n,
#pragma omp parallel num_threads(n) { #pragma omp for for(...) { ... } ... }
Test Program 1: Parallel Output Test
#include<iostream> #include<Windows.h> #include<omp.h> using namespace std; int main() { #pragma omp parallel { #pragma omp for for (int j = 0; j < 1000; j++) { Sleep(5000); cout << "j=" << j << ", Thread_id = " << omp_get_thread_num() << endl; } } return 0; }
Test Program 2: Time comparison between serial and parallel accumulations
#include<iostream> #include<time.h> #include<omp.h> using namespace std; void SumNumber() { int sum = 0; for(int i = 0; i < 10000000; i++) sum ++; } void SerialTest() { clock_t t1 = clock(); for (int i = 0; i < 100; i++) SumNumber(); clock_t t2 = clock(); cout << "Serial Time: " << t2 - t1 << endl; } void ParallelTest() { clock_t t1 = clock(); #pragma omp parallel for for (int i = 0; i < 100; i++) SumNumber(); clock_t t2 = clock(); cout << "Parallel Time: " << t2 - t1 << endl; } int main() { SerialTest(); ParallelTest(); return 0; }
void ParallelTest() { clock_t t1 = clock(); #pragma omp parallel num_threads(5) { #pragma omp for for (int i = 0; i < 100; i++) SumNumber(); } clock_t t2 = clock(); cout << "Parallel Time: " << t2 - t1 << endl; }
sections structure is a common task partitioning statement in omp. It divides the code structure of parallel areas into discrete code blocks, each of which is executed by a thread and belongs to task-level parallelism.
#pragma omp parallel sections { #pragma omp section { ... } #pragma omp section { ... } ... }
Test Program 3: Addition of Vectors