#pragma omp parallel default(shared) private(beta,pi)
Introduction to OpenMP
OpenMP (Open Multi-Processing) is an application
programming interface (API) that supports multi-platform shared memory
multiprocessing programming in C
, C++
, and Fortran
. It consists of a set of compiler directives, library routines, and environment variables that influence the code's run-time behavior.
OpenMP uses a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer. An application built with the hybrid model of parallel programming can run on a computer cluster using both OpenMP and Message Passing Interface (MPI), such that OpenMP is used for parallelism within a (multi-core) node while MPI is used for parallelism between nodes.
History
OpenMP is managed by the nonprofit technology consortium OpenMP
Architecture Review Board (or OpenMP ARB), jointly defined by a group of
major computer hardware and software vendors. They publised its first
API specifications in October 1997. The latest version specification
(4.5) released in 2015 adds/ improves supports for accelerators, atomics, error handling, thread affinity, tasking extensions, user defined reduction, SIMD support, and so on.
All the latest OpenMP versions are integrated with the recent GNU compilers (4.9
and above), no need to install them separately!
How it works?
OpenMP programs accomplish parallelism exclusively through the use of threads.
A thread of execution is the smallest unit of processing that can be
scheduled by an operating system.Threads exist within the resources of a
single process
. Without the process, they cease to exist. Typically, the number of threads match the number of machine processors/cores. However, the actual use of threads is up to the application.
OpenMP is an explicit (not automatic) programming model, offering the programmer full control over parallelization. It uses the fork-join model of parallel execution in addition to thread based, compiler directive program flows. All OpenMP programs begin as a single process (master thread). The master thread executes sequentially until the first parallel region construct is encountered. the master thread then creates (fork) a team of parallel threads. When the team threads complete the statements in the parallel region construct, they synchronize (join) and terminate, leaving only the master thread.
OpenMP components
Three primary OpenMP components are:
- Compiler directives
- Runtime library routines
- Environment Variables
The OpenMP directives appear as comments in your
source code and are ignored by compilers unless you tell them otherwise.
Directives have the following syntax: sentinel directive-name [clause, ...]
. See an example, below
OpenMP provides several environment variables for controlling the execution of parallel code at run-time. For example to set 8
threads while running your code, you would do the following:
export OMP_NUM_THREADS=8
OpenMP cluases
The table below summarizes which clauses are accepted by which OpenMP directives.
Clause | Description |
copyin | Allows threads to access the master thread's value, for a threadprivate variable. |
copyprivate | Specifies that one or more variables should be shared among all threads. |
default | Specifies the behavior of unscoped variables in a parallel region. |
firstprivate | Specifies that each thread should have its own instance of a variable, and that the variable should be initialized with the value of the variable, because it exists before the parallel construct. |
if | Specifies whether a loop should be executed in parallel or in serial. |
lastprivate | Specifies that the enclosing context's version of the variable is
set equal to the private version of whichever thread executes the final
iteration (for-loop construct) or last section (#pragma sections). |
nowait | Overrides the barrier implicit in a directive. |
num_threads | Sets the number of threads in a thread team. |
ordered | Required on a parallel for-loop statement if an ordered directive is to be used in the loop. |
private | Specifies that each thread should have its own instance of a variable. |
reduction | Specifies that one or more variables that are private to each thread are the subject of a reduction operation at the end of the parallel region. |
schedule | Applies to the for directive. |
shared | Specifies that one or more variables should be shared among all threads. |
OpenMP worksharing
A work-sharing construct divides the execution of the enclosed code region among the members of the team that encounter it, but do not launch new threads. There is no implied barrier upon entry to a work-sharing construct, however there is an implied barrier at the end of a work sharing construct. The syntax of the workshare construct is as follows:
!$omp workshare
structured-block
!$omp end workshare [nowait]
There are three type of constructs:
Do/for-loop
Shares iterations of a loop across the team. Represents a type of "data parallelism".
#pragma omp for [clause ...] newline
schedule (type [,chunk])
ordered
private (list)
firstprivate (list)
lastprivate (list)
shared (list)
reduction (operator: list)
collapse (n)
nowait
for_loop
Sections
Breaks work into separate, discrete sections. Each section is executed by a thread. Can be used to implement a type of "functional parallelism".
#pragma omp sections [clause ...] newline
private (list)
firstprivate (list)
lastprivate (list)
reduction (operator: list)
nowait
{
#pragma omp section newline
structured_block
#pragma omp section newline
structured_block
}
Single
Serializes a section of code.
#pragma omp single [clause ...] newline
private (list)
firstprivate (list)
nowait
structured_block
RestrictionsThe following restrictions apply to worksharing constructs: a) Each worksharing region must be encountered by all threads in a team or by none at all, unless cancellation has been requested for the innermost enclosing parallel region. b) The sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team.
Useful resource
Please download the file below: