OpenMP

Lesson 49/71

Introduction to OpenMP

OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran. It consists of a set of compiler directives, library routines, and environment variables that influence the code's run-time behavior.

OpenMP uses a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer. An application built with the hybrid model of parallel programming can run on a computer cluster using both OpenMP and Message Passing Interface (MPI), such that OpenMP is used for parallelism within a (multi-core) node while MPI is used for parallelism between nodes.

History

OpenMP is managed by the nonprofit technology consortium OpenMP Architecture Review Board (or OpenMP ARB), jointly defined by a group of major computer hardware and software vendors. They publised its first API specifications in October 1997. The latest version specification (4.5) released in 2015 adds/ improves supports for accelerators, atomics, error handling, thread affinity, tasking extensions, user defined reduction, SIMD support, and so on. All the latest OpenMP versions are integrated with the recent GNU compilers (4.9 and above), no need to install them separately!

How it works?

OpenMP programs accomplish parallelism exclusively through the use of threads. A thread of execution is the smallest unit of processing that can be scheduled by an operating system.Threads exist within the resources of a single process. Without the process, they cease to exist. Typically, the number of threads match the number of machine processors/cores. However, the actual use of threads is up to the application.

OpenMP workflow<br>
OpenMP workflow

OpenMP is an explicit (not automatic) programming model, offering the programmer full control over parallelization. It uses the fork-join model of parallel execution in addition to thread based, compiler directive program flows. All OpenMP programs begin as a single process (master thread). The master thread executes sequentially until the first parallel region construct is encountered. the master thread then creates (fork) a team of parallel threads. When the team threads complete the statements in the parallel region construct, they synchronize (join) and terminate, leaving only the master thread.

OpenMP components

Three primary OpenMP components are:

  • Compiler directives
  • Runtime library routines
  • Environment Variables

The OpenMP directives appear as comments in your source code and are ignored by compilers unless you tell them otherwise. Directives have the following syntax: sentinel directive-name [clause, ...]. See an example, below

#pragma omp parallel default(shared) private(beta,pi)

OpenMP provides several environment variables for controlling the execution of parallel code at run-time. For example to set 8 threads while running your code, you would do the following:

export OMP_NUM_THREADS=8

OpenMP cluases

The table below summarizes which clauses are accepted by which OpenMP directives.

ClauseDescription
copyinAllows threads to access the master thread's value, for a threadprivate variable.
copyprivateSpecifies that one or more variables should be shared among all threads.
defaultSpecifies the behavior of unscoped variables in a parallel region.
firstprivateSpecifies that each thread should have its own instance of a variable, and that the variable should be initialized with the value of the variable, because it exists before the parallel construct.
ifSpecifies whether a loop should be executed in parallel or in serial.
lastprivateSpecifies that the enclosing context's version of the variable is set equal to the private version of whichever thread executes the final iteration (for-loop construct) or last section (#pragma sections).
nowaitOverrides the barrier implicit in a directive.
num_threadsSets the number of threads in a thread team.
orderedRequired on a parallel for-loop statement if an ordered directive is to be used in the loop.
privateSpecifies that each thread should have its own instance of a variable.
reductionSpecifies that one or more variables that are private to each thread are the subject of a reduction operation at the end of the parallel region.
scheduleApplies to the for directive.
sharedSpecifies that one or more variables should be shared among all threads.

OpenMP worksharing

A work-sharing construct divides the execution of the enclosed code region among the members of the team that encounter it, but do not launch new threads. There is no implied barrier upon entry to a work-sharing construct, however there is an implied barrier at the end of a work sharing construct. The syntax of the workshare construct is as follows:

!$omp workshare 
    structured-block 
!$omp end workshare [nowait]

There are three type of constructs:

Do/for-loop

Shares iterations of a loop across the team. Represents a type of "data parallelism".

#pragma omp for [clause ...]  newline 
                schedule (type [,chunk]) 
                ordered
                private (list) 
                firstprivate (list) 
                lastprivate (list) 
                shared (list) 
                reduction (operator: list) 
                collapse (n) 
                nowait 

   for_loop

Sections

Breaks work into separate, discrete sections. Each section is executed by a thread. Can be used to implement a type of "functional parallelism".

#pragma omp sections [clause ...]  newline 
                     private (list) 
                     firstprivate (list) 
                     lastprivate (list) 
                     reduction (operator: list) 
                     nowait
  {

  #pragma omp section   newline 

     structured_block

  #pragma omp section   newline 

     structured_block

  }

Single

Serializes a section of code.

#pragma omp single [clause ...]  newline 
                   private (list) 
                   firstprivate (list) 
                   nowait

     structured_block

RestrictionsThe following restrictions apply to worksharing constructs: a) Each worksharing region must be encountered by all threads in a team or by none at all, unless cancellation has been requested for the innermost enclosing parallel region. b) The sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team.

Useful resource

Please download the file below:

GDPR

When you visit any of our websites, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and manage your preferences. Please note, that blocking some types of cookies may impact your experience of the site and the services we are able to offer.