OpenMP is discussed in Slides from lectures starting with Lecture 13.
There are a few sample codes in the $CLASSHG/codes/openmp directory. See the README.txt file for instructions on compiling and executing.
Here is a very simple code, that simply evaluates a costly function at many points:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | ! $CLASSHG/codes/openmp/yeval.f90
program yeval
use omp_lib
implicit none
integer, parameter :: n = 100000000
integer :: i, nthreads
real(kind=8), dimension(n) :: y
real(kind=8) :: dx, x
! Specify number of threads to use:
!$ print *, "How many threads to use? "
!$ read *, nthreads
!$ call omp_set_num_threads(nthreads)
!$ print "('Using OpenMP with ',i3,' threads')", nthreads
dx = 1.d0 / (n+1.d0)
!$omp parallel do private(x)
do i=1,n
x = i*dx
y(i) = exp(x)*cos(x)*sin(x)*sqrt(5*x+6.d0)
enddo
print *, "Filled vector y of length", n
end program yeval
|
Note the following:
Consider the problem of normalizing a vector by dividing each element by the 1-norm of the vector, defined by \|x\|_1 = \sum_{i=1}^n |x_i|.
We must first loop over all points to compute the norm. Then we must loop over all points and set y_i = x_i / \|x\|_1. Note that we cannot combine these two loops into a single loop!
Here is an example with fine-grain paralellism, where we use the OpenMP omp parallel do directive or the omp do directive within a omp parallel block.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | ! $CLASSHG/codes/openmp/normalize1.f90
! Example of normalizing a vector using fine-grain parallelism.
program main
use omp_lib
implicit none
integer :: i, thread_num
integer, parameter :: n = 1000
real(kind=8), dimension(n) :: x, y
real(kind=8) :: norm,ynorm
integer :: nthreads
! Specify number of threads to use:
nthreads = 1 ! need this value in serial mode
!$ nthreads = 4
!$ call omp_set_num_threads(nthreads)
!$ print "('Using OpenMP with ',i3,' threads')", nthreads
! Specify number of threads to use:
!$ call omp_set_num_threads(4)
! initialize x:
!$omp parallel do
do i=1,n
x(i) = dble(i) ! convert to double float
enddo
norm = 0.d0
ynorm = 0.d0
!$omp parallel private(i)
!$omp do reduction(+ : norm)
do i=1,n
norm = norm + abs(x(i))
enddo
!$omp barrier ! not needed (implicit)
!$omp do reduction(+ : ynorm)
do i=1,n
y(i) = x(i) / norm
ynorm = ynorm + abs(y(i))
enddo
!$omp end parallel
print *, "norm of x = ",norm, " n(n+1)/2 = ",n*(n+1)/2
print *, 'ynorm should be 1.0: ynorm = ', ynorm
end program main
|
Note the following:
Next is a version with coarse-grain parallelism, were we decide how to split up the array between threads and then execute the same code on each thread, but each thread will compute its own version of istart and iend for its portion of the array. With this code we are guaranteed that thread 0 always handles x(1), for example, so in principle the data could be distributed. When using OpenMP on a shared memory computer this doesn’t matter, but this version is more easily generalized to MPI.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | ! $CLASSHG/codes/openmp/normalize2.f90
! Example of normalizing a vector using coarse-grain parallelism.
program main
use omp_lib
implicit none
integer, parameter :: n = 1000
real(kind=8), dimension(n) :: x,y
real(kind=8) :: norm,norm_thread,ynorm,ynorm_thread
integer :: nthreads, points_per_thread,thread_num
integer :: i,istart,iend
! Specify number of threads to use:
nthreads = 1 ! need this value in serial mode
!$ nthreads = 4
!$ call omp_set_num_threads(nthreads)
!$ print "('Using OpenMP with ',i3,' threads')", nthreads
! Determine how many points to handle with each thread.
! Note that dividing two integers and assigning to an integer will
! round down if the result is not an integer.
! This, together with the min(...) in the definition of iend below,
! insures that all points will get distributed to some thread.
points_per_thread = (n + nthreads - 1) / nthreads
print *, "points_per_thread = ",points_per_thread
! initialize x:
do i=1,n
x(i) = dble(i) ! convert to double float
enddo
norm = 0.d0
ynorm = 0.d0
!$omp parallel private(i,norm_thread, &
!$omp istart,iend,thread_num,ynorm_thread)
thread_num = 0 ! needed in serial mode
!$ thread_num = omp_get_thread_num() ! unique for each thread
! Determine start and end index for the set of points to be
! handled by this thread:
istart = thread_num * points_per_thread + 1
iend = min((thread_num+1) * points_per_thread, n)
!$omp critical
print 201, thread_num, istart, iend
!$omp end critical
201 format("Thread ",i2," will take i = ",i6," through i = ",i6)
norm_thread = 0.d0
do i=istart,iend
norm_thread = norm_thread + abs(x(i))
enddo
! update global norm with value from each thread:
!$omp critical
norm = norm + norm_thread
print *, "norm updated to: ",norm
!$omp end critical
! make sure all have updated norm before proceeding:
!$omp barrier
ynorm_thread = 0.d0
do i=istart,iend
y(i) = x(i) / norm
ynorm_thread = ynorm_thread + abs(y(i))
enddo
! update global ynorm with value from each thread:
!$omp critical
ynorm = ynorm + ynorm_thread
print *, "ynorm updated to: ",ynorm
!$omp end critical
!$omp barrier
!$omp end parallel
print *, "norm of x = ",norm, " n(n+1)/2 = ",n*(n+1)/2
print *, 'ynorm should be 1.0: ynorm = ', ynorm
end program main
|
Note the following:
For comparison of fine-grain and coarse-grain parallelism on Jacobi iteration, see