.. _homework4: ========================================== Homework 4 ========================================== .. comment :: Not yet assigned and may change slightly. Due Thursday, May 5, 2011, by 11:00pm PDT, by pushing to your bitbucket repository. The goal of this homework is to gain some more practice with Fortran 90, modules, and Makefiles, and to start using OpenMP. Before tackling this homework, you should read some of the class notes and links they point to. In particular, the following sections are relevant: * :ref:`openmp` Make sure you update your clone of the class repository before doing this assignment:: $ cd $CLASSHG $ hg pull -u # pulls and updates You should also create a directory `$MYHG/homeworks/homework4` to work in since this is where your modified versions of the files will eventually need to be. Within in this directory you should have two subdirectories `parta` and `partb` for Part A and B below. Numerical Quadrature ==================== The Fortran files in `$CLASSHG/codes/fortran/quadrature_hw4` implement the *Trapezoidal rule* for computing an approximation to the definite integral :math:`\int_a^b f(x)\, dx` To do tests of the accuracy we will use functions for which we can compute the exact integral for comparison, but of course the point of numerical quadrature is to approximate integrals that cannot be computed exactly. The Trapezoidal rule is a simple *quadrature rule* that is based on evaluating the function :math:`f(x)` at :math:`n+1` points in the interval, approximating the function by a *piecewise linear* function connecting these function values, and computing the exact integral of this piecewise linear function. This is easy to do since on each of the :math:`n` intervals between function values we only need to compute the area of a trapezoid. The figures below show examples where 10 and 20 intervals are used to approximate the interal :math:`\int_{-10}^{3} \exp(x/10) + \cos(x)\,dx`. .. image:: images/plotquad_k1_n10.png :width: 10cm .. image:: images/plotquad_k1_n20.png :width: 10cm Let :math:`\Delta x = (b-a)/n` and :math:`x_i=a + (i-1)\Delta x` for :math:`i=1, 2, \ldots, n+1`. On interval number :math:`i` from :math:`x_i` to :math:`x_{i+1}` the area is :math:`\frac {\Delta x} 2 (f(x_{i})+ f(x_{i+1}))`. Summing this up from :math:`i=1` to `n` gives :math:`\Delta x \left(\frac 1 2 (f(a) + f(b)) + \sum_{i=2}^n f(x_i) \right)`. The trapezoidal rule is *second order accurate*, which means that if the function :math:`f(x)` is sufficiently smooth then for small :math:`\Delta x` (a large number of intervals), the error is proportional to :math:`(\Delta x)^2`. So, for example, running the code in the directory `$CLASSHG/codes/fortran/quadrature_hw4` gives:: $ make test ./testquad.exe How many intervals n? 20 Trapezoidal: n = 20 approx = 0.9434636084761E+01 true = 0.9416892561216E+01 error = 0.177E-01 g was evaluated 21 times $ make test ./testquad.exe How many intervals n? 200 Trapezoidal: n = 200 approx = 0.9417068999802E+01 true = 0.9416892561216E+01 error = 0.176E-03 g was evaluated 201 times Note that with :math:`n=200` the error is about :math:`100 = 10^2` times smaller than with :math:`n=20`. Assignment: =========== Part A ------ #. There are much better methods than the Trapezoidal rule that are not much harder to implement but get much smaller errors with the same number of function evaluations. One such method is *Simpson's rule*, which approximates the integral over a single interval from :math:`x_i` to :math:`x_{i+1}` by :math:`\int_{x_i}^{x_{i+1}} f(x)\, dx \approx \frac {\Delta x} 6 (f(x_i) + 4f(x_{i+1/2}) + f(x_{i+1}))`, where :math:`x_{i+1/2} = \frac 1 2 (x_i + x_{i+1}) = x_i + \Delta x/2`. Add a subroutine `simpson` to the module `quadrature_mod.f90` that implements Simpson's rule. Note that when you sum the above expression over all :math:`i`, you can combine some terms as was done in simplifying the Trapezoidal rule, so the result looks like :math:`\frac{\Delta x}{6}[f(x_1) + 4f(x_{3/2}) + 2f(x_2) + 4f(x_{5/2}) + 2f(x_3) + \cdots + 2f(x_n) + 4f(x_{n+1/2}) + f(x_{n+1})]`. Figure out a good way to evaluate this. Note that your program should evaluate the function only :math:`2n+1` times for Simpson's method. The code contains a counter that helps you check this. This method is 4th order accurate, which means that on fine enough grids the error is proportional to :math:`\Delta x^4`. Hence increasing :math:`n` by a factor of 10 should decrease the error by a factor of :math:`10^4 = 10000.` Check that your function works properly by experimenting with this on the function provided. *Note:* In finite precision arithmetic this asymptotic behavior is not seen if :math:`\Delta x` is too small. Once :math:`\Delta x^4` is smaller than machine epsilon (around `2.d-16`) you will not see any improvement by taking :math:`n` larger. Have your main program print out results for Simpson's rule in the same format as for Trapezoidal. #. Modify your `functions_mod.f90` so that the function being integrated is :math:`g(x) = \exp(x/10) + \cos(kx)` where the *wave number* :math:`k` is a module variable that is read in from the user by the main program (*before* the number of intervals) and then used in the definition of the function `g(x)`. So running your program should give results like:: $ make test ./testquad.exe Wavenumber k? 8. How many intervals n? 20 Trapezoidal: n = 20 approx = 0.1084939817273E+02 true = 0.9582360287054E+01 error = 0.127E+01 g was evaluated 21 times Simpson: n = 20 approx = 0.9363491553954E+01 true = 0.9582360287054E+01 error = -0.219E+00 g was evaluated 41 times If :math:`k=1` this is the same function as before. For large :math:`k` the function is highly oscillatory, making it difficult to integrate unless a sufficiently fine grid is used. For example, with :math:`k=8` as above the error is large when :math:`n=20`, not surprising if you look at the plots below. .. image:: images/plotquad_k8_n20.png :width: 10cm .. image:: images/plotquad_k8_n100.png :width: 10cm Part B ------ Now suppose we want to compute the integral with wave number :math:`k=10000`. In order to get an accurate result we will have to take :math:`n` very large. For this we might want an OpenMP version of the code. #. Modify your code so that it uses OpenMP. Have `nthreads` be a variable of the module `quadrature_mod.f90` that is set in the main program and used in the module in order to replace the `do` loops in `trapezoid` and `simpson` by `omp parallel do` loops. These loops must properly use reductions and private variables. Remove the call to `evalfunc` from the main program so you don't have to worry about parallelizing this subroutine, which is just used to plot the function for plotting purposes. You might want to test your code on small values of :math:`k` and :math:`n` first. #. Eventually test it with :math:`k=10000` and :math:`n = 50000000`. Change the `Makefile` so that the `time` command is used when executing the code to determine how much time it takes. Hard wire these values of :math:`k` and :math:`n` into the main program rather than prompting the user, so the time to respond to the prompts doesn't influence your results. So running the code should give something like:: $ make test gfortran -c -fopenmp evalfunc_mod.f90 gfortran -c -fopenmp functions_mod.f90 gfortran -c -fopenmp quadrature_mod.f90 gfortran -c -fopenmp testquad.f90 gfortran -fopenmp evalfunc_mod.o functions_mod.o quadrature_mod.o testquad.o -o testquad.exe time ./testquad.exe Compiled with OpenMP Using 2 threads Trapezoidal: n = 50000000 approx = 0.9819716972423E+01 true = 0.9819716972381E+01 error = 0.415E-10 g was evaluated 49444572 times Simpson: n = 50000000 approx = 0.9819716972380E+01 true = 0.9819716972381E+01 error = -0.927E-12 g was evaluated 147106801 times 10.27 real 19.56 user 0.03 sys Note from the last line that 19.56 seconds of CPU time was used but the code ran in 10.27 seconds, which indicates that both cores were being used. #. If you look at the number of times g was evaluated in the results above you will see that this is not what's expected. There was a bug in my code that you might have missed too: The variable `gcounter` is not *thread safe*: both threads call `g` and update this counter independently and the result is somewhat random. Sometimes the value updated by one thread was in local cache and not seen by the other thread. Modify the code to keep a separate counter for each thread, or for more generality an array of length 8, by defining:: integer :: gcounter_thread(0:7) which would allow nthreads up to 8. In the function `g`, use `omp_get_thread_num` to determine the thread number that made the call and update the corresponding element of the array. Output both values of the counter, so you will get output something like:: $ make test time ./testquad.exe Compiled with OpenMP Using 2 threads Trapezoidal: n = 50000000 approx = 0.9819716972423E+01 true = 0.9819716972381E+01 error = 0.415E-10 g was evaluated 25000002 times by thread 0 g was evaluated 24999999 times by thread 1 Simpson: n = 50000000 approx = 0.9819716972380E+01 true = 0.9819716972381E+01 error = -0.927E-12 g was evaluated 50000002 times by thread 0 g was evaluated 49999999 times by thread 1 10.27 real 19.56 user 0.03 sys #. Your module `testquad` should also work if we provide a different main program and/or `functions` module for grading purposes. #. Make sure your directory $MYHG/homeworks/homework4 contains the following subdirectories and files: * parta * testquad.f90 * quadrature_mod.f90 * functions_mod.f90 * evalfunc_mod.f90 (unchanged from the original) * Makefile (unchanged from the original) * makeplot.py (unchanged from the original) * partb * testquad.f90 * quadrature_mod.f90 * functions_mod.f90 * Makefile When you are done, don't forget to use the `hg add`, `hg commit`, and `hg push` commands to push these to your bitbucket repository for grading.