.. _project: ========================================== Final Project [2014] ========================================== Due Wednesday, June 11, 2014, by 11:00pm PDT. .. warning:: A survey link has been added at the end of this page. **Some resources and hints:** See :ref:`project_hints`. **Part 1** The notebook `$UWHPSC/project/BVP.ipynb` illustrates the solution of the two-point boundary value problem :math:`u''(x) = -f(x)` by setting up and solving a tridiagonal system. This was discussed in :ref:`lab18`. (`nbviewer `_, `video `_) The goal of Part 1 is to create a Fortran 90 module that contains subroutines `solve_bvp_direct` and `solve_bvp_split` that mimic the Python codes, and then add OpenMP to the latter in order to solve the four boundary value problems simultaneously with 4 threads. You do **not** need to convert the recursive code to Fortran, although if you want more of a challenge you might try to do this with OpenMP. The directory `$UWHPSC/project/part1` contains the following files: * `main1.f90` a driver program for the first part below. * `problem.f90`, a module containing the problem description (right hand side function `f` and true solution function `u_true`). * `bvp_solvers.f90`, the template for a module that should be filled in properly with three subroutines following the directions below. * `Makefile1` for the first part below. Note that the problem specified in `problem.f90` is the same one used in the IPython notebook and in Lecture 23. You might want to test your routines with a different problem, e.g. choose a different true solution and differentiate twice to get the corresponding right hand side (and remember to change the boundary conditions specified in the main program to be suitable). We might do the same to test your routines. #. Finish writing the subroutine `solve_BVP_direct` in the module `bvp_solvers.f90`. This routine should set up a tridiagonal system and solve it using the LAPACK subroutine `DGTSV`. As input it takes a 1-dimensional array `x`, and two real numbers `u_left` and `u_right` and the output should be an array `u` containing the approximate solution at the points specified in `x`. It can be assumed that the points in `x` are equally spaced. **Notes:** * For consistency with the Python code, the arrays `x` and `u` have been declared to start with index 0 rather than 1. Note how this is indicated in the declaration of these arrays in the subroutine `solve_bvp_direct`. * Instead of `DGTSV `_, you could use `DPTSV `_, which solves a tridiagonal `symmetric positive definite `_ system. The matrix `A` for this boundary value problem is symmetric but is negative definite rather than positive definite, so if you want to do this you would have to negate the system and solve :math:`-u''(x) = f(x)`. The `DPTSV` routine is slightly more efficient, but it's fine to use `DGTSV`. **NOTE: The output below was created using `DPTSV` and if you use `DGTSV` slightly different results are obtained.** Add a print statement to your subroutine so that running the code produces output like this:: $ make test -f Makefile1 Wrote data to input_data.txt ./test.exe n = 20 Solving tridiagonal system with n = 20 error_max = 3.99812551471256938E-003 $ make test -f Makefile1 n=10000 Wrote data to input_data.txt ./test.exe n = 10000 Solving tridiagonal system with n = 10000 error_max = 1.69490235180091986E-008 #. Add another subroutine `solve_bvp_split` to the `bvp_solvers` module that has the same calling sequence and produces the same result, but that splits the interval roughly in half and makes four calls to `solve_bvp_direct`, following the same idea as in the IPython notebook. Provide a second main program `main2.f90` and makefile `Makefile2` to test this routine. Add print statements to the subroutine so that it gives results similar to:: $ make test -f Makefile2 Wrote data to input_data.txt ./test.exe n = 20 Solving tridiagonal system with n = 10 Solving tridiagonal system with n = 9 Solving tridiagonal system with n = 10 Solving tridiagonal system with n = 9 Computed G0 = 0.1186D+02 G1 = 0.1167D+02 z = -0.6111D+02 error_max = 3.99812551484757250E-003 $ make test -f Makefile2 n=10000 Wrote data to input_data.txt ./test.exe n = 10000 Solving tridiagonal system with n = 5000 Solving tridiagonal system with n = 4999 Solving tridiagonal system with n = 5000 Solving tridiagonal system with n = 4999 Computed G0 = 0.2442D-01 G1 = 0.2402D-01 z = -0.6004D+02 error_max = 1.73004366388340713E-008 Printing out `G0, G1`, and `z` might be useful for debugging to compare with what the Python code produces. **NOTE:** The split version gets slightly different results from the direct version, particularly for `n = 10000`. This tridiagonal matrix has a condition number that grows like :math:`{\cal O}(n^2)`, and so for `n = 1e5`, perturbations in the way you do the linear algebra can make be expected to make relative errors on the order of `1e10` times the machine roundoff, which is comparable to the error in the BVP solution that is being computed. (This also suggests that making `n` even larger may not give you a better approximation of the true solution, when computing in finite precision arithmetic.) #. Add another subroutine `solve_bvp_split_omp` to the `bvp_solvers` module that has the same calling sequence and produces the same result, but that uses OpenMP in such a way that four different threads make the four calls to `solve_bvp_direct`. Do this by using `omp parallel sections `_, see for example `$UWHPSC/codes/openmp/demo2.f90` or `$UWHPSC/codes/adaptive_quadrature/openmp2/adapquad_mod.f90`. This will take a bit of thought about what variables should be private to each thread and perhaps some rearrangement of the code to make sure each thread is solving the desired problem and all four results can be combined as needed. To help debug, you might want to print out various things from the serial version of the code and compare to the parallel version, and try running with small values of `n`. You can call `omp_set_num_threads(4)` in the subroutine and do not need to test with a different number of threads. **Note:** This is not a great problem for OpenMP since solving a tridiagonal system is so quick, and the overhead of forking threads will probably make the OpenMP version run slower than the serial version unless `n` were very large, but the point is to understand and debug the code. Provide a new main program `main3.f90` and `Makefile3` that compiles with OpenMP and links with OpenMP and the LAPACK libraries, e.g. set:: LFLAGS = -lblas -llapack -fopenmp FFLAGS = -fopenmp Add print statements to your subroutine so that it gives output such as:: $ make test -f Makefile3 test.exe Wrote data to input_data.txt ./test.exe n = 20 nthreads = 4 Thread 0 taking from 0.000 to 0.524 with u_mid = 0.000 Solving tridiagonal system with n = 10 Thread 1 taking from 0.524 to 1.000 with u_mid = 0.000 Solving tridiagonal system with n = 9 Thread 2 taking from 0.000 to 0.524 with u_mid = 1.000 Solving tridiagonal system with n = 10 Thread 3 taking from 0.524 to 1.000 with u_mid = 1.000 Solving tridiagonal system with n = 9 Computed G0 = 0.1186D+02 G1 = 0.1167D+02 z = -0.6111D+02 error_max = 3.99812551484757250E-003 $ make test -f Makefile3 n=10000 Wrote data to input_data.txt ./test.exe n = 10000 nthreads = 4 Thread 1 taking from 0.000 to 0.500 with u_mid = 0.000 Solving tridiagonal system with n = 5000 Thread 0 taking from 0.500 to 1.000 with u_mid = 0.000 Solving tridiagonal system with n = 4999 Thread 2 taking from 0.000 to 0.500 with u_mid = 1.000 Solving tridiagonal system with n = 5000 Thread 3 taking from 0.500 to 1.000 with u_mid = 1.000 Solving tridiagonal system with n = 4999 Computed G0 = 0.2442D-01 G1 = 0.2402D-01 z = -0.6004D+02 error_max = 1.73004366388340713E-008 **Part 2** In :ref:`lab19` the heat equation will be discussed along with an IPython notebook illustrating how solutions behave and two numerical methods for approximating the solution. For simplicity, we are only considering a special case of the one-dimensional heat equation :math:`u_t(x,t) = u_{xx}(x,t)` in which the problem is solved on the interval :math:`0 < x < \pi`, the boundary conditions are :math:`u(0,t) = u(\pi,t) = 0` for all :math:`t`, and the initial data is a sine wave of the form `u_0(x) = \sin(kx)` for some integer `k`. You might want to experiment with initial data that is a linear combination of different "Fourier modes", as illustrated in the notebook. The directory `$UWHPSC/homeworks/project/part2` contains some files that implement the explicit method discussed in class. You can do, for example:: $ make test -f Makefile1 and you can vary `n, k, tfinal,` and `nsteps` by specifying at the command line, e.g. :: $ make test -f Makefile1 n=100 k=5 nsteps=500 The main program prints out the max-norm error at the final time and also produces a file `solution.txt` that contains the approximate and true solution at the final time. #. Add a second subroutine to the file `heat_solvers.f90` that implements the implicit Crank-Nicolson method that will be discussed in :ref:`lab19`. Name this subroutine `solve_heat_implicit` and it should have the same calling sequence as the `solve_heat_explicit`. Add a parameter `method` to `main1.f90` so that if `method==1` then the explicit method is used and if `method==2` then the implicit method is used. Add this also to `Makefile1` so that a value is written to `input_data.txt` and then read by the main program, similar to the other parameters. (You can give it the default value 1). Note that the two methods do not give the same approximate solution (or error), but test that both give results that agree with the IPython notebook. To implement this method, you will have to solve a tridiagonal system of equations every time step. You can use the LAPACK routine `dgtsv` (or `dptsv` if you prefer). Note that either of these routines overwrites the input arrays that describe the matrix with the LU factorization, so be careful if you are using this in a loop where you have more than one system to solve! #. Create a new main program `main2.f90` based on your modified `main1.f90` that outputs the solution at every time step to a file `frames.txt`. Use the same format as currently used to write to `solution.txt`, but add to the file every time step, and also write the initial data before starting to solve the problem. So after running the code the file `frames.txt` should have `(nsteps+1)*(n+2)` lines (since each `u` solution vector has `n+2` elements). Do not modify the subroutines in `heat_solvers.f90` to do this. Instead, have a loop in the main program that calls `solve_heat_explicit` or `solve_heat_implicit` repeatedly, `nsteps` times, taking a single time step with each call and then writing the solution before the next call. #. Write a Python script `animate.py` that reads `n` and `nsteps` from `input_data.txt` and reads all the solutions from `frames.txt` and produces an animation in a file `heat.html`. Use `JSAnimation` and the `JSAnimation_frametools.py` module from :ref:`lab15`. In :ref:`lab20` we will look at an example of doing this for a different problem, so if you're not sure how to do it, take a look at that Lab. Create a `Makefile2` with a phony target `movie` so that you can do, for example, $ make movie -f Makefile2 k=4 n=50 nsteps=40 method=1 $ make movie -f Makefile2 k=4 n=50 nsteps=40 method=2 and create the animations shown at * ``_ * ``_ illustrating that the implicit method is more stable. To submit --------- * At the end, you should have committed the following files to your repository: **Part 1** * `$MYHPSC/project/part1/Makefile1` (unchanged from original) * `$MYHPSC/project/part1/main1.f90` (unchanged from original) * `$MYHPSC/project/part1/problem.f90` (unchanged from original) * `$MYHPSC/project/part1/bvp_solvers.f90` (with 3 subroutines) * `$MYHPSC/project/part1/Makefile2` * `$MYHPSC/project/part1/main2.f90` * `$MYHPSC/project/part1/Makefile3` * `$MYHPSC/project/part1/main3.f90` **Part 2** * `$MYHPSC/project/part2/problem.f90` (unchanged from original) * `$MYHPSC/project/part2/Makefile1` * `$MYHPSC/project/part2/main1.f90` * `$MYHPSC/project/part2/heat_solvers.f90` * `$MYHPSC/project/part2/Makefile2` * `$MYHPSC/project/part2/main2.f90` * `$MYHPSC/project/part2/animate.py` (Script to create the animation) * `$MYHPSC/project/part2/JSAnimation_frametools.py` (optional, unchanged from original) * **Please be sure you have the specified directory and file names.** It is hard to grade otherwise, and points will be deducted. * Make sure you push to bitbucket after committing. * Submit the commit number that you want graded by following the link provided on the `Canvas page for the project `_. * Also take this `survey `_ worth 10 points. Before doing this survey, please first do the course evaluation. This is being done on-line this year at the link ``_. Anonymous results will be available to the instructor and TAs well after the quarter has ended. * On-campus students: UW Libraries and UW Information Technology want to know what students think about the Active Learning Classrooms in Odegaard Library. Your feedback will shape recommendations for new classrooms. Take their brief survey here: ``_.