UW AMath High Performance Scientific Computing
 
AMath 483/583 Class Notes
 
Spring Quarter, 2013

Table Of Contents

This Page

Final Project [2014]

Due Wednesday, June 11, 2014, by 11:00pm PDT.

Warning

A survey link has been added at the end of this page.

Some resources and hints: See Project Hints.

Part 1

The notebook $UWHPSC/project/BVP.ipynb illustrates the solution of the two-point boundary value problem \(u''(x) = -f(x)\) by setting up and solving a tridiagonal system. This was discussed in Lab 18: Thursday May 29, 2014. (nbviewer, video)

The goal of Part 1 is to create a Fortran 90 module that contains subroutines solve_bvp_direct and solve_bvp_split that mimic the Python codes, and then add OpenMP to the latter in order to solve the four boundary value problems simultaneously with 4 threads.

You do not need to convert the recursive code to Fortran, although if you want more of a challenge you might try to do this with OpenMP.

The directory $UWHPSC/project/part1 contains the following files:

  • main1.f90 a driver program for the first part below.
  • problem.f90, a module containing the problem description (right hand side function f and true solution function u_true).
  • bvp_solvers.f90, the template for a module that should be filled in properly with three subroutines following the directions below.
  • Makefile1 for the first part below.

Note that the problem specified in problem.f90 is the same one used in the IPython notebook and in Lecture 23. You might want to test your routines with a different problem, e.g. choose a different true solution and differentiate twice to get the corresponding right hand side (and remember to change the boundary conditions specified in the main program to be suitable). We might do the same to test your routines.

  1. Finish writing the subroutine solve_BVP_direct in the module bvp_solvers.f90. This routine should set up a tridiagonal system and solve it using the LAPACK subroutine DGTSV. As input it takes a 1-dimensional array x, and two real numbers u_left and u_right and the output should be an array u containing the approximate solution at the points specified in x. It can be assumed that the points in x are equally spaced.

    Notes:

    • For consistency with the Python code, the arrays x and u have been declared to start with index 0 rather than 1. Note how this is indicated in the declaration of these arrays in the subroutine solve_bvp_direct.
    • Instead of DGTSV, you could use DPTSV, which solves a tridiagonal symmetric positive definite system. The matrix A for this boundary value problem is symmetric but is negative definite rather than positive definite, so if you want to do this you would have to negate the system and solve \(-u''(x) = f(x)\). The DPTSV routine is slightly more efficient, but it’s fine to use DGTSV.

    NOTE: The output below was created using `DPTSV` and if you use `DGTSV` slightly different results are obtained.

    Add a print statement to your subroutine so that running the code produces output like this:

    $ make test -f Makefile1
    Wrote data to input_data.txt
    ./test.exe
     n =           20
    Solving tridiagonal system with n =     20
     error_max =   3.99812551471256938E-003
    
    
    $ make test -f Makefile1 n=10000
    Wrote data to input_data.txt
    ./test.exe
     n =        10000
    Solving tridiagonal system with n =  10000
     error_max =   1.69490235180091986E-008
    
  2. Add another subroutine solve_bvp_split to the bvp_solvers module that has the same calling sequence and produces the same result, but that splits the interval roughly in half and makes four calls to solve_bvp_direct, following the same idea as in the IPython notebook.

    Provide a second main program main2.f90 and makefile Makefile2 to test this routine.

    Add print statements to the subroutine so that it gives results similar to:

    $ make test -f Makefile2
    Wrote data to input_data.txt
    ./test.exe
     n =           20
    Solving tridiagonal system with n =     10
    Solving tridiagonal system with n =      9
    Solving tridiagonal system with n =     10
    Solving tridiagonal system with n =      9
    Computed G0 =   0.1186D+02  G1 =   0.1167D+02  z =  -0.6111D+02
     error_max =   3.99812551484757250E-003
    
    
    $ make test -f Makefile2 n=10000
    Wrote data to input_data.txt
    ./test.exe
     n =        10000
    Solving tridiagonal system with n =   5000
    Solving tridiagonal system with n =   4999
    Solving tridiagonal system with n =   5000
    Solving tridiagonal system with n =   4999
    Computed G0 =   0.2442D-01  G1 =   0.2402D-01  z =  -0.6004D+02
     error_max =   1.73004366388340713E-008
    

    Printing out G0, G1, and z might be useful for debugging to compare with what the Python code produces.

    NOTE: The split version gets slightly different results from the direct version, particularly for n = 10000. This tridiagonal matrix has a condition number that grows like \({\cal O}(n^2)\), and so for n = 1e5, perturbations in the way you do the linear algebra can make be expected to make relative errors on the order of 1e10 times the machine roundoff, which is comparable to the error in the BVP solution that is being computed. (This also suggests that making n even larger may not give you a better approximation of the true solution, when computing in finite precision arithmetic.)

  3. Add another subroutine solve_bvp_split_omp to the bvp_solvers module that has the same calling sequence and produces the same result, but that uses OpenMP in such a way that four different threads make the four calls to solve_bvp_direct.

    Do this by using omp parallel sections, see for example $UWHPSC/codes/openmp/demo2.f90 or $UWHPSC/codes/adaptive_quadrature/openmp2/adapquad_mod.f90.

    This will take a bit of thought about what variables should be private to each thread and perhaps some rearrangement of the code to make sure each thread is solving the desired problem and all four results can be combined as needed. To help debug, you might want to print out various things from the serial version of the code and compare to the parallel version, and try running with small values of n.

    You can call omp_set_num_threads(4) in the subroutine and do not need to test with a different number of threads.

    Note: This is not a great problem for OpenMP since solving a tridiagonal system is so quick, and the overhead of forking threads will probably make the OpenMP version run slower than the serial version unless n were very large, but the point is to understand and debug the code.

    Provide a new main program main3.f90 and Makefile3 that compiles with OpenMP and links with OpenMP and the LAPACK libraries, e.g. set:

    LFLAGS = -lblas -llapack -fopenmp
    FFLAGS = -fopenmp
    

    Add print statements to your subroutine so that it gives output such as:

    $ make test -f Makefile3
    test.exe
    Wrote data to input_data.txt
    ./test.exe
     n =           20
     nthreads =            4
    Thread   0 taking from   0.000 to   0.524 with u_mid =   0.000
    Solving tridiagonal system with n =     10
    Thread   1 taking from   0.524 to   1.000 with u_mid =   0.000
    Solving tridiagonal system with n =      9
    Thread   2 taking from   0.000 to   0.524 with u_mid =   1.000
    Solving tridiagonal system with n =     10
    Thread   3 taking from   0.524 to   1.000 with u_mid =   1.000
    Solving tridiagonal system with n =      9
    Computed G0 =   0.1186D+02  G1 =   0.1167D+02  z =  -0.6111D+02
     error_max =   3.99812551484757250E-003
    
    $ make test -f Makefile3 n=10000
    Wrote data to input_data.txt
    ./test.exe
     n =        10000
     nthreads =            4
    Thread   1 taking from   0.000 to   0.500 with u_mid =   0.000
    Solving tridiagonal system with n =   5000
    Thread   0 taking from   0.500 to   1.000 with u_mid =   0.000
    Solving tridiagonal system with n =   4999
    Thread   2 taking from   0.000 to   0.500 with u_mid =   1.000
    Solving tridiagonal system with n =   5000
    Thread   3 taking from   0.500 to   1.000 with u_mid =   1.000
    Solving tridiagonal system with n =   4999
    Computed G0 =   0.2442D-01  G1 =   0.2402D-01  z =  -0.6004D+02
     error_max =   1.73004366388340713E-008
    

Part 2

In Lab 19: Tuesday June 3, 2014 the heat equation will be discussed along with an IPython notebook illustrating how solutions behave and two numerical methods for approximating the solution.

For simplicity, we are only considering a special case of the one-dimensional heat equation \(u_t(x,t) = u_{xx}(x,t)\) in which the problem is solved on the interval \(0 < x < \pi\), the boundary conditions are \(u(0,t) = u(\pi,t) = 0\) for all \(t\), and the initial data is a sine wave of the form u_0(x) = sin(kx) for some integer k. You might want to experiment with initial data that is a linear combination of different “Fourier modes”, as illustrated in the notebook.

The directory $UWHPSC/homeworks/project/part2 contains some files that implement the explicit method discussed in class. You can do, for example:

$ make test -f Makefile1

and you can vary n, k, tfinal, and nsteps by specifying at the command line, e.g.

$ make test -f Makefile1 n=100 k=5 nsteps=500

The main program prints out the max-norm error at the final time and also produces a file solution.txt that contains the approximate and true solution at the final time.

  1. Add a second subroutine to the file heat_solvers.f90 that implements the implicit Crank-Nicolson method that will be discussed in Lab 19: Tuesday June 3, 2014. Name this subroutine solve_heat_implicit and it should have the same calling sequence as the solve_heat_explicit.

    Add a parameter method to main1.f90 so that if method==1 then the explicit method is used and if method==2 then the implicit method is used. Add this also to Makefile1 so that a value is written to input_data.txt and then read by the main program, similar to the other parameters. (You can give it the default value 1). Note that the two methods do not give the same approximate solution (or error), but test that both give results that agree with the IPython notebook.

    To implement this method, you will have to solve a tridiagonal system of equations every time step. You can use the LAPACK routine dgtsv (or dptsv if you prefer). Note that either of these routines overwrites the input arrays that describe the matrix with the LU factorization, so be careful if you are using this in a loop where you have more than one system to solve!

  2. Create a new main program main2.f90 based on your modified main1.f90 that outputs the solution at every time step to a file frames.txt. Use the same format as currently used to write to solution.txt, but add to the file every time step, and also write the initial data before starting to solve the problem. So after running the code the file frames.txt should have (nsteps+1)*(n+2) lines (since each u solution vector has n+2 elements).

    Do not modify the subroutines in heat_solvers.f90 to do this. Instead, have a loop in the main program that calls solve_heat_explicit or solve_heat_implicit repeatedly, nsteps times, taking a single time step with each call and then writing the solution before the next call.

  3. Write a Python script animate.py that reads n and nsteps from input_data.txt and reads all the solutions from frames.txt and produces an animation in a file heat.html. Use JSAnimation and the JSAnimation_frametools.py module from Lab 15: Tuesday May 20, 2014.

    In Lab 20: Thursday June 5, 2014 we will look at an example of doing this for a different problem, so if you’re not sure how to do it, take a look at that Lab.

    Create a Makefile2 with a phony target movie so that you can do, for example,

    $ make movie -f Makefile2 k=4 n=50 nsteps=40 method=1

    $ make movie -f Makefile2 k=4 n=50 nsteps=40 method=2

    and create the animations shown at

    illustrating that the implicit method is more stable.

To submit

  • At the end, you should have committed the following files to your repository:

    Part 1

    • $MYHPSC/project/part1/Makefile1 (unchanged from original)
    • $MYHPSC/project/part1/main1.f90 (unchanged from original)
    • $MYHPSC/project/part1/problem.f90 (unchanged from original)
    • $MYHPSC/project/part1/bvp_solvers.f90 (with 3 subroutines)
    • $MYHPSC/project/part1/Makefile2
    • $MYHPSC/project/part1/main2.f90
    • $MYHPSC/project/part1/Makefile3
    • $MYHPSC/project/part1/main3.f90

    Part 2

    • $MYHPSC/project/part2/problem.f90 (unchanged from original)
    • $MYHPSC/project/part2/Makefile1
    • $MYHPSC/project/part2/main1.f90
    • $MYHPSC/project/part2/heat_solvers.f90
    • $MYHPSC/project/part2/Makefile2
    • $MYHPSC/project/part2/main2.f90
    • $MYHPSC/project/part2/animate.py (Script to create the animation)
    • $MYHPSC/project/part2/JSAnimation_frametools.py (optional, unchanged from original)
  • Please be sure you have the specified directory and file names. It is hard to grade otherwise, and points will be deducted.

  • Make sure you push to bitbucket after committing.

  • Submit the commit number that you want graded by following the link provided on the Canvas page for the project.

  • Also take this survey worth 10 points.

    Before doing this survey, please first do the course evaluation. This is being done on-line this year at the link https://uw.iasystem.org/survey/128096. Anonymous results will be available to the instructor and TAs well after the quarter has ended.

  • On-campus students: UW Libraries and UW Information Technology want to know what students think about the Active Learning Classrooms in Odegaard Library. Your feedback will shape recommendations for new classrooms. Take their brief survey here: https://catalyst.uw.edu/webq/survey/fournier/239147.