.. _rotating_fortran: ============================================================= Rotating particles and Fortran efficiency ============================================================= These examples solve the simple problem described in Section :ref:`particle_methods`, and follow a nearly identical pattern to the sequence of Python programs discussed in :ref:`rotating`. Comparing these examples to the ones on that page may be useful in understanding the languages as well as some of the efficiency issues. For a general introduction to Fortran, see :ref:`fortran`. For more about the timings reported here and the computer used, see :ref:`timings`. This is a simple example of a particle code (see :ref:`particle_methods`) in which a set of particles in 2 space dimensions (the *x-y* plane) are initialized and then moved around. Here the motion is simply taken to be rotation about the origin. For more about this example see :ref:`rotating`. The results below show that in Fortran loops are not necessarily bad. In fact the fastest version on this page is `rotate3.f90` which has nested loops. In Python loops are bad because it's an interpreted language. Using the vectorized version gives considerable speedup because it calls a compiled routine to do the loops. Fortran is a compiled language so doing the loops directly can be faster. A poorly written code --------------------- Here we start with the analog of `rotate2.py`: .. literalinclude:: ../codes/particles/rotate2.f90 :language: fortran :linenos: The initial timing is:: $ gfortran rotate2.f90 -o rotate2.exe $ ./rotate2.exe Rotating 150000 particles through 40 theta values CPU time (sec): 1.03054905 This is much faster than the corresponding Python code (note that we are using 150000 particles rather than 150 here!). We can speed it up a bit more by turning optimization on:: $ gfortran -O3 rotate2.f90 -o rotate2.exe $ ./rotate2.exe Rotating 150000 particles through 40 theta values CPU time (sec): 0.70011801 Eliminating repeated computations --------------------------------- Next is the analog of `rotate3.py`, in which redudant computations have been taken out of the loops. Here is the Fortran version: .. literalinclude:: ../codes/particles/rotate3.f90 :language: fortran :linenos: Without optimization:: $ gfortran rotate3.f90 -o rotate3.exe $ ./rotate3.exe Rotating 150000 particles through 40 theta values CPU time (sec): 0.07432000 With optimization:: $ gfortran -O3 rotate3.f90 -o rotate3.exe $ ./rotate3.exe Rotating 150000 particles through 40 theta values CPU time (sec): 0.03912600 This is much faster than the Python version. Array operations ---------------- Version 4 uses array operations rather than loops. .. literalinclude:: ../codes/particles/rotate4.f90 :language: fortran :linenos: Recall that in Python vectorizing the loops made a huge difference. Here it actually slows down a bit relative to the loops. Without optimization:: $ gfortran rotate4.f90 -o rotate4.exe ./rotate4.exe Rotating 150000 particles through 40 theta values CPU time (sec): 0.08140400 With optimization:: $ gfortran -O3 rotate4.f90 -o rotate4.exe ./rotate4.exe Rotating 150000 particles through 40 theta values CPU time (sec): 0.04200900 Linear algebra version ---------------------- Finally, Version 5 uses the Fortran 90 matrix multiplication routine. .. literalinclude:: ../codes/particles/rotate5.f90 :language: fortran :linenos: This slows the code down even further. Moreover, optimization doesn't help at all! Without optimization:: $ gfortran rotate5.f90 -o rotate5.exe ./rotate5.exe Rotating 150000 particles through 40 theta values CPU time (sec): 0.34880498 With optimization:: $ gfortran -O3 rotate5.f90 -o rotate5.exe ./rotate5.exe Rotating 150000 particles through 40 theta values CPU time (sec): 2.14158487