.. _rotating_fortran:

=============================================================
Rotating particles and Fortran efficiency
=============================================================

These examples solve the simple problem described in Section
:ref:`particle_methods`, and
follow a nearly identical pattern to the sequence of Python
programs discussed in :ref:`rotating`.  Comparing these examples to the ones on
that page may be useful in understanding the languages as well as some of
the efficiency issues.

For a general introduction to Fortran, see :ref:`fortran`.

For more about the timings reported here and the computer used, see
:ref:`timings`.  

This is a simple example of a particle code (see :ref:`particle_methods`) in
which a set of particles in 2 space dimensions (the *x-y* plane) are
initialized and then moved around.  Here the motion is simply taken to be
rotation about the origin.  For more about this example see :ref:`rotating`.

The results below show that in Fortran loops are not necessarily bad.  In
fact the fastest version on this page is `rotate3.f90` which has nested
loops.   In Python loops are bad because it's an interpreted language.
Using the vectorized version gives considerable speedup because it calls a
compiled routine to do the loops.  Fortran is a compiled language so doing
the loops directly can be faster.

A poorly written code
---------------------

Here we start with the analog of `rotate2.py`: 

.. literalinclude:: ../codes/particles/rotate2.f90
   :language: fortran
   :linenos:

The initial timing is::

    $ gfortran rotate2.f90 -o rotate2.exe
    $ ./rotate2.exe 
     Rotating       150000  particles through           40  theta values
     CPU time (sec):  1.03054905

This is much faster than the corresponding Python code (note that we are
using 150000 particles rather than 150 here!).

We can speed it up a bit more by turning optimization on::

    $ gfortran -O3 rotate2.f90 -o rotate2.exe
    $ ./rotate2.exe
     Rotating       150000  particles through           40  theta values
     CPU time (sec):  0.70011801


Eliminating repeated computations
---------------------------------

Next is the analog of `rotate3.py`, in which redudant
computations have been taken out of the loops.  Here is the Fortran version:

.. literalinclude:: ../codes/particles/rotate3.f90
   :language: fortran
   :linenos:

Without optimization::

    $ gfortran rotate3.f90 -o rotate3.exe
    $ ./rotate3.exe
     Rotating       150000  particles through           40  theta values
     CPU time (sec):  0.07432000

With optimization::

    $ gfortran -O3 rotate3.f90 -o rotate3.exe
    $ ./rotate3.exe
     Rotating       150000  particles through           40  theta values
     CPU time (sec):  0.03912600

This is much faster than the Python version.

Array operations
----------------

Version 4 uses array operations rather than loops.  

.. literalinclude:: ../codes/particles/rotate4.f90
   :language: fortran
   :linenos:

Recall that in Python
vectorizing the loops made a huge difference.  Here it actually slows down a
bit relative to the loops.

Without optimization::

    $ gfortran rotate4.f90 -o rotate4.exe
    ./rotate4.exe
     Rotating       150000  particles through           40  theta values
     CPU time (sec):  0.08140400

With optimization::

    $ gfortran -O3 rotate4.f90 -o rotate4.exe
    ./rotate4.exe
     Rotating       150000  particles through           40  theta values
     CPU time (sec):  0.04200900

Linear algebra version
----------------------

Finally, Version 5 uses the Fortran 90 matrix multiplication routine.

.. literalinclude:: ../codes/particles/rotate5.f90
   :language: fortran
   :linenos:


This slows the code down even further. Moreover, optimization doesn't help
at all!

Without optimization::

    $ gfortran rotate5.f90 -o rotate5.exe
    ./rotate5.exe
     Rotating       150000  particles through           40  theta values
     CPU time (sec):  0.34880498

With optimization::

    $ gfortran -O3 rotate5.f90 -o rotate5.exe
    ./rotate5.exe
     Rotating       150000  particles through           40  theta values
     CPU time (sec):  2.14158487