These examples solve the simple problem described in Section
*Particle methods*, and
follow a nearly identical pattern to the sequence of Python
programs discussed in *Rotating particles and Python efficiency*. Comparing these examples to the ones on
that page may be useful in understanding the languages as well as some of
the efficiency issues.

For a general introduction to Fortran, see *Fortran*.

For more about the timings reported here and the computer used, see
*Timing and profiling computer codes*.

This is a simple example of a particle code (see *Particle methods*) in
which a set of particles in 2 space dimensions (the *x-y* plane) are
initialized and then moved around. Here the motion is simply taken to be
rotation about the origin. For more about this example see *Rotating particles and Python efficiency*.

The results below show that in Fortran loops are not necessarily bad. In fact the fastest version on this page is rotate3.f90 which has nested loops. In Python loops are bad because it’s an interpreted language. Using the vectorized version gives considerable speedup because it calls a compiled routine to do the loops. Fortran is a compiled language so doing the loops directly can be faster.

Here we start with the analog of rotate2.py:

```
# $CLASSHG/codes/particles/rotate2.f90
program rotate2
implicit none
integer, parameter :: n = 150000 ! number of particles
integer :: nsteps = 40 ! number of rotation steps
integer :: i, j
real :: t1, t2, pi, dtheta, theta, r, alpha, x1, y1
real, dimension(n) :: x, y ! arrays of length n
call cpu_time(t1) ! start the CPU timer
pi = 4. * atan(1.) ! computes pi to full precision
dtheta = 2. * pi / nsteps ! change in theta between steps
print *, "Rotating ", n, " particles through ", nsteps, " theta values"
do i=1,nsteps
theta = i*dtheta
do j=1,n
alpha = 2. * pi*j/n
r = 1. + 0.5*cos(10.*alpha)
x1 = 2. + r*cos(alpha)
y1 = r*sin(alpha)
x(j) = cos(theta)*x1 - sin(theta)*y1
y(j) = sin(theta)*x1 + cos(theta)*y1
enddo
enddo
call cpu_time(t2)
print '(" CPU time (sec):", f12.8 )', t2-t1
end program rotate2
```

The initial timing is:

```
$ gfortran rotate2.f90 -o rotate2.exe
$ ./rotate2.exe
Rotating 150000 particles through 40 theta values
CPU time (sec): 1.03054905
```

This is much faster than the corresponding Python code (note that we are using 150000 particles rather than 150 here!).

We can speed it up a bit more by turning optimization on:

```
$ gfortran -O3 rotate2.f90 -o rotate2.exe
$ ./rotate2.exe
Rotating 150000 particles through 40 theta values
CPU time (sec): 0.70011801
```

Next is the analog of rotate3.py, in which redudant computations have been taken out of the loops. Here is the Fortran version:

```
# $CLASSHG/codes/particles/rotate3.f90
program rotate3
implicit none
integer, parameter :: n = 150000 ! number of particles
integer :: nsteps = 40 ! number of rotation steps
integer :: i, j
real :: t1, t2, pi, dtheta, theta, sinth, costh
real, dimension(n) :: x, y, alpha ! arrays of length n
real, dimension(n) :: x1, y1, r ! arrays of length n
call cpu_time(t1) ! start the CPU timer
pi = 4. * atan(1.) ! computes pi to full precision
dtheta = 2. * pi / nsteps ! change in theta between steps
print *, "Rotating ", n, " particles through ", nsteps, " theta values"
do j=1,n
alpha(j) = (j-1)*2.*pi / n
r(j) = 1. + 0.5*cos(10*alpha(j)) ! radius for each alpha
x1(j) = 2. + r(j)*cos(alpha(j)) ! x value before rotation
y1(j) = r(j)*sin(alpha(j)) ! y value before rotation
enddo
do i=1,nsteps
theta = i*dtheta
costh = cos(theta)
sinth = sin(theta)
do j=1,n
x(j) = costh*x1(j) - sinth*y1(j)
y(j) = sinth*x1(j) + costh*y1(j)
enddo
enddo
call cpu_time(t2)
print '(" CPU time (sec):", f12.8 )', t2-t1
end program rotate3
```

Without optimization:

```
$ gfortran rotate3.f90 -o rotate3.exe
$ ./rotate3.exe
Rotating 150000 particles through 40 theta values
CPU time (sec): 0.07432000
```

With optimization:

```
$ gfortran -O3 rotate3.f90 -o rotate3.exe
$ ./rotate3.exe
Rotating 150000 particles through 40 theta values
CPU time (sec): 0.03912600
```

This is much faster than the Python version.

Version 4 uses array operations rather than loops.

```
# $CLASSHG/codes/particles/rotate4.f90
program rotate4
implicit none
integer, parameter :: n = 150000 ! number of particles
integer :: nsteps = 40 ! number of rotation steps
integer :: i, j
real :: t1, t2, pi, dtheta, theta, sinth, costh
real, dimension(n) :: x, y, alpha ! arrays of length n
real, dimension(n) :: x1, y1, r ! arrays of length n
call cpu_time(t1) ! start the CPU timer
pi = 4. * atan(1.) ! computes pi to full precision
dtheta = 2. * pi / nsteps ! change in theta between steps
print *, "Rotating ", n, " particles through ", nsteps, " theta values"
do j=1,n
alpha(j) = (j-1)*2.*pi / n
enddo
r = 1. + 0.5*cos(10*alpha) ! vector with radius for each alpha
x1 = 2. + r*cos(alpha) ! vector of x before rotation
y1 = r*sin(alpha) ! vector of y before rotation
do i=1,nsteps
theta = i*dtheta
costh = cos(theta)
sinth = sin(theta)
x = cos(theta)*x1 - sin(theta)*y1
y = sin(theta)*x1 + cos(theta)*y1
enddo
call cpu_time(t2)
print '(" CPU time (sec):", f12.8 )', t2-t1
end program rotate4
```

Recall that in Python vectorizing the loops made a huge difference. Here it actually slows down a bit relative to the loops.

Without optimization:

```
$ gfortran rotate4.f90 -o rotate4.exe
./rotate4.exe
Rotating 150000 particles through 40 theta values
CPU time (sec): 0.08140400
```

With optimization:

```
$ gfortran -O3 rotate4.f90 -o rotate4.exe
./rotate4.exe
Rotating 150000 particles through 40 theta values
CPU time (sec): 0.04200900
```

Finally, Version 5 uses the Fortran 90 matrix multiplication routine.

```
# $CLASSHG/codes/particles/rotate5.f90
program rotate5
implicit none
integer, parameter :: n = 150000 ! number of particles
integer :: nsteps = 40 ! number of rotation steps
integer :: i, j
real :: t1, t2, pi, dtheta, theta, sdt, cdt
real, dimension(n) :: x, y, alpha ! arrays of length n
real, dimension(n) :: x1, y1, r ! arrays of length n
real, dimension(2,n) :: xy ! 2 by n matrix for xy values
real, dimension(2,2) :: Rot ! 2 by 2 rotation matrix
call cpu_time(t1) ! start the CPU timer
pi = 4. * atan(1.) ! computes pi to full precision
dtheta = 2. * pi / nsteps ! change in theta between steps
print *, "Rotating ", n, " particles through ", nsteps, " theta values"
do j=1,n
alpha(j) = (j-1)*2.*pi / n
enddo
r = 1. + 0.5*cos(10*alpha) ! vector with radius for each alpha
xy(1,:) = 2. + r*cos(alpha) ! vector of x before rotation
xy(2,:) = r*sin(alpha) ! vector of y before rotation
Rot(1,1) = cdt
Rot(1,2) = sdt
Rot(2,1) = -sdt
Rot(2,2) = cdt
do i=1,nsteps
xy = matmul(Rot, xy) ! multiply by rotation matrix
x = xy(1,:) ! extract 1st row
y = xy(2,:) ! extract 2nd row
enddo
call cpu_time(t2)
print '(" CPU time (sec):", f12.8 )', t2-t1
end program rotate5
```

This slows the code down even further. Moreover, optimization doesn’t help at all!

Without optimization:

```
$ gfortran rotate5.f90 -o rotate5.exe
./rotate5.exe
Rotating 150000 particles through 40 theta values
CPU time (sec): 0.34880498
```

With optimization:

```
$ gfortran -O3 rotate5.f90 -o rotate5.exe
./rotate5.exe
Rotating 150000 particles through 40 theta values
CPU time (sec): 2.14158487
```