Table Of Contents

Previous topic

Profiling

This Page

Profiling mmf.math.multigrid

This is a log of the profiling process for mmf.math.multigrid version 2216. The file multigrid_profile.py was put in a directory multigrid/profile located with the multigrid module. The optimized version was committed as svn version 2220, then both of these were moved to the documentation directory for archival. The profile script was modified slightly along the way, so the output from the current version is not exactly the same as the recorded output. The detailed profiling code is shown at the bottom of this file.

First we must define a problem to profile. The main application is a three-dimensional problem, so we start with that.

def test_1(n=4):
    mg = mmf.math.multigrid.MultiGrid(n_0=2, n=n, d=3, L=2*np.pi,
                                      boundary_conditions='periodic')
    x,y,z = mg.x()
    b = np.sin(x)*np.sin(y)*np.sin(z)
    exact = b/mg.d
    sol = mg.solve(b=b)
    assert np.allclose(sol, exact, atol=0.0011)

Our first run gives the following output:

:math:` python multigrid_profile.py test_1

Profiling function test_1...
Writing output to _profs/multigrid_test_1.prof
Done.
         249197 function calls (233037 primitive calls) in 10.630 CPU seconds

   Ordered by: internal time, call count
   List reduced from 71 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1536    4.198    0.003    7.630    0.005 multigrid_.py:646(T)
    46080    2.887    0.000    3.076    0.000 numeric.py:879(roll)
 9184/224    1.052    0.000    1.388    0.006 multigrid_.py:864(I)
 6624/224    0.700    0.000    0.829    0.004 multigrid_.py:748(R)
     1984    0.324    0.000    0.403    0.000 multigrid_.py:1001(augment)
    20736    0.273    0.000    0.287    0.000 index_tricks.py:487(__getslice__)
    46080    0.189    0.000    0.189    0.000 numeric.py:232(asanyarray)
      640    0.185    0.000    5.190    0.008 multigrid_.py:503(S)
      723    0.152    0.000    0.168    0.000 fromnumeric.py:32(_wrapit)
     9259    0.126    0.000    0.126    0.000 numeric.py:180(asarray)
     1536    0.108    0.000    7.738    0.005 multigrid_.py:600(A)
        1    0.106    0.106   10.606   10.606 multigrid_.py:1344(solve)
    89478    0.066    0.000    0.066    0.000 index_tricks.py:478(__getitem__)
   240/80    0.061    0.000    9.292    0.116 multigrid_.py:1221(v_cycle)
     5952    0.040    0.000    0.040    0.000 numeric.py:943(rollaxis)
     2176    0.033    0.000    0.039    0.000 multigrid_.py:417(dx_inv)
       80    0.019    0.000    0.024    0.000 linalg.py:1185(lstsq)
       80    0.014    0.000    1.016    0.013 multigrid_.py:1203(to_mat)
      882    0.012    0.000    0.180    0.000 fromnumeric.py:1626(prod)
      640    0.010    0.000    0.012    0.000 fromnumeric.py:833(trace)

 Line-data for file .../utils/mmf/math/multigrid/multigrid_.py
   ...
   646: ------ ------     def T(self, x, dx_inv=None):
   715:  0.01%   1536         ndim = self.d
   716:  0.04%   1536         inner = np.s_[...,] + _inner*ndim
   717: ------ ------
   718:  0.02%   1536         v_shape = x.shape[:-ndim]   # Shape of vector part
   719:  0.01%   1536         nv = len(v_shape)           # Number of vector dimensions
   720:  0.02%   1536         shape = np.asarray(x.shape[-ndim:]) # Shape of grid part
   721: ------ ------
   722: ------ ------
   723:  0.48%   1536         ddx = np.zeros(x.shape, dtype=x.dtype)
   724: ------ ------
   725:  0.01%   1536         if dx_inv is None:
   726:  0.03%   1536            dx_inv = self.dx_inv(shape)
   727: ------ ------
   728:  0.03%   1536         x = self.augment(x)
   729: ------ ------
   730:  0.16%   1536         dx_inv_2 = np.dot(dx_inv, dx_inv.T)
   731:  0.05%   6144         for a in xrange(ndim):
   732:  0.19%  18432             for b in xrange(ndim):
   733:  0.10%  13824                 if a == b:
   734: ------ ------                     xab = (- 2*x
   735: ------ ------                            + np.roll(x, -1, axis=nv + a)
   736:  2.62%   4608                            + np.roll(x, 1, axis=nv + a))
   737: ------ ------                 else:
   738:  0.26%   9216                    xa = (np.roll(x, 1, axis=nv + a)
   739:  0.21%   9216                          - np.roll(x, -1, axis=nv + a))
   740:  0.26%   9216                    xab = (np.roll(xa, 1, axis=nv + b)
   741:  0.19%   9216                           - np.roll(xa, -1, axis=nv + b))/4
   742: ------ ------
   743: ------ ------                 # Transform to get Laplacian
   744: 10.90%  13824                 ddx += xab[inner]*dx_inv_2[a,b]
   745: ------ ------
   746:  0.01%   1536         return ddx

 Line-data for file /Library/.../numpy/core/numeric.py
      ...
   879: ------ ------ def roll(a, shift, axis=None):
      ...
   928:  0.43%  46080     a = asanyarray(a)
   929:  0.26%  46080     if axis is None:
   930: ------ ------         n = a.size
   931: ------ ------         reshape = True
   932: ------ ------     else:
   933:  0.48%  46080         n = a.shape[axis]
   934:  0.22%  46080         reshape = False
   935:  0.29%  46080     shift %= n
   936:  3.69%  46080     indexes = concatenate((arange(n-shift,n),arange(n-shift)))
   937: 20.85%  46080     res = a.take(indexes, axis)
   938:  0.24%  46080     if reshape:
   939: ------ ------         return res.reshape(a.shape)
   940: ------ ------     else:
   941:  0.18%  46080         return res

We see that most of the time is spent in T and roll. Using the lineevent data we can see the hotspots. Running once more with n=5 gives demonstrates the same proportions:

` python multigrid_profile.py test_1 5
Profiling function test_1...
Writing output to _profs/multigrid_test_1.prof
Done.
         372066 function calls (347195 primitive calls) in 57.850 CPU seconds

   Ordered by: internal time, call count
   List reduced from 72 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2210   28.526    0.013   48.318    0.022 multigrid_.py:646(T)
    66300   18.209    0.000   18.531    0.000 numeric.py:879(roll)
13940/340    3.203    0.000    3.407    0.010 multigrid_.py:864(I)
10540/340    1.866    0.000    2.338    0.007 multigrid_.py:748(R)
     2890    1.507    0.001    1.629    0.001 multigrid_.py:1001(augment)
     1020    1.306    0.001   34.933    0.034 multigrid_.py:503(S)
        1    0.897    0.897   57.689   57.689 multigrid_.py:1344(solve)
     2210    0.740    0.000   49.058    0.022 multigrid_.py:600(A)
  357/102    0.338    0.001   49.085    0.481 multigrid_.py:1221(v_cycle)
    66300    0.323    0.000    0.323    0.000 numeric.py:232(asanyarray)
    13747    0.198    0.000    0.198    0.000 numeric.py:180(asarray)
   138046    0.104    0.000    0.104    0.000 index_tricks.py:478(__getitem__)
    32130    0.078    0.000    0.101    0.000 index_tricks.py:487(__getslice__)
        1    0.068    0.068   57.847   57.847 multigrid_profile.py:28(test_1)
     8670    0.062    0.000    0.062    0.000 numeric.py:943(rollaxis)
     3230    0.054    0.000    0.065    0.000 multigrid_.py:417(dx_inv)
        2    0.053    0.027    0.054    0.027 multigrid_.py:434(x)
       85    0.048    0.001    0.054    0.001 multigrid_.py:1412(_remove_constant)
       17    0.034    0.002   50.641    2.979 multigrid_.py:1263(full_multigrid)
       32    0.033    0.001    0.033    0.001 fromnumeric.py:1492(amax)

We start with line 744 in multigrid_.py:

743: ------ ------                 # Transform to get Laplacian
744: 10.90%  13824                 ddx += xab[inner]*dx_inv_2[a,b]

Perhaps the problem is that the operations of take and the multiplication etc. do not happen in place and force copies to be made. Noting that the array xab is never used again, we change this to the following using in-place operations and reprofile:

743: ------ ------                 # Transform to get Laplacian
744:  6.80%  13824                 xab[inner] *= dx_inv_2[a,b]
745:  3.65%  13824                 ddx += xab[inner]

6.80% + 3.65% = 10.45%: Not much improvement. This surprised me and demonstrates the important point that you must profile to determine performance issues! Perhaps it is the indexing that is taking a long time. We try a few other things for comparison. It looks like in-place operations on arrays with complex indexing is somewhat costly:

743: ------ ------                 # Transform to get Laplacian
744:  0.78%  13824                 xab[inner]
745:  2.70%  13824                 xab *= dx_inv_2[a,b]
746:  6.67%  13824                 xab[inner] *= 1.0
747:  3.82%  13824                 ddx += xab[inner]

Here is the best I can do with these lines:

743: ------ ------                 # Transform to get Laplacian
744:  2.95%  13824                 xab *= dx_inv_2[a,b]
745:  4.37%  13824                 ddx += xab[inner]

2.95% + 4.37% = 7.32%: Not great but better than nothing.

Note

Note that the timings are not completely accurate as they depend on CPU load etc. and vary from run to run. One can either average several runs, or just treat the results cautiously, acknowledging a fair amount of error. Make another profiling run if things look suspicious. We have done that here and included the most “consistent” runs.

Another problem is that the indexing is done each step in the loop, which occurs 9 times. Let’s try making ddx bigger and just extracting the result at the end. Here is the present set of timings:

:math:` python multigrid_profile.py test_1
Profiling function test_1...
Writing output to _profs/multigrid_test_1.prof
Done.
         249197 function calls (233037 primitive calls) in 10.354 CPU seconds

   Ordered by: internal time, call count
   List reduced from 71 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1536    3.898    0.003    7.428    0.005 multigrid_.py:646(T)
    46080    2.909    0.000    3.111    0.000 numeric.py:879(roll)
 9184/224    1.104    0.000    1.327    0.006 multigrid_.py:865(I)
 6624/224    0.733    0.000    0.898    0.004 multigrid_.py:749(R)
     1984    0.365    0.000    0.475    0.000 multigrid_.py:1002(augment)

  723:  0.51%   1536         ddx = np.zeros(x.shape, dtype=x.dtype)
     ...
  728:  0.03%   1536         x = self.augment(x)
  729: ------ ------
  730:  0.16%   1536         dx_inv_2 = np.dot(dx_inv, dx_inv.T)
  731:  0.05%   6144         for a in xrange(ndim):
  732:  0.18%  18432             for b in xrange(ndim):
  733:  0.10%  13824                 if a == b:
  734: ------ ------                     xab = (- 2*x
  735: ------ ------                            + np.roll(x, -1, axis=nv + a)
  736:  2.70%   4608                            + np.roll(x, 1, axis=nv + a))
  737: ------ ------                 else:
  738:  0.26%   9216                    xa = (np.roll(x, 1, axis=nv + a)
  739:  0.21%   9216                          - np.roll(x, -1, axis=nv + a))
  740:  0.27%   9216                    xab = (np.roll(xa, 1, axis=nv + b)
  741:  0.20%   9216                           - np.roll(xa, -1, axis=nv + b))/4
  742: ------ ------
  743: ------ ------                 # Transform to get Laplacian
  744:  2.97%  13824                 xab *= dx_inv_2[a,b]
  745:  4.44%  13824                 ddx += xab[inner]
  746: ------ ------
  747:  0.01%   1536         return ddx

followed by the new timings with a larger ddx:

` python multigrid_profile.py test_1
   ...
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1536    3.735    0.002    7.263    0.005 multigrid_.py:646(T)
    46080    2.915    0.000    3.112    0.000 numeric.py:879(roll)
 9184/224    1.205    0.000    1.359    0.006 multigrid_.py:865(I)
 6624/224    0.734    0.000    0.891    0.004 multigrid_.py:749(R)

    ...
 726:  0.03%   1536         x = self.augment(x)
 727: ------ ------
 728:  0.57%   1536         ddx = np.zeros(x.shape, dtype=x.dtype)
 729: ------ ------
 730:  0.17%   1536         dx_inv_2 = np.dot(dx_inv, dx_inv.T)
 731:  0.05%   6144         for a in xrange(ndim):
 732:  0.19%  18432             for b in xrange(ndim):
 733:  0.11%  13824                 if a == b:
 734: ------ ------                     xab = (- 2*x
 735: ------ ------                            + np.roll(x, -1, axis=nv + a)
 736:  2.71%   4608                            + np.roll(x, 1, axis=nv + a))
 737: ------ ------                 else:
 738:  0.28%   9216                    xa = (np.roll(x, 1, axis=nv + a)
 739:  0.22%   9216                          - np.roll(x, -1, axis=nv + a))
 740:  0.28%   9216                    xab = (np.roll(xa, 1, axis=nv + b)
 741:  0.20%   9216                           - np.roll(xa, -1, axis=nv + b))/4
 742: ------ ------
 743: ------ ------                 # Transform to get Laplacian
 744:  2.98%  13824                 xab *= dx_inv_2[a,b]
 745:  2.72%  13824                 ddx += xab
 746: ------ ------
 747:  0.10%   1536         return ddx[inner]

This is better now and we are approaching the point where the roll operations are significant, so let’s try looking at these. Note that in order to implement the boundary conditions we have augmented the array and extract the inner entries upon return. Thus, we can afford to have errors in the outer dimensions. As such, we don’t really need a full roll, we just need to shift the array. Let’s try doing this with indexing (noting that we may have to undo the previous optimiziations!).

Note

Note, when making a major change like this one must run one’s unit tests. This was a tricky change and it took a few tries to get it right. I found some other subtle potential bugs in the process.

Here are the results:

:math:` python multigrid_profile.py test_1
Profiling function test_1...
Writing output to _profs/multigrid_test_1.prof
Done.
         282989 function calls (266829 primitive calls) in 7.208 CPU seconds

   Ordered by: internal time, call count
   List reduced from 69 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1536    3.708    0.002    4.356    0.003 multigrid_.py:646(T)
 9184/224    1.111    0.000    1.274    0.006 multigrid_.py:916(I)
 6624/224    0.739    0.000    0.903    0.004 multigrid_.py:800(R)
     1984    0.365    0.000    0.477    0.000 multigrid_.py:1053(augment)
   173958    0.231    0.000    0.231    0.000 index_tricks.py:478(__getitem__)
      640    0.190    0.000    2.941    0.005 multigrid_.py:503(S)

  646: ------ ------     def T(self, x, dx_inv=None):
     ...
  715:  0.02%   1536         ndim = self.d
  716:  0.03%   1536         v_shape = x.shape[:-ndim]   # Shape of vector part
  717:  0.02%   1536         nv = len(v_shape)           # Number of vector dimensions
  718:  0.04%   1536         shape = np.asarray(x.shape[-ndim:]) # Shape of grid part
  719: ------ ------
  720:  0.06%   1536         inner = np.s_[:,]*nv + np.s_[1:-1,]*ndim
  721: ------ ------
  722:  0.01%   1536         if dx_inv is None:
  723:  0.03%   1536            dx_inv = self.dx_inv(shape)
  724: ------ ------
  725:  0.72%   1536         ddx = np.zeros(x.shape, dtype=x.dtype)
  726: ------ ------
  727:  0.05%   1536         x = self.augment(x)
  728: ------ ------
  729:  0.24%   1536         dx_inv_2 = np.dot(dx_inv, dx_inv.T)
  730:  0.07%   6144         for a in xrange(ndim):
  731:  0.24%  18432             for b in xrange(ndim):
  732:  0.15%  13824                 if a == b:
  733: ------ ------                     # Index shifts
  734:  0.23%   4608                     a_left, a_right = list(inner), list(inner)
  735:  0.20%   4608                     a_left[nv + a] = np.s_[:-2,][0]
  736:  0.13%   4608                     a_right[nv + a] = np.s_[2:]
  737: ------ ------
  738: 12.03%   4608                     xab = x[a_left]  + x[a_right] - 2*x[inner]
  739: ------ ------                 else:
  740: ------ ------                     # Index shifts.  The b_shifts should leave the
  741: ------ ------                     # other indices alone as the a-shifts will extract
  742: ------ ------                     # these.  Likewise, the a-shifts should leave the
  743: ------ ------                     # index b alone.
  744:  0.49%   9216                     a_left, a_right = list(inner), list(inner)
  745:  0.40%   9216                     a_left[nv + a] = np.s_[:-2,][0]
  746:  0.24%   9216                     a_right[nv + a] = np.s_[2:]
  747:  1.04%   9216                     a_left[nv + b] = np.s_[:]
  748:  0.20%   9216                     a_right[nv + b] = np.s_[:]
  749: ------ ------
  750:  0.26%   9216                     b_left = list(len(a_left)*np.s_[:,])
  751:  0.25%   9216                     b_right = list(len(a_left)*np.s_[:,])
  752:  0.19%   9216                     b_left[nv + b] = np.s_[:-2,][0]
  753:  0.21%   9216                     b_right[nv + b] = np.s_[2:]
  754: ------ ------
  755: 10.21%   9216                     xa = (x[b_right] - x[b_left])
  756: 15.12%   9216                     xab = (xa[a_right] - xa[a_left])/4
  757: ------ ------
  758: ------ ------                 # Transform to get Laplacian
  759:  3.64%  13824                 xab *= dx_inv_2[a,b]
  760:  3.26%  13824                 ddx += xab
  761: ------ ------
  762:  0.01%   1536         return ddx

In order to compare various ways of doing the axpy operation B += a*A we added some special tests to the profiling code. In principle using the BLAS axpy operation should work well, but the slow part of the code is

def test_axpy0(A,B,inds,tmp,axpy):
    B = axpy(A[inds].ravel(), B, a=1.5)
    return B
def test_axpy1(A,B,inds,tmp,axpy):
    tmp[::] = A[inds]
    B = axpy(tmp.ravel(), B, a=1.5)
    return B
def test_axpy2(A,B,inds,tmp,axpy):
    tmp[::] = A[inds]
    tmp *= 1.5
    B += tmp
    return B
def test_axpy3(A,B,inds,tmp,axpy):
    B += 1.5*A[inds]
    return B
    

with the following results:

` python multigrid_profile.py None test_axpy
Profiling test_axpy() with profiler None...
Writing output to _profs/multigrid_test_axpy().prof
test_axpy0(A,B,inds,tmp,axpy) took 2.45949s
test_axpy1(A,B,inds,tmp,axpy) took 1.94218s
test_axpy2(A,B,inds,tmp,axpy) took 2.57161s
test_axpy3(A,B,inds,tmp,axpy) took 3.35713s
Done

We have a clear winner with the axpy operations after a copy.

Note

It is a bit tricky to use the axpy operation: you must make sure that everything is ravelled, and that the arrays are contiguous etc. It will work without, but the performance will be worse than the old code. One useful check is that the output array has the same base B:

C = axpy(A,B,...)
assert C.base is B

Here is the optimized code. We have almost increased the performance by a factor of 2:

$ python multigrid_profile.py test_1
Profiling test_1() with profiler mmf...
Writing output to _profs/multigrid_test_1().prof
Done
         289134 function calls (272974 primitive calls) in 5.983 CPU seconds

   Ordered by: internal time, call count
   List reduced from 71 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1536    2.396    0.002    3.170    0.002 multigrid_.py:648(T)
 9184/224    1.106    0.000    1.263    0.006 multigrid_.py:937(I)
 6624/224    0.737    0.000    0.906    0.004 multigrid_.py:821(R)
     1984    0.370    0.000    0.482    0.000 multigrid_.py:1074(augment)
   173958    0.229    0.000    0.229    0.000 index_tricks.py:478(__getitem__)
    62208    0.183    0.000    0.262    0.000 index_tricks.py:487(__getslice__)
      640    0.179    0.000    2.040    0.003 multigrid_.py:505(S)
    10795    0.160    0.000    0.160    0.000 numeric.py:180(asarray)
     1536    0.107    0.000    3.277    0.002 multigrid_.py:602(A)
        1    0.104    0.104    5.960    5.960 multigrid_.py:1417(solve)
     5952    0.069    0.000    0.069    0.000 numeric.py:943(rollaxis)
   240/80    0.061    0.000    5.227    0.065 multigrid_.py:1294(v_cycle)
     1536    0.050    0.000    0.050    0.000 __init__.py:22(get_blas_funcs)
     2259    0.042    0.000    0.086    0.000 fromnumeric.py:32(_wrapit)
     2176    0.038    0.000    0.045    0.000 multigrid_.py:419(dx_inv)
     2418    0.029    0.000    0.115    0.000 fromnumeric.py:1626(prod)
       80    0.020    0.000    0.027    0.000 linalg.py:1185(lstsq)
       80    0.015    0.000    0.959    0.012 multigrid_.py:1276(to_mat)
      640    0.010    0.000    0.012    0.000 fromnumeric.py:833(trace)
       16    0.009    0.001    5.687    0.355 multigrid_.py:1336(full_multigrid)


  717:  0.03%   1536         ndim = self.d
  718:  0.02%   1536         x_shape = x.shape
  719:  0.02%   1536         v_shape = x_shape[:-ndim]   # Shape of vector part
  720:  0.02%   1536         nv = len(v_shape)           # Number of vector dimensions
  721:  0.04%   1536         shape = np.asarray(x_shape[-ndim:]) # Shape of grid part
  722: ------ ------
  723:  0.07%   1536         inner = np.s_[:,]*nv + np.s_[1:-1,]*ndim
  724: ------ ------
  725:  0.01%   1536         if dx_inv is None:
  726:  0.04%   1536            dx_inv = self.dx_inv(shape)
  727: ------ ------
  728:  0.04%   1536         ddx = np.zeros(np.prod(x_shape), dtype=x.dtype)
  729:  0.18%   1536         tmp = np.empty(x_shape, dtype=float)
  730: ------ ------
  731:  0.05%   1536         axpy, = get_blas_funcs(['axpy'], [ddx, ddx])
  732: ------ ------         # axpy(A,B, A.size, a) computes B -> a*A + B
  733: ------ ------
  734:  0.11%   1536         x = self.augment(x)
  735: ------ ------
  736:  0.29%   1536         dx_inv_2 = np.dot(dx_inv, dx_inv.T)
  737: ------ ------
  738:  0.09%   6144         for a in xrange(ndim):
  739:  0.26%  18432             for b in xrange(ndim):
  740:  0.18%  13824                 if a == b:
  741: ------ ------                     # Index shifts
  742:  0.22%   4608                     al, ar = list(inner), list(inner)
  743:  0.19%   4608                     al[nv + a] = np.s_[:-2,][0]
  744:  0.14%   4608                     ar[nv + a] = np.s_[2:]
  745: ------ ------
  746:  2.59%   4608                     tmp[::] = x[al]
  747:  0.97%   4608                     ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b])
  748:  1.92%   4608                     tmp[::] = x[ar]
  749:  0.84%   4608                     ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b])
  750:  1.90%   4608                     tmp[::] = x[inner]
  751:  1.00%   4608                     ddx = axpy(tmp.ravel(), ddx, a=-2*dx_inv_2[a,b])
  752: ------ ------
  753: ------ ------                 else:
  754: ------ ------                     # Index shifts.  The b_shifts should leave the
  755: ------ ------                     # other indices alone as the a-shifts will extract
  756: ------ ------                     # these.  Likewise, the a-shifts should leave the
  757: ------ ------                     # index b alone.
  758:  0.29%   9216                     al_bl = list(inner)
  759:  0.37%   9216                     al_bl[nv + a] = np.s_[:-2,][0] # Bug in s_
  760:  0.21%   9216                     al_bl[nv + b] = np.s_[:-2,][0] # Bug in s_
  761: ------ ------
  762:  0.25%   9216                     al_br = list(inner)
  763:  0.23%   9216                     al_br[nv + a] = np.s_[:-2,][0] # Bug in s_
  764:  1.43%   9216                     al_br[nv + b] = np.s_[2:]
  765: ------ ------
  766:  0.25%   9216                     ar_bl = list(inner)
  767:  0.27%   9216                     ar_bl[nv + a] = np.s_[2:]
  768:  2.06%   9216                     ar_bl[nv + b] = np.s_[:-2,][0] # Bug in s_
  769: ------ ------
  770:  0.26%   9216                     ar_br = list(inner)
  771:  0.26%   9216                     ar_br[nv + a] = np.s_[2:]
  772:  0.24%   9216                     ar_br[nv + b] = np.s_[2:]
  773: ------ ------
  774:  3.92%   9216                     tmp[::] = x[ar_br]
  775:  1.22%   9216                     ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b]/4)
  776:  3.85%   9216                     tmp[::] = x[al_bl]
  777:  1.11%   9216                     ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b]/4)
  778:  3.81%   9216                     tmp[::] = x[ar_bl]
  779:  1.16%   9216                     ddx = axpy(tmp.ravel(), ddx, a=-dx_inv_2[a,b]/4)
  780:  3.80%   9216                     tmp[::] = x[al_br]
  781:  1.13%   9216                     ddx = axpy(tmp.ravel(), ddx, a=-dx_inv_2[a,b]/4)
  782: ------ ------
  783:  0.08%   1536         return ddx.reshape(x_shape)

The slow part is still the indexing to perform the shifts. It seems we could probably gain about a factor of 3 or so if we coded this directly in C++.

I considered cleaning up I and R, but they are kind of tricky and the current code is about as clean an symmetric as it can be, so I think I will leave it for now. Again, this could be pretty easily done in C++, so let’s wait for that.

Details

We profile with the following class that allows us to choose between the hotshot profiler, the cProfile profiler, and our own mmf.utils.mmf_profile profiler based on hotshot but including analysis of the line-event data.

The following class does the dirty work:

class Profile(object):
    def __init__(self, cmd):
        self.profiler = _profiler
        print("Profiling %s with profiler %s..." % (cmd, self.profiler))
        self.filename = "_profs/multigrid_%s.prof" % (cmd,)
        print("Writing output to %s" % (self.filename,))
        
        if 'cProfile' == self.profiler:
            self.prof = cProfile.runctx(cmd, globals(), locals(),
                                        self.filename)
        elif 'None' == self.profiler:
            exec(cmd, globals(), locals())
            self.prof = None
        else:
            if 'hotshot' == self.profiler:
                self.prof = hotshot.Profile(self.filename)
            elif 'mmf' == self.profiler:
                self.prof = mmf_profile.Profile(self.filename)

            self.prof.runctx(cmd, globals(), locals())
            self.prof.close()

        print("Done")

    def analyze(self):
        if 'hotshot' == self.profiler:
            self.stats = hotshot.stats.load(self.filename)
            self.data = None
        elif 'mmf' == self.profiler:
            self.stats, self.data = mmf_profile.load(self.filename)
        elif 'None' == self.profiler:
            return 
        else:
            self.stats = pstats.Stats(self.filename)
            self.data = None

        self.stats.strip_dirs()
        self.stats.sort_stats('time', 'calls')
        self.stats.print_stats(20)

        if self.data:
            files = self.data.annotate_files()
            for f in files:
                print("Line-data for file %s"%(f,))
                print(files[f])