This is a log of the profiling process for mmf.math.multigrid version 2216. The file multigrid_profile.py was put in a directory multigrid/profile located with the multigrid module. The optimized version was committed as svn version 2220, then both of these were moved to the documentation directory for archival. The profile script was modified slightly along the way, so the output from the current version is not exactly the same as the recorded output. The detailed profiling code is shown at the bottom of this file.
First we must define a problem to profile. The main application is a three-dimensional problem, so we start with that.
def test_1(n=4):
mg = mmf.math.multigrid.MultiGrid(n_0=2, n=n, d=3, L=2*np.pi,
boundary_conditions='periodic')
x,y,z = mg.x()
b = np.sin(x)*np.sin(y)*np.sin(z)
exact = b/mg.d
sol = mg.solve(b=b)
assert np.allclose(sol, exact, atol=0.0011)
Our first run gives the following output:
:math:` python multigrid_profile.py test_1
Profiling function test_1...
Writing output to _profs/multigrid_test_1.prof
Done.
249197 function calls (233037 primitive calls) in 10.630 CPU seconds
Ordered by: internal time, call count
List reduced from 71 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1536 4.198 0.003 7.630 0.005 multigrid_.py:646(T)
46080 2.887 0.000 3.076 0.000 numeric.py:879(roll)
9184/224 1.052 0.000 1.388 0.006 multigrid_.py:864(I)
6624/224 0.700 0.000 0.829 0.004 multigrid_.py:748(R)
1984 0.324 0.000 0.403 0.000 multigrid_.py:1001(augment)
20736 0.273 0.000 0.287 0.000 index_tricks.py:487(__getslice__)
46080 0.189 0.000 0.189 0.000 numeric.py:232(asanyarray)
640 0.185 0.000 5.190 0.008 multigrid_.py:503(S)
723 0.152 0.000 0.168 0.000 fromnumeric.py:32(_wrapit)
9259 0.126 0.000 0.126 0.000 numeric.py:180(asarray)
1536 0.108 0.000 7.738 0.005 multigrid_.py:600(A)
1 0.106 0.106 10.606 10.606 multigrid_.py:1344(solve)
89478 0.066 0.000 0.066 0.000 index_tricks.py:478(__getitem__)
240/80 0.061 0.000 9.292 0.116 multigrid_.py:1221(v_cycle)
5952 0.040 0.000 0.040 0.000 numeric.py:943(rollaxis)
2176 0.033 0.000 0.039 0.000 multigrid_.py:417(dx_inv)
80 0.019 0.000 0.024 0.000 linalg.py:1185(lstsq)
80 0.014 0.000 1.016 0.013 multigrid_.py:1203(to_mat)
882 0.012 0.000 0.180 0.000 fromnumeric.py:1626(prod)
640 0.010 0.000 0.012 0.000 fromnumeric.py:833(trace)
Line-data for file .../utils/mmf/math/multigrid/multigrid_.py
...
646: ------ ------ def T(self, x, dx_inv=None):
715: 0.01% 1536 ndim = self.d
716: 0.04% 1536 inner = np.s_[...,] + _inner*ndim
717: ------ ------
718: 0.02% 1536 v_shape = x.shape[:-ndim] # Shape of vector part
719: 0.01% 1536 nv = len(v_shape) # Number of vector dimensions
720: 0.02% 1536 shape = np.asarray(x.shape[-ndim:]) # Shape of grid part
721: ------ ------
722: ------ ------
723: 0.48% 1536 ddx = np.zeros(x.shape, dtype=x.dtype)
724: ------ ------
725: 0.01% 1536 if dx_inv is None:
726: 0.03% 1536 dx_inv = self.dx_inv(shape)
727: ------ ------
728: 0.03% 1536 x = self.augment(x)
729: ------ ------
730: 0.16% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T)
731: 0.05% 6144 for a in xrange(ndim):
732: 0.19% 18432 for b in xrange(ndim):
733: 0.10% 13824 if a == b:
734: ------ ------ xab = (- 2*x
735: ------ ------ + np.roll(x, -1, axis=nv + a)
736: 2.62% 4608 + np.roll(x, 1, axis=nv + a))
737: ------ ------ else:
738: 0.26% 9216 xa = (np.roll(x, 1, axis=nv + a)
739: 0.21% 9216 - np.roll(x, -1, axis=nv + a))
740: 0.26% 9216 xab = (np.roll(xa, 1, axis=nv + b)
741: 0.19% 9216 - np.roll(xa, -1, axis=nv + b))/4
742: ------ ------
743: ------ ------ # Transform to get Laplacian
744: 10.90% 13824 ddx += xab[inner]*dx_inv_2[a,b]
745: ------ ------
746: 0.01% 1536 return ddx
Line-data for file /Library/.../numpy/core/numeric.py
...
879: ------ ------ def roll(a, shift, axis=None):
...
928: 0.43% 46080 a = asanyarray(a)
929: 0.26% 46080 if axis is None:
930: ------ ------ n = a.size
931: ------ ------ reshape = True
932: ------ ------ else:
933: 0.48% 46080 n = a.shape[axis]
934: 0.22% 46080 reshape = False
935: 0.29% 46080 shift %= n
936: 3.69% 46080 indexes = concatenate((arange(n-shift,n),arange(n-shift)))
937: 20.85% 46080 res = a.take(indexes, axis)
938: 0.24% 46080 if reshape:
939: ------ ------ return res.reshape(a.shape)
940: ------ ------ else:
941: 0.18% 46080 return res
We see that most of the time is spent in T and roll. Using the lineevent data we can see the hotspots. Running once more with n=5 gives demonstrates the same proportions:
` python multigrid_profile.py test_1 5
Profiling function test_1...
Writing output to _profs/multigrid_test_1.prof
Done.
372066 function calls (347195 primitive calls) in 57.850 CPU seconds
Ordered by: internal time, call count
List reduced from 72 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
2210 28.526 0.013 48.318 0.022 multigrid_.py:646(T)
66300 18.209 0.000 18.531 0.000 numeric.py:879(roll)
13940/340 3.203 0.000 3.407 0.010 multigrid_.py:864(I)
10540/340 1.866 0.000 2.338 0.007 multigrid_.py:748(R)
2890 1.507 0.001 1.629 0.001 multigrid_.py:1001(augment)
1020 1.306 0.001 34.933 0.034 multigrid_.py:503(S)
1 0.897 0.897 57.689 57.689 multigrid_.py:1344(solve)
2210 0.740 0.000 49.058 0.022 multigrid_.py:600(A)
357/102 0.338 0.001 49.085 0.481 multigrid_.py:1221(v_cycle)
66300 0.323 0.000 0.323 0.000 numeric.py:232(asanyarray)
13747 0.198 0.000 0.198 0.000 numeric.py:180(asarray)
138046 0.104 0.000 0.104 0.000 index_tricks.py:478(__getitem__)
32130 0.078 0.000 0.101 0.000 index_tricks.py:487(__getslice__)
1 0.068 0.068 57.847 57.847 multigrid_profile.py:28(test_1)
8670 0.062 0.000 0.062 0.000 numeric.py:943(rollaxis)
3230 0.054 0.000 0.065 0.000 multigrid_.py:417(dx_inv)
2 0.053 0.027 0.054 0.027 multigrid_.py:434(x)
85 0.048 0.001 0.054 0.001 multigrid_.py:1412(_remove_constant)
17 0.034 0.002 50.641 2.979 multigrid_.py:1263(full_multigrid)
32 0.033 0.001 0.033 0.001 fromnumeric.py:1492(amax)
We start with line 744 in multigrid_.py:
743: ------ ------ # Transform to get Laplacian
744: 10.90% 13824 ddx += xab[inner]*dx_inv_2[a,b]
Perhaps the problem is that the operations of take and the multiplication etc. do not happen in place and force copies to be made. Noting that the array xab is never used again, we change this to the following using in-place operations and reprofile:
743: ------ ------ # Transform to get Laplacian
744: 6.80% 13824 xab[inner] *= dx_inv_2[a,b]
745: 3.65% 13824 ddx += xab[inner]
6.80% + 3.65% = 10.45%: Not much improvement. This surprised me and demonstrates the important point that you must profile to determine performance issues! Perhaps it is the indexing that is taking a long time. We try a few other things for comparison. It looks like in-place operations on arrays with complex indexing is somewhat costly:
743: ------ ------ # Transform to get Laplacian
744: 0.78% 13824 xab[inner]
745: 2.70% 13824 xab *= dx_inv_2[a,b]
746: 6.67% 13824 xab[inner] *= 1.0
747: 3.82% 13824 ddx += xab[inner]
Here is the best I can do with these lines:
743: ------ ------ # Transform to get Laplacian
744: 2.95% 13824 xab *= dx_inv_2[a,b]
745: 4.37% 13824 ddx += xab[inner]
2.95% + 4.37% = 7.32%: Not great but better than nothing.
Note
Note that the timings are not completely accurate as they depend on CPU load etc. and vary from run to run. One can either average several runs, or just treat the results cautiously, acknowledging a fair amount of error. Make another profiling run if things look suspicious. We have done that here and included the most “consistent” runs.
Another problem is that the indexing is done each step in the loop, which occurs 9 times. Let’s try making ddx bigger and just extracting the result at the end. Here is the present set of timings:
:math:` python multigrid_profile.py test_1
Profiling function test_1...
Writing output to _profs/multigrid_test_1.prof
Done.
249197 function calls (233037 primitive calls) in 10.354 CPU seconds
Ordered by: internal time, call count
List reduced from 71 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1536 3.898 0.003 7.428 0.005 multigrid_.py:646(T)
46080 2.909 0.000 3.111 0.000 numeric.py:879(roll)
9184/224 1.104 0.000 1.327 0.006 multigrid_.py:865(I)
6624/224 0.733 0.000 0.898 0.004 multigrid_.py:749(R)
1984 0.365 0.000 0.475 0.000 multigrid_.py:1002(augment)
723: 0.51% 1536 ddx = np.zeros(x.shape, dtype=x.dtype)
...
728: 0.03% 1536 x = self.augment(x)
729: ------ ------
730: 0.16% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T)
731: 0.05% 6144 for a in xrange(ndim):
732: 0.18% 18432 for b in xrange(ndim):
733: 0.10% 13824 if a == b:
734: ------ ------ xab = (- 2*x
735: ------ ------ + np.roll(x, -1, axis=nv + a)
736: 2.70% 4608 + np.roll(x, 1, axis=nv + a))
737: ------ ------ else:
738: 0.26% 9216 xa = (np.roll(x, 1, axis=nv + a)
739: 0.21% 9216 - np.roll(x, -1, axis=nv + a))
740: 0.27% 9216 xab = (np.roll(xa, 1, axis=nv + b)
741: 0.20% 9216 - np.roll(xa, -1, axis=nv + b))/4
742: ------ ------
743: ------ ------ # Transform to get Laplacian
744: 2.97% 13824 xab *= dx_inv_2[a,b]
745: 4.44% 13824 ddx += xab[inner]
746: ------ ------
747: 0.01% 1536 return ddx
followed by the new timings with a larger ddx:
` python multigrid_profile.py test_1
...
ncalls tottime percall cumtime percall filename:lineno(function)
1536 3.735 0.002 7.263 0.005 multigrid_.py:646(T)
46080 2.915 0.000 3.112 0.000 numeric.py:879(roll)
9184/224 1.205 0.000 1.359 0.006 multigrid_.py:865(I)
6624/224 0.734 0.000 0.891 0.004 multigrid_.py:749(R)
...
726: 0.03% 1536 x = self.augment(x)
727: ------ ------
728: 0.57% 1536 ddx = np.zeros(x.shape, dtype=x.dtype)
729: ------ ------
730: 0.17% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T)
731: 0.05% 6144 for a in xrange(ndim):
732: 0.19% 18432 for b in xrange(ndim):
733: 0.11% 13824 if a == b:
734: ------ ------ xab = (- 2*x
735: ------ ------ + np.roll(x, -1, axis=nv + a)
736: 2.71% 4608 + np.roll(x, 1, axis=nv + a))
737: ------ ------ else:
738: 0.28% 9216 xa = (np.roll(x, 1, axis=nv + a)
739: 0.22% 9216 - np.roll(x, -1, axis=nv + a))
740: 0.28% 9216 xab = (np.roll(xa, 1, axis=nv + b)
741: 0.20% 9216 - np.roll(xa, -1, axis=nv + b))/4
742: ------ ------
743: ------ ------ # Transform to get Laplacian
744: 2.98% 13824 xab *= dx_inv_2[a,b]
745: 2.72% 13824 ddx += xab
746: ------ ------
747: 0.10% 1536 return ddx[inner]
This is better now and we are approaching the point where the roll operations are significant, so let’s try looking at these. Note that in order to implement the boundary conditions we have augmented the array and extract the inner entries upon return. Thus, we can afford to have errors in the outer dimensions. As such, we don’t really need a full roll, we just need to shift the array. Let’s try doing this with indexing (noting that we may have to undo the previous optimiziations!).
Note
Note, when making a major change like this one must run one’s unit tests. This was a tricky change and it took a few tries to get it right. I found some other subtle potential bugs in the process.
Here are the results:
:math:` python multigrid_profile.py test_1
Profiling function test_1...
Writing output to _profs/multigrid_test_1.prof
Done.
282989 function calls (266829 primitive calls) in 7.208 CPU seconds
Ordered by: internal time, call count
List reduced from 69 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1536 3.708 0.002 4.356 0.003 multigrid_.py:646(T)
9184/224 1.111 0.000 1.274 0.006 multigrid_.py:916(I)
6624/224 0.739 0.000 0.903 0.004 multigrid_.py:800(R)
1984 0.365 0.000 0.477 0.000 multigrid_.py:1053(augment)
173958 0.231 0.000 0.231 0.000 index_tricks.py:478(__getitem__)
640 0.190 0.000 2.941 0.005 multigrid_.py:503(S)
646: ------ ------ def T(self, x, dx_inv=None):
...
715: 0.02% 1536 ndim = self.d
716: 0.03% 1536 v_shape = x.shape[:-ndim] # Shape of vector part
717: 0.02% 1536 nv = len(v_shape) # Number of vector dimensions
718: 0.04% 1536 shape = np.asarray(x.shape[-ndim:]) # Shape of grid part
719: ------ ------
720: 0.06% 1536 inner = np.s_[:,]*nv + np.s_[1:-1,]*ndim
721: ------ ------
722: 0.01% 1536 if dx_inv is None:
723: 0.03% 1536 dx_inv = self.dx_inv(shape)
724: ------ ------
725: 0.72% 1536 ddx = np.zeros(x.shape, dtype=x.dtype)
726: ------ ------
727: 0.05% 1536 x = self.augment(x)
728: ------ ------
729: 0.24% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T)
730: 0.07% 6144 for a in xrange(ndim):
731: 0.24% 18432 for b in xrange(ndim):
732: 0.15% 13824 if a == b:
733: ------ ------ # Index shifts
734: 0.23% 4608 a_left, a_right = list(inner), list(inner)
735: 0.20% 4608 a_left[nv + a] = np.s_[:-2,][0]
736: 0.13% 4608 a_right[nv + a] = np.s_[2:]
737: ------ ------
738: 12.03% 4608 xab = x[a_left] + x[a_right] - 2*x[inner]
739: ------ ------ else:
740: ------ ------ # Index shifts. The b_shifts should leave the
741: ------ ------ # other indices alone as the a-shifts will extract
742: ------ ------ # these. Likewise, the a-shifts should leave the
743: ------ ------ # index b alone.
744: 0.49% 9216 a_left, a_right = list(inner), list(inner)
745: 0.40% 9216 a_left[nv + a] = np.s_[:-2,][0]
746: 0.24% 9216 a_right[nv + a] = np.s_[2:]
747: 1.04% 9216 a_left[nv + b] = np.s_[:]
748: 0.20% 9216 a_right[nv + b] = np.s_[:]
749: ------ ------
750: 0.26% 9216 b_left = list(len(a_left)*np.s_[:,])
751: 0.25% 9216 b_right = list(len(a_left)*np.s_[:,])
752: 0.19% 9216 b_left[nv + b] = np.s_[:-2,][0]
753: 0.21% 9216 b_right[nv + b] = np.s_[2:]
754: ------ ------
755: 10.21% 9216 xa = (x[b_right] - x[b_left])
756: 15.12% 9216 xab = (xa[a_right] - xa[a_left])/4
757: ------ ------
758: ------ ------ # Transform to get Laplacian
759: 3.64% 13824 xab *= dx_inv_2[a,b]
760: 3.26% 13824 ddx += xab
761: ------ ------
762: 0.01% 1536 return ddx
In order to compare various ways of doing the axpy operation B += a*A we added some special tests to the profiling code. In principle using the BLAS axpy operation should work well, but the slow part of the code is
def test_axpy0(A,B,inds,tmp,axpy):
B = axpy(A[inds].ravel(), B, a=1.5)
return B
def test_axpy1(A,B,inds,tmp,axpy):
tmp[::] = A[inds]
B = axpy(tmp.ravel(), B, a=1.5)
return B
def test_axpy2(A,B,inds,tmp,axpy):
tmp[::] = A[inds]
tmp *= 1.5
B += tmp
return B
def test_axpy3(A,B,inds,tmp,axpy):
B += 1.5*A[inds]
return B
with the following results:
` python multigrid_profile.py None test_axpy
Profiling test_axpy() with profiler None...
Writing output to _profs/multigrid_test_axpy().prof
test_axpy0(A,B,inds,tmp,axpy) took 2.45949s
test_axpy1(A,B,inds,tmp,axpy) took 1.94218s
test_axpy2(A,B,inds,tmp,axpy) took 2.57161s
test_axpy3(A,B,inds,tmp,axpy) took 3.35713s
Done
We have a clear winner with the axpy operations after a copy.
Note
It is a bit tricky to use the axpy operation: you must make sure that everything is ravelled, and that the arrays are contiguous etc. It will work without, but the performance will be worse than the old code. One useful check is that the output array has the same base B:
C = axpy(A,B,...)
assert C.base is B
Here is the optimized code. We have almost increased the performance by a factor of 2:
$ python multigrid_profile.py test_1
Profiling test_1() with profiler mmf...
Writing output to _profs/multigrid_test_1().prof
Done
289134 function calls (272974 primitive calls) in 5.983 CPU seconds
Ordered by: internal time, call count
List reduced from 71 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1536 2.396 0.002 3.170 0.002 multigrid_.py:648(T)
9184/224 1.106 0.000 1.263 0.006 multigrid_.py:937(I)
6624/224 0.737 0.000 0.906 0.004 multigrid_.py:821(R)
1984 0.370 0.000 0.482 0.000 multigrid_.py:1074(augment)
173958 0.229 0.000 0.229 0.000 index_tricks.py:478(__getitem__)
62208 0.183 0.000 0.262 0.000 index_tricks.py:487(__getslice__)
640 0.179 0.000 2.040 0.003 multigrid_.py:505(S)
10795 0.160 0.000 0.160 0.000 numeric.py:180(asarray)
1536 0.107 0.000 3.277 0.002 multigrid_.py:602(A)
1 0.104 0.104 5.960 5.960 multigrid_.py:1417(solve)
5952 0.069 0.000 0.069 0.000 numeric.py:943(rollaxis)
240/80 0.061 0.000 5.227 0.065 multigrid_.py:1294(v_cycle)
1536 0.050 0.000 0.050 0.000 __init__.py:22(get_blas_funcs)
2259 0.042 0.000 0.086 0.000 fromnumeric.py:32(_wrapit)
2176 0.038 0.000 0.045 0.000 multigrid_.py:419(dx_inv)
2418 0.029 0.000 0.115 0.000 fromnumeric.py:1626(prod)
80 0.020 0.000 0.027 0.000 linalg.py:1185(lstsq)
80 0.015 0.000 0.959 0.012 multigrid_.py:1276(to_mat)
640 0.010 0.000 0.012 0.000 fromnumeric.py:833(trace)
16 0.009 0.001 5.687 0.355 multigrid_.py:1336(full_multigrid)
717: 0.03% 1536 ndim = self.d
718: 0.02% 1536 x_shape = x.shape
719: 0.02% 1536 v_shape = x_shape[:-ndim] # Shape of vector part
720: 0.02% 1536 nv = len(v_shape) # Number of vector dimensions
721: 0.04% 1536 shape = np.asarray(x_shape[-ndim:]) # Shape of grid part
722: ------ ------
723: 0.07% 1536 inner = np.s_[:,]*nv + np.s_[1:-1,]*ndim
724: ------ ------
725: 0.01% 1536 if dx_inv is None:
726: 0.04% 1536 dx_inv = self.dx_inv(shape)
727: ------ ------
728: 0.04% 1536 ddx = np.zeros(np.prod(x_shape), dtype=x.dtype)
729: 0.18% 1536 tmp = np.empty(x_shape, dtype=float)
730: ------ ------
731: 0.05% 1536 axpy, = get_blas_funcs(['axpy'], [ddx, ddx])
732: ------ ------ # axpy(A,B, A.size, a) computes B -> a*A + B
733: ------ ------
734: 0.11% 1536 x = self.augment(x)
735: ------ ------
736: 0.29% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T)
737: ------ ------
738: 0.09% 6144 for a in xrange(ndim):
739: 0.26% 18432 for b in xrange(ndim):
740: 0.18% 13824 if a == b:
741: ------ ------ # Index shifts
742: 0.22% 4608 al, ar = list(inner), list(inner)
743: 0.19% 4608 al[nv + a] = np.s_[:-2,][0]
744: 0.14% 4608 ar[nv + a] = np.s_[2:]
745: ------ ------
746: 2.59% 4608 tmp[::] = x[al]
747: 0.97% 4608 ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b])
748: 1.92% 4608 tmp[::] = x[ar]
749: 0.84% 4608 ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b])
750: 1.90% 4608 tmp[::] = x[inner]
751: 1.00% 4608 ddx = axpy(tmp.ravel(), ddx, a=-2*dx_inv_2[a,b])
752: ------ ------
753: ------ ------ else:
754: ------ ------ # Index shifts. The b_shifts should leave the
755: ------ ------ # other indices alone as the a-shifts will extract
756: ------ ------ # these. Likewise, the a-shifts should leave the
757: ------ ------ # index b alone.
758: 0.29% 9216 al_bl = list(inner)
759: 0.37% 9216 al_bl[nv + a] = np.s_[:-2,][0] # Bug in s_
760: 0.21% 9216 al_bl[nv + b] = np.s_[:-2,][0] # Bug in s_
761: ------ ------
762: 0.25% 9216 al_br = list(inner)
763: 0.23% 9216 al_br[nv + a] = np.s_[:-2,][0] # Bug in s_
764: 1.43% 9216 al_br[nv + b] = np.s_[2:]
765: ------ ------
766: 0.25% 9216 ar_bl = list(inner)
767: 0.27% 9216 ar_bl[nv + a] = np.s_[2:]
768: 2.06% 9216 ar_bl[nv + b] = np.s_[:-2,][0] # Bug in s_
769: ------ ------
770: 0.26% 9216 ar_br = list(inner)
771: 0.26% 9216 ar_br[nv + a] = np.s_[2:]
772: 0.24% 9216 ar_br[nv + b] = np.s_[2:]
773: ------ ------
774: 3.92% 9216 tmp[::] = x[ar_br]
775: 1.22% 9216 ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b]/4)
776: 3.85% 9216 tmp[::] = x[al_bl]
777: 1.11% 9216 ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b]/4)
778: 3.81% 9216 tmp[::] = x[ar_bl]
779: 1.16% 9216 ddx = axpy(tmp.ravel(), ddx, a=-dx_inv_2[a,b]/4)
780: 3.80% 9216 tmp[::] = x[al_br]
781: 1.13% 9216 ddx = axpy(tmp.ravel(), ddx, a=-dx_inv_2[a,b]/4)
782: ------ ------
783: 0.08% 1536 return ddx.reshape(x_shape)
The slow part is still the indexing to perform the shifts. It seems we could probably gain about a factor of 3 or so if we coded this directly in C++.
I considered cleaning up I and R, but they are kind of tricky and the current code is about as clean an symmetric as it can be, so I think I will leave it for now. Again, this could be pretty easily done in C++, so let’s wait for that.
We profile with the following class that allows us to choose between the hotshot profiler, the cProfile profiler, and our own mmf.utils.mmf_profile profiler based on hotshot but including analysis of the line-event data.
The following class does the dirty work:
class Profile(object):
def __init__(self, cmd):
self.profiler = _profiler
print("Profiling %s with profiler %s..." % (cmd, self.profiler))
self.filename = "_profs/multigrid_%s.prof" % (cmd,)
print("Writing output to %s" % (self.filename,))
if 'cProfile' == self.profiler:
self.prof = cProfile.runctx(cmd, globals(), locals(),
self.filename)
elif 'None' == self.profiler:
exec(cmd, globals(), locals())
self.prof = None
else:
if 'hotshot' == self.profiler:
self.prof = hotshot.Profile(self.filename)
elif 'mmf' == self.profiler:
self.prof = mmf_profile.Profile(self.filename)
self.prof.runctx(cmd, globals(), locals())
self.prof.close()
print("Done")
def analyze(self):
if 'hotshot' == self.profiler:
self.stats = hotshot.stats.load(self.filename)
self.data = None
elif 'mmf' == self.profiler:
self.stats, self.data = mmf_profile.load(self.filename)
elif 'None' == self.profiler:
return
else:
self.stats = pstats.Stats(self.filename)
self.data = None
self.stats.strip_dirs()
self.stats.sort_stats('time', 'calls')
self.stats.print_stats(20)
if self.data:
files = self.data.annotate_files()
for f in files:
print("Line-data for file %s"%(f,))
print(files[f])