===================================== Profiling :mod:`mmf.math.multigrid` ===================================== This is a log of the profiling process for :mod:`mmf.math.multigrid` version 2216. The file :file:`multigrid_profile.py` was put in a directory :file:`multigrid/profile` located with the :mod:`multigrid` module. The optimized version was committed as svn version 2220, then both of these were moved to the documentation directory for archival. The profile script was modified slightly along the way, so the output from the current version is not exactly the same as the recorded output. The detailed profiling code is shown at the bottom of this file. First we must define a problem to profile. The main application is a three-dimensional problem, so we start with that. .. literalinclude:: multigrid_profile.py :pyobject: test_1 Our first run gives the following output:: $ python multigrid_profile.py test_1 Profiling function test_1... Writing output to _profs/multigrid_test_1.prof Done. 249197 function calls (233037 primitive calls) in 10.630 CPU seconds Ordered by: internal time, call count List reduced from 71 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 1536 4.198 0.003 7.630 0.005 multigrid_.py:646(T) 46080 2.887 0.000 3.076 0.000 numeric.py:879(roll) 9184/224 1.052 0.000 1.388 0.006 multigrid_.py:864(I) 6624/224 0.700 0.000 0.829 0.004 multigrid_.py:748(R) 1984 0.324 0.000 0.403 0.000 multigrid_.py:1001(augment) 20736 0.273 0.000 0.287 0.000 index_tricks.py:487(__getslice__) 46080 0.189 0.000 0.189 0.000 numeric.py:232(asanyarray) 640 0.185 0.000 5.190 0.008 multigrid_.py:503(S) 723 0.152 0.000 0.168 0.000 fromnumeric.py:32(_wrapit) 9259 0.126 0.000 0.126 0.000 numeric.py:180(asarray) 1536 0.108 0.000 7.738 0.005 multigrid_.py:600(A) 1 0.106 0.106 10.606 10.606 multigrid_.py:1344(solve) 89478 0.066 0.000 0.066 0.000 index_tricks.py:478(__getitem__) 240/80 0.061 0.000 9.292 0.116 multigrid_.py:1221(v_cycle) 5952 0.040 0.000 0.040 0.000 numeric.py:943(rollaxis) 2176 0.033 0.000 0.039 0.000 multigrid_.py:417(dx_inv) 80 0.019 0.000 0.024 0.000 linalg.py:1185(lstsq) 80 0.014 0.000 1.016 0.013 multigrid_.py:1203(to_mat) 882 0.012 0.000 0.180 0.000 fromnumeric.py:1626(prod) 640 0.010 0.000 0.012 0.000 fromnumeric.py:833(trace) Line-data for file .../utils/mmf/math/multigrid/multigrid_.py ... 646: ------ ------ def T(self, x, dx_inv=None): 715: 0.01% 1536 ndim = self.d 716: 0.04% 1536 inner = np.s_[...,] + _inner*ndim 717: ------ ------ 718: 0.02% 1536 v_shape = x.shape[:-ndim] # Shape of vector part 719: 0.01% 1536 nv = len(v_shape) # Number of vector dimensions 720: 0.02% 1536 shape = np.asarray(x.shape[-ndim:]) # Shape of grid part 721: ------ ------ 722: ------ ------ 723: 0.48% 1536 ddx = np.zeros(x.shape, dtype=x.dtype) 724: ------ ------ 725: 0.01% 1536 if dx_inv is None: 726: 0.03% 1536 dx_inv = self.dx_inv(shape) 727: ------ ------ 728: 0.03% 1536 x = self.augment(x) 729: ------ ------ 730: 0.16% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T) 731: 0.05% 6144 for a in xrange(ndim): 732: 0.19% 18432 for b in xrange(ndim): 733: 0.10% 13824 if a == b: 734: ------ ------ xab = (- 2*x 735: ------ ------ + np.roll(x, -1, axis=nv + a) 736: 2.62% 4608 + np.roll(x, 1, axis=nv + a)) 737: ------ ------ else: 738: 0.26% 9216 xa = (np.roll(x, 1, axis=nv + a) 739: 0.21% 9216 - np.roll(x, -1, axis=nv + a)) 740: 0.26% 9216 xab = (np.roll(xa, 1, axis=nv + b) 741: 0.19% 9216 - np.roll(xa, -1, axis=nv + b))/4 742: ------ ------ 743: ------ ------ # Transform to get Laplacian 744: 10.90% 13824 ddx += xab[inner]*dx_inv_2[a,b] 745: ------ ------ 746: 0.01% 1536 return ddx Line-data for file /Library/.../numpy/core/numeric.py ... 879: ------ ------ def roll(a, shift, axis=None): ... 928: 0.43% 46080 a = asanyarray(a) 929: 0.26% 46080 if axis is None: 930: ------ ------ n = a.size 931: ------ ------ reshape = True 932: ------ ------ else: 933: 0.48% 46080 n = a.shape[axis] 934: 0.22% 46080 reshape = False 935: 0.29% 46080 shift %= n 936: 3.69% 46080 indexes = concatenate((arange(n-shift,n),arange(n-shift))) 937: 20.85% 46080 res = a.take(indexes, axis) 938: 0.24% 46080 if reshape: 939: ------ ------ return res.reshape(a.shape) 940: ------ ------ else: 941: 0.18% 46080 return res We see that most of the time is spent in `T` and `roll`. Using the lineevent data we can see the hotspots. Running once more with `n=5` gives demonstrates the same proportions:: $ python multigrid_profile.py test_1 5 Profiling function test_1... Writing output to _profs/multigrid_test_1.prof Done. 372066 function calls (347195 primitive calls) in 57.850 CPU seconds Ordered by: internal time, call count List reduced from 72 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 2210 28.526 0.013 48.318 0.022 multigrid_.py:646(T) 66300 18.209 0.000 18.531 0.000 numeric.py:879(roll) 13940/340 3.203 0.000 3.407 0.010 multigrid_.py:864(I) 10540/340 1.866 0.000 2.338 0.007 multigrid_.py:748(R) 2890 1.507 0.001 1.629 0.001 multigrid_.py:1001(augment) 1020 1.306 0.001 34.933 0.034 multigrid_.py:503(S) 1 0.897 0.897 57.689 57.689 multigrid_.py:1344(solve) 2210 0.740 0.000 49.058 0.022 multigrid_.py:600(A) 357/102 0.338 0.001 49.085 0.481 multigrid_.py:1221(v_cycle) 66300 0.323 0.000 0.323 0.000 numeric.py:232(asanyarray) 13747 0.198 0.000 0.198 0.000 numeric.py:180(asarray) 138046 0.104 0.000 0.104 0.000 index_tricks.py:478(__getitem__) 32130 0.078 0.000 0.101 0.000 index_tricks.py:487(__getslice__) 1 0.068 0.068 57.847 57.847 multigrid_profile.py:28(test_1) 8670 0.062 0.000 0.062 0.000 numeric.py:943(rollaxis) 3230 0.054 0.000 0.065 0.000 multigrid_.py:417(dx_inv) 2 0.053 0.027 0.054 0.027 multigrid_.py:434(x) 85 0.048 0.001 0.054 0.001 multigrid_.py:1412(_remove_constant) 17 0.034 0.002 50.641 2.979 multigrid_.py:1263(full_multigrid) 32 0.033 0.001 0.033 0.001 fromnumeric.py:1492(amax) We start with line 744 in :file:`multigrid_.py`:: 743: ------ ------ # Transform to get Laplacian 744: 10.90% 13824 ddx += xab[inner]*dx_inv_2[a,b] Perhaps the problem is that the operations of take and the multiplication etc. do not happen in place and force copies to be made. Noting that the array `xab` is never used again, we change this to the following using in-place operations and reprofile:: 743: ------ ------ # Transform to get Laplacian 744: 6.80% 13824 xab[inner] *= dx_inv_2[a,b] 745: 3.65% 13824 ddx += xab[inner] 6.80% + 3.65% = 10.45%: Not much improvement. This surprised me and demonstrates the important point that you must profile to determine performance issues! Perhaps it is the indexing that is taking a long time. We try a few other things for comparison. It looks like in-place operations on arrays with complex indexing is somewhat costly:: 743: ------ ------ # Transform to get Laplacian 744: 0.78% 13824 xab[inner] 745: 2.70% 13824 xab *= dx_inv_2[a,b] 746: 6.67% 13824 xab[inner] *= 1.0 747: 3.82% 13824 ddx += xab[inner] Here is the best I can do with these lines:: 743: ------ ------ # Transform to get Laplacian 744: 2.95% 13824 xab *= dx_inv_2[a,b] 745: 4.37% 13824 ddx += xab[inner] 2.95% + 4.37% = 7.32%: Not great but better than nothing. .. note:: Note that the timings are not completely accurate as they depend on CPU load etc. and vary from run to run. One can either average several runs, or just treat the results cautiously, acknowledging a fair amount of error. Make another profiling run if things look suspicious. We have done that here and included the most "consistent" runs. Another problem is that the indexing is done each step in the loop, which occurs 9 times. Let's try making `ddx` bigger and just extracting the result at the end. Here is the present set of timings:: $ python multigrid_profile.py test_1 Profiling function test_1... Writing output to _profs/multigrid_test_1.prof Done. 249197 function calls (233037 primitive calls) in 10.354 CPU seconds Ordered by: internal time, call count List reduced from 71 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 1536 3.898 0.003 7.428 0.005 multigrid_.py:646(T) 46080 2.909 0.000 3.111 0.000 numeric.py:879(roll) 9184/224 1.104 0.000 1.327 0.006 multigrid_.py:865(I) 6624/224 0.733 0.000 0.898 0.004 multigrid_.py:749(R) 1984 0.365 0.000 0.475 0.000 multigrid_.py:1002(augment) 723: 0.51% 1536 ddx = np.zeros(x.shape, dtype=x.dtype) ... 728: 0.03% 1536 x = self.augment(x) 729: ------ ------ 730: 0.16% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T) 731: 0.05% 6144 for a in xrange(ndim): 732: 0.18% 18432 for b in xrange(ndim): 733: 0.10% 13824 if a == b: 734: ------ ------ xab = (- 2*x 735: ------ ------ + np.roll(x, -1, axis=nv + a) 736: 2.70% 4608 + np.roll(x, 1, axis=nv + a)) 737: ------ ------ else: 738: 0.26% 9216 xa = (np.roll(x, 1, axis=nv + a) 739: 0.21% 9216 - np.roll(x, -1, axis=nv + a)) 740: 0.27% 9216 xab = (np.roll(xa, 1, axis=nv + b) 741: 0.20% 9216 - np.roll(xa, -1, axis=nv + b))/4 742: ------ ------ 743: ------ ------ # Transform to get Laplacian 744: 2.97% 13824 xab *= dx_inv_2[a,b] 745: 4.44% 13824 ddx += xab[inner] 746: ------ ------ 747: 0.01% 1536 return ddx followed by the new timings with a larger `ddx`:: $ python multigrid_profile.py test_1 ... ncalls tottime percall cumtime percall filename:lineno(function) 1536 3.735 0.002 7.263 0.005 multigrid_.py:646(T) 46080 2.915 0.000 3.112 0.000 numeric.py:879(roll) 9184/224 1.205 0.000 1.359 0.006 multigrid_.py:865(I) 6624/224 0.734 0.000 0.891 0.004 multigrid_.py:749(R) ... 726: 0.03% 1536 x = self.augment(x) 727: ------ ------ 728: 0.57% 1536 ddx = np.zeros(x.shape, dtype=x.dtype) 729: ------ ------ 730: 0.17% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T) 731: 0.05% 6144 for a in xrange(ndim): 732: 0.19% 18432 for b in xrange(ndim): 733: 0.11% 13824 if a == b: 734: ------ ------ xab = (- 2*x 735: ------ ------ + np.roll(x, -1, axis=nv + a) 736: 2.71% 4608 + np.roll(x, 1, axis=nv + a)) 737: ------ ------ else: 738: 0.28% 9216 xa = (np.roll(x, 1, axis=nv + a) 739: 0.22% 9216 - np.roll(x, -1, axis=nv + a)) 740: 0.28% 9216 xab = (np.roll(xa, 1, axis=nv + b) 741: 0.20% 9216 - np.roll(xa, -1, axis=nv + b))/4 742: ------ ------ 743: ------ ------ # Transform to get Laplacian 744: 2.98% 13824 xab *= dx_inv_2[a,b] 745: 2.72% 13824 ddx += xab 746: ------ ------ 747: 0.10% 1536 return ddx[inner] This is better now and we are approaching the point where the roll operations are significant, so let's try looking at these. Note that in order to implement the boundary conditions we have augmented the array and extract the inner entries upon return. Thus, we can afford to have errors in the outer dimensions. As such, we don't really need a full roll, we just need to shift the array. Let's try doing this with indexing (noting that we may have to undo the previous optimiziations!). .. note:: Note, when making a major change like this one must run one's unit tests. This was a tricky change and it took a few tries to get it right. I found some other subtle potential bugs in the process. Here are the results:: $ python multigrid_profile.py test_1 Profiling function test_1... Writing output to _profs/multigrid_test_1.prof Done. 282989 function calls (266829 primitive calls) in 7.208 CPU seconds Ordered by: internal time, call count List reduced from 69 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 1536 3.708 0.002 4.356 0.003 multigrid_.py:646(T) 9184/224 1.111 0.000 1.274 0.006 multigrid_.py:916(I) 6624/224 0.739 0.000 0.903 0.004 multigrid_.py:800(R) 1984 0.365 0.000 0.477 0.000 multigrid_.py:1053(augment) 173958 0.231 0.000 0.231 0.000 index_tricks.py:478(__getitem__) 640 0.190 0.000 2.941 0.005 multigrid_.py:503(S) 646: ------ ------ def T(self, x, dx_inv=None): ... 715: 0.02% 1536 ndim = self.d 716: 0.03% 1536 v_shape = x.shape[:-ndim] # Shape of vector part 717: 0.02% 1536 nv = len(v_shape) # Number of vector dimensions 718: 0.04% 1536 shape = np.asarray(x.shape[-ndim:]) # Shape of grid part 719: ------ ------ 720: 0.06% 1536 inner = np.s_[:,]*nv + np.s_[1:-1,]*ndim 721: ------ ------ 722: 0.01% 1536 if dx_inv is None: 723: 0.03% 1536 dx_inv = self.dx_inv(shape) 724: ------ ------ 725: 0.72% 1536 ddx = np.zeros(x.shape, dtype=x.dtype) 726: ------ ------ 727: 0.05% 1536 x = self.augment(x) 728: ------ ------ 729: 0.24% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T) 730: 0.07% 6144 for a in xrange(ndim): 731: 0.24% 18432 for b in xrange(ndim): 732: 0.15% 13824 if a == b: 733: ------ ------ # Index shifts 734: 0.23% 4608 a_left, a_right = list(inner), list(inner) 735: 0.20% 4608 a_left[nv + a] = np.s_[:-2,][0] 736: 0.13% 4608 a_right[nv + a] = np.s_[2:] 737: ------ ------ 738: 12.03% 4608 xab = x[a_left] + x[a_right] - 2*x[inner] 739: ------ ------ else: 740: ------ ------ # Index shifts. The b_shifts should leave the 741: ------ ------ # other indices alone as the a-shifts will extract 742: ------ ------ # these. Likewise, the a-shifts should leave the 743: ------ ------ # index b alone. 744: 0.49% 9216 a_left, a_right = list(inner), list(inner) 745: 0.40% 9216 a_left[nv + a] = np.s_[:-2,][0] 746: 0.24% 9216 a_right[nv + a] = np.s_[2:] 747: 1.04% 9216 a_left[nv + b] = np.s_[:] 748: 0.20% 9216 a_right[nv + b] = np.s_[:] 749: ------ ------ 750: 0.26% 9216 b_left = list(len(a_left)*np.s_[:,]) 751: 0.25% 9216 b_right = list(len(a_left)*np.s_[:,]) 752: 0.19% 9216 b_left[nv + b] = np.s_[:-2,][0] 753: 0.21% 9216 b_right[nv + b] = np.s_[2:] 754: ------ ------ 755: 10.21% 9216 xa = (x[b_right] - x[b_left]) 756: 15.12% 9216 xab = (xa[a_right] - xa[a_left])/4 757: ------ ------ 758: ------ ------ # Transform to get Laplacian 759: 3.64% 13824 xab *= dx_inv_2[a,b] 760: 3.26% 13824 ddx += xab 761: ------ ------ 762: 0.01% 1536 return ddx In order to compare various ways of doing the axpy operation `B += a*A` we added some special tests to the profiling code. In principle using the BLAS axpy operation should work well, but the slow part of the code is .. literalinclude:: multigrid_profile.py :pyobject: test_axpy0 .. literalinclude:: multigrid_profile.py :pyobject: test_axpy1 .. literalinclude:: multigrid_profile.py :pyobject: test_axpy2 .. literalinclude:: multigrid_profile.py :pyobject: test_axpy3 with the following results:: $ python multigrid_profile.py None test_axpy Profiling test_axpy() with profiler None... Writing output to _profs/multigrid_test_axpy().prof test_axpy0(A,B,inds,tmp,axpy) took 2.45949s test_axpy1(A,B,inds,tmp,axpy) took 1.94218s test_axpy2(A,B,inds,tmp,axpy) took 2.57161s test_axpy3(A,B,inds,tmp,axpy) took 3.35713s Done We have a clear winner with the axpy operations after a copy. .. note:: It is a bit tricky to use the axpy operation: you must make sure that everything is ravelled, and that the arrays are contiguous etc. It will work without, but the performance will be worse than the old code. One useful check is that the output array has the same base B:: C = axpy(A,B,...) assert C.base is B Here is the optimized code. We have almost increased the performance by a factor of 2:: $ python multigrid_profile.py test_1 Profiling test_1() with profiler mmf... Writing output to _profs/multigrid_test_1().prof Done 289134 function calls (272974 primitive calls) in 5.983 CPU seconds Ordered by: internal time, call count List reduced from 71 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 1536 2.396 0.002 3.170 0.002 multigrid_.py:648(T) 9184/224 1.106 0.000 1.263 0.006 multigrid_.py:937(I) 6624/224 0.737 0.000 0.906 0.004 multigrid_.py:821(R) 1984 0.370 0.000 0.482 0.000 multigrid_.py:1074(augment) 173958 0.229 0.000 0.229 0.000 index_tricks.py:478(__getitem__) 62208 0.183 0.000 0.262 0.000 index_tricks.py:487(__getslice__) 640 0.179 0.000 2.040 0.003 multigrid_.py:505(S) 10795 0.160 0.000 0.160 0.000 numeric.py:180(asarray) 1536 0.107 0.000 3.277 0.002 multigrid_.py:602(A) 1 0.104 0.104 5.960 5.960 multigrid_.py:1417(solve) 5952 0.069 0.000 0.069 0.000 numeric.py:943(rollaxis) 240/80 0.061 0.000 5.227 0.065 multigrid_.py:1294(v_cycle) 1536 0.050 0.000 0.050 0.000 __init__.py:22(get_blas_funcs) 2259 0.042 0.000 0.086 0.000 fromnumeric.py:32(_wrapit) 2176 0.038 0.000 0.045 0.000 multigrid_.py:419(dx_inv) 2418 0.029 0.000 0.115 0.000 fromnumeric.py:1626(prod) 80 0.020 0.000 0.027 0.000 linalg.py:1185(lstsq) 80 0.015 0.000 0.959 0.012 multigrid_.py:1276(to_mat) 640 0.010 0.000 0.012 0.000 fromnumeric.py:833(trace) 16 0.009 0.001 5.687 0.355 multigrid_.py:1336(full_multigrid) 717: 0.03% 1536 ndim = self.d 718: 0.02% 1536 x_shape = x.shape 719: 0.02% 1536 v_shape = x_shape[:-ndim] # Shape of vector part 720: 0.02% 1536 nv = len(v_shape) # Number of vector dimensions 721: 0.04% 1536 shape = np.asarray(x_shape[-ndim:]) # Shape of grid part 722: ------ ------ 723: 0.07% 1536 inner = np.s_[:,]*nv + np.s_[1:-1,]*ndim 724: ------ ------ 725: 0.01% 1536 if dx_inv is None: 726: 0.04% 1536 dx_inv = self.dx_inv(shape) 727: ------ ------ 728: 0.04% 1536 ddx = np.zeros(np.prod(x_shape), dtype=x.dtype) 729: 0.18% 1536 tmp = np.empty(x_shape, dtype=float) 730: ------ ------ 731: 0.05% 1536 axpy, = get_blas_funcs(['axpy'], [ddx, ddx]) 732: ------ ------ # axpy(A,B, A.size, a) computes B -> a*A + B 733: ------ ------ 734: 0.11% 1536 x = self.augment(x) 735: ------ ------ 736: 0.29% 1536 dx_inv_2 = np.dot(dx_inv, dx_inv.T) 737: ------ ------ 738: 0.09% 6144 for a in xrange(ndim): 739: 0.26% 18432 for b in xrange(ndim): 740: 0.18% 13824 if a == b: 741: ------ ------ # Index shifts 742: 0.22% 4608 al, ar = list(inner), list(inner) 743: 0.19% 4608 al[nv + a] = np.s_[:-2,][0] 744: 0.14% 4608 ar[nv + a] = np.s_[2:] 745: ------ ------ 746: 2.59% 4608 tmp[::] = x[al] 747: 0.97% 4608 ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b]) 748: 1.92% 4608 tmp[::] = x[ar] 749: 0.84% 4608 ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b]) 750: 1.90% 4608 tmp[::] = x[inner] 751: 1.00% 4608 ddx = axpy(tmp.ravel(), ddx, a=-2*dx_inv_2[a,b]) 752: ------ ------ 753: ------ ------ else: 754: ------ ------ # Index shifts. The b_shifts should leave the 755: ------ ------ # other indices alone as the a-shifts will extract 756: ------ ------ # these. Likewise, the a-shifts should leave the 757: ------ ------ # index b alone. 758: 0.29% 9216 al_bl = list(inner) 759: 0.37% 9216 al_bl[nv + a] = np.s_[:-2,][0] # Bug in s_ 760: 0.21% 9216 al_bl[nv + b] = np.s_[:-2,][0] # Bug in s_ 761: ------ ------ 762: 0.25% 9216 al_br = list(inner) 763: 0.23% 9216 al_br[nv + a] = np.s_[:-2,][0] # Bug in s_ 764: 1.43% 9216 al_br[nv + b] = np.s_[2:] 765: ------ ------ 766: 0.25% 9216 ar_bl = list(inner) 767: 0.27% 9216 ar_bl[nv + a] = np.s_[2:] 768: 2.06% 9216 ar_bl[nv + b] = np.s_[:-2,][0] # Bug in s_ 769: ------ ------ 770: 0.26% 9216 ar_br = list(inner) 771: 0.26% 9216 ar_br[nv + a] = np.s_[2:] 772: 0.24% 9216 ar_br[nv + b] = np.s_[2:] 773: ------ ------ 774: 3.92% 9216 tmp[::] = x[ar_br] 775: 1.22% 9216 ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b]/4) 776: 3.85% 9216 tmp[::] = x[al_bl] 777: 1.11% 9216 ddx = axpy(tmp.ravel(), ddx, a=dx_inv_2[a,b]/4) 778: 3.81% 9216 tmp[::] = x[ar_bl] 779: 1.16% 9216 ddx = axpy(tmp.ravel(), ddx, a=-dx_inv_2[a,b]/4) 780: 3.80% 9216 tmp[::] = x[al_br] 781: 1.13% 9216 ddx = axpy(tmp.ravel(), ddx, a=-dx_inv_2[a,b]/4) 782: ------ ------ 783: 0.08% 1536 return ddx.reshape(x_shape) The slow part is still the indexing to perform the shifts. It seems we could probably gain about a factor of 3 or so if we coded this directly in C++. I considered cleaning up I and R, but they are kind of tricky and the current code is about as clean an symmetric as it can be, so I think I will leave it for now. Again, this could be pretty easily done in C++, so let's wait for that. Details ======= We profile with the following class that allows us to choose between the :mod:`hotshot` profiler, the :mod:`cProfile` profiler, and our own :mod:`mmf.utils.mmf_profile` profiler based on :mod:`hotshot` but including analysis of the line-event data. The following class does the dirty work: .. literalinclude:: multigrid_profile.py :pyobject: Profile