32

Background: I think I might want to port some code that calculates matrix exponential-vector products using a Krylov subspace method from MATLAB to Python. (Specifically, Jitse Niesen's expmvp function, which uses an algorithm described in this paper.) However, I know that unless I make heavy use of functions from modules derived from compiled libraries (i.e., I only use raw Python, and not many built-in functions), then it could be quite slow.

Question: What tools or approaches are available to help me speed up code I write in Python for performance? In particular, I'm interested in tools that automate the process as much as possible, though general approaches are also welcome.

Note: I have an older version of Jitse's algorithm, and haven't used it in a while. It could be very easy to make this code fast, but I felt like it would make a good concrete example, and it is related to my own research. Debating my approach for implementing this particular algorithm in Python is another question entirely.

Geoff Oxberry
  • 30,394
  • 9
  • 64
  • 127

4 Answers4

41

I'm going to break up my answer into three parts. Profiling, speeding up the python code via c, and speeding up python via python. It is my view that Python has some of the best tools for looking at what your code's performance is then drilling down to the actual bottle necks. Speeding up code without profiling is about like trying to kill a deer with an uzi.

If you are really only interested in mat-vec products, I would recommend scipy.sparse.

Python tools for profiling

profile and cProfile modules: These modules will give you your standard run time analysis and function call stack. It is pretty nice to save their statistics and using the pstats module you can look at the data in a number of ways.

kernprof: this tool puts together many routines for doing things like line by line code timing

memory_profiler: this tool produces line by line memory foot print of your code.

IPython timers: The timeit function is quite nice for seeing the differences in functions in a quick interactive way.

Speeding up Python

Cython: cython is the quickest way to take a few functions in python and get faster code. You can decorate the function with the cython variant of python and it generates c code. This is very maintable and can also link to other hand written code in c/c++/fortran quite easily. It is by far the preferred tool today.

ctypes: ctypes will allow you to write your functions in c and then wrap them quickly with its simple decoration of the code. It handles all the pain of casting from PyObjects and managing the gil to call the c function.

Other approaches exist for writing your code in C but they are all somewhat more for taking a C/C++ library and wrapping it in Python.

Python-only approaches

If you want to stay inside Python mostly, my advice is to figure out what data you are using and picking correct data types for implementing your algorithms. It has been my experience that you will usually get much farther by optimizing your data structures then any low level c hack. For example:

numpy: a contingous array very fast for strided operations of arrays

numexpr: a numpy array expression optimizer. It allows for multithreading numpy array expressions and also gets rid of the numerous temporaries numpy makes because of restrictions of the Python interpreter.

blist: a b-tree implementation of a list, very fast for inserting, indexing, and moving the internal nodes of a list

pandas: data frames (or tables) very fast analytics on the arrays.

pytables: fast structured hierarchical tables (like hdf5), especially good for out of core calculations and queries to large data.

Glorfindel
  • 219
  • 1
  • 4
  • 11
aterrel
  • 3,644
  • 24
  • 26
  • 3
    You can use ctypes to call Fortran routines too. – Matthew Emmett Jun 12 '12 at 13:48
  • Yup, http://www.sagemath.org/doc/numerical_sage/ctypes.html – aterrel Jun 12 '12 at 13:51
  • Talking about wrapping code, what about f2py? – astrojuanlu Jun 12 '12 at 14:38
  • f2py is a great tool and used by many in the community. fwrap is a more recent replacement as f2py shows its age but its not really complete. – aterrel Jun 12 '12 at 18:27
  • Thanks! These are the types of resources I was looking for. I was only aware of some of them, and only in passing (or from looking at them on the Internet). Aron keeps mentioning numexpr. How does that work? Would that apply? – Geoff Oxberry Jun 12 '12 at 23:43
  • I added a blurb on numexpr. Its a nice library, I haven't used it extensively. – aterrel Jun 13 '12 at 05:09
  • As it seems, the last commit to fwrap has been made in late 2010. f2py seems more alive from this point of view. – AlexE Jun 13 '12 at 12:18
  • I'm not really a power Fortran user, so maybe. I know Kurt was working on FWrap more but his time got sucked away. It is also my understanding that f2py has only had cosmetic changes [0] since moving under NumPy. Really discussing the state of wrapping fortran is a different question.

    [0] https://github.com/numpy/numpy/blame/master/numpy/f2py/src/fortranobject.c

    – aterrel Jun 13 '12 at 12:29
  • I see that you haven't mentioned numba. Is this intentional? (not a rhetorical question; I am not a Python guy so it is an honest doubt). – Federico Poloni Dec 27 '14 at 22:22
  • At the time this was written, Numba was a very new product without many features. Now it is a much richer ecosystem and would recommend it (also since writing this I've become a full time employee at Continuum Analytics). – aterrel Dec 29 '14 at 04:37
7

First of all, if there is a C or Fortran implementation available (MATLAB MEX function?), why don't you write a Python wrapper?

If you want your own implementation not only a wrapper, I would strongly suggest to use the numpy module for linear algebra stuff. Make sure it is linked to an optimized blas (like ATLAS, GOTOblas, uBLAS, Intel MKL, ...). And use Cython or weave. Read this Performance Python article for a good introduction and benchmark. The different implementations in this article are available for download here courtesy of Travis Oliphant (Numpy-guru).

Good luck.

GertVdE
  • 6,179
  • 1
  • 21
  • 36
  • That Performance Python article seems a bit dated, it doesn't mention some of the newer tools available like numexpr. – Aron Ahmadia Jun 12 '12 at 07:28
  • I indeed overlooked numexpr. It would be nice to run the same laplace benchmark with numexpr... – GertVdE Jun 12 '12 at 08:46
  • Is scipy.weave still used and developed? It seems that the Performance Python article shows that it could be fast to use and gives a pretty good improvement on speed but I have rarely seen it mentioned outside of that article. – Ken Jun 12 '12 at 16:05
  • @Ken: scipy.weave is, as far as I know, no longer under active development. It is kept for backward compatibility but new projects are encouraged to use Cython. – GertVdE Jun 12 '12 at 18:13
  • For GotoBLAS and NumPy/SciPy, see http://www.der-schnorz.de/2012/06/optimized-linear-algebra-and-numpyscipy/ – AlexE Jun 13 '12 at 12:19
  • @GertVdE: There isn't a C or Fortran implementation available; it is not a MATLAB MEX function (as you can see by clicking the link). I have written a Python wrapper in the past, to call Python from Fortran (through C). It was a slow process, and memory profiling with Valgrind indicated that the Python interpreter did some strange things with memory. I would prefer not to repeat that experience until it becomes clear that the time investment would be worth it. – Geoff Oxberry Jun 13 '12 at 13:08
4

Basically I agree with the other answers. The best options for speedy numerical python-code are

  • Use specialized libraries like numpy
  • wrap your existing code so that your python-program can call it directly

But if you want to program the whole algorithm from scratch (I quote: "I only use raw Python") then you might want to consider http://pypy.org/ a JIT (Just In Time) implementation of python. I haven't been able to use it for my project (because that relies on numpyand the pypyguys are corrently working on supporting that) but the benchmarks are quite impressive (http://speed.pypy.org/)

bgschaid
  • 351
  • 1
  • 4
2

Some of the links above are outdated, hence look here:

http://wiki.scipy.org/PerformanceTips

http://wiki.scipy.org/PerformancePython

Some ideas:

Numpy, Numba, Cython, Numexpr, Theano, Tensorflow, f2py, CPython C API, pypy, cffi, Pythran, Nuitka, swig, boost.python

den.run.ai
  • 637
  • 1
  • 5
  • 13