[ACCEPTED]-Speeding Up Python-performance

Accepted answer
Score: 42

Regarding "Secondly: When writing a 18 program from scratch in python, what are 17 some good ways to greatly improve performance?"

Remember 16 the Jackson rules of optimization:

  • Rule 1: Don't do it.
  • Rule 2 (for experts only): Don't do it yet.

And the 15 Knuth rule:

  • "Premature optimization is the root of all evil."

The more useful rules are in 14 the General Rules for Optimization.

  1. Don't optimize as you go. First get 13 it right. Then get it fast. Optimizing 12 a wrong program is still wrong.

  2. Remember 11 the 80/20 rule.

  3. Always run "before" and 10 "after" benchmarks. Otherwise, you 9 won't know if you've found the 80%.

  4. Use the 8 right algorithms and data structures. This 7 rule should be first. Nothing matters as 6 much as algorithm and data structure.

Bottom Line

You 5 can't prevent or avoid the "optimize 4 this program" effort. It's part of 3 the job. You have to plan for it and do 2 it carefully, just like the design, code 1 and test activities.

Score: 28

Rather than just punting to C, I'd suggest:

Make 4 your code count. Do more with fewer executions 3 of lines:

  • Change the algorithm to a faster one. It doesn't need to be fancy to be faster in many cases.
  • Use python primitives that happens to be written in C. Some things will force an interpreter dispatch where some wont. The latter is preferable
  • Beware of code that first constructs a big data structure followed by its consumation. Think the difference between range and xrange. In general it is often worth thinking about memory usage of the program. Using generators can sometimes bring O(n) memory use down to O(1).
  • Python is generally non-optimizing. Hoist invariant code out of loops, eliminate common subexpressions where possible in tight loops.
  • If something is expensive, then precompute or memoize it. Regular expressions can be compiled for instance.
  • Need to crunch numbers? You might want to check numpy out.
  • Many python programs are slow because they are bound by disk I/O or database access. Make sure you have something worthwhile to do while you wait on the data to arrive rather than just blocking. A weapon could be something like the Twisted framework.
  • Note that many crucial data-processing libraries have C-versions, be it XML, JSON or whatnot. They are often considerably faster than the Python interpreter.

If all of the above fails for profiled 2 and measured code, then begin thinking about 1 the C-rewrite path.

Score: 24

The usual suspects -- profile it, find the 32 most expensive line, figure out what it's 31 doing, fix it. If you haven't done much 30 profiling before, there could be some big 29 fat quadratic loops or string duplication 28 hiding behind otherwise innocuous-looking 27 expressions.

In Python, two of the most common 26 causes I've found for non-obvious slowdown 25 are string concatenation and generators. Since 24 Python's strings are immutable, doing something 23 like this:

result = u""
for item in my_list:
    result += unicode (item)

will copy the entire string twice per 22 iteration. This has been well-covered, and 21 the solution is to use "".join:

result = "".join (unicode (item) for item in my_list)

Generators are another 20 culprit. They're very easy to use and can 19 simplify some tasks enormously, but a poorly-applied 18 generator will be much slower than simply 17 appending items to a list and returning 16 the list.

Finally, don't be afraid to rewrite bits in C! Python, as a dynamic 15 high-level language, is simply not capable 14 of matching C's speed. If there's one function 13 that you can't optimize any more in Python, consider 12 extracting it to an extension module.

My 11 favorite technique for this is to maintain 10 both Python and C versions of a module. The 9 Python version is written to be as clear 8 and obvious as possible -- any bugs should 7 be easy to diagnose and fix. Write your 6 tests against this module. Then write the 5 C version, and test it. Its behavior should 4 in all cases equal that of the Python implementation 3 -- if they differ, it should be very easy 2 to figure out which is wrong and correct 1 the problem.

Score: 17

First thing that comes to mind: psyco. It runs 9 only on x86, for the time being.

Then, constant binding. That 8 is, make all global references (and global.attr, global.attr.attr…) be 7 local names inside of functions and methods. This 6 isn't always successful, but in general 5 it works. It can be done by hand, but obviously 4 is tedious.

You said apart from in-code optimization, so 3 I won't delve into this, but keep your mind 2 open for typical mistakes (for i in range(10000000) comes to mind) that 1 people do.

Score: 9

Cython and pyrex can be used to generate 8 c code using a python-like syntax. Psyco 7 is also fantastic for appropriate projects 6 (sometimes you'll not notice much speed 5 boost, sometimes it'll be as much as 50x 4 as fast). I still reckon the best way is 3 to profile your code (cProfile, etc.) and 2 then just code the bottlenecks as c functions 1 for python.

Score: 7

I'm surprised no one mentioned ShedSkin: http://code.google.com/p/shedskin/, it 5 automagically converts your python program 4 to C++ and in some benchmarks yields better 3 improvements than psyco in speed.

Plus 2 anecdotal stories on the simplicity: http://pyinsci.blogspot.com/2006/12/trying-out-latest-release-of-shedskin.html

There 1 are limitations though, please see: http://tinyurl.com/shedskin-limitations

Score: 5

I hope you've read: http://wiki.python.org/moin/PythonSpeed/PerformanceTips

Resuming what's already 1 there are usualy 3 principles:

  • write code that gets transformed in better bytecode, like, use locals, avoid unnecessary lookups/calls, use idiomatic constructs (if there's natural syntax for what you want, use it - usually faster. eg: don't do: "for key in some_dict.keys()", do "for key in some_dict")
  • whatever is written in C is considerably faster, abuse whatever C functions/modules you have available
  • when in doubt, import timeit, profile
Score: 4

This won't necessarily speed up any of your 11 code, but is critical knowledge when programming 10 in Python if you want to avoid slowing your 9 code down. The "Global Interpreter 8 Lock" (GIL), has the potential to drastically 7 reduce the speed of your multi-threaded 6 program if its behavior is not understood 5 (yes, this bit me ... I had a nice 4 processor 4 machine that wouldn't use more than 1.2 3 processors at a time). There's an introductory 2 article with some links to get you started 1 at SmoothSpan.

Score: 4

Run your app through the Python profiler. Find 2 a serious bottleneck. Rewrite that bottleneck 1 in C. Repeat.

Score: 4

People have given some good advice, but 27 you have to be aware that when high performance 26 is needed, the python model is: punt to 25 c. Efforts like psyco may in the future 24 help a bit, but python just isn't a fast 23 language, and it isn't designed to be. Very 22 few languages have the ability to do the 21 dynamic stuff really well and still generate 20 very fast code; at least for the forseeable 19 future (and some of the design works against 18 fast compilation) that will be the case.

So, if 17 you really find yourself in this bind, your 16 best bet will be to isolate the parts of 15 your system that are unacceptable slow in 14 (good) python, and design around the idea 13 that you'll rewrite those bits in C. Sorry. Good 12 design can help make this less painful. Prototype 11 it in python first though, then you've easily 10 got a sanity check on your c, as well.

This 9 works well enough for things like numpy, after 8 all. I can't emphasize enough how much 7 good design will help you though. If you 6 just iteratively poke at your python bits 5 and replace the slowest ones with C, you 4 may end up with a big mess. Think about 3 exactly where the C bits are needed, and 2 how they can be minimized and encapsulated 1 sensibly.

Score: 4

It's often possible to achieve near-C speeds 7 (close enough for any project using Python 6 in the first place!) by replacing explicit 5 algorithms written out longhand in Python 4 with an implicit algorithm using a built-in 3 Python call. This works because most Python 2 built-ins are written in C anyway. Well, in 1 CPython of course ;-) https://www.python.org/doc/essays/list2str/

Score: 3

Just a note on using psyco: In some cases 7 it can actually produce slower run-times. Especially 6 when trying to use psyco with code that 5 was written in C. I can't remember the the 4 article I read this, but the map() and reduce() functions 3 were mentioned specifically. Luckily you 2 can tell psyco not to handle specified functions 1 and/or modules.

Score: 3

This is the procedure that I try to follow:

  • import psyco; psyco.full()
  • If it's not fast enough, run the code through a profiler, see where the bottlenecks are. (DISABLE psyco for this step!)
  • Try to do things such as other people have mentioned to get the code at those bottlenecks as fast as possible.
    • Stuff like [str(x) for x in l] or [x.strip() for x in l] is much, much slower than map(str, x) or map(str.strip, x).
  • After this, if I still need more speed, it's actually really easy to get PyRex up and running. I first copy a section of python code, put it directly in the pyrex code, and see what happens. Then I twiddle with it until it gets faster and faster.


Score: 2

The canonical reference to how to improve 5 Python code is here: PerformanceTips. I'd recommend against 4 optimizing in C unless you really need to 3 though. For most applications, you can 2 get the performance you need by following 1 the rules posted in that link.

Score: 1

If using psyco, I'd recommend psyco.profile() instead of 6 psyco.full(). For a larger project it will be smarter 5 about the functions that got optimized and 4 use a ton less memory.

I would also recommend 3 looking at iterators and generators. If 2 your application is using large data sets 1 this will save you many copies of containers.

Score: 1

Besides the (great) psyco and the (nice) shedskin, I'd 5 recommend trying cython a great fork of pyrex.

Or, if 4 you are not in a hurry, I recommend to just 3 wait. Newer python virtual machines are 2 coming, and unladen-swallow will find its way into the 1 mainstream.

Score: 0

A couple of ways to speed up Python code 2 were introduced after this question was 1 asked:

  • Pypy has a JIT-compiler, which makes it a lot faster for CPU-bound code.
  • Pypy is written in Rpython, a subset of Python that compiles to native code, leveraging the LLVM tool-chain.
Score: 0

For an established project I feel the main 3 performance gain will be from making use 2 of python internal lib as much as possible.

Some 1 tips are here: http://blog.hackerearth.com/faster-python-code

Score: 0

There is also Python → 11l → C++ transpiler, which 1 can be downloaded from here.

More Related questions