Why is Python slow? A lot of blame lies with the interpreter's ponderous data representation. Whenever you create anything in Python, be it a lowly number, a dictionary, or some user-defined object (i.e. a CatSeekingMissileFactoryFactory), under the hood Python represents these things as a big fat structure.
Why can't an integer just be represented as, well...an integer? Being a dynamic language, Python lacks any external information about the types of the objects it encounters: how is it to know whether one blob of 64 bits represents an integer, a float, and or a heap allocated object? At the very minimum, a type tag must be tacked onto each value. Furthermore, since the Python interpreter uses reference counting for garbage collection, each object is tasked with remembering how many other objects refer to it. Throw in a few other pressing exigencies and you end with objects in Python that are significantly larger than an equivalent piece of data from a compiled language. To actually get useful work done the Python interpreter has to perform a delicate dance of wasteful indirection: checking tags here, calling some unknown function pointer there and (finally!) pulling data from the heap into registers.
The problem is not that Guido van Rossum doesn't care about performance or that Python is badly written! The internals of the interpreter are thoughtfully designed and optimized to eke out speed gains wherever the opportunity presents itself. However, Python's unholy marriage to its object representation seems to make the significant and enduring performance gap between Python and many other languages inescapable.
Didn't PyPy already solve this problem?
So, why not cast off the yoke of PyObjects? PyPy has already shown that, if you give a JIT compiler the freedom to lay things out however it wants, Python can be made a lot faster. However, all that speed comes at a terrible cost: the loss of compatibility with extension modules that rely on the old Python C API. PyPy is allowed to think of an int as just some bits in a register but your library still expects a big struct, with a type tag and refcount and all. Despite many efforts by the PyPy team to help the rest of us slowpokes transition to their otherwise very impressive system, all the libraries which expect PyObjects are too important to be abandoned (and often too large to be rewritten).
Can we do anything to make Python faster without having to give up libraries like SciPy?
How about a faster interpreter?
Earlier this year, my officemate Russell and I got it into our heads that CPython hadn't reached quite far enough into the bag of interpreter implementation tricks.
- Why does Python use a stack-based virtual machine when a register-based machine might have lower dispatch overhead?
- Why does CPython only perform peephole optimizations? Why not use simple dataflow analyses?
- Attribute lookups are very repetitive, would some sort of runtime hints/feedback be useful for cutting down on hash function evaluations?
- Why not use bit-tagging to store integers directly inside in PyObject pointers? It's a common technique used in the implementation of other high level languages. Perhaps the Python developers shouldn't have rejected it?
- Call frames in Python seem a bit bloated and take a long time to set up, can we make function calls cheaper?
If you are interested in trying Falcon, you can get it on github. Let me know how it works for you!