Thursday, January 24, 2013

Just-in-time compilers for number crunching in Python

Python is an extremely popular language for number crunching and data analysis. For someone who isn't familiar with the scientific Python ecosystem this might be surprising, since Python is actually orders of magnitude slower at simple numerical operations than most lower level languages. If you need to do some repetitive arithmetic on a large collection of numbers, then ideally those numbers would be stored contiguously in memory, loaded into registers in small groups, and acted upon by a small set of machine instructions. The Python "interpreter" (actually a stack-based virtual machine), however, uses a very bulky object representation. Furthermore, Python's dynamicism introduces a lot of indirection around simple operations like getting the value of a field or multiplying two numbers. Every time you run some innocuous looking code such as x[i] = math.sqrt(y[i] * z.imag) a shocking host of dictionary look-ups, allocations, and all-around wasteful computations kick into gear.

The trick, then, to getting good numerical performance from Python is to avoid really doing your work in Python. Instead, you should use Python's remarkable capacity as a glue language to coordinate calls between highly optimized lower-level numerical libraries. This is why a certain handsome astrophysicist called Python "the engine of modern science". NumPy plays an extremely important role in enabling this sticky style of programming by providing a high-level Pythonic interface to an unboxed array that can be passed easily into precompiled C and Fortran libraries.

In order to benefit from NumPy and its vast ecosystem, your algorithm must spend most of its time performing some common operation for which someone has already written an efficient library. Want to multiply some matrices? Great news, calling BLAS from Python isn't really much slower than doing it from C. Need to perform a large convolution? No problem, just hop on over to the frequency domain with a call to the always zippy FFTW.

But disaster and tribulation: What if no one has yet written a library that does the heavy lifting that I need? The standard solutions all boil down to "implement the bottleneck in C" (or if you're feeling enlightened, Cython).

Is a different way possible? Must we sacrifice all our abstractions to get performance? Even if we give up all the niceties of Python, we'll still probably churn out some fairly naive native code that woefully underutilizes our computers' capabilities. Think of all those pitifully empty vector registers, despondently idle extra cores, and a swarm of GPU shader units which haven't seen a general purpose computation in weeks. Harnessing all that parallelism from a low-level language requires, for most tasks, a heroic effort.

A worthy challenge is then issued in two parts:

  • Find a way to accelerate a meaningfully expressive subset of Python, such that it's possible to still use convenient abstractions without a large runtime cost. This generally implies a just-in-time compiler of some sort (though a few notable exceptions do compile Python statically). 
  • As long as we're dynamically translating high level abstractions into low-level executables, is there any chance that the "high level"-ness could be useful for parallelization?  It sure would be nice to use those other cores...

To be clear, I am not talking about speeding up all of Python, though some very smart and praiseworthy folks have been working on that for a while. Rather, the great thing about many numerically intensive algorithms is that they are remarkably simple. You might get away with using some subset of Python for implementing the core of your computation, and still feel like you are coding at a high level of abstraction (so long as the boundary between the numerical language subset and the rest of Python is mostly seamless).

A surprisingly large number of projects have already risen to meet this challenge. They (roughly) fall onto a spectrum which trades off between the freedom of the compiler to dramatically rewrite/optimize your code and the expressiveness of the sub-language that is exposed to the programmer.

  • NumPyPy - an attempt to reimplement all of NumPy in Python and then let PyPy do its meta-tracing magic. It seems to me that the all-or-nothing nature of PyPy's uncooperativeness with existing NumPy libraries makes this a Utopian misadventure in code duplication, requiring the reimplementation of a huge scientific computing code base with faint hope that a largely opaque general-purpose JIT can play the role of an optimizing compiler for scientific programs. Hopefully fijal will prove us detractors wrong.
  • Numba - one the several cool projects Travis Oliphant has been cooking up since he started Continuum Analytics. For the most part, Numba's main purpose is to unbox numeric values and make looping fast in Python. It's still a work in progress and seems to be going in multiple directions at once. They're adding support for general-purpose Python constructs, but relying on the traditional Python runtime to implement anything non-numeric, which sequentializes their runtime due to the Global Interpreter Lock. To enable parallelism you can disavow using any constructs that rely on things Numba doesn't compile directly...but that requires that you know what those constructs are. Like I said, it's still evolving. The commercial version of Numba even touts some capacity for targeting GPUs, but I haven't used it and don't know what can actually get parallelized.    
  • Blaze - another Travis Oliphant creation, though this one is even more ambitious than Numba. Whereas NumPy is a good abstraction for dense in-memory arrays with varying layouts, Blaze is intended to work with more complex data types and "is designed to handle out-of-core computations on large datasets that exceed the system memory capacity, as well as on distributed and streaming data". Travis is billing Blaze as the successor to NumPy. The underlying abstractions are to a large degree inspired by the Haskell library Repa 3, which is very cool and worth reading about. One key difference between Blaze and NumPy (aside from the much richer array type) is that Blaze delays array computations and then compiles them on-demand. I get the sense that Blaze is pretty far off from being ready for the masses, but I'm sure it will be Awesome Upon Arrival.  
  • Copperhead - Copperhead takes the direct route to parallelism by forcing you to write your code using data parallel operators which have clear compilation schemes onto multicore and GPU targets. To further simplify the compiler's job, Copperhead forces your code to be purely functional, which goes far against the grain of idiomatic Python. In exchange for these semantics handcuffs, you get some pretty speedy parallel programs. Unfortuantely, the author Bryan Catanzaro has disappeared from github, so I'm not sure if Copperhead is still being developed.  
  • Theano - Theano is both more cumbersome and more honest than projects like Numba or Copperhead, which take code that looks like Python but then execute it under different assumptions/semantics. With Theano, on the other hand, you have to explicitly piece together symbolic expressions representing your computation. You're always aware that you're constructing Theano syntax explicitly. In exchange for your effort though, Theano can work small feats of magic. For example, Theano can group and reorganize matrix multiplications, reorder floating point operations for stability, and compute gradients using automatic differentiation. Their backend has some preliminary support for CUDA and should eventually add in multi-core and SIMD code generation. 

Foot-in-mouth edit: I put PyPy all the way on the hopelessly sequential left of the diagram, just as they announced a new position to parallelize and vectorize their JIT. Also, fijal justifiably took offense at my description of NumPyPy. I was in saying that NumPyPy is a whole-ecosystem rewrite, they're only going to rewrite the core and are still figuring out the right way to interact with native libraries.


To add another compiler-critter into the fray, I've written Parakeet, a just-in-time compiler for numerical Python which specializes functions for given input types. Parakeet makes extensive use of the data parallel operators such as map, reduce, and (prefix) scan. It's not essential to use these operators when programming with Parakeet, but they do enable parallelism and more aggressive optimizations. Luckily, it's quite easy to end up using these operators by accident, since our library functions are implemented on top of them.

(edited to present Parakeet less sheepishly)
On the spectrum described above, Parakeet sits somewhere between Numba and Copperhead. Like Copperhead, Parakeet's subset of Python is limited to using a small set of data types, library functions and data parallel operators. On the other hand, unlike Copperhead, you don't have to program in a purely function style: if you write loop-heavy numerical code you'll miss out on parallelization but will still see good single-core performance. The main difference from Numba is the absence of any sort of "object layer" which uses the Python C API. Parakeet will (in the long run) support a smaller more numerically-focused subset of Python for the purpose of giving the programmer a clear sense of what will run fast (and if a feature is slow, then Parakeet simply doesn't support it). Additionally, Parakeet's implementation of Python and NumPy library functions leans heavily on data parallel operators, which gives me hope for making pervasive use of GPUs and multi-core hardware.

If you want to learn more about Parakeet check out some of the following presentations:

  • HotPar 2012: We submitted a paper describing an old version of our compiler (written in OCaml with a fragile GPU backend).
  • SciPy 2013 Lightning Talk: A 5-minute overview of the rewritten Parakeet with an LLVM backend.
  • PyData Boston 2013: A longer presentation with more extensive comparison to Numba.
If you want to try using Parakeet, you can either install it via pip (pip install parakeet) or just clone the github repo. Let me know how it goes!

13 comments:

  1. HADOOP TRAINING INSTITUTE IN NOIDA

    CIITN provides Big data hadoop training in Noida in Noida as per the current industry standards. Our training programs will enable professionals to secure placements in MNCs. CIITN is one of the most recommended Hadoop Training Institute in Noida that offers hands on practical knowledge/practical implementation on live projects and will ensure the job with the help of advance level Hadoop Training Courses. At CIITN Hadoop Training in Noida is conducted by specialist working certified corporate professionals having 8+ years of experience in implementing real-time Hadoop projects.CIITN is the best Hadoop training center in Noida with a very high level infrastructure and laboratory facility. The most attractive thing is that candidates can opt multiple IT training course at Noida location. We feel proud by announce that CIITN prepares thousands of candidates for Hadoop training at sensible fees structure which is sufficient for best Hadoop training in Noida to attend the Hadoop classes.Hadoop training course includes “Knowledge by Experiments” strategy to get Hadoop training and performing real-time practices and real-time modulation. This extra ordinary practices with live environment experience in Hadoop Training certifies that you are ready to apply your Hadoop knowledge in big corporations after the Hadoop training in Noida completed.


    Big data hadoop training in Noida

    B-12, Sector - 2, Noida, U.P
    State - Uttar Pradesh U.P
    Country - India
    Pin Code - 201301

    Phone - +917290926565
    Mon - Sun: 10am - 6pm

    ReplyDelete
  2. HADOOP TRAINING INSTITUTE IN NOIDA

    CIITN provides Big data hadoop training in Noida in Noida as per the current industry standards. Our training programs will enable professionals to secure placements in MNCs. CIITN is one of the most recommended Hadoop Training Institute in Noida that offers hands on practical knowledge/practical implementation on live projects and will ensure the job with the help of advance level Hadoop Training Courses. At CIITN Hadoop Training in Noida is conducted by specialist working certified corporate professionals having 8+ years of experience in implementing real-time Hadoop projects.CIITN is the best Hadoop training center in Noida with a very high level infrastructure and laboratory facility. The most attractive thing is that candidates can opt multiple IT training course at Noida location. We feel proud by announce that CIITN prepares thousands of candidates for Hadoop training at sensible fees structure which is sufficient for best Hadoop training in Noida to attend the Hadoop classes.Hadoop training course includes “Knowledge by Experiments” strategy to get Hadoop training and performing real-time practices and real-time modulation. This extra ordinary practices with live environment experience in Hadoop Training certifies that you are ready to apply your Hadoop knowledge in big corporations after the Hadoop training in Noida completed.


    Big data hadoop training in Noida
    hadoop training institute in noida
    hadoop jobs in noida
    best hadoop training in noida
    hadoop institute in noida
    hadoop training noida
    hadoop course in noida

    B-12, Sector - 2, Noida, U.P
    State - Uttar Pradesh U.P
    Country - India
    Pin Code - 201301

    Phone - +917290926565
    Mon - Sun: 10am - 6pm




    ReplyDelete
  3. HADOOP TRAINING INSTITUTE IN NOIDA

    CIITN provides Big data hadoop training in Noida in Noida as per the current industry standards. Our training programs will enable professionals to secure placements in MNCs. CIITN is one of the most recommended Hadoop Training Institute in Noida that offers hands on practical knowledge/practical implementation on live projects and will ensure the job with the help of advance level Hadoop Training Courses. At CIITN Hadoop Training in Noida is conducted by specialist working certified corporate professionals having 8+ years of experience in implementing real-time Hadoop projects.CIITN is the best Hadoop training center in Noida with a very high level infrastructure and laboratory facility. The most attractive thing is that candidates can opt multiple IT training course at Noida location. We feel proud by announce that CIITN prepares thousands of candidates for Hadoop training at sensible fees structure which is sufficient for best Hadoop training in Noida to attend the Hadoop classes.Hadoop training course includes “Knowledge by Experiments” strategy to get Hadoop training and performing real-time practices and real-time modulation. This extra ordinary practices with live environment experience in Hadoop Training certifies that you are ready to apply your Hadoop knowledge in big corporations after the Hadoop training in Noida completed.


    Big data hadoop training in Noida
    hadoop training institute in noida
    hadoop jobs in noida
    best hadoop training in noida
    hadoop institute in noida
    hadoop training noida
    hadoop course in noida

    B-12, Sector - 2, Noida, U.P
    State - Uttar Pradesh U.P
    Country - India
    Pin Code - 201301

    Phone - +917290926565
    Mon - Sun: 10am - 6pm




    ReplyDelete
  4. Best Linux Training Institute in Noida


    CIITNOIDA provides Best Linux Training in Noida
    as per the current industry standards. Our training programs will enable professionals to secure placements in MNCs. CIITNOIDA is one of the most recommended Linux Training Institute in Noida that offers hands on practical knowledge / practical implementation on live projects and will ensure the job with the help of advanced level Linux Training Courses. At CIITNOIDA Linux Training in Noida is conducted by specialist working certified corporate professionals having 8+ years of experience in implementing real-time Linux projects.
    CIITNOIDA is the well-known Linux training institute with high tech infrastructure and lab facilities. We also provide online access of servers so that candidates will implement the projects at their home easily. CIITNOIDA in Noida mentored more than 3000+ candidates with Linux Certification Training in Noida at very reasonable fee. The course curriculum is customized as per the requirement of candidates/corporates.
    In addition to this, our classrooms are built-in with projectors that facilitate our students to understand the topic in a simple manner.
    CIITNOIDA is one of the best Linux Training Institutes in Noida with 100% placement support. We are following the below “P3-Model (Placement Preparation Process)” to ensure the placement of our candidates.

    Linux Training Institute in Noida

    ReplyDelete
  5. Best Shell Scripting Training in Noida

    The training imparted by CIITN makes one develop his job accessibility. UNIX SHELL SCRIPTING training is imparted in such a method that the students become technically sound and that enhances their capability to work with this knowledge as technocrats. UNIX SHELL SCRIPTING is a script printed for the shell, or command line predictor of any operating scheme. The shell is often taken as aneasy domain-specific program language. Characteristic operations done by shell scripts include file management, program implementation and printing text. CIITN gives guidance by essentially making the student work with UNIX SHELL SCRIPTING. The teaching procedure is set in such a way that the learner gets a real feel of the work. The knowledge is trained in a step by step procedure so that the students can get that into their head and put into practice it when needed. They are permitted to create and manage tables, do scripting and actual feel the pulse. The trainers at CIITN is the most outstanding. They have the first hand information of the procedure and really communicate that to their students. The students feel confident that they are at CIITN for the education.
    CIITNOIDA offers shell scripting training with choice of multiple training locations across noida. Our unix shell scripting training centers are equipped with lab facilities and excellent infrastructure. We also provide unix shell scripting certification training path for our students in noida. Through our associated shell scripting training centers, we have trained more than 129 shell scripting students and provided 80 percent placement.

    Unix Shell Scripting Training in Noida


    ReplyDelete
  6. Linux Certification Courses in Noida


    CIITNOIDA provides Best Unix Training in Noida
    as per the current industry standards. Our training programs will enable professionals to secure placements in MNCs. CIITNOIDA is one of the most recommended Linux Training Institute in Noida that offers hands on practical knowledge / practical implementation on live projects and will ensure the job with the help of advanced level Linux Training Courses. At CIITNOIDA Linux Training in Noida is conducted by specialist working certified corporate professionals having 8+ years of experience in implementing real-time Linux projects.
    CIITNOIDA is the well-known Linux training institute with high tech infrastructure and lab facilities. We also provide online access of servers so that candidates will implement the projects at their home easily. CIITNOIDA in Noida mentored more than 3000+ candidates with Linux Certification Training in Noida at very reasonable fee. The course curriculum is customized as per the requirement of candidates/corporates.
    In addition to this, our classrooms are built-in with projectors that facilitate our students to understand the topic in a simple manner.
    CIITNOIDA is one of the best Linux Training Institutes in Noida with 100% placement support. We are following the below “P3-Model (Placement Preparation Process)” to ensure the placement of our candidates.

    Linux Training Institute in Noida

    ReplyDelete

  7. Red Hat Training Institute in Noida

    The training imparted by CIITN makes one develop his job accessibility. UNIX SHELL SCRIPTING training is imparted in such a method that the students become technically sound and that enhances their capability to work with this knowledge as technocrats. UNIX SHELL SCRIPTING is a script printed for the shell, or command line predictor of any operating scheme. The shell is often taken as aneasy domain-specific program language. Characteristic operations done by shell scripts include file management, program implementation and printing text. CIITN gives guidance by essentially making the student work with UNIX SHELL SCRIPTING. The teaching procedure is set in such a way that the learner gets a real feel of the work. The knowledge is trained in a step by step procedure so that the students can get that into their head and put into practice it when needed. They are permitted to create and manage tables, do scripting and actual feel the pulse. The trainers at CIITN is the most outstanding. They have the first hand information of the procedure and really communicate that to their students. The students feel confident that they are at CIITN for the education.
    CIITNOIDA offers shell scripting training with choice of multiple training locations across noida. Our unix shell scripting training centers are equipped with lab facilities and excellent infrastructure. We also provide unix shell scripting certification training path for our students in noida. Through our associated shell scripting training centers, we have trained more than 129 shell scripting students and provided 80 percent placement.

    Unix Training in Noida

    ReplyDelete
  8. Best SAP SD Institute In Noida

    SAP SD training in Noida provided by CIITN Noida. We provide IT trainings based on corporates standards that helps students to be prepare for industries. CIITN offers best SAP SD training in Noida,CIITN is one of the best result oriented SAP SD Training Institute in Noida, offers best practically, experimental knowledge in SAP SD training in Noida. SAP SD (Sales and Distribution) is an important module of SAP ERP and handles all the processes of order to delivery.
    It is tightly integrated with other SAP modules like SAP MM & SAP PP. The SAP SD Training module manages customer relationship beginning from raising a quotation to sales order and billing of the product or service. It consists of business processes required in selling, shipping, billing of a product. Key sub-modules of SAP SD are Sales, Customer and Vendor Master Data, Billing, Delivery, Pricing and Credit Management.At CIITN SAP SD training is conducted by 6+ years of experience in managing real-time projects.

    SAP SD Training Institute In Noida
    SAP SD Course In Noida

    ReplyDelete
  9. Java Training in Noida
    CIITN provides Best java training in noida based on current industry standards that helps attendees to secure placements in their dream jobs at MNCs.The curriculum of our Java training institute in Noida is designed in a way to make sure that our students are not just able to understand the important concepts of the programming language but are also able to apply the knowledge in a practical way.

    if you are looking for the best oracle sql certification center in Noida, CIIT is worth to consider. CIIT is a oracle training institute offering best sql course, oracle training, sql certification and oracle dba training at affordable price. Best Oracle training in Noida.

    Java Training in Noida
    best java training in noida

    ReplyDelete
  10. Best Summer Internship In Noida

    These technologies prepare individuals for fields like software programming, technical support, graphic design, software testing, business analytics. Embedded Systems, Industrial Automation Training and more. Candidates go through a series of comprehensive practical sessions where they work on live problems and implement solutions on real-time basis.
    It has a dedicated placement cell which provides 100% placement assistance to students. The benefits that a student gets out of summer training are innumerable. CIITN,Best Summer training Center for B.Tech/CS/CSE/IT/ BCA/MCA/ B.E /M.tech / B.sc/ M.sc/ Engineering Student has already accomplished itself successfully in the field of Training and Development after setting milestones and bringing smiles to the faces of more than 1 Lakh students. CIITN was incorporated in the year 2002 and over the years CIITN has grown remarkably into the greatest training giant of Northern India.
    There is no looking back to technology! Those passionate for it need not worry about the professional prospects it offers. Courses and trainings in the field of Computer Science and Applications open a wide array of choices for individuals. All you need to do it prepare yourself right!

    Summer Internship Course In Noida

    Summer Internship Training In Noida

    ReplyDelete
  11. Best Sap Training Center in Noida

    CIIT is the biggest ERP SAP training institute in Noida with high tech infrastructure and lab facilities and the options of opting for multiple courses at Noida Location. CIIT in Noida prepares thousands of aspirants for ERP SAP at reasonable fees that is customized keeping in mind training and course content requirement of each attendee.

    ERP SAP training course involves "Learning by Doing" using state-of-the-art infrastructure for performing hands-on exercises and real-world simulations. This extensive hands-on experience in ERP SAP training ensures that you absorb the knowledge and skills that you will need to apply at work after your placement in an MNC.

    CIIT Noida is one of the best ERP SAP training institute in Noida with 100% placement support. CIIT has well defined course modules and training sessions for students. At CIIT ERP SAP training is conducted during day time classes, weekend classes, evening batch classes and fast track training classes.

    ReplyDelete
  12. Best Sap Training Center in Noida

    CIIT is the biggest ERP SAP training institute in Noida with high tech infrastructure and lab facilities and the options of opting for multiple courses at Noida Location. CIIT in Noida prepares thousands of aspirants for ERP SAP at reasonable fees that is customized keeping in mind training and course content requirement of each attendee.

    ERP SAP training course involves "Learning by Doing" using state-of-the-art infrastructure for performing hands-on exercises and real-world simulations. This extensive hands-on experience in ERP SAP training ensures that you absorb the knowledge and skills that you will need to apply at work after your placement in an MNC.

    CIIT Noida is one of the best ERP SAP training institute in Noida with 100% placement support. CIIT has well defined course modules and training sessions for students. At CIIT ERP SAP training is conducted during day time classes, weekend classes, evening batch classes and fast track training classes.

    ReplyDelete