Accelerating Scientific Code with Numba

Graham Markall

Software Engineer, Continuum Analytics

graham.markall@continuum.io

@gmarkall

1

Tutorial setup

$ conda env create -f=environment.yml
$ source activate pyconuk-numba # Linux, OS X
$ activate pyconuk-numba        # Windows
$ ipython notebook
2

Hello! (about me)

3

Overview

  1. Overview of Numba / Numba basics
  2. Understanding Numba / Numba internals
  3. Using Numba in your code

Format:

  • Presentation
  • Tutorial exercises (exercises folder)
4

What is Numba? (1)

A tool that makes Python code go faster by specialising and compiling it.

5

What is Numba? (2)

6

Implementation overview

7

Compiler Overview

_images/compiler.png
8

Numba Overview

_images/numbacompiler.png
9

Mandelbrot, 20 iterations

CPython 1x
Numpy array-wide operations 13x
Numba (CPU) 120x
Numba (NVidia Tesla K20c) 2100x
_images/mandel.png
10

Other examples

See the example_codes directory, times in msec:

Example CPython Numba Speedup
Black-Scholes 969 433 2.2x
Check Neighbours 550 28 19.9x
IS Distance 372 70 5.4x
Pairwise 62 12 5.1x
11

Mandelbrot function

from numba import jit

@jit
def mandel(x, y, max_iters):
    c = complex(x,y)
    z = 0j
    for i in range(max_iters):
        z = z*z + c
        if z.real * z.real + z.imag * z.imag >= 4:
            return 255 * i // max_iters

    return 255
12

Supported Python Syntax

Inside functions decorated with @jit:

13

Unsupported Python Syntax

Also inside functions decorated with @jit:

Classes cannot be decorated with @jit.

14

Supported Python Features

15

Supported Python modules

Comprehensive list: http://numba.pydata.org/numba-doc/0.21.0/reference/pysupported.html

16

Supported Numpy features

17

Tutorial exercise 1.1

The jit decorator

$ conda env create -f=environment.yml
$ source activate pyconuk-numba # Linux, OS X
$ activate pyconuk-numba        # Windows
$ ipython notebook
18

Writing Ufuncs

@vectorize
def rel_diff(x, y):
    return 2 * (x - y) / (x + y)

Call:

a = np.arange(1000, dtype = float32)
b = a * 2 + 1
rel_diff(a, b)
19

Tutorial exercise 1.2

The vectorize decorator

20

Generalized Ufuncs

@guvectorize([(int64[:], int64[:], int64[:])], '(n),()->(n)')
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + y[0]
21

Layout examples

Matrix-vector products:

@guvectorize([(float64[:, :], float64[:], float64[:])],
              '(m,n),(n)->(m)')
def batch_matmul(M, v, y):
    pass # ...

Fixed outputs (e.g. max and min):

@guvectorize([(float64[:], float64[:], float64[:])],
              '(n)->(),()')
def max_min(arr, largest, smallest):
    pass # ...
22

Tutorial exercise 1.3

The guvectorize decorator

23

Understanding Numba / Numba Internals

24

Dispatch overhead

@jit
def add(a, b):
    return a + b

def add_python(a, b):
    return a + b
>>> %timeit add(1, 2)
10000000 loops, best of 3: 163 ns per loop

>>> %timeit add_python(1, 2)
10000000 loops, best of 3: 85.3 ns per loop
25

Dispatch process

Calling a @jit function:

  1. Lookup types of arguments
  2. Do any compiled versions match the types of these arguments?
  1. Yes: retrieve the compiled code from the cache
  2. No: compile a new specialisation
  1. Marshal arguments to native values
  2. Call the native code function
  3. Marshal the native return value to a Python value
26

Compilation pipeline

_images/archi2.png
27

Type Inference

def f(a, b):   # a:= float32, b:= float32
    c = a + b  # c:= float32
    return c   # return := float32
28

Type Unification

Example typing 1:

def select(a, b, c):  # a := float32, b := float32, c := bool
    if c:
        ret = a       # ret := float32
    else:
        ret = b       # ret := float32
    return ret       # return := {float32, float32}
                      #           => float32
29

Type Unification

Example typing 2:

def select(a, b, c):  # a := tuple(int32, int32), b := float32,
                      # c := bool
    if c:
        ret = a       # ret := tuple(int32, int32)
    else:
        ret = b       # ret := float32
    return ret       # return := {tuple(int32, int32), float32}
                      #           => XXX
30

Unification error

numba.typeinfer.TypingError: Failed at nopython (nopython frontend)
Var 'q1mq0t' unified to object:
    q1mq0t := {array(float64, 1d, C), float64}
if cond:
    q1mq0t = 6.0
else:
    q1mq0t = np.zeros(10)
31

Interpreting Type Errors

numba.typeinfer.TypingError: Failed at nopython (nopython frontend)
Undeclared getitem(float64, int64)
a = 10.0
a[0] = 2.0
32

Interpreting lowering errors

numba.lowering.LoweringError: Failed at nopython (nopython mode backend)
Internal error:
ValueError: '$0.22' is not a valid parameter name
File "blackscholes.py", line 34

Try commenting out code until the error goes away to figure out the source.

33

Broadcasting/slicing error

Possibly due to an operation on two different sliced/broadcasted arrays:

raise LoweringError(msg, inst.loc)
numba.lowering.LoweringError: Failed at nopython (nopython mode backend)
Internal error:
NotImplementedError: Don't know how to allocate array with layout 'A'.
File "is_distance_solution.py", line 34
numba.typeinfer.TypingError: Failed at nopython (nopython frontend)
Internal error at <numba.typeinfer.CallConstrain object at 0x7f1b3d9762e8>:
Don't know how to create implicit output array with 'A' layout.
File "pairwise.py", line 22
34

Treating array like a scalar

Another one, this time trying to check truth of an array:

Internal error:
NotImplementedError: ('is_true', <llvmlite.ir.instructions.LoadInstr object at 0x7f2c311ff860>, array(bool, 1d, C))
File "blackscholes_tutorial.py", line 26
File "blackscholes_tutorial.py", line 45
35

Inspecting pipeline stage output

36

Tutorial Exercise 2.1

Inspection

37

Modes of compilation

38

Tutorial exercise 2.2

Compilation modes

39

Loop lifting

@jit
def sum_strings(arr):
    intarr = np.empty(len(arr), dtype=np.int32)
    for i in range(len(arr)):
        intarr[i] = int(arr[i])
    sum = 0

    # Lifted loop
    for i in range(len(intarr)):
        sum += intarr[i]

     return sum
40

Tutorial Exercise 2.3

Loop Lifting

41

Using Numba "At Large"

42

Tips 0 - Profiling

_images/gprof2dot.png
43

Tips 1 - General Approach

44

Tips 2 - Don't Specify Types

@jit(float64(float64, float64))
def add(a, b):
    return a + b
45

Tips 3 - Optimisations

for i in range(len(X)):
    Y[i] = sin(X[i])
for i in range(len(Y)):
    Z[i] = Y[i] * Y[i]
  1. Loop fusion:
for i in range(len(X)):
    Y[i] = sin(X[i])
    Z[i] = Y[i] * Y[i]
  1. Array contraction:
for i in range(len(X)):
    Y = sin(X[i])
    Z[i] = Y * Y
46

Tips 4 - Debugging

47

Tips 5 - Releasing the GIL

@numba.jit(nogil=True)
def my_function(x, y, z):
    ...
48

Example codes

49

Example Optimisation Time

50

Future of Numba

Short- to medium-term roadmap:

51

Blog posts

52

More info / contributing

Repos, documentation, mailing list:

Commercial support: sales@continuum.io

53