Accelerating Scientific Code with Numba
Graham Markall
Software Engineer, Continuum Analytics
@gmarkall
Graham Markall
Software Engineer, Continuum Analytics
@gmarkall
$ conda env create -f=environment.yml
$ source activate pyconuk-numba # Linux, OS X
$ activate pyconuk-numba # Windows
$ ipython notebook
Format:
- Presentation
- Tutorial exercises (
exercises
folder)
A tool that makes Python code go faster by specialising and compiling it.
CPython | 1x |
Numpy array-wide operations | 13x |
Numba (CPU) | 120x |
Numba (NVidia Tesla K20c) | 2100x |
See the example_codes
directory, times in msec:
Example | CPython | Numba | Speedup |
---|---|---|---|
Black-Scholes | 969 | 433 | 2.2x |
Check Neighbours | 550 | 28 | 19.9x |
IS Distance | 372 | 70 | 5.4x |
Pairwise | 62 | 12 | 5.1x |
from numba import jit
@jit
def mandel(x, y, max_iters):
c = complex(x,y)
z = 0j
for i in range(max_iters):
z = z*z + c
if z.real * z.real + z.imag * z.imag >= 4:
return 255 * i // max_iters
return 255
Inside functions decorated with @jit:
Also inside functions decorated with @jit:
Classes cannot be decorated with @jit.
Types:
- int, bool, float, complex
- tuple, list, None
- bytes, bytearray, memoryview (and other buffer-like objects)
Built-in functions:
- abs, enumerate, len, min, max, print, range, round, zip
Standard library:
- cmath, math, random, ctypes...
Third-party:
- cffi, numpy
Comprehensive list: http://numba.pydata.org/numba-doc/0.21.0/reference/pysupported.html
All kinds of arrays: scalar and structured type
- except when containing Python objects
Allocation, iterating, indexing, slicing
Reductions: argmax(), max(), prod() etc.
Scalar types and values (including datetime64 and timedelta64)
Array expressions, but no broadcasting
See reference manual: http://numba.pydata.org/numba-doc/0.21.0/reference/numpysupported.html
The jit decorator
$ conda env create -f=environment.yml
$ source activate pyconuk-numba # Linux, OS X
$ activate pyconuk-numba # Windows
$ ipython notebook
@vectorize
def rel_diff(x, y):
return 2 * (x - y) / (x + y)
Call:
a = np.arange(1000, dtype = float32)
b = a * 2 + 1
rel_diff(a, b)
The vectorize decorator
@guvectorize([(int64[:], int64[:], int64[:])], '(n),()->(n)')
def g(x, y, res):
for i in range(x.shape[0]):
res[i] = x[i] + y[0]
(n),()->(n)
->
: Inputs, not allocated. After: outputs, allocatedMatrix-vector products:
@guvectorize([(float64[:, :], float64[:], float64[:])],
'(m,n),(n)->(m)')
def batch_matmul(M, v, y):
pass # ...
Fixed outputs (e.g. max and min):
@guvectorize([(float64[:], float64[:], float64[:])],
'(n)->(),()')
def max_min(arr, largest, smallest):
pass # ...
The guvectorize decorator
@jit
def add(a, b):
return a + b
def add_python(a, b):
return a + b
>>> %timeit add(1, 2)
10000000 loops, best of 3: 163 ns per loop
>>> %timeit add_python(1, 2)
10000000 loops, best of 3: 85.3 ns per loop
Calling a @jit
function:
- Yes: retrieve the compiled code from the cache
- No: compile a new specialisation
def f(a, b): # a:= float32, b:= float32
c = a + b # c:= float32
return c # return := float32
Example typing 1:
def select(a, b, c): # a := float32, b := float32, c := bool
if c:
ret = a # ret := float32
else:
ret = b # ret := float32
return ret # return := {float32, float32}
# => float32
Example typing 2:
def select(a, b, c): # a := tuple(int32, int32), b := float32,
# c := bool
if c:
ret = a # ret := tuple(int32, int32)
else:
ret = b # ret := float32
return ret # return := {tuple(int32, int32), float32}
# => XXX
numba.typeinfer.TypingError: Failed at nopython (nopython frontend)
Var 'q1mq0t' unified to object:
q1mq0t := {array(float64, 1d, C), float64}
if cond:
q1mq0t = 6.0
else:
q1mq0t = np.zeros(10)
numba.typeinfer.TypingError: Failed at nopython (nopython frontend)
Undeclared getitem(float64, int64)
a = 10.0
a[0] = 2.0
numba.lowering.LoweringError: Failed at nopython (nopython mode backend)
Internal error:
ValueError: '$0.22' is not a valid parameter name
File "blackscholes.py", line 34
Try commenting out code until the error goes away to figure out the source.
Possibly due to an operation on two different sliced/broadcasted arrays:
raise LoweringError(msg, inst.loc)
numba.lowering.LoweringError: Failed at nopython (nopython mode backend)
Internal error:
NotImplementedError: Don't know how to allocate array with layout 'A'.
File "is_distance_solution.py", line 34
numba.typeinfer.TypingError: Failed at nopython (nopython frontend)
Internal error at <numba.typeinfer.CallConstrain object at 0x7f1b3d9762e8>:
Don't know how to create implicit output array with 'A' layout.
File "pairwise.py", line 22
Another one, this time trying to check truth of an array:
Internal error:
NotImplementedError: ('is_true', <llvmlite.ir.instructions.LoadInstr object at 0x7f2c311ff860>, array(bool, 1d, C))
File "blackscholes_tutorial.py", line 26
File "blackscholes_tutorial.py", line 45
NUMBA_DEBUG=1
Inspection
Compilation modes
@jit
def sum_strings(arr):
intarr = np.empty(len(arr), dtype=np.int32)
for i in range(len(arr)):
intarr[i] = int(arr[i])
sum = 0
# Lifted loop
for i in range(len(intarr)):
sum += intarr[i]
return sum
Loop Lifting
Start off with just jitting it and see if it runs
Use numba --annotate-html to see what Numba sees
Start adding nopython=True to your innermost functions
Try to fix each function and then move on
- Need to make sure all inputs, outputs, are Numba-compatible types
- No lists, dicts, etc
Don't forget to assess performance at each state
@jit(float64(float64, float64))
def add(a, b):
return a + b
float64(float64, float64)
probably unnecessary!for i in range(len(X)):
Y[i] = sin(X[i])
for i in range(len(Y)):
Z[i] = Y[i] * Y[i]
for i in range(len(X)):
Y[i] = sin(X[i])
Z[i] = Y[i] * Y[i]
for i in range(len(X)):
Y = sin(X[i])
Z[i] = Y * Y
NUMBA_DISABLE_JIT=1
to disable compilation@numba.jit(nogil=True)
def my_function(x, y, z):
...
examples/nogil.py
in the Numba distributionThey all have timing and testing.
Set up so you can modify one of its implementations to try and use Numba and go fast
Some taken from examples, some found on the internet
- see references in source
Example solutions in the same folder
Short- to medium-term roadmap:
@
operator)@vectorize(target='cuda')
Repos, documentation, mailing list:
Commercial support: sales@continuum.io