Supported Python features in CUDA Python

This page lists the Python features supported in the CUDA Python. This includes all kernel and device functions compiled with @cuda.jit and other higher level Numba decorators that targets the CUDA GPU.

Language

Execution Model

CUDA Python maps directly to the single-instruction multiple-thread execution (SIMT) model of CUDA. Each instruction is implicitly executed by multiple threads in parallel. With this execution model, array expressions are less useful because we don’t want multiple threads to perform the same task. Instead, we want threads to perform a task in a cooperative fashion.

For details please consult the CUDA Programming Guide.

Constructs

The following Python constructs are not supported:

  • Exception handling (try .. except, try .. finally)

  • Context management (the with statement)

  • Comprehensions (either list, dict, set or generator comprehensions)

  • Generator (any yield statements)

The raise statement is supported.

The assert statement is supported, but only has an effect when debug=True is passed to the numba.cuda.jit() decorator. This is similar to the behavior of the assert keyword in CUDA C/C++, which is ignored unless compiling with device debug turned on.

Printing of strings, integers, and floats is supported, but printing is an asynchronous operation - in order to ensure that all output is printed after a kernel launch, it is necessary to call numba.cuda.synchronize(). Eliding the call to synchronize is acceptable, but output from a kernel may appear during other later driver operations (e.g. subsequent kernel launches, memory transfers, etc.), or fail to appear before the program execution completes.

Built-in types

The following built-in types support are inherited from CPU nopython mode.

  • int

  • float

  • complex

  • bool

  • None

  • tuple

See nopython built-in types.

Built-in functions

The following built-in functions are supported:

Standard library modules

Numpy support

Due to the CUDA programming model, dynamic memory allocation inside a kernel is inefficient and is often not needed. Numba disallows any memory allocating features. This disables a large number of NumPy APIs. For best performance, users should write code such that each thread is dealing with a single element at a time.

Supported numpy features:

  • accessing ndarray attributes .shape, .strides, .ndim, .size, etc..

  • scalar ufuncs that have equivalents in the math module; i.e. np.sin(x[0]), where x is a 1D array.

  • indexing and slicing works.

Unsupported numpy features:

  • array creation APIs.

  • array methods.

  • functions that returns a new array.