Memory management¶
Data transfer¶
Even though Numba can automatically transfer NumPy arrays to the device, it can only do so conservatively by always transferring device memory back to the host when a kernel finishes. To avoid the unnecessary transfer for read-only arrays, you can use the following APIs to manually control the transfer:
-
numba.roc.
device_array
(shape, dtype=np.float, strides=None, order='C') Allocate an empty device ndarray. Similar to
numpy.empty()
.
-
numba.roc.
device_array_like
(ary) Call roc.devicearray() with information from the array.
-
numba.roc.
to_device
(obj, context, copy=True, to=None) Allocate and transfer a numpy ndarray or structured scalar to the device.
To copy host->device a numpy array:
ary = numpy.arange(10) d_ary = roc.to_device(ary)
The resulting
d_ary
is aDeviceNDArray
.To copy device->host:
hary = d_ary.copy_to_host()
To copy device->host to an existing array:
ary = numpy.empty(shape=d_ary.shape, dtype=d_ary.dtype) d_ary.copy_to_host(ary)
Device arrays¶
Device array references have the following methods. These methods are to be called in host code, not within ROC-jitted functions.
-
class
numba.roc.hsadrv.devicearray.
DeviceNDArray
(shape, strides, dtype, dgpu_data=None) An on-dGPU array type
-
copy_to_host
(ary=None, stream=None) Copy
self
toary
or create a new Numpy ndarray ifary
isNone
.The transfer is synchronous: the function returns after the copy is finished.
Always returns the host array.
Example:
import numpy as np from numba import hsa arr = np.arange(1000) d_arr = hsa.to_device(arr) my_kernel[100, 100](d_arr) result_array = d_arr.copy_to_host()
-
is_c_contiguous
() Return true if the array is C-contiguous.
-
is_f_contiguous
() Return true if the array is Fortran-contiguous.
-
ravel
(order='C') Flatten the array without changing its contents, similar to
numpy.ndarray.ravel()
.
-
reshape
(*newshape, **kws) Reshape the array without changing its contents, similarly to
numpy.ndarray.reshape()
. Example:d_arr = d_arr.reshape(20, 50, order='F')
-
Data Registration¶
The CPU and GPU do not share the same main memory, however, it is recommended to register a memory allocation to the HSA runtime for as a performance optimisation hint.
-
roc.
register
(*arrays)¶ Register every given array. The function can be used in a with-context for automically deregistration:
array_a = numpy.arange(10) array_b = numpy.arange(10) with roc.register(array_a, array_b): some_hsa_code(array_a, array_b)
-
roc.
deregister
(*arrays)¶ Deregister every given array
Streams¶
-
numba.roc.
stream
()
ROC streams have the following methods:
-
class
numba.roc.hsadrv.driver.
Stream
An asynchronous stream for async API
-
auto_synchronize
() A context manager that waits for all commands in this stream to execute and commits any pending memory transfers upon exiting the context.
-
synchronize
() Synchronize the stream.
-