Release Notes¶

Version 0.52.0 (14 October, 2020)¶

This release focuses on performance improvements, but also adds some new features and contains numerous bug fixes and stability improvements.

Highlights of core performance improvements include:

Intel kindly sponsored research and development into producing a new reference count pruning pass. This pass operates at the LLVM level and can prune a number of common reference counting patterns. This will improve performance for two primary reasons:
- There will be less pressure on the atomic locks used to do the reference counting.
- Removal of reference counting operations permits more inlining and the optimisation passes can in general do more with what is present.
(Siu Kwan Lam).
Intel also sponsored work to improve the performance of the numba.typed.List container, particularly in the case of __getitem__ and iteration (Stuart Archibald).
Superword-level parallelism vectorization is now switched on and the optimisation pipeline has been lightly analysed and tuned so as to be able to vectorize more and more often (Stuart Archibald).

Highlights of core feature changes include:

The inspect_cfg method on the JIT dispatcher object has been significantly enhanced and now includes highlighted output and interleaved line markers and Python source (Stuart Archibald).
The BSD operating system is now unofficially supported (Stuart Archibald).
Numerous features/functionality improvements to NumPy support, including support for:
- np.asfarray (Guilherme Leobas)
- “subtyping” in record arrays (Lucio Fernandez-Arjona)
- np.split and np.array_split (Isaac Virshup)
- operator.contains with ndarray (@mugoh).
- np.asarray_chkfinite (Rishabh Varshney).
- NumPy 1.19 (Stuart Archibald).
- the ndarray allocators, empty, ones and zeros, accepting a dtype specified as a string literal (Stuart Archibald).
Booleans are now supported as literal types (Alexey Kozlov).
On the CUDA target:
- CUDA 9.0 is now the minimum supported version (Graham Markall).
- Support for Unified Memory has been added (Max Katz).
- Kernel launch overhead is reduced (Graham Markall).
- Cudasim support for mapped array, memcopies and memset has been added (Mike Williams).
- Access has been wired in to all libdevice functions (Graham Markall).
- Additional CUDA atomic operations have been added (Michael Collison).
- Additional math library functions (frexp, ldexp, isfinite) (Zhihao Yuan).
- Support for power on complex numbers (Graham Markall).

Deprecations to note:

There are no new deprecations. However, note that “compatibility” mode, which was added some 40 releases ago to help transition from 0.11 to 0.12+, has been removed! Also, the shim to permit the import of jitclass from Numba’s top level namespace has now been removed as per the deprecation schedule.

General Enhancements:

PR #5418: Add np.asfarray impl (Guilherme Leobas)
PR #5560: Record subtyping (Lucio Fernandez-Arjona)
PR #5609: Jitclass Infer Spec from Type Annotations (Ethan Pronovost)
PR #5699: Implement np.split and np.array_split (Isaac Virshup)
PR #6015: Adding BooleanLiteral type (Alexey Kozlov)
PR #6027: Support operators inlining in InlineOverloads (Alexey Kozlov)
PR #6038: Closes #6037, fixing FreeBSD compilation (László Károlyi)
PR #6086: Add more accessible version information (Stuart Archibald)
PR #6157: Add pipeline_class argument to @cfunc as supported by @jit. (Arthur Peters)
PR #6262: Support dtype from str literal. (Stuart Archibald)
PR #6271: Support ndarray contains (@mugoh)
PR #6295: Enhance inspect_cfg (Stuart Archibald)
PR #6304: Support NumPy 1.19 (Stuart Archibald)
PR #6309: Add suitable file search path for BSDs. (Stuart Archibald)
PR #6341: Re roll 6279 (Rishabh Varshney and Valentin Haenel)

Performance Enhancements:

PR #6145: Patch to fingerprint namedtuples. (Stuart Archibald)
PR #6202: Speed up str(int) (Stuart Archibald)
PR #6261: Add np.ndarray.ptp() support. (Stuart Archibald)
PR #6266: Use custom LLVM refcount pruning pass (Siu Kwan Lam)
PR #6275: Switch on SLP vectorize. (Stuart Archibald)
PR #6278: Improve typed list performance. (Stuart Archibald)
PR #6335: Split optimisation passes. (Stuart Archibald)
PR #6455: Fix refprune on obfuscated refs and stabilize optimisation WRT wrappers. (Stuart Archibald)

Fixes:

PR #5639: Make UnicodeType inherit from Hashable (Stuart Archibald)
PR #6006: Resolves incorrectly hoisted list in parfor. (Todd A. Anderson)
PR #6126: fix version_info if version can not be determined (Valentin Haenel)
PR #6137: Remove references to Python 2’s long (Eric Wieser)
PR #6139: Use direct syntax instead of the add_metaclass decorator (Eric Wieser)
PR #6140: Replace calls to utils.iteritems(d) with d.items() (Eric Wieser)
PR #6141: Fix #6130 objmode cache segfault (Siu Kwan Lam)
PR #6156: Remove callers of reraise in favor of using with_traceback directly (Eric Wieser)
PR #6162: Move charseq support out of init (Stuart Archibald)
PR #6165: #5425 continued (Amos Bird and Stuart Archibald)
PR #6166: Remove Python 2 compatibility from numba.core.utils (Eric Wieser)
PR #6185: Better error message on NotDefinedError (Luiz Almeida)
PR #6194: Remove recursion from traverse_types (Radu Popovici)
PR #6200: Workaround #5973 (Stuart Archibald)
PR #6203: Make find_callname only lookup functions that are likely part of NumPy. (Stuart Archibald)
PR #6204: Fix unicode kind selection for getitem. (Stuart Archibald)
PR #6206: Build all extension modules with -g -Wall -Werror on Linux x86, provide -O0 flag option (Graham Markall)
PR #6212: Fix for objmode recompilation issue (Alexey Kozlov)
PR #6213: Fix #6177. Remove AOT dependency on the Numba package (Siu Kwan Lam)
PR #6224: Add support for tuple concatenation to array analysis. (#5396 continued) (Todd A. Anderson)
PR #6231: Remove compatibility mode (Graham Markall)
PR #6254: Fix win-32 hashing bug (from Stuart Archibald) (Ray Donnelly)
PR #6265: Fix #6260 (Stuart Archibald)
PR #6267: speed up a couple of really slow unittests (Stuart Archibald)
PR #6281: Remove numba.jitclass shim as per deprecation schedule. (Stuart Archibald)
PR #6294: Make return type propagate to all return variables (Andreas Sodeur)
PR #6300: Un-skip tests that were skipped because of #4026. (Owen Anderson)
PR #6307: Remove restrictions on SVML version due to bug in LLVM SVML CC (Stuart Archibald)
PR #6316: Make IR inliner tests not self mutating. (Stuart Archibald)
PR #6318: PR #5892 continued (Todd A. Anderson, via Stuart Archibald)
PR #6319: Permit switching off boundschecking when debug is on. (Stuart Archibald)
PR #6324: PR 6208 continued (Ivan Butygin and Stuart Archibald)
PR #6337: Implements key on types.TypeRef (Andreas Sodeur)
PR #6354: Bump llvmlite to 0.35. series. (Stuart Archibald)
PR #6357: Fix enumerate invalid decref (Siu Kwan Lam)
PR #6359: Fixes typed list indexing on 32bit (Stuart Archibald)
PR #6378: Fix incorrect CPU override in vectorization test. (Stuart Archibald)
PR #6379: Use O0 to enable inline and not affect loop-vectorization by later O3… (Siu Kwan Lam)
PR #6384: Fix failing tests to match on platform invariant int spelling. (Stuart Archibald)
PR #6390: Updates inspect_cfg (Stuart Archibald)
PR #6396: Remove hard dependency on tbb package. (Stuart Archibald)
PR #6408: Don’t do array analysis for tuples that contain arrays. (Todd A. Anderson)
PR #6441: Fix ASCII flag in Unicode slicing (0.52.0rc2 regression) (Ehsan Totoni)
PR #6442: Fix array analysis regression in 0.52 RC2 for tuple of 1D arrays (Ehsan Totoni)
PR #6446: Fix #6444: pruner issues with reference stealing functions (Siu Kwan Lam)
PR #6450: Fix asfarray kwarg default handling. (Stuart Archibald)
PR #6486: fix abstract base class import (Valentin Haenel)
PR #6487: Restrict maximum version of python (Siu Kwan Lam)

CUDA Enhancements/Fixes:

PR #5465: Remove macro expansion and replace uses with FE typing + BE lowering (Graham Markall)
PR #5741: CUDA: Add two-argument implementation of round() (Graham Markall)
PR #5900: Enable CUDA Unified Memory (Max Katz)
PR #6042: CUDA: Lower launch overhead by launching kernel directly (Graham Markall)
PR #6064: Lower math.frexp and math.ldexp in numba.cuda (Zhihao Yuan)
PR #6066: Lower math.isfinite in numba.cuda (Zhihao Yuan)
PR #6092: CUDA: Add mapped_array_like and pinned_array_like (Graham Markall)
PR #6127: Fix race in reduction kernels on Volta, require CUDA 9, add syncwarp with default mask (Graham Markall)
PR #6129: Extend Cudasim to support most of the memory functionality. (Mike Williams)
PR #6150: CUDA: Turn on flake8 for cudadrv and fix errors (Graham Markall)
PR #6152: CUDA: Provide wrappers for all libdevice functions, and fix typing of math function (#4618) (Graham Markall)
PR #6227: Raise exception when no supported architectures are found (Jacob Tomlinson)
PR #6244: CUDA Docs: Make workflow using simulator more explicit (Graham Markall)
PR #6248: Add support for CUDA atomic subtract operations (Michael Collison)
PR #6289: Refactor atomic test cases to reduce code duplication (Michael Collison)
PR #6290: CUDA: Add support for complex power (Graham Markall)
PR #6296: Fix flake8 violations in numba.cuda module (Graham Markall)
PR #6297: Fix flake8 violations in numba.cuda.tests.cudapy module (Graham Markall)
PR #6298: Fix flake8 violations in numba.cuda.tests.cudadrv (Graham Markall)
PR #6299: Fix flake8 violations in numba.cuda.simulator (Graham Markall)
PR #6306: Fix flake8 in cuda atomic test from merge. (Stuart Archibald)
PR #6325: Refactor code for atomic operations (Michael Collison)
PR #6329: Flake8 fix for a CUDA test (Stuart Archibald)
PR #6331: Explicitly state that NUMBA_ENABLE_CUDASIM needs to be set before import (Graham Markall)
PR #6340: CUDA: Fix #6339, performance regression launching specialized kernels (Graham Markall)
PR #6380: Only test managed allocations on Linux (Graham Markall)

Documentation Updates:

PR #6090: doc: Add doc on direct creation of Numba typed-list (@rht)
PR #6110: Update CONTRIBUTING.md (Stuart Archibald)
PR #6128: CUDA Docs: Restore Dispatcher.forall() docs (Graham Markall)
PR #6277: fix: cross2d wrong doc. reference (issue #6276) (@jeertmans)
PR #6282: Remove docs on Python 2(.7) EOL. (Stuart Archibald)
PR #6283: Add note on how public CI is impl and what users can do to help. (Stuart Archibald)
PR #6292: Document support for structured array attribute access (Graham Markall)
PR #6310: Declare unofficial *BSD support (Stuart Archibald)
PR #6342: Fix docs on literally usage. (Stuart Archibald)
PR #6348: doc: fix typo in jitclass.rst (“initilising” -> “initialising”) (@muxator)
PR #6362: Move llvmlite support in README to 0.35 (Stuart Archibald)
PR #6363: Note that reference counted types are not permitted in set(). (Stuart Archibald)
PR #6364: Move deprecation schedules for 0.52 (Stuart Archibald)

CI/Infrastructure Updates:

PR #6252: Show channel URLs (Siu Kwan Lam)
PR #6338: Direct user questions to Discourse instead of the Google Group. (Stan Seibert)
PR #6474: Add skip on PPC64LE for tests causing SIGABRT in LLVM. (Stuart Archibald)

Authors:

Alexey Kozlov
Amos Bird
Andreas Sodeur
Arthur Peters
Ehsan Totoni (core dev)
Eric Wieser
Ethan Pronovost
Graham Markall
Guilherme Leobas
Isaac Virshup
Ivan Butygin
Jacob Tomlinson
Luiz Almeida
László Károlyi
Lucio Fernandez-Arjona
Max Katz
Michael Collison
Mike Williams
Owen Anderson
Radu Popovici
Ray Donnelly
Rishabh Varshney
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
Valentin Haenel (core dev)
Zhihao Yuan
@jeertmans
@mugoh
@muxator
@rht

Version 0.51.2 (September 2, 2020)¶

This is a bugfix release for 0.51.1. It fixes a critical performance bug in the CFG back edge computation algorithm that leads to exponential time complexity arising in compilation for use cases with certain pathological properties.

PR #6195: PR 6187 Continue. Don’t visit already checked successors

Authors:

Graham Markall
Siu Kwan Lam (core dev)

Version 0.51.1 (August 26, 2020)¶

This is a bugfix release for 0.51.0, it fixes a critical bug in caching, another critical bug in the CUDA target initialisation sequence and also fixes some compile time performance regressions:

PR #6141: Fix #6130 objmode cache segfault
PR #6146: Fix compilation slowdown due to controlflow analysis
PR #6147: CUDA: Don’t make a runtime call on import
PR #6153: Fix for #6151. Make UnicodeCharSeq into str for comparison.
PR #6168: Fix Issue #6167: Failure in test_cuda_submodules

Authors:

Graham Markall
Siu Kwan Lam (core dev)
Stuart Archibald (core dev)

Version 0.51.0 (August 12, 2020)¶

This release continues to add new features to Numba and also contains a significant number of bug fixes and stability improvements.

Highlights of core feature changes include:

The compilation chain is now based on LLVM 10 (Valentin Haenel).
Numba has internally switched to prefer non-literal types over literal ones so as to reduce function over-specialisation, this with view of speeding up compile times (Siu Kwan Lam).
On the CUDA target: Support for CUDA Toolkit 11, Ampere, and Compute Capability 8.0; Printing of SASS code for kernels; Callbacks to Python functions can be inserted into CUDA streams, and streams are async awaitable; Atomic nanmin and nanmax functions are added; Fixes for various miscompilations and segfaults. (mostly Graham Markall; call backs on streams by Peter Würtz).

Intel also kindly sponsored research and development that lead to some exciting new features:

Support for heterogeneous immutable lists and heterogeneous immutable string key dictionaries. Also optional initial/construction value capturing for all lists and dictionaries containing literal values (Stuart Archibald).
A new pass-by-reference mutable structure extension type StructRef (Siu Kwan Lam).
Object mode blocks are now cacheable, with the side effect of numerous bug fixes and performance improvements in caching. This also permits caching of functions defined in closures (Siu Kwan Lam).

Deprecations to note:

To align with other targets, the argtypes and restypes kwargs to @cuda.jit are now deprecated, the bind kwarg is also deprecated. Further the target kwarg to the numba.jit decorator family is deprecated.

General Enhancements:

PR #5463: Add str(int) impl
PR #5526: Impl. np.asarray(literal)
PR #5619: Add support for multi-output ufuncs
PR #5711: Division with timedelta input
PR #5763: Support minlength argument to np.bincount
PR #5779: Return zero array from np.dot when the arguments are empty.
PR #5796: Add implementation for np.positive
PR #5849: Setitem for records when index is StringLiteral, including literal unroll
PR #5856: Add support for conversion of inplace_binop to parfor.
PR #5893: Allocate 1D iteration space one at a time for more even distribution.
PR #5922: Reduce objmode and unpickling overhead
PR #5944: re-enable OpenMP in wheels
PR #5946: Implement literal dictionaries and lists.
PR #5956: Update numba_sysinfo.py
PR #5978: Add structref as a mutable struct that is pass-by-ref
PR #5980: Deprecate target kwarg for numba.jit.
PR #6058: Add prefer_literal option to overload API

Fixes:

PR #5674: Fix #3955. Allow with objmode to be cached
PR #5724: Initialize process lock lazily to prevent multiprocessing issue
PR #5783: Make np.divide and np.remainder code more similar
PR #5808: Fix 5665 Block jit(nopython=True, forceobj=True) and suppress njit(forceobj=True)
PR #5834: Fix the is operator on Ellipsis
PR #5838: Ensure Dispatcher.__eq__ always returns a bool
PR #5841: cleanup: Use PythonAPI.bool_from_bool in more places
PR #5862: Do not leak loop iteration variables into the numba.np.npyimpl namespace
PR #5869: Update repomap
PR #5879: Fix erroneous input mutation in linalg routines
PR #5882: Type check function in jit decorator
PR #5925: Use np.inf and -np.inf for max and min float values respectively.
PR #5935: Fix default arguments with multiprocessing
PR #5952: Fix “Internal error … local variable ‘errstr’ referenced before assignment during BoundFunction(…)”
PR #5962: Fix SVML tests with LLVM 10 and AVX512
PR #5972: fix flake8 for numba/runtests.py
PR #5995: Update setup.py with new llvmlite versions
PR #5996: Set lower bound for llvmlite to 0.33
PR #6004: Fix problem in branch pruning with LiteralStrKeyDict
PR #6017: Fixing up numba_do_raise
PR #6028: Fix #6023
PR #6031: Continue 5821
PR #6035: Fix overspecialize of literal
PR #6046: Fixes statement reordering bug in maximize fusion step.
PR #6056: Fix issue on invalid inlining of non-empty build_list by inline_arraycall
PR #6057: fix aarch64/python_3.8 failure on master
PR #6070: Fix overspecialized containers
PR #6071: Remove f-strings in setup.py
PR #6072: Fix for #6005
PR #6073: Fixes invalid C prototype in helper function.
PR #6078: Duplicate NumPy’s PyArray_DescrCheck macro
PR #6081: Fix issue with cross drive use and relpath.
PR #6083: Fix bug in initial value unify.
PR #6087: remove invalid sanity check from randrange tests
PR #6089: Fix invalid reference to TypingError
PR #6097: Add function code and closure bytes into cache key
PR #6099: Restrict upper limit of TBB version due to ABI changes.
PR #6101: Restrict lower limit of icc_rt version due to assumed SVML bug.
PR #6107: Fix and test #6095
PR #6109: Fixes an issue reported in #6094
PR #6111: Decouple LiteralList and LiteralStrKeyDict from tuple
PR #6116: Fix #6102. Problem with non-unique label.

CUDA Enhancements/Fixes:

PR #5359: Remove special-casing of 0d arrays
PR #5709: CUDA: Refactoring of cuda.jit and kernel / dispatcher abstractions
PR #5732: CUDA Docs: document forall method of kernels
PR #5745: CUDA stream callbacks and async awaitable streams
PR #5761: Add implmentation for int types for isnan and isinf for CUDA
PR #5819: Add support for CUDA 11 and Ampere / CC 8.0
PR #5826: CUDA: Add function to get SASS for kernels
PR #5846: CUDA: Allow disabling NVVM optimizations, and fix debug issues
PR #5851: CUDA EMM enhancements - add default get_ipc_handle implementation, skip a test conditionally
PR #5852: CUDA: Fix cuda.test()
PR #5857: CUDA docs: Add notes on resetting the EMM plugin
PR #5859: CUDA: Fix reduce docs and style improvements
PR #6016: Fixes change of list spelling in a cuda test.
PR #6020: CUDA: Fix #5820, adding atomic nanmin / nanmax
PR #6030: CUDA: Don’t optimize IR before sending it to NVVM
PR #6052: Fix dtype for atomic_add_double testsuite
PR #6080: CUDA: Prevent auto-upgrade of atomic intrinsics
PR #6123: Fix #6121

Documentation Updates:

PR #5782: Host docs on Read the Docs
PR #5830: doc: Mention that caching uses pickle
PR #5963: Fix broken link to numpy ufunc signature docs
PR #5975: restructure communication section
PR #5981: Document bounds-checking behavior in python deviations page
PR #5993: Docs for structref
PR #6008: Small fix so bullet points are rendered by sphinx
PR #6013: emphasize cuda kernel functions are asynchronous
PR #6036: Update deprecation doc from numba.errors to numba.core.errors
PR #6062: Change references to numba.pydata.org to https

CI updates:

PR #5850: Updates the “New Issue” behaviour to better redirect users.
PR #5940: Add discourse badge
PR #5960: Setting mypy on CI

Enhancements from user contributed PRs (with thanks!):

Aisha Tammy added the ability to switch off TBB support at compile time in #5821 (continued in #6031 by Stuart Archibald).
Alexander Stiebing fixed a reference before assignment bug in #5952.
Alexey Kozlov fixed a bug in tuple getitem for literals in #6028.
Andrew Eckart updated the repomap in #5869, added support for Read the Docs in #5782, fixed a bug in the np.dot implementation to correctly handle empty arrays in #5779 and added support for minlength to np.bincount in #5763.
@bitsisbits updated numba_sysinfo.py to handle HSA agents correctly in #5956.
Daichi Suzuo Fixed a bug in the threading backend initialisation sequence such that it is now correctly a lazy lock in #5724.
Eric Wieser contributed a number of patches, particularly in enhancing and improving the ufunc capabilities:
- #5359: Remove special-casing of 0d arrays
- #5834: Fix the is operator on Ellipsis
- #5619: Add support for multi-output ufuncs
- #5841: cleanup: Use PythonAPI.bool_from_bool in more places
- #5862: Do not leak loop iteration variables into the numba.np.npyimpl namespace
- #5838: Ensure Dispatcher.__eq__ always returns a bool
- #5830: doc: Mention that caching uses pickle
- #5783: Make np.divide and np.remainder code more similar
Ethan Pronovost added a guard to prevent the common mistake of applying a jit decorator to the same function twice in #5881.
Graham Markall contributed many patches to the CUDA target, as follows:
- #6052: Fix dtype for atomic_add_double tests
- #6030: CUDA: Don’t optimize IR before sending it to NVVM
- #5846: CUDA: Allow disabling NVVM optimizations, and fix debug issues
- #5826: CUDA: Add function to get SASS for kernels
- #5851: CUDA EMM enhancements - add default get_ipc_handle implementation, skip a test conditionally
- #5709: CUDA: Refactoring of cuda.jit and kernel / dispatcher abstractions
- #5819: Add support for CUDA 11 and Ampere / CC 8.0
- #6020: CUDA: Fix #5820, adding atomic nanmin / nanmax
- #5857: CUDA docs: Add notes on resetting the EMM plugin
- #5859: CUDA: Fix reduce docs and style improvements
- #5852: CUDA: Fix cuda.test()
- #5732: CUDA Docs: document forall method of kernels
Guilherme Leobas added support for str(int) in #5463 and np.asarray(literal value)` in #5526.
Hameer Abbasi deprecated the target kwarg for numba.jit in #5980.
Hannes Pahl added a badge to the Numba github page linking to the new discourse forum in #5940 and also fixed a bug that permitted illegal combinations of flags to be passed into @jit in #5808.
Kayran Schmidt emphasized that CUDA kernel functions are asynchronous in the documentation in #6013.
Leonardo Uieda fixed a broken link to the NumPy ufunc signature docs in #5963.
Lucio Fernandez-Arjona added mypy to CI and started adding type annotations to the code base in #5960, also fixed a (de)serialization problem on the dispatcher in #5935, improved the undefined variable error message in #5876, added support for division with timedelta input in #5711 and implemented setitem for records when the index is a StringLiteral in #5849.
Ludovic Tiako documented Numba’s bounds-checking behavior in the python deviations page in #5981.
Matt Roeschke changed all http references https in #6062.
@niteya-shah implemented isnan and isinf for integer types on the CUDA target in #5761 and implemented np.positive in #5796.
Peter Würtz added CUDA stream callbacks and async awaitable streams in #5745.
@rht fixed an invalid import referred to in the deprecation documentation in #6036.
Sergey Pokhodenko updated the SVML tests for LLVM 10 in #5962.
Shyam Saladi fixed a Sphinx rendering bug in #6008.

Authors:

Aisha Tammy
Alexander Stiebing
Alexey Kozlov
Andrew Eckart
@bitsisbits
Daichi Suzuo
Eric Wieser
Ethan Pronovost
Graham Markall
Guilherme Leobas
Hameer Abbasi
Hannes Pahl
Kayran Schmidt
Kozlov, Alexey
Leonardo Uieda
Lucio Fernandez-Arjona
Ludovic Tiako
Matt Roeschke
@niteya-shah
Peter Würtz
Sergey Pokhodenko
Shyam Saladi
@rht
Siu Kwan Lam (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
Valentin Haenel (core dev)

Version 0.50.1 (Jun 24, 2020)¶

This is a bugfix release for 0.50.0, it fixes a critical bug in error reporting and a number of other smaller issues:

PR #5861: Added except for possible Windows get_terminal_size exception
PR #5876: Improve undefined variable error message
PR #5884: Update the deprecation notices for 0.50.1
PR #5889: Fixes literally not forcing re-dispatch for inline=’always’
PR #5912: Fix bad attr access on certain typing templates breaking exceptions.
PR #5918: Fix cuda test due to #5876

Authors:

@pepping_dore
Lucio Fernandez-Arjona
Siu Kwan Lam (core dev)
Stuart Archibald (core dev)

Version 0.50.0 (Jun 10, 2020)¶

This is a more usual release in comparison to the others that have been made in the last six months. It comprises the result of a number of maintenance tasks along with some new features and a lot of bug fixes.

Highlights of core feature changes include:

The compilation chain is now based on LLVM 9.
The error handling and reporting system has been improved to reduce the size of error messages, and also improve quality and specificity.
The CUDA target has more stream constructors available and a new function for compiling to PTX without linking and loading the code to a device. Further, the macro-based system for describing CUDA threads and blocks has been replaced with standard typing and lowering implementations, for improved debugging and extensibility.

IMPORTANT: The backwards compatibility shim, that was present in 0.49.x to accommodate the refactoring of Numba’s internals, has been removed. If a module is imported from a moved location an ImportError will occur.

General Enhancements:

PR #5060: Enables np.sum for timedelta64
PR #5225: Adjust interpreter to make conditionals predicates via bool() call.
PR #5506: Jitclass static methods
PR #5580: Revert shim
PR #5591: Fix #5525 Add figure for total memory to numba -s output.
PR #5616: Simplify the ufunc kernel registration
PR #5617: Remove /examples from the Numba repo.
PR #5673: Fix inliners to run all passes on IR and clean up correctly.
PR #5700: Make it easier to understand type inference: add SSA dump, use for DEBUG_TYPEINFER
PR #5702: Fixes for LLVM 9
PR #5722: Improve error messages.
PR #5758: Support NumPy 1.18

Fixes:

PR #5390: add error handling for lookup_module
PR #5464: Jitclass drops annotations to avoid error
PR #5478: Fix #5471. Issue with omitted type not recognized as literal value.
PR #5517: Fix numba.typed.List extend for singleton and empty iterable
PR #5549: Check type getitem
PR #5568: Add skip to entrypoint test on windows
PR #5581: Revert #5568
PR #5602: Fix segfault caused by pop from numba.typed.List
PR #5645: Fix SSA redundant CFG computation
PR #5686: Fix issue with SSA not minimal
PR #5689: Fix bug in unified_function_type (issue 5685)
PR #5694: Skip part of slice array analysis if any part is not analyzable.
PR #5697: Fix usedef issue with parfor loopnest variables.
PR #5705: A fix for cases where SSA looks like a reduction variable.
PR #5714: Fix bug in test
PR #5717: Initialise Numba extensions ahead of any compilation starting.
PR #5721: Fix array iterator layout.
PR #5738: Unbreak master on buildfarm
PR #5757: Force LLVM to use ZMM registers for vectorization.
PR #5764: fix flake8 errors
PR #5768: Interval example: fix import
PR #5781: Moving record array examples to a test module
PR #5791: Fix up no cgroups problem
PR #5795: Restore refct removal pass and make it strict
PR #5807: Skip failing test on POWER8 due to PPC CTR Loop problem.
PR #5812: Fix side issue from #5792, @overload inliner cached IR being mutated.
PR #5815: Pin llvmlite to 0.33
PR #5833: Fixes the source location appearing incorrectly in error messages.

CUDA Enhancements/Fixes:

PR #5347: CUDA: Provide more stream constructors
PR #5388: CUDA: Fix OOB write in test_round{f4,f8}
PR #5437: Fix #5429: Exception using .get_ipc_handle(...) on array from as_cuda_array(...)
PR #5481: CUDA: Replace macros with typing and lowering implementations
PR #5556: CUDA: Make atomic semantics match Python / NumPy, and fix #5458
PR #5558: CUDA: Only release primary ctx if retained
PR #5561: CUDA: Add function for compiling to PTX (+ other small fixes)
PR #5573: CUDA: Skip tests under cuda-memcheck that hang it
PR #5578: Implement math.modf for CUDA target
PR #5704: CUDA Eager compilation: Fix max_registers kwarg
PR #5718: CUDA lib path tests: unset CUDA_PATH when CUDA_HOME unset
PR #5800: Fix LLVM 9 IR for NVVM
PR #5803: CUDA Update expected error messages to fix #5797

Documentation Updates:

PR #5546: DOC: Add documentation about cost model to inlining notes.
PR #5653: Update doc with respect to try-finally case

Enhancements from user contributed PRs (with thanks!):

Elias Kuthe fixed in issue with imports in the Interval example in #5768
Eric Wieser Simplified the ufunc kernel registration mechanism in #5616
Ethan Pronovost patched a problem with __annotations__ in jitclass in #5464, fixed a bug that lead to infinite loops in Numba’s Type.__getitem__ in #5549, fixed a bug in np.arange testing in #5714 and added support for @staticmethod to jitclass in #5506.
Gabriele Gemmi implemented math.modf for the CUDA target in #5578
Graham Markall contributed many patches, largely to the CUDA target, as follows:
- #5347: CUDA: Provide more stream constructors
- #5388: CUDA: Fix OOB write in test_round{f4,f8}
- #5437: Fix #5429: Exception using .get_ipc_handle(...) on array from as_cuda_array(...)
- #5481: CUDA: Replace macros with typing and lowering implementations
- #5556: CUDA: Make atomic semantics match Python / NumPy, and fix #5458
- #5558: CUDA: Only release primary ctx if retained
- #5561: CUDA: Add function for compiling to PTX (+ other small fixes)
- #5573: CUDA: Skip tests under cuda-memcheck that hang it
- #5648: Unset the memory manager after EMM Plugin tests
- #5700: Make it easier to understand type inference: add SSA dump, use for DEBUG_TYPEINFER
- #5704: CUDA Eager compilation: Fix max_registers kwarg
- #5718: CUDA lib path tests: unset CUDA_PATH when CUDA_HOME unset
- #5800: Fix LLVM 9 IR for NVVM
- #5803: CUDA Update expected error messages to fix #5797
Guilherme Leobas updated the documentation surrounding try-finally in #5653
Hameer Abbasi added documentation about the cost model to the notes on inlining in #5546
Jacques Gaudin rewrote numba -s to produce and consume a dictionary of output about the current system in #5591
James Bourbeau Updated min/argmin and max/argmax to handle non-leading nans (via #5758)
Lucio Fernandez-Arjona moved the record array examples to a test module in #5781 and added np.timedelta64 handling to np.sum in #5060
Pearu Peterson Fixed a bug in unified_function_type in #5689
Sergey Pokhodenko fixed an issue impacting LLVM 10 regarding vectorization widths on Intel SkyLake processors in #5757
Shan Sikdar added error handling for lookup_module in #5390
@toddrme2178 add CI testing for NumPy 1.18 (via #5758)

Authors:

Elias Kuthe
Eric Wieser
Ethan Pronovost
Gabriele Gemmi
Graham Markall
Guilherme Leobas
Hameer Abbasi
Jacques Gaudin
James Bourbeau
Lucio Fernandez-Arjona
Pearu Peterson
Sergey Pokhodenko
Shan Sikdar
Siu Kwan Lam (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
@toddrme2178
Valentin Haenel (core dev)

Version 0.49.1 (May 7, 2020)¶

This is a bugfix release for 0.49.0, it fixes some residual issues with SSA form, a critical bug in the branch pruning logic and a number of other smaller issues:

PR #5587: Fixed #5586 Threading Implementation Typos
PR #5592: Fixes #5583 Remove references to cffi_support from docs and examples
PR #5614: Fix invalid type in resolve for comparison expr in parfors.
PR #5624: Fix erroneous rewrite of predicate to bit const on prune.
PR #5627: Fixes #5623, SSA local def scan based on invalid equality assumption.
PR #5629: Fixes naming error in array_exprs
PR #5630: Fix #5570. Incorrect race variable detection due to SSA naming.
PR #5638: Make literal_unroll function work as a freevar.
PR #5648: Unset the memory manager after EMM Plugin tests
PR #5651: Fix some SSA issues
PR #5652: Pin to sphinx=2.4.4 to avoid problem with C declaration
PR #5658: Fix unifying undefined first class function types issue
PR #5669: Update example in 5m guide WRT SSA type stability.
PR #5676: Restore numba.types as public API

Authors:

Graham Markall
Juan Manuel Cruz Martinez
Pearu Peterson
Sean Law
Stuart Archibald (core dev)
Siu Kwan Lam (core dev)

Version 0.49.0 (Apr 16, 2020)¶

This release is very large in terms of code changes. Large scale removal of unsupported Python and NumPy versions has taken place along with a significant amount of refactoring to simplify the Numba code base to make it easier for contributors. Numba’s intermediate representation has also undergone some important changes to solve a number of long standing issues. In addition some new features have been added and a large number of bugs have been fixed!

IMPORTANT: In this release Numba’s internals have moved about a lot. A backwards compatibility “shim” is provided for this release so as to not immediately break projects using Numba’s internals. If a module is imported from a moved location the shim will issue a deprecation warning and suggest how to update the import statement for the new location. The shim will be removed in 0.50.0!

Highlights of core feature changes include:

Removal of all Python 2 related code and also updating the minimum supported Python version to 3.6, the minimum supported NumPy version to 1.15 and the minimum supported SciPy version to 1.0. (Stuart Archibald).
Refactoring of the Numba code base. The code is now organised into submodules by functionality. This cleans up Numba’s top level namespace. (Stuart Archibald).
Introduction of an ir.Del free static single assignment form for Numba’s intermediate representation (Siu Kwan Lam and Stuart Archibald).
An OpenMP-like thread masking API has been added for use with code using the parallel CPU backends (Aaron Meurer and Stuart Archibald).
For the CUDA target, all kernel launches now require a configuration, this preventing accidental launches of kernels with the old default of a single thread in a single block. The hard-coded autotuner is also now removed, such tuning is deferred to CUDA API calls that provide the same functionality (Graham Markall).
The CUDA target also gained an External Memory Management plugin interface to allow Numba to use another CUDA-aware library for all memory allocations and deallocations (Graham Markall).
The Numba Typed List container gained support for construction from iterables (Valentin Haenel).
Experimental support was added for first-class function types (Pearu Peterson).

Enhancements from user contributed PRs (with thanks!):

Aaron Meurer added support for thread masking at runtime in #4615.
Andreas Sodeur fixed a long standing bug that was preventing cProfile from working with Numba JIT compiled functions in #4476.
Arik Funke fixed error messages in test_array_reductions (#5278), fixed an issue with test discovery (#5239), made it so the documentation would build again on windows (#5453) and fixed a nested list problem in the docs in #5489.
Antonio Russo fixed a SyntaxWarning in #5252.
Eric Wieser added support for inferring the types of object arrays (#5348) and iterating over 2D arrays (#5115), also fixed some compiler warnings due to missing (void) in #5222. Also helped improved the “shim” and associated warnings in #5485, #5488, #5498 and partly #5532.
Ethan Pronovost fixed a problem with the shim erroneously warning for jitclass use in #5454 and also prevented illegal return values in jitclass __init__ in #5505.
Gabriel Majeri added SciPy 2019 talks to the docs in #5106.
Graham Markall changed the Numba HTML documentation theme to resolve a number of long standing issues in #5346. Also contributed were a large number of CUDA enhancements and fixes, namely:
- #5519: CUDA: Silence the test suite - Fix #4809, remove autojit, delete prints
- #5443: Fix #5196: Docs: assert in CUDA only enabled for debug
- #5436: Fix #5408: test_set_registers_57 fails on Maxwell
- #5423: Fix #5421: Add notes on printing in CUDA kernels
- #5400: Fix #4954, and some other small CUDA testsuite fixes
- #5328: NBEP 7: External Memory Management Plugin Interface
- #5144: Fix #4875: Make #2655 test with debug expect to pass
- #5323: Document lifetime semantics of CUDA Array Interface
- #5061: Prevent kernel launch with no configuration, remove autotuner
- #5099: Fix #5073: Slices of dynamic shared memory all alias
- #5136: CUDA: Enable asynchronous operations on the default stream
- #5085: Support other itemsizes with view
- #5059: Docs: Explain how to use Memcheck with Numba, fixups in CUDA documentation
- #4957: Add notes on overwriting gufunc inputs to docs
Greg Jennings fixed an issue with np.random.choice not acknowledging the RNG seed correctly in #3897/#5310.
Guilherme Leobas added support for np.isnat in #5293.
Henry Schreiner made the llvmlite requirements more explicit in requirements.txt in #5150.
Ivan Butygin helped fix an issue with parfors sequential lowering in #5114/#5250.
Jacques Gaudin fixed a bug for Python >= 3.8 in numba -s in #5548.
Jim Pivarski added some hints for debugging entry points in #5280.
John Kirkham added numpy.dtype coercion for the dtype argument to CUDA device arrays in #5252.
Leo Fang added a list of libraries that support __cuda_array_interface__ in #5104.
Lucio Fernandez-Arjona added getitem for the NumPy record type when the index is a StringLiteral type in #5182 and improved the documentation rendering via additions to the TOC and removal of numbering in #5450.
Mads R. B. Kristensen fixed an issue with __cuda_array_interface__ not requiring the context in #5189.
Marcin Tolysz added support for nested modules in AOT compilation in #5174.
Mike Williams fixed some issues with NumPy records and getitem in the CUDA simulator in #5343.
Pearu Peterson added experimental support for first-class function types in #5287 (and fixes in #5459, #5473/#5429, and #5557).
Ravi Teja Gutta added support for np.flip in #4376/#5313.
Rohit Sanjay fixed an issue with type refinement for unicode input supplied to typed-list extend() (#5295) and fixed unicode .strip() to strip all whitespace characters in #5213.
Vladimir Lukyanov fixed an awkward bug in typed.dict in #5361, added a fix to ensure the LLVM and assembly dumps are highlighted correctly in #5357 and implemented a Numba IR Lexer and added highlighting to Numba IR dumps in #5333.
hdf fixed an issue with the boundscheck flag in the CUDA jit target in #5257.

General Enhancements:

PR #4615: Allow masking threads out at runtime
PR #4798: Add branch pruning based on raw predicates.
PR #5115: Add support for iterating over 2D arrays
PR #5117: Implement ord()/chr()
PR #5122: Remove Python 2.
PR #5127: Calling convention adaptor for boxer/unboxer to call jitcode
PR #5151: implement None-typed typed-list
PR #5174: Nested modules https://github.com/numba/numba/issues/4739
PR #5182: Add getitem for Record type when index is StringLiteral
PR #5185: extract code-gen utilities from closures
PR #5197: Refactor Numba, part I
PR #5210: Remove more unsupported Python versions from build tooling.
PR #5212: Adds support for viewing the CFG of the ELF disassembly.
PR #5227: Immutable typed-list
PR #5231: Added support for np.asarray to be used with numba.typed.List
PR #5235: Added property dtype to numba.typed.List
PR #5272: Refactor parfor: split up ParforPass
PR #5281: Make IR ir.Del free until legalized.
PR #5287: First-class function type
PR #5293: np.isnat
PR #5294: Create typed-list from iterable
PR #5295: refine typed-list on unicode input to extend
PR #5296: Refactor parfor: better exception from passes
PR #5308: Provide numba.extending.is_jitted
PR #5320: refactor array_analysis
PR #5325: Let literal_unroll accept types.Named*Tuple
PR #5330: refactor common operation in parfor lowering into a new util
PR #5333: Add: highlight Numba IR dump
PR #5342: Support for tuples passed to parfors.
PR #5348: Add support for inferring the types of object arrays
PR #5351: SSA again
PR #5352: Add shim to accommodate refactoring.
PR #5356: implement allocated parameter in njit
PR #5369: Make test ordering more consistent across feature availability
PR #5428: Wip/deprecate jitclass location
PR #5441: Additional changes to first class function
PR #5455: Move to llvmlite 0.32.*
PR #5457: implement repr for untyped lists

Fixes:

PR #4476: Another attempt at fixing frame injection in the dispatcher tracing path
PR #4942: Prevent some parfor aliasing. Rename copied function var to prevent recursive type locking.
PR #5092: Fix 5087
PR #5150: More explicit llvmlite requirement in requirements.txt
PR #5172: fix version spec for llvmlite
PR #5176: Normalize kws going into fold_arguments.
PR #5183: pass ‘inline’ explicitly to overload
PR #5193: Fix CI failure due to missing files when installed
PR #5213: Fix .strip() to strip all whitespace characters
PR #5216: Fix namedtuple mistreated by dispatcher as simple tuple
PR #5222: Fix compiler warnings due to missing (void)
PR #5232: Fixes a bad import that breaks master
PR #5239: fix test discovery for unittest
PR #5247: Continue PR #5126
PR #5250: Part fix/5098
PR #5252: Trivially fix SyntaxWarning
PR #5276: Add prange variant to has_no_side_effect.
PR #5278: fix error messages in test_array_reductions
PR #5310: PR #3897 continued
PR #5313: Continues PR #4376
PR #5318: Remove AUTHORS file reference from MANIFEST.in
PR #5327: Add warning if FNV hashing is found as the default for CPython.
PR #5338: Remove refcount pruning pass
PR #5345: Disable test failing due to removed pass.
PR #5357: Small fix to have llvm and asm highlighted properly
PR #5361: 5081 typed.dict
PR #5431: Add tolerance to numba extension module entrypoints.
PR #5432: Fix code causing compiler warnings.
PR #5445: Remove undefined variable
PR #5454: Don’t warn for numba.experimental.jitclass
PR #5459: Fixes issue 5448
PR #5480: Fix for #5477, literal_unroll KeyError searching for getitems
PR #5485: Show the offending module in “no direct replacement” error message
PR #5488: Add missing numba.config shim
PR #5495: Fix missing null initializer for variable after phi strip
PR #5498: Make the shim deprecation warnings work on python 3.6 too
PR #5505: Better error message if __init__ returns value
PR #5527: Attempt to fix #5518
PR #5529: PR #5473 continued
PR #5532: Make numba.<mod> available without an import
PR #5542: Fixes RC2 module shim bug
PR #5548: Fix #5537 Removed reference to platform.linux_distribution
PR #5555: Fix #5515 by reverting changes to ArrayAnalysis
PR #5557: First-class function call cannot use keyword arguments
PR #5569: Fix RewriteConstGetitems not registering calltype for new expr
PR #5571: Pin down llvmlite requirement

CUDA Enhancements/Fixes:

PR #5061: Prevent kernel launch with no configuration, remove autotuner
PR #5085: Support other itemsizes with view
PR #5099: Fix #5073: Slices of dynamic shared memory all alias
PR #5104: Add a list of libraries that support __cuda_array_interface__
PR #5136: CUDA: Enable asynchronous operations on the default stream
PR #5144: Fix #4875: Make #2655 test with debug expect to pass
PR #5189: __cuda_array_interface__ not requiring context
PR #5253: Coerce dtype to numpy.dtype
PR #5257: boundscheck fix
PR #5319: Make user facing error string use abs path not rel.
PR #5323: Document lifetime semantics of CUDA Array Interface
PR #5328: NBEP 7: External Memory Management Plugin Interface
PR #5343: Fix cuda spoof
PR #5400: Fix #4954, and some other small CUDA testsuite fixes
PR #5436: Fix #5408: test_set_registers_57 fails on Maxwell
PR #5519: CUDA: Silence the test suite - Fix #4809, remove autojit, delete prints

Documentation Updates:

PR #4957: Add notes on overwriting gufunc inputs to docs
PR #5059: Docs: Explain how to use Memcheck with Numba, fixups in CUDA documentation
PR #5106: Add SciPy 2019 talks to docs
PR #5147: Update master for 0.48.0 updates
PR #5155: Explain what inlining at Numba IR level will do
PR #5161: Fix README.rst formatting
PR #5207: Remove AUTHORS list
PR #5249: fix target path for See also
PR #5262: fix typo in inlining docs
PR #5270: fix ‘see also’ in typeddict docs
PR #5280: Added some hints for debugging entry points.
PR #5297: Update docs with intro to {g,}ufuncs.
PR #5326: Update installation docs with OpenMP requirements.
PR #5346: Docs: use sphinx_rtd_theme
PR #5366: Remove reference to Python 2.7 in install check output
PR #5423: Fix #5421: Add notes on printing in CUDA kernels
PR #5438: Update package deps for doc building.
PR #5440: Bump deprecation notices.
PR #5443: Fix #5196: Docs: assert in CUDA only enabled for debug
PR #5450: Docs: remove numbers and add titles to TOC
PR #5453: fix building docs on windows
PR #5489: docs: fix rendering of nested bulleted list

CI updates:

PR #5314: Update the image used in Azure CI for OSX.
PR #5360: Remove Travis CI badge.

Authors:

Aaron Meurer
Andreas Sodeur
Antonio Russo
Arik Funke
Eric Wieser
Ethan Pronovost
Gabriel Majeri
Graham Markall
Greg Jennings
Guilherme Leobas
hdf
Henry Schreiner
Ivan Butygin
Jacques Gaudin
Jim Pivarski
John Kirkham
Leo Fang
Lucio Fernandez-Arjona
Mads R. B. Kristensen
Marcin Tolysz
Mike Williams
Pearu Peterson
Ravi Teja Gutta
Rohit Sanjay
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
Valentin Haenel (core dev)
Vladimir Lukyanov

Version 0.48.0 (Jan 27, 2020)¶

This release is particularly small as it was present to catch anything that missed the 0.47.0 deadline (the deadline deliberately coincided with the end of support for Python 2.7). The next release will be considerably larger.

The core changes in this release are dominated by the start of the clean up needed for the end of Python 2.7 support, improvements to the CUDA target and support for numerous additional unicode string methods.

Enhancements from user contributed PRs (with thanks!):

Brian Wignall fixed more spelling typos in #4998.
Denis Smirnov added support for string methods capitalize (#4823), casefold (#4824), swapcase (#4825), rsplit (#4834), partition (#4845) and splitlines (#4849).
Elena Totmenina extended support for string methods startswith (#4867) and added endswith (#4868).
Eric Wieser made type_callable return the decorated function itself in #4760
Ethan Pronovost added support for np.argwhere in #4617
Graham Markall contributed a large number of CUDA enhancements and fixes, namely:
- #5068: Remove Python 3.4 backports from utils
- #4975: Make device_array_like create contiguous arrays (Fixes #4832)
- #5023: Don’t launch ForAll kernels with 0 elements (Fixes #5017)
- #5016: Fix various issues in CUDA library search (Fixes #4979)
- #5014: Enable use of records and bools for shared memory, remove ddt, add additional transpose tests
- #4964: Fix #4628: Add more appropriate typing for CUDA device arrays
- #5007: test_consuming_strides: Keep dev array alive
- #4997: State that CUDA Toolkit 8.0 required in docs
James Bourbeau added the Python 3.8 classifier to setup.py in #5027.
John Kirkham added a clarification to the __cuda_array_interface__ documentation in #5049.
Leo Fang Fixed an indexing problem in dummyarray in #5012.
Marcel Bargull fixed a build and test issue for Python 3.8 in #5029.
Maria Rubtsov added support for string methods isdecimal (#4842), isdigit (#4843), isnumeric (#4844) and replace (#4865).

General Enhancements:

PR #4760: Make type_callable return the decorated function
PR #5010: merge string prs

This merge PR included the following:
- PR #4823: Implement str.capitalize() based on CPython
- PR #4824: Implement str.casefold() based on CPython
- PR #4825: Implement str.swapcase() based on CPython
- PR #4834: Implement str.rsplit() based on CPython
- PR #4842: Implement str.isdecimal
- PR #4843: Implement str.isdigit
- PR #4844: Implement str.isnumeric
- PR #4845: Implement str.partition() based on CPython
- PR #4849: Implement str.splitlines() based on CPython
- PR #4865: Implement str.replace
- PR #4867: Functionality extension str.startswith() based on CPython
- PR #4868: Add functionality for str.endswith()
PR #5039: Disable help messages.
PR #4617: Add coverage for np.argwhere

Fixes:

PR #4724: Only use lives (and not aliases) to create post parfor live set.
PR #4998: Fix more spelling typos
PR #5024: Propagate semantic constants ahead of static rewrites.
PR #5027: Add Python 3.8 classifier to setup.py
PR #5046: Update setup.py and buildscripts for dependency requirements
PR #5053: Convert from arrays to names in define() and don’t invalidate for multiple consistent defines.
PR #5058: Permit mixed int types in wrap_index
PR #5078: Catch the use of global typed-list in JITed functions
PR #5092: Fix #5087, bug in bytecode analysis.

CUDA Enhancements/Fixes:

PR #4964: Fix #4628: Add more appropriate typing for CUDA device arrays
PR #4975: Make device_array_like create contiguous arrays (Fixes #4832)
PR #4997: State that CUDA Toolkit 8.0 required in docs
PR #5007: test_consuming_strides: Keep dev array alive
PR #5012: Fix IndexError when accessing the “-1” element of dummyarray
PR #5014: Enable use of records and bools for shared memory, remove ddt, add additional transpose tests
PR #5016: Fix various issues in CUDA library search (Fixes #4979)
PR #5023: Don’t launch ForAll kernels with 0 elements (Fixes #5017)
PR #5068: Remove Python 3.4 backports from utils

Documentation Updates:

PR #5049: Clarify what dictionary means
PR #5062: Update docs for updated version requirements
PR #5090: Update deprecation notices for 0.48.0

CI updates:

PR #5029: Install optional dependencies for Python 3.8 tests
PR #5040: Drop Py2.7 and Py3.5 from public CI
PR #5048: Fix CI py38

Authors:

Brian Wignall
Denis Smirnov
Elena Totmenina
Eric Wieser
Ethan Pronovost
Graham Markall
James Bourbeau
John Kirkham
Leo Fang
Marcel Bargull
Maria Rubtsov
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
Valentin Haenel (core dev)

Version 0.47.0 (Jan 2, 2020)¶

This release expands the capability of Numba in a number of important areas and is also significant as it is the last major point release with support for Python 2 and Python 3.5 included. The next release (0.48.0) will be for Python 3.6+ only! (This follows NumPy’s deprecation schedule as specified in NEP 29.)

Highlights of core feature changes include:

Full support for Python 3.8 (Siu Kwan Lam)
Opt-in bounds checking (Aaron Meurer)
Support for map, filter and reduce (Stuart Archibald)

Intel also kindly sponsored research and development that lead to some exciting new features:

Initial support for basic try/except use (Siu Kwan Lam)
The ability to pass functions created from closures/lambdas as arguments (Stuart Archibald)
sorted and list.sort() now accept the key argument (Stuart Archibald and Siu Kwan Lam)
A new compiler pass triggered through the use of the function numba.literal_unroll which permits iteration over heterogeneous tuples and constant lists of constants. (Stuart Archibald)

Enhancements from user contributed PRs (with thanks!):

Ankit Mahato added a reference to a new talk on Numba at PyCon India 2019 in #4862
Brian Wignall kindly fixed some spelling mistakes and typos in #4909
Denis Smirnov wrote numerous methods to considerable enhance string support including:
- str.rindex() in #4861
- str.isprintable() in #4836
- str.index() in #4860
- start/end parameters for str.find() in #4866
- str.isspace() in #4835
- str.isidentifier() #4837
- str.rpartition() in #4841
- str.lower() and str.islower() in #4651
Elena Totmenina implemented both str.isalnum(), str.isalpha() and str.isascii in #4839, #4840 and #4847 respectively.
Eric Larson fixed a bug in literal comparison in #4710
Ethan Pronovost updated the np.arange implementation in #4770 to allow the use of the dtype key word argument and also added bool implementations for several types in #4715.
Graham Markall fixed some issues with the CUDA target, namely:
- #4931: Added physical limits for CC 7.0 / 7.5 to CUDA autotune
- #4934: Fixed bugs in TestCudaWarpOperations
- #4938: Improved errors / warnings for the CUDA vectorize decorator
Guilherme Leobas fixed a typo in the urem implementation in #4667
Isaac Virshup contributed a number of patches that fixed bugs, added support for more NumPy functions and enhanced Python feature support. These contributions included:
- #4729: Allow array construction with mixed type shape tuples
- #4904: Implementing np.lcm
- #4780: Implement np.gcd and math.gcd
- #4779: Make slice constructor more similar to python.
- #4707: Added support for slice.indices
- #4578: Clarify numba ufunc supported features
James Bourbeau fixed some issues with tooling, #4794 add setuptools as a dependency and #4501 add pre-commit hooks for flake8 compliance.
Leo Fang made numba.dummyarray.Array iterable in #4629
Marc Garcia fixed the numba.jit parameter name signature_or_function in #4703
Marcelo Duarte Trevisani patched the llvmlite requirement to >=0.30.0 in #4725
Matt Cooper fixed a long standing CI problem in #4737 by remove maxParallel
Matti Picus fixed an issue with collections.abc in #4734 from Azure Pipelines.
Rob Ennis patched a bug in np.interp float32 handling in #4911
VDimir fixed a bug in array transposition layouts in #4777 and re-enabled and fixed some idle tests in #4776.
Vyacheslav Smirnov Enable support for str.istitle()` in #4645

General Enhancements:

PR #4432: Bounds checking
PR #4501: Add pre-commit hooks
PR #4536: Handle kw args in inliner when callee is a function
PR #4599: Permits closures to become functions, enables map(), filter()
PR #4611: Implement method title() for unicode based on Cpython
PR #4645: Enable support for istitle() method for unicode string
PR #4651: Implement str.lower() and str.islower()
PR #4652: Implement str.rfind()
PR #4695: Refactor overload* and support jit_options and inline
PR #4707: Added support for slice.indices
PR #4715: Add bool overload for several types
PR #4729: Allow array construction with mixed type shape tuples
PR #4755: Python3.8 support
PR #4756: Add parfor support for ndarray.fill.
PR #4768: Update typeconv error message to ask for sys.executable.
PR #4770: Update np.arange implementation with @overload
PR #4779: Make slice constructor more similar to python.
PR #4780: Implement np.gcd and math.gcd
PR #4794: Add setuptools as a dependency
PR #4802: put git hash into build string
PR #4803: Better compiler error messages for improperly used reduction variables.
PR #4817: Typed list implement and expose allocation
PR #4818: Typed list faster copy
PR #4835: Implement str.isspace() based on CPython
PR #4836: Implement str.isprintable() based on CPython
PR #4837: Implement str.isidentifier() based on CPython
PR #4839: Implement str.isalnum() based on CPython
PR #4840: Implement str.isalpha() based on CPython
PR #4841: Implement str.rpartition() based on CPython
PR #4847: Implement str.isascii() based on CPython
PR #4851: Add graphviz output for FunctionIR
PR #4854: Python3.8 looplifting
PR #4858: Implement str.expandtabs() based on CPython
PR #4860: Implement str.index() based on CPython
PR #4861: Implement str.rindex() based on CPython
PR #4866: Support params start/end for str.find()
PR #4874: Bump to llvmlite 0.31
PR #4896: Specialise arange dtype on arch + python version.
PR #4902: basic support for try except
PR #4904: Implement np.lcm
PR #4910: loop canonicalisation and type aware tuple unroller/loop body versioning passes
PR #4961: Update hash(tuple) for Python 3.8.
PR #4977: Implement sort/sorted with key.
PR #4987: Add is_internal property to all Type classes.

Fixes:

PR #4090: Update to LLVM8 memset/memcpy intrinsic
PR #4582: Convert sub to add and div to mul when doing the reduction across the per-thread reduction array.
PR #4648: Handle 0 correctly as slice parameter.
PR #4660: Remove multiply defined variables from all blocks’ equivalence sets.
PR #4672: Fix pickling of dufunc
PR #4710: BUG: Comparison for literal
PR #4718: Change get_call_table to support intermediate Vars.
PR #4725: Requires llvmlite >=0.30.0
PR #4734: prefer to import from collections.abc
PR #4736: fix flake8 errors
PR #4776: Fix and enable idle tests from test_array_manipulation
PR #4777: Fix transpose output array layout
PR #4782: Fix issue with SVML (and knock-on function resolution effects).
PR #4785: Treat 0d arrays like scalars.
PR #4787: fix missing incref on flags
PR #4789: fix typos in numba/targets/base.py
PR #4791: fix typos
PR #4811: fix spelling in now-failing tests
PR #4852: windowing test should check equality only up to double precision errors
PR #4881: fix refining list by using extend on an iterator
PR #4882: Fix return type in arange and zero step size handling.
PR #4885: suppress spurious RuntimeWarning about ufunc sizes
PR #4891: skip the xfail test for now. Py3.8 CFG refactor seems to have changed the test case
PR #4892: regex needs to accept singular form of “argument”
PR #4901: fix typed list equals
PR #4909: Fix some spelling typos
PR #4911: np.interp bugfix for float32 handling
PR #4920: fix creating list with JIT disabled
PR #4921: fix creating dict with JIT disabled
PR #4935: Better handling of prange with multiple reductions on the same variable.
PR #4946: Improve the error message for raise <string>.
PR #4955: Move overload of literal_unroll to avoid circular dependency that breaks Python 2.7
PR #4962: Fix test error on windows
PR #4973: Fixes a bug in the relabelling logic in literal_unroll.
PR #4978: Fix overload_method problem with stararg
PR #4981: Add ind_to_const to enable fewer equivalence classes.
PR #4991: Continuation of #4588 (Let dead code removal handle removing more of the unneeded code after prange conversion to parfor)
PR #4994: Remove xfail for test which has since had underlying issue fixed.
PR #5018: Fix #5011.
PR #5019: skip pycc test on Python 3.8 + macOS because of distutils issue

CUDA Enhancements/Fixes:

PR #4629: Make numba.dummyarray.Array iterable
PR #4675: Bump cuda array interface to version 2
PR #4741: Update choosing the “CUDA_PATH” for windows
PR #4838: Permit ravel(‘A’) for contig device arrays in CUDA target
PR #4931: Add physical limits for CC 7.0 / 7.5 to autotune
PR #4934: Fix fails in TestCudaWarpOperations
PR #4938: Improve errors / warnings for cuda vectorize decorator

Documentation Updates:

PR #4418: Directed graph task roadmap
PR #4578: Clarify numba ufunc supported features
PR #4655: fix sphinx build warning
PR #4667: Fix typo on urem implementation
PR #4669: Add link to ParallelAccelerator paper.
PR #4703: Fix numba.jit parameter name signature_or_function
PR #4862: Addition of PyCon India 2019 talk on Numba
PR #4947: Document jitclass with numba.typed use.
PR #4958: Add docs for try..except
PR #4993: Update deprecations for 0.47

CI Updates:

PR #4737: remove maxParallel from Azure Pipelines
PR #4767: pin to 2.7.16 for py27 on osx
PR #4781: WIP/runtest cf pytest

Authors:

Aaron Meurer
Ankit Mahato
Brian Wignall
Denis Smirnov
Ehsan Totoni (core dev)
Elena Totmenina
Eric Larson
Ethan Pronovost
Giovanni Cavallin
Graham Markall
Guilherme Leobas
Isaac Virshup
James Bourbeau
Leo Fang
Marc Garcia
Marcelo Duarte Trevisani
Matt Cooper
Matti Picus
Rob Ennis
Rujal Desai
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
VDimir
Valentin Haenel (core dev)
Vyacheslav Smirnov

Version 0.46.0¶

This release significantly reworked one of the main parts of Numba, the compiler pipeline, to make it more extensible and easier to use. The purpose of this was to continue enhancing Numba’s ability for use as a compiler toolkit. In a similar vein, Numba now has an extension registration mechanism to allow other Numba-using projects to automatically have their Numba JIT compilable functions discovered. There were also a number of other related compiler toolkit enhancement added along with some more NumPy features and a lot of bug fixes.

This release has updated the CUDA Array Interface specification to version 2, which clarifies the strides attribute for C-contiguous arrays and specifies the treatment for zero-size arrays. The implementation in Numba has been changed and may affect downstream packages relying on the old behavior (see issue #4661).

Enhancements from user contributed PRs (with thanks!):

Aaron Meurer fixed some Python issues in the code base in #4345 and #4341.
Ashwin Srinath fixed a CUDA performance bug via #4576.
Ethan Pronovost added support for triangular indices functions in #4601 (the NumPy functions tril_indices, tril_indices_from, triu_indices, and triu_indices_from).
Gerald Dalley fixed a tear down race occurring in Python 2.
Gregory R. Lee fixed the use of deprecated inspect.getargspec.
Guilherme Leobas contributed five PRs, adding support for np.append and np.count_nonzero in #4518 and #4386. The typed List was fixed to accept unsigned integers in #4510. #4463 made a fix to NamedTuple internals and #4397 updated the docs for np.sum.
James Bourbeau added a new feature to permit the automatic application of the jit decorator to a whole module in #4331. Also some small fixes to the docs and the code base were made in #4447 and #4433, and a fix to inplace array operation in #4228.
Jim Crist fixed a bug in the rendering of patched errors in #4464.
Leo Fang updated the CUDA Array Interface contract in #4609.
Pearu Peterson added support for Unicode based NumPy arrays in #4425.
Peter Andreas Entschev fixed a CUDA concurrency bug in #4581.
Lucio Fernandez-Arjona extended Numba’s np.sum support to now accept the dtype kwarg in #4472.
Pedro A. Morales Maries added support for np.cross in #4128 and also added the necessary extension numba.numpy_extensions.cross2d in #4595.
David Hoese, Eric Firing, Joshua Adelman, and Juan Nunez-Iglesias all made documentation fixes in #4565, #4482, #4455, #4375 respectively.
Vyacheslav Smirnov and Rujal Desai enabled support for count() on unicode strings in #4606.

General Enhancements:

PR #4113: Add rewrite for semantic constants.
PR #4128: Add np.cross support
PR #4162: Make IR comparable and legalize it.
PR #4208: R&D inlining, jitted and overloaded.
PR #4331: Automatic JIT of called functions
PR #4353: Inspection tool to check what numba supports
PR #4386: Implement np.count_nonzero
PR #4425: Unicode array support
PR #4427: Entrypoints for numba extensions
PR #4467: Literal dispatch
PR #4472: Allow dtype input argument in np.sum
PR #4513: New compiler.
PR #4518: add support for np.append
PR #4554: Refactor NRT C-API
PR #4556: 0.46 scheduled deprecations
PR #4567: Add env var to disable performance warnings.
PR #4568: add np.array_equal support
PR #4595: Implement numba.cross2d
PR #4601: Add triangular indices functions
PR #4606: Enable support for count() method for unicode string

Fixes:

PR #4228: Fix inplace operator error for arrays
PR #4282: Detect and raise unsupported on generator expressions
PR #4305: Don’t allow the allocation of mutable objects written into a container to be hoisted.
PR #4311: Avoid deprecated use of inspect.getargspec
PR #4328: Replace GC macro with function call
PR #4330: Loosen up typed container casting checks
PR #4341: Fix some coding lines at the top of some files (utf8 -> utf-8)
PR #4345: Replace “import *” with explicit imports in numba/types
PR #4346: Fix incorrect alg in isupper for ascii strings.
PR #4349: test using jitclass in typed-list
PR #4361: Add allocation hoisting info to LICM section at diagnostic L4
PR #4366: Offset search box to avoid wrapping on some pages with Safari. Fixes #4365.
PR #4372: Replace all “except BaseException” with “except Exception”.
PR #4407: Restore the “free” conda channel for NumPy 1.10 support.
PR #4408: Add lowering for constant bytes.
PR #4409: Add exception chaining for better error context
PR #4411: Name of type should not contain user facing description for debug.
PR #4412: Fix #4387. Limit the number of return types for recursive functions
PR #4426: Fixed two module teardown races in py2.
PR #4431: Fix and test numpy.random.random_sample(n) for np117
PR #4463: NamedTuple - Raises an error on non-iterable elements
PR #4464: Add a newline in patched errors
PR #4474: Fix liveness for remove dead of parfors (and other IR extensions)
PR #4510: Make List.__getitem__ accept unsigned parameters
PR #4512: Raise specific error at typing time for iteration on >1D array.
PR #4532: Fix static_getitem with Literal type as index
PR #4547: Update to inliner cost model information.
PR #4557: Use specific random number seed when generating arbitrary test data
PR #4559: Adjust test timeouts
PR #4564: Skip unicode array tests on ppc64le that trigger an LLVM bug
PR #4621: Fix packaging issue due to missing numba/cext
PR #4623: Fix issue 4520 due to storage model mismatch
PR #4644: Updates for llvmlite 0.30.0

CUDA Enhancements/Fixes:

PR #4410: Fix #4111. cudasim mishandling recarray
PR #4576: Replace use of np.prod with functools.reduce for computing size from shape
PR #4581: Prevent taking the GIL in ForAll
PR #4592: Fix #4589. Just pass NULL for b2d_func for constant dynamic sharedmem
PR #4609: Update CUDA Array Interface & Enforce Numba compliance
PR #4619: Implement math.{degrees, radians} for the CUDA target.
PR #4675: Bump cuda array interface to version 2

Documentation Updates:

PR #4317: Add docs for ARMv8/AArch64
PR #4318: Add supported platforms to the docs. Closes #4316
PR #4375: Add docstrings to inspect methods
PR #4388: Update Python 2.7 EOL statement
PR #4397: Add note about np.sum
PR #4447: Minor parallel performance tips edits
PR #4455: Clarify docs for typed dict with regard to arrays
PR #4482: Fix example in guvectorize docstring.
PR #4541: fix two typos in architecture.rst
PR #4548: Document numba.extending.intrinsic and inlining.
PR #4565: Fix typo in jit-compilation docs
PR #4607: add dependency list to docs
PR #4614: Add documentation for implementing new compiler passes.

CI Updates:

PR #4415: Make 32bit incremental builds on linux not use free channel
PR #4433: Removes stale azure comment
PR #4493: Fix Overload Inliner wrt CUDA Intrinsics
PR #4593: Enable Azure CI batching

Contributors:

Aaron Meurer
Ashwin Srinath
David Hoese
Ehsan Totoni (core dev)
Eric Firing
Ethan Pronovost
Gerald Dalley
Gregory R. Lee
Guilherme Leobas
James Bourbeau
Jim Crist
Joshua Adelman
Juan Nunez-Iglesias
Leo Fang
Lucio Fernandez-Arjona
Pearu Peterson
Pedro A. Morales Marie
Peter Andreas Entschev
Rujal Desai
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
Valentin Haenel (core dev)
Vyacheslav Smirnov

Version 0.45.1¶

This patch release addresses some regressions reported in the 0.45.0 release and adds support for NumPy 1.17:

PR #4325: accept scalar/0d-arrays
PR #4338: Fix #4299. Parfors reduction vars not deleted.
PR #4350: Use process level locks for fork() only.
PR #4354: Try to fix #4352.
PR #4357: Fix np1.17 isnan, isinf, isfinite ufuncs
PR #4363: Fix np.interp for np1.17 nan handling
PR #4371: Fix nump1.17 random function non-aliasing

Contributors:

Siu Kwan Lam (core dev)
Stuart Archibald (core dev)
Valentin Haenel (core dev)

Version 0.45.0¶

In this release, Numba gained an experimental numba.typed.List container as a future replacement of the reflected list. In addition, functions decorated with parallel=True can now be cached to reduce compilation overhead associated with the auto-parallelization.

Enhancements from user contributed PRs (with thanks!):

James Bourbeau added the Numba version to reportable error messages in #4227, added the signature parameter to inspect_types in #4200, improved the docstring of normalize_signature in #4205, and fixed #3658 by adding reference counting to register_dispatcher in #4254
Guilherme Leobas implemented the dominator tree and dominance frontier algorithms in #4216 and #4149, respectively.
Nick White fixed the issue with round in the CUDA target in #4137.
Joshua Adelman added support for determining if a value is in a range (i.e. x in range(...)) in #4129, and added windowing functions (np.bartlett, np.hamming, np.blackman, np.hanning, np.kaiser) from NumPy in #4076.
Lucio Fernandez-Arjona added support for np.select in #4077
Rob Ennis added support for np.flatnonzero in #4157
Keith Kraus extended the __cuda_array_interface__ with an optional mask attribute in #4199.
Gregory R. Lee replaced deprecated use of inspect.getargspec in #4311.

General Enhancements:

PR #4328: Replace GC macro with function call
PR #4311: Avoid deprecated use of inspect.getargspec
PR #4296: Slacken window function testing tol on ppc64le
PR #4254: Add reference counting to register_dispatcher
PR #4239: Support len() of multi-dim arrays in array analysis
PR #4234: Raise informative error for np.kron array order
PR #4232: Add unicodetype db, low level str functions and examples.
PR #4229: Make hashing cacheable
PR #4227: Include numba version in reportable error message
PR #4216: Add dominator tree
PR #4200: Add signature parameter to inspect_types
PR #4196: Catch missing imports of internal functions.
PR #4180: Update use of unlowerable global message.
PR #4166: Add tests for PR #4149
PR #4157: Support for np.flatnonzero
PR #4149: Implement dominance frontier for SSA for the Numba IR
PR #4148: Call branch pruning in inline_closure_call()
PR #4132: Reduce usage of inttoptr
PR #4129: Support contains for range
PR #4112: better error messages for np.transpose and tuples
PR #4110: Add range attrs, start, stop, step
PR #4077: Add np select
PR #4076: Add numpy windowing functions support (np.bartlett, np.hamming, np.blackman, np.hanning, np.kaiser)
PR #4095: Support ir.Global/FreeVar in find_const()
PR #3691: Make TypingError abort compiling earlier
PR #3646: Log internal errors encountered in typeinfer

Fixes:

PR #4303: Work around scipy bug 10206
PR #4302: Fix flake8 issue on master
PR #4301: Fix integer literal bug in np.select impl
PR #4291: Fix pickling of jitclass type
PR #4262: Resolves #4251 - Fix bug in reshape analysis.
PR #4233: Fixes issue revealed by #4215
PR #4224: Fix #4223. Looplifting error due to StaticSetItem in objectmode
PR #4222: Fix bad python path.
PR #4178: Fix unary operator overload, check with unicode impl
PR #4173: Fix return type in np.bincount with weights
PR #4153: Fix slice shape assignment in array analysis
PR #4152: fix status check in dict lookup
PR #4145: Use callable instead of checking __module__
PR #4118: Fix inline assembly support on CPU.
PR #4088: Resolves #4075 - parfors array_analysis bug.
PR #4085: Resolves #3314 - parfors array_analysis bug with reshape.

CUDA Enhancements/Fixes:

PR #4199: Extend __cuda_array_interface__ with optional mask attribute, bump version to 1
PR #4137: CUDA - Fix round Builtin
PR #4114: Support 3rd party activated CUDA context

Documentation Updates:

PR #4317: Add docs for ARMv8/AArch64
PR #4318: Add supported platforms to the docs. Closes #4316
PR #4295: Alter deprecation schedules
PR #4253: fix typo in pysupported docs
PR #4252: fix typo on repomap
PR #4241: remove unused import
PR #4240: fix typo in jitclass docs
PR #4205: Update return value order in normalize_signature docstring
PR #4237: Update doc links to point to latest not dev docs.
PR #4197: hyperlink repomap
PR #4170: Clarify docs on accumulating into arrays in prange
PR #4147: fix docstring for DictType iterables
PR #3951: A guide to overloading

CI Updates:

PR #4300: AArch64 has no faulthandler package
PR #4273: pin to MKL BLAS for testing to get consistent results
PR #4209: Revert previous network tol patch and try with conda config
PR #4138: Remove tbb before Azure test only on Python 3, since it was already removed for Python 2

Contributors:

Ehsan Totoni (core dev)
Gregory R. Lee
Guilherme Leobas
James Bourbeau
Joshua L. Adelman
Keith Kraus
Lucio Fernandez-Arjona
Nick White
Rob Ennis
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
Valentin Haenel (core dev)

Version 0.44.1¶

This patch release addresses some regressions reported in the 0.44.0 release:

PR #4165: Fix #4164 issue with NUMBAPRO_NVVM.
PR #4172: Abandon branch pruning if an arg name is redefined. (Fixes #4163)
PR #4183: Fix #4156. Problem with defining in-loop variables.

Version 0.44.0¶

IMPORTANT: In this release a few significant deprecations (and some less significant ones) are being made, users are encouraged to read the related documentation.

General enhancements in this release include:

Numba is backed by LLVM 8 on all platforms apart from ppc64le, which, due to bugs, remains on the LLVM 7.x series.
Numba’s dictionary support now includes type inference for keys and values.
The .view() method now works for NumPy scalar types.
Newly supported NumPy functions added: np.delete, np.nanquantile, np.quantile, np.repeat, np.shape.

In addition considerable effort has been made to fix some long standing bugs and a large number of other bugs, the “Fixes” section is very large this time!

Enhancements from user contributed PRs (with thanks!):

Max Bolingbroke added support for the selective use of fastmath flags in #3847.
Rob Ennis made min() and max() work on iterables in #3820 and added np.quantile and np.nanquantile in #3899.
Sergey Shalnov added numerous unicode string related features, zfill in #3978, ljust in #4001, rjust and center in #4044 and strip, lstrip and rstrip in #4048.
Guilherme Leobas added support for np.delete in #3890
Christoph Deil exposed the Numba CLI via python -m numba in #4066 and made numerous documentation fixes.
Leo Schwarz wrote the bulk of the code for jitclass default constructor arguments in #3852.
Nick White enhanced the CUDA backend to use min/max PTX instructions where possible in #4054.
Lucio Fernandez-Arjona implemented the unicode string __mul__ function in #3952.
Dimitri Vorona wrote the bulk of the code to implement getitem and setitem for jitclass in #3861.

General Enhancements:

PR #3820: Min max on iterables
PR #3842: Unicode type iteration
PR #3847: Allow fine-grained control of fastmath flags to partially address #2923
PR #3852: Continuation of PR #2894
PR #3861: Continuation of PR #3730
PR #3890: Add support for np.delete
PR #3899: Support for np.quantile and np.nanquantile
PR #3900: Fix 3457 :: Implements np.repeat
PR #3928: Add .view() method for NumPy scalars
PR #3939: Update icc_rt clone recipe.
PR #3952: __mul__ for strings, initial implementation and tests
PR #3956: Type-inferred dictionary
PR #3959: Create a view for string slicing to avoid extra allocations
PR #3978: zfill operation implementation
PR #4001: ljust operation implementation
PR #4010: Support dict() and {}
PR #4022: Support for llvm 8
PR #4034: Make type.Optional str more representative
PR #4041: Deprecation warnings
PR #4044: rjust and center operations implementation
PR #4048: strip, lstrip and rstrip operations implementation
PR #4066: Expose numba CLI via python -m numba
PR #4081: Impl np.shape and support function for asarray.
PR #4091: Deprecate the use of iternext_impl without RefType

CUDA Enhancements/Fixes:

PR #3933: Adds .nbytes property to CUDA device array objects.
PR #4011: Add .inspect_ptx() to cuda device function
PR #4054: CUDA: Use min/max PTX Instructions
PR #4096: Update env-vars for CUDA libraries lookup

Documentation Updates:

PR #3867: Code repository map
PR #3918: adding Joris’ Fosdem 2019 presentation
PR #3926: order talks on applications of Numba by date
PR #3943: fix two small typos in vectorize docs
PR #3944: Fixup jitclass docs
PR #3990: mention preprint repo in FAQ. Fixes #3981
PR #4012: Correct runtests command in contributing.rst
PR #4043: fix typo
PR #4047: Ambiguous Documentation fix for guvectorize.
PR #4060: Remove remaining mentions of autojit in docs
PR #4063: Fix annotate example in docstring
PR #4065: Add FAQ entry explaining Numba project name
PR #4079: Add Documentation for atomicity of typed.Dict
PR #4105: Remove info about CUDA ENVVAR potential replacement

Fixes:

PR #3719: Resolves issue #3528. Adds support for slices when not using parallel=True.
PR #3727: Remove dels for known dead vars.
PR #3845: Fix mutable flag transmission in .astype
PR #3853: Fix some minor issues in the C source.
PR #3862: Correct boolean reinterpretation of data
PR #3863: Comments out the appveyor badge
PR #3869: fixes flake8 after merge
PR #3871: Add assert to ir.py to help enforce correct structuring
PR #3881: fix preparfor dtype transform for datetime64
PR #3884: Prevent mutation of objmode fallback IR.
PR #3885: Updates for llvmlite 0.29
PR #3886: Use safe_load from pyyaml.
PR #3887: Add tolerance to network errors by permitting conda to retry
PR #3893: Fix casting in namedtuple ctor.
PR #3894: Fix array inliner for multiple array definition.
PR #3905: Cherrypick #3903 to main
PR #3920: Raise better error if unsupported jump opcode found.
PR #3927: Apply flake8 to the numpy related files
PR #3935: Silence DeprecationWarning
PR #3938: Better error message for unknown opcode
PR #3941: Fix typing of ufuncs in parfor conversion
PR #3946: Return variable renaming dict from inline_closurecall
PR #3962: Fix bug in alignment computation of Record.make_c_struct
PR #3967: Fix error with pickling unicode
PR #3964: Unicode split algo versioning
PR #3975: Add handler for unknown locale to numba -s
PR #3991: Permit Optionals in ufunc machinery
PR #3995: Remove assert in type inference causing poor error message.
PR #3996: add is_ascii flag to UnicodeType
PR #4009: Prevent zero division error in np.linalg.cond
PR #4014: Resolves #4007.
PR #4021: Add a more specific error message for invalid write to a global.
PR #4023: Fix handling of titles in record dtype
PR #4024: Do a check if a call is const before saying that an object is multiply defined.
PR #4027: Fix issue #4020. Turn off no_cpython_wrapper flag when compiling for…
PR #4033: [WIP] Fixing wrong dtype of array inside reflected list #4028
PR #4061: Change IPython cache dir name to numba_cache
PR #4067: Delete examples/notebooks/LinearRegr.py
PR #4070: Catch writes to global typed.Dict and raise.
PR #4078: Check tuple length
PR #4084: Fix missing incref on optional return None
PR #4089: Make the warnings fixer flush work for warning comparing on type.
PR #4094: Fix function definition finding logic for commented def
PR #4100: Fix alignment check on 32-bit.
PR #4104: Use PEP 508 compliant env markers for install deps

Contributors:

Benjamin Zaitlen
Christoph Deil
David Hirschfeld
Dimitri Vorona
Ehsan Totoni (core dev)
Guilherme Leobas
Leo Schwarz
Lucio Fernandez-Arjona
Max Bolingbroke
NanduTej
Nick White
Ravi Teja Gutta
Rob Ennis
Sergey Shalnov
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
Valentin Haenel (core dev)

Version 0.43.1¶

This is a bugfix release that provides minor changes to fix: a bug in branch pruning, bugs in np.interp functionality, and also fully accommodate the NumPy 1.16 release series.

PR #3826: NumPy 1.16 support
PR #3850: Refactor np.interp
PR #3883: Rewrite pruned conditionals as their evaluated constants.

Contributors:

Rob Ennis
Siu Kwan Lam (core dev)
Stuart Archibald (core dev)

Version 0.43.0¶

In this release, the major new features are:

Initial support for statically typed dictionaries
Improvements to hash() to match Python 3 behavior
Support for the heapq module
Ability to pass C structs to Numba
More NumPy functions: asarray, trapz, roll, ptp, extract

NOTE:

The vast majority of NumPy 1.16 behaviour is supported, however datetime and timedelta use involving NaT matches the behaviour present in earlier release. The ufunc suite has not been extending to accommodate the two new time computation related additions present in NumPy 1.16. In addition the functions ediff1d and interp have known minor issues in replicating outputs exactly when NaN’s occur in certain input patterns.

General Enhancements:

PR #3563: Support for np.roll
PR #3572: Support for np.ptp
PR #3592: Add dead branch prune before type inference.
PR #3598: Implement np.asarray()
PR #3604: Support for np.interp
PR #3607: Some simplication to lowering
PR #3612: Exact match flag in dispatcher
PR #3627: Support for np.trapz
PR #3630: np.where with broadcasting
PR #3633: Support for np.extract
PR #3657: np.max, np.min, np.nanmax, np.nanmin - support for complex dtypes
PR #3661: Access C Struct as Numpy Structured Array
PR #3678: Support for str.split and str.join
PR #3684: Support C array in C struct
PR #3696: Add intrinsic to help debug refcount
PR #3703: Implementations of type hashing.
PR #3715: Port CPython3.7 dictionary for numba internal use
PR #3716: Support inplace concat of strings
PR #3718: Add location to ConstantInferenceError exceptions.
PR #3720: improve error msg about invalid signature
PR #3731: Support for heapq
PR #3754: Updates for llvmlite 0.28
PR #3760: Overloadable operator.setitem
PR #3775: Support overloading operator.delitem
PR #3777: Implement compiler support for dictionary
PR #3791: Implement interpreter-side interface for numba dict
PR #3799: Support refcount’ed types in numba dict

CUDA Enhancements/Fixes:

PR #3713: Fix the NvvmSupportError message when CC too low
PR #3722: Fix #3705: slicing error with negative strides
PR #3755: Make cuda.to_device accept readonly host array
PR #3773: Adapt library search to accommodate multiple locations

Documentation Updates:

PR #3651: fix link to berryconda in docs
PR #3668: Add Azure Pipelines build badge
PR #3749: DOC: Clarify when prange is different from range
PR #3771: fix a few typos
PR #3785: Clarify use of range as function only.
PR #3829: Add docs for typed-dict

Fixes:

PR #3614: Resolve #3586
PR #3618: Skip gdb tests on ARM.
PR #3643: Remove support_literals usage
PR #3645: Enforce and fix that AbstractTemplate.generic must be returning a Signature
PR #3648: Fail on @overload signature mismatch.
PR #3660: Added Ignore message to test numba.tests.test_lists.TestLists.test_mul_error
PR #3662: Replace six with numba.six
PR #3663: Removes coverage computation from travisci builds
PR #3672: Avoid leaking memory when iterating over uniform tuple
PR #3676: Fixes constant string lowering inside tuples
PR #3677: Ensure all referenced compiled functions are linked properly
PR #3692: Fix test failure due to overly strict test on floating point values.
PR #3693: Intercept failed import to help users.
PR #3694: Fix memory leak in enumerate iterator
PR #3695: Convert return of None from intrinsic implementation to dummy value
PR #3697: Fix for issue #3687
PR #3701: Fix array.T analysis (fixes #3700)
PR #3704: Fixes for overload_method
PR #3706: Don’t push call vars recursively into nested parfors. Resolves #3686.
PR #3710: Set as non-hoistable if a mutable variable is passed to a function in a loop. Resolves #3699.
PR #3712: parallel=True to use better builtin mechanism to resolve call types. Resolves issue #3671
PR #3725: Fix invalid removal of dead empty list
PR #3740: add uintp as a valid type to the tuple operator.getitem
PR #3758: Fix target definition update in inlining
PR #3782: Raise typing error on yield optional.
PR #3792: Fix non-module object used as the module of a function.
PR #3800: Bugfix for np.interp
PR #3808: Bump macro to include VS2014 to fix py3.5 build
PR #3809: Add debug guard to debug only C function.
PR #3816: Fix array.sum(axis) 1d input return type.
PR #3821: Replace PySys_WriteStdout with PySys_FormatStdout to ensure no truncation.
PR #3830: Getitem should not return optional type
PR #3832: Handle single string as path in find_file()

Contributors:

Ehsan Totoni
Gryllos Prokopis
Jonathan J. Helmus
Kayla Ngan
lalitparate
luk-f-a
Matyt
Max Bolingbroke
Michael Seifert
Rob Ennis
Siu Kwan Lam
Stan Seibert
Stuart Archibald
Todd A. Anderson
Tao He
Valentin Haenel

Version 0.42.1¶

Bugfix release to fix the incorrect hash in OSX wheel packages. No change in source code.

Version 0.42.0¶

In this release the major features are:

The capability to launch and attach the GDB debugger from within a jitted function.
The upgrading of LLVM to version 7.0.0.

We added a draft of the project roadmap to the developer manual. The roadmap is for informational purposes only as priorities and resources may change.

Here are some enhancements from contributed PRs:

#3532. Daniel Wennberg improved the cuda.{pinned, mapped} API so that the associated memory is released immediately at the exit of the context manager.
#3531. Dimitri Vorona enabled the inlining of jitclass methods.
#3516. Simon Perkins added the support for passing numpy dtypes (i.e. np.dtype("int32")) and their type constructor (i.e. np.int32) into a jitted function.
#3509. Rob Ennis added support for np.corrcoef.

A regression issue (#3554, #3461) relating to making an empty slice in parallel mode is resolved by #3558.

General Enhancements:

PR #3392: Launch and attach gdb directly from Numba.
PR #3437: Changes to accommodate LLVM 7.0.x
PR #3509: Support for np.corrcoef
PR #3516: Typeof dtype values
PR #3520: Fix @stencil ignoring cval if out kwarg supplied.
PR #3531: Fix jitclass method inlining and avoid unnecessary increfs
PR #3538: Avoid future C-level assertion error due to invalid visibility
PR #3543: Avoid implementation error being hidden by the try-except
PR #3544: Add long_running test flag and feature to exclude tests.
PR #3549: ParallelAccelerator caching improvements
PR #3558: Fixes array analysis for inplace binary operators.
PR #3566: Skip alignment tests on armv7l.
PR #3567: Fix unifying literal types in namedtuple
PR #3576: Add special copy routine for NumPy out arrays
PR #3577: Fix example and docs typos for objmode context manager. reorder statements.
PR #3580: Use alias information when determining whether it is safe to
PR #3583: Use ir.unknown_loc for unknown Loc, as #3390 with tests
PR #3587: Fix llvm.memset usage changes in llvm7
PR #3596: Fix Array Analysis for Global Namedtuples
PR #3597: Warn users if threading backend init unsafe.
PR #3605: Add guard for writing to read only arrays from ufunc calls
PR #3606: Improve the accuracy of error message wording for undefined type.
PR #3611: gdb test guard needs to ack ptrace permissions
PR #3616: Skip gdb tests on ARM.

CUDA Enhancements:

PR #3532: Unregister temporarily pinned host arrays at once
PR #3552: Handle broadcast arrays correctly in host->device transfer.
PR #3578: Align cuda and cuda simulator kwarg names.

Documentation Updates:

PR #3545: Fix @njit description in 5 min guide
PR #3570: Minor documentation fixes for numba.cuda
PR #3581: Fixing minor typo in reference/types.rst
PR #3594: Changing @stencil docs to correctly reflect func_or_mode param
PR #3617: Draft roadmap as of Dec 2018

Contributors:

Aaron Critchley
Daniel Wennberg
Dimitri Vorona
Dominik Stańczak
Ehsan Totoni (core dev)
Iskander Sharipov
Rob Ennis
Simon Muller
Simon Perkins
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)

Version 0.41.0¶

This release adds the following major features:

Diagnostics showing the optimizations done by ParallelAccelerator
Support for profiling Numba-compiled functions in Intel VTune
Additional NumPy functions: partition, nancumsum, nancumprod, ediff1d, cov, conj, conjugate, tri, tril, triu
Initial support for Python 3 Unicode strings

General Enhancements:

PR #1968: armv7 support
PR #2983: invert mapping b/w binop operators and the operator module #2297
PR #3160: First attempt at parallel diagnostics
PR #3307: Adding NUMBA_ENABLE_PROFILING envvar, enabling jit event
PR #3320: Support for np.partition
PR #3324: Support for np.nancumsum and np.nancumprod
PR #3325: Add location information to exceptions.
PR #3337: Support for np.ediff1d
PR #3345: Support for np.cov
PR #3348: Support user pipeline class in with lifting
PR #3363: string support
PR #3373: Improve error message for empty imprecise lists.
PR #3375: Enable overload(operator.getitem)
PR #3402: Support negative indexing in tuple.
PR #3414: Refactor Const type
PR #3416: Optimized usage of alloca out of the loop
PR #3424: Updates for llvmlite 0.26
PR #3462: Add support for np.conj/np.conjugate.
PR #3480: np.tri, np.tril, np.triu - default optional args
PR #3481: Permit dtype argument as sole kwarg in np.eye

CUDA Enhancements:

PR #3399: Add max_registers Option to cuda.jit

Continuous Integration / Testing:

PR #3303: CI with Azure Pipelines
PR #3309: Workaround race condition with apt
PR #3371: Fix issues with Azure Pipelines
PR #3362: Fix #3360: RuntimeWarning: ‘numba.runtests’ found in sys.modules
PR #3374: Disable openmp in wheel building
PR #3404: Azure Pipelines templates
PR #3419: Fix cuda tests and error reporting in test discovery
PR #3491: Prevent faulthandler installation on armv7l
PR #3493: Fix CUDA test that used negative indexing behaviour that’s fixed.
PR #3495: Start Flake8 checking of Numba source

Fixes:

PR #2950: Fix dispatcher to only consider contiguous-ness.
PR #3124: Fix 3119, raise for 0d arrays in reductions
PR #3228: Reduce redundant module linking
PR #3329: Fix AOT on windows.
PR #3335: Fix memory management of __cuda_array_interface__ views.
PR #3340: Fix typo in error name.
PR #3365: Fix the default unboxing logic
PR #3367: Allow non-global reference to objmode() context-manager
PR #3381: Fix global reference in objmode for dynamically created function
PR #3382: CUDA_ERROR_MISALIGNED_ADDRESS Using Multiple Const Arrays
PR #3384: Correctly handle very old versions of colorama
PR #3394: Add 32bit package guard for non-32bit installs
PR #3397: Fix with-objmode warning
PR #3403 Fix label offset in call inline after parfor pass
PR #3429: Fixes raising of user defined exceptions for exec(<string>).
PR #3432: Fix error due to function naming in CI in py2.7
PR #3444: Fixed TBB’s single thread execution and test added for #3440
PR #3449: Allow matching non-array objects in find_callname()
PR #3455: Change getiter and iternext to not be pure. Resolves #3425
PR #3467: Make ir.UndefinedType singleton class.
PR #3478: Fix np.random.shuffle sideeffect
PR #3487: Raise unsupported for kwargs given to print()
PR #3488: Remove dead script.
PR #3498: Fix stencil support for boolean as return type
PR #3511: Fix handling make_function literals (regression of #3414)
PR #3514: Add missing unicode != unicode
PR #3527: Fix complex math sqrt implementation for large -ve values
PR #3530: This adds arg an check for the pattern supplied to Parfors.
PR #3536: Sets list dtor linkage to linkonce_odr to fix visibility in AOT

Documentation Updates:

PR #3316: Update 0.40 changelog with additional PRs
PR #3318: Tweak spacing to avoid search box wrapping onto second line
PR #3321: Add note about memory leaks with exceptions to docs. Fixes #3263
PR #3322: Add FAQ on CUDA + fork issue. Fixes #3315.
PR #3343: Update docs for argsort, kind kwarg partially supported.
PR #3357: Added mention of njit in 5minguide.rst
PR #3434: Fix parallel reduction example in docs.
PR #3452: Fix broken link and mark up problem.
PR #3484: Size Numba logo in docs in em units. Fixes #3313
PR #3502: just two typos
PR #3506: Document string support
PR #3513: Documentation for parallel diagnostics.
PR #3526: Fix 5 min guide with respect to @njit decl

Contributors:

Alex Ford
Andreas Sodeur
Anton Malakhov
Daniel Stender
Ehsan Totoni (core dev)
Henry Schreiner
Marcel Bargull
Matt Cooper
Nick White
Nicolas Hug
rjenc29
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)

Version 0.40.1¶

This is a PyPI-only patch release to ensure that PyPI wheels can enable the TBB threading backend, and to disable the OpenMP backend in the wheels. Limitations of manylinux1 and variation in user environments can cause segfaults when OpenMP is enabled on wheel builds. Note that this release has no functional changes for users who obtained Numba 0.40.0 via conda.

Patches:

PR #3338: Accidentally left Anton off contributor list for 0.40.0
PR #3374: Disable OpenMP in wheel building
PR #3376: Update 0.40.1 changelog and docs on OpenMP backend

Version 0.40.0¶

This release adds a number of major features:

A new GPU backend: kernels for AMD GPUs can now be compiled using the ROCm driver on Linux.
The thread pool implementation used by Numba for automatic multithreading is configurable to use TBB, OpenMP, or the old “workqueue” implementation. (TBB is likely to become the preferred default in a future release.)
New documentation on thread and fork-safety with Numba, along with overall improvements in thread-safety.
Experimental support for executing a block of code inside a nopython mode function in object mode.
Parallel loops now allow arrays as reduction variables
CUDA improvements: FMA, faster float64 atomics on supporting hardware, records in const memory, and improved datatime dtype support
More NumPy functions: vander, tri, triu, tril, fill_diagonal

General Enhancements:

PR #3017: Add facility to support with-contexts
PR #3033: Add support for multidimensional CFFI arrays
PR #3122: Add inliner to object mode pipeline
PR #3127: Support for reductions on arrays.
PR #3145: Support for np.fill_diagonal
PR #3151: Keep a queue of references to last N deserialized functions. Fixes #3026
PR #3154: Support use of list() if typeable.
PR #3166: Objmode with-block
PR #3179: Updates for llvmlite 0.25
PR #3181: Support function extension in alias analysis
PR #3189: Support literal constants in typing of object methods
PR #3190: Support passing closures as literal values in typing
PR #3199: Support inferring stencil index as constant in simple unary expressions
PR #3202: Threading layer backend refactor/rewrite/reinvention!
PR #3209: Support for np.tri, np.tril and np.triu
PR #3211: Handle unpacking in building tuple (BUILD_TUPLE_UNPACK opcode)
PR #3212: Support for np.vander
PR #3227: Add NumPy 1.15 support
PR #3272: Add MemInfo_data to runtime._nrt_python.c_helpers
PR #3273: Refactor. Removing thread-local-storage based context nesting.
PR #3278: compiler threadsafety lockdown
PR #3291: Add CPU count and CFS restrictions info to numba -s.

CUDA Enhancements:

PR #3152: Use cuda driver api to get best blocksize for best occupancy
PR #3165: Add FMA intrinsic support
PR #3172: Use float64 add Atomics, Where Available
PR #3186: Support Records in CUDA Const Memory
PR #3191: CUDA: fix log size
PR #3198: Fix GPU datetime timedelta types usage
PR #3221: Support datetime/timedelta scalar argument to a CUDA kernel.
PR #3259: Add DeviceNDArray.view method to reinterpret data as a different type.
PR #3310: Fix IPC handling of sliced cuda array.

ROCm Enhancements:

PR #3023: Support for AMDGCN/ROCm.
PR #3108: Add ROC info to numba -s output.
PR #3176: Move ROC vectorize init to npyufunc
PR #3177: Add auto_synchronize support to ROC stream
PR #3178: Update ROC target documentation.
PR #3294: Add compiler lock to ROC compilation path.
PR #3280: Add wavebits property to the HSA Agent.
PR #3281: Fix ds_permute types and add tests

Continuous Integration / Testing:

PR #3091: Remove old recipes, switch to test config based on env var.
PR #3094: Add higher ULP tolerance for products in complex space.
PR #3096: Set exit on error in incremental scripts
PR #3109: Add skip to test needing jinja2 if no jinja2.
PR #3125: Skip cudasim only tests
PR #3126: add slack, drop flowdock
PR #3147: Improve error message for arg type unsupported during typing.
PR #3128: Fix recipe/build for jetson tx2/ARM
PR #3167: In build script activate env before installing.
PR #3180: Add skip to broken test.
PR #3216: Fix libcuda.so loading in some container setup
PR #3224: Switch to new Gitter notification webhook URL and encrypt it
PR #3235: Add 32bit Travis CI jobs
PR #3257: This adds scipy/ipython back into windows conda test phase.

Fixes:

PR #3038: Fix random integer generation to match results from NumPy.
PR #3045: Fix #3027 - Numba reassigns sys.stdout
PR #3059: Handler for known LoweringErrors.
PR #3060: Adjust attribute error for NumPy functions.
PR #3067: Abort simulator threads on exception in thread block.
PR #3079: Implement +/-(types.boolean) Fix #2624
PR #3080: Compute np.var and np.std correctly for complex types.
PR #3088: Fix #3066 (array.dtype.type in prange)
PR #3089: Fix invalid ParallelAccelerator hoisting issue.
PR #3136: Fix #3135 (lowering error)
PR #3137: Fix for issue3103 (race condition detection)
PR #3142: Fix Issue #3139 (parfors reuse of reduction variable across prange blocks)
PR #3148: Remove dead array equal @infer code
PR #3153: Fix canonicalize_array_math typing for calls with kw args
PR #3156: Fixes issue with missing pygments in testing and adds guards.
PR #3168: Py37 bytes output fix.
PR #3171: Fix #3146. Fix CFUNCTYPE void* return-type handling
PR #3193: Fix setitem/getitem resolvers
PR #3222: Fix #3214. Mishandling of POP_BLOCK in while True loop.
PR #3230: Fixes liveness analysis issue in looplifting
PR #3233: Fix return type difference for 32bit ctypes.c_void_p
PR #3234: Fix types and layout for np.where.
PR #3237: Fix DeprecationWarning about imp module
PR #3241: Fix #3225. Normalize 0nd array to scalar in typing of indexing code.
PR #3256: Fix #3251: Move imports of ABCs to collections.abc for Python >= 3.3
PR #3292: Fix issue3279.
PR #3302: Fix error due to mismatching dtype

Documentation Updates:

PR #3104: Workaround for #3098 (test_optional_unpack Heisenbug)
PR #3132: Adds an ~5 minute guide to Numba.
PR #3194: Fix docs RE: np.random generator fork/thread safety
PR #3242: Page with Numba talks and tutorial links
PR #3258: Allow users to choose the type of issue they are reporting.
PR #3260: Fixed broken link
PR #3266: Fix cuda pointer ownership problem with user/externally allocated pointer
PR #3269: Tweak typography with CSS
PR #3270: Update FAQ for functions passed as arguments
PR #3274: Update installation instructions
PR #3275: Note pyobject and voidptr are types in docs
PR #3288: Do not need to call parallel optimizations “experimental” anymore
PR #3318: Tweak spacing to avoid search box wrapping onto second line

Contributors:

Anton Malakhov
Alex Ford
Anthony Bisulco
Ehsan Totoni (core dev)
Leonard Lausen
Matthew Petroff
Nick White
Ray Donnelly
rjenc29
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Stuart Reynolds
Todd A. Anderson (core dev)

Version 0.39.0¶

Here are the highlights for the Numba 0.39.0 release.

This is the first version that supports Python 3.7.
With help from Intel, we have fixed the issues with SVML support (related issues #2938, #2998, #3006).
List has gained support for containing reference-counted types like NumPy arrays and list. Note, list still cannot hold heterogeneous types.
We have made a significant change to the internal calling-convention, which should be transparent to most users, to allow for a future feature that will permitting jumping back into python-mode from a nopython-mode function. This also fixes a limitation to print that disabled its use from nopython functions that were deep in the call-stack.
For CUDA GPU support, we added a __cuda_array_interface__ following the NumPy array interface specification to allow Numba to consume externally defined device arrays. We have opened a corresponding pull request to CuPy to test out the concept and be able to use a CuPy GPU array.
The Numba dispatcher inspect_types() method now supports the kwarg pretty which if set to True will produce ANSI/HTML output, showing the annotated types, when invoked from ipython/jupyter-notebook respectively.
The NumPy functions ndarray.dot, np.percentile and np.nanpercentile, and np.unique are now supported.
Numba now supports the use of a per-project configuration file to permanently set behaviours typically set via NUMBA_* family environment variables.
Support for the ppc64le architecture has been added.

Enhancements:

PR #2793: Simplify and remove javascript from html_annotate templates.
PR #2840: Support list of refcounted types
PR #2902: Support for np.unique
PR #2926: Enable fence for all architecture and add developer notes
PR #2928: Making error about untyped list more informative.
PR #2930: Add configuration file and color schemes.
PR #2932: Fix encoding to ‘UTF-8’ in check_output decode.
PR #2938: Python 3.7 compat: _Py_Finalizing becomes _Py_IsFinalizing()
PR #2939: Comprehensive SVML unit test
PR #2946: Add support for ndarray.dot method and tests.
PR #2953: percentile and nanpercentile
PR #2957: Add new 3.7 opcode support.
PR #2963: Improve alias analysis to be more comprehensive
PR #2984: Support for namedtuples in array analysis
PR #2986: Fix environment propagation
PR #2990: Improve function call matching for intrinsics
PR #3002: Second pass at error rewrites (interpreter errors).
PR #3004: Add numpy.empty to the list of pure functions.
PR #3008: Augment SVML detection with llvmlite SVML patch detection.
PR #3012: Make use of the common spelling of heterogeneous/homogeneous.
PR #3032: Fix pycc ctypes test due to mismatch in calling-convention
PR #3039: Add SVML detection to Numba environment diagnostic tool.
PR #3041: This adds @needs_blas to tests that use BLAS
PR #3056: Require llvmlite>=0.24.0

CUDA Enhancements:

PR #2860: __cuda_array_interface__
PR #2910: More CUDA intrinsics
PR #2929: Add Flag To Prevent Unneccessary D->H Copies
PR #3037: Add CUDA IPC support on non-peer-accessible devices

CI Enhancements:

PR #3021: Update appveyor config.
PR #3040: Add fault handler to all builds
PR #3042: Add catchsegv
PR #3077: Adds optional number of processes for -m in testing

Fixes:

PR #2897: Fix line position of delete statement in numba ir
PR #2905: Fix for #2862
PR #3009: Fix optional type returning in recursive call
PR #3019: workaround and unittest for issue #3016
PR #3035: [TESTING] Attempt delayed removal of Env
PR #3048: [WIP] Fix cuda tests failure on buildfarm
PR #3054: Make test work on 32-bit
PR #3062: Fix cuda.In freeing devary before the kernel launch
PR #3073: Workaround #3072
PR #3076: Avoid ignored exception due to missing globals at interpreter teardown

Documentation Updates:

PR #2966: Fix syntax in env var docs.
PR #2967: Fix typo in CUDA kernel layout example.
PR #2970: Fix docstring copy paste error.

Contributors:

The following people contributed to this release.

Anton Malakhov
Ehsan Totoni (core dev)
Julia Tatz
Matthias Bussonnier
Nick White
Ray Donnelly
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Todd A. Anderson (core dev)
Rik-de-Kort
rjenc29

Version 0.38.1¶

This is a critical bug fix release addressing: https://github.com/numba/numba/issues/3006

The bug does not impact users using conda packages from Anaconda or Intel Python Distribution (but it does impact conda-forge). It does not impact users of pip using wheels from PyPI.

This only impacts a small number of users where:

The ICC runtime (specifically libsvml) is present in the user’s environment.

The user is using an llvmlite statically linked against a version of LLVM that has not been patched with SVML support.

The platform is 64-bit.

The release fixes a code generation path that could lead to the production of incorrect results under the above situation.

Fixes:

PR #3007: Augment SVML detection with llvmlite SVML patch detection.

Contributors:

The following people contributed to this release.

Stuart Archibald (core dev)

Version 0.38.0¶

Following on from the bug fix focus of the last release, this release swings back towards the addition of new features and usability improvements based on community feedback. This release is comparatively large! Three key features/ changes to note are:

Numba (via llvmlite) is now backed by LLVM 6.0, general vectorization is improved as a result. A significant long standing LLVM bug that was causing corruption was also found and fixed.

Further considerable improvements in vectorization are made available as Numba now supports Intel’s short vector math library (SVML). Try it out with conda install -c numba icc_rt.

CUDA 8.0 is now the minimum supported CUDA version.

Other highlights include:

Bug fixes to parallel=True have enabled more vectorization opportunities when using the ParallelAccelerator technology.

Much effort has gone into improving error reporting and the general usability of Numba. This includes highlighted error messages and performance tips documentation. Try it out with conda install colorama.

A number of new NumPy functions are supported, np.convolve, np.correlate np.reshape, np.transpose, np.permutation, np.real, np.imag, and np.searchsorted now supports the`side` kwarg. Further, np.argsort now supports the kind kwarg with quicksort and mergesort available.

The Numba extension API has gained the ability operate more easily with functions from Cython modules through the use of numba.extending.get_cython_function_address to obtain function addresses for direct use in ctypes.CFUNCTYPE.

Numba now allows the passing of jitted functions (and containers of jitted functions) as arguments to other jitted functions.

The CUDA functionality has gained support for a larger selection of bit manipulation intrinsics, also SELP, and has had a number of bugs fixed.

Initial work to support the PPC64LE platform has been added, full support is however waiting on the LLVM 6.0.1 release as it contains critical patches not present in 6.0.0. It is hoped that any remaining issues will be fixed in the next release.

The capacity for advanced users/compiler engineers to define their own compilation pipelines.

Enhancements:

PR #2660: Support bools from cffi in nopython.
PR #2741: Enhance error message for undefined variables.
PR #2744: Add diagnostic error message to test suite discovery failure.
PR #2748: Added Intel SVML optimizations as opt-out choice working by default
PR #2762: Support transpose with axes arguments.
PR #2777: Add support for np.correlate and np.convolve
PR #2779: Implement np.random.permutation
PR #2801: Passing jitted functions as args
PR #2802: Support np.real() and np.imag()
PR #2807: Expose import_cython_function
PR #2821: Add kwarg ‘side’ to np.searchsorted
PR #2822: Adds stable argsort
PR #2832: Fixups for llvmlite 0.23/llvm 6
PR #2836: Support index method on tuples
PR #2839: Support for np.transpose and np.reshape.
PR #2843: Custom pipeline
PR #2847: Replace signed array access indices in unsiged prange loop body
PR #2859: Add support for improved error reporting.
PR #2880: This adds a github issue template.
PR #2881: Build recipe to clone Intel ICC runtime.
PR #2882: Update TravisCI to test SVML
PR #2893: Add reference to the data buffer in array.ctypes object
PR #2895: Move to CUDA 8.0

Fixes:

PR #2737: Fix #2007 (part 1). Empty array handling in np.linalg.
PR #2738: Fix install_requires to allow pip getting pre-release version
PR #2740: Fix 2208. Generate better error message.
PR #2765: Fix Bit-ness
PR #2780: PowerPC reference counting memory fences
PR #2805: Fix six imports.
PR #2813: Fix #2812: gufunc scalar output bug.
PR #2814: Fix the build post #2727
PR #2831: Attempt to fix #2473
PR #2842: Fix issue with test discovery and broken CUDA drivers.
PR #2850: Add rtsys init guard and test.
PR #2852: Skip vectorization test with targets that are not x86
PR #2856: Prevent printing to stdout in test_extending.py
PR #2864: Correct C code to prevent compiler warnings.
PR #2889: Attempt to fix #2386.
PR #2891: Removed test skipping for inspect_cfg
PR #2898: Add guard to parallel test on unsupported platforms
PR #2907: Update change log for PPC64LE LLVM dependency.
PR #2911: Move build requirement to llvmlite>=0.23.0dev0
PR #2912: Fix random permutation test.
PR #2914: Fix MD list syntax in issue template.

Documentation Updates:

PR #2739: Explicitly state default value of error_model in docstring
PR #2803: DOC: parallel vectorize requires signatures
PR #2829: Add Python 2.7 EOL plan to docs
PR #2838: Use automatic numbering syntax in list.
PR #2877: Add performance tips documentation.
PR #2883: Fix #2872: update rng doc about thread/fork-safety
PR #2908: Add missing link and ref to docs.
PR #2909: Tiny typo correction

ParallelAccelerator enhancements/fixes:

PR #2727: Changes to enable vectorization in ParallelAccelerator.
PR #2816: Array analysis for transpose with arbitrary arguments
PR #2874: Fix dead code eliminator not to remove a call with side-effect
PR #2886: Fix ParallelAccelerator arrayexpr repr

CUDA enhancements:

PR #2734: More Constants From cuda.h
PR #2767: Add len(..) Support to DeviceNDArray
PR #2778: Add More Device Array API Functions to CUDA Simulator
PR #2824: Add CUDA Primitives for Population Count
PR #2835: Emit selp Instructions to Avoid Branching
PR #2867: Full support for CUDA device attributes

CUDA fixes: * PR #2768: Don’t Compile Code on Every Assignment * PR #2878: Fixes a Win64 issue with the test in Pr/2865

Contributors:

The following people contributed to this release.

Abutalib Aghayev
Alex Olivas
Anton Malakhov
Dong-hee Na
Ehsan Totoni (core dev)
John Zwinck
Josh Wilson
Kelsey Jordahl
Nick White
Olexa Bilaniuk
Rik-de-Kort
Siu Kwan Lam (core dev)
Stan Seibert (core dev)
Stuart Archibald (core dev)
Thomas Arildsen
Todd A. Anderson (core dev)

Version 0.37.0¶

This release focuses on bug fixing and stability but also adds a few new features including support for Numpy 1.14. The key change for Numba core was the long awaited addition of the final tranche of thread safety improvements that allow Numba to be run concurrently on multiple threads without hitting known thread safety issues inside LLVM itself. Further, a number of fixes and enhancements went into the CUDA implementation and ParallelAccelerator gained some new features and underwent some internal refactoring.

Misc enhancements:

PR #2627: Remove hacks to make llvmlite threadsafe
PR #2672: Add ascontiguousarray
PR #2678: Add Gitter badge
PR #2691: Fix #2690: add intrinsic to convert array to tuple
PR #2703: Test runner feature: failed-first and last-failed
PR #2708: Patch for issue #1907
PR #2732: Add support for array.fill

Misc Fixes:

PR #2610: Fix #2606 lowering of optional.setattr
PR #2650: Remove skip for win32 cosine test
PR #2668: Fix empty_like from readonly arrays.
PR #2682: Fixes 2210, remove _DisableJitWrapper
PR #2684: Fix #2340, generator error yielding bool
PR #2693: Add travis-ci testing of NumPy 1.14, and also check on Python 2.7
PR #2694: Avoid type inference failure due to a typing template rejection
PR #2695: Update llvmlite version dependency.
PR #2696: Fix tuple indexing codegeneration for empty tuple
PR #2698: Fix #2697 by deferring deletion in the simplify_CFG loop.
PR #2701: Small fix to avoid tempfiles being created in the current directory
PR #2725: Fix 2481, LLVM IR parsing error due to mutated IR
PR #2726: Fix #2673: incorrect fork error msg.
PR #2728: Alternative to #2620. Remove dead code ByteCodeInst.get.
PR #2730: Add guard for test needing SciPy/BLAS

Documentation updates:

PR #2670: Update communication channels
PR #2671: Add docs about diagnosing loop vectorizer
PR #2683: Add docs on const arg requirements and on const mem alloc
PR #2722: Add docs on numpy support in cuda
PR #2724: Update doc: warning about unsupported arguments

ParallelAccelerator enhancements/fixes:

Parallel support for np.arange and np.linspace, also np.mean, np.std and np.var are added. This was performed as part of a general refactor and cleanup of the core ParallelAccelerator code.

PR #2674: Core pa
PR #2704: Generate Dels after parfor sequential lowering
PR #2716: Handle matching directly supported functions

CUDA enhancements:

PR #2665: CUDA DeviceNDArray: Support numpy tranpose API
PR #2681: Allow Assigning to DeviceNDArrays
PR #2702: Make DummyArray do High Dimensional Reshapes
PR #2714: Use CFFI to Reuse Code

CUDA fixes:

PR #2667: Fix CUDA DeviceNDArray slicing
PR #2686: Fix #2663: incorrect offset when indexing cuda array.
PR #2687: Ensure Constructed Stream Bound
PR #2706: Workaround for unexpected warp divergence due to exception raising code
PR #2707: Fix regression: cuda test submodules not loading properly in runtests
PR #2731: Use more challenging values in slice tests.
PR #2720: A quick testsuite fix to not run the new cuda testcase in the multiprocess pool

Contributors:

The following people contributed to this release.

Coutinho Menezes Nilo
Daniel
Ehsan Totoni
Nick White
Paul H. Liu
Siu Kwan Lam
Stan Seibert
Stuart Archibald
Todd A. Anderson

Version 0.36.2¶

This is a bugfix release that provides minor changes to address:

PR #2645: Avoid CPython bug with exec in older 2.7.x.
PR #2652: Add support for CUDA 9.

Version 0.36.1¶

This release continues to add new features to the work undertaken in partnership with Intel on ParallelAccelerator technology. Other changes of note include the compilation chain being updated to use LLVM 5.0 and the production of conda packages using conda-build 3 and the new compilers that ship with it.

NOTE: A version 0.36.0 was tagged for internal use but not released.

ParallelAccelerator:

NOTE: The ParallelAccelerator technology is under active development and should be considered experimental.

New features relating to ParallelAccelerator, from work undertaken with Intel, include the addition of the @stencil decorator for ease of implementation of stencil-like computations, support for general reductions, and slice and range fusion for parallel slice/bit-array assignments. Documentation on both the use and implementation of the above has been added. Further, a new debug environment variable NUMBA_DEBUG_ARRAY_OPT_STATS is made available to give information about which operators/calls are converted to parallel for-loops.

ParallelAccelerator features:

PR #2457: Stencil Computations in ParallelAccelerator
PR #2548: Slice and range fusion, parallelizing bitarray and slice assignment
PR #2516: Support general reductions in ParallelAccelerator

ParallelAccelerator fixes:

PR #2540: Fix bug #2537
PR #2566: Fix issue #2564.
PR #2599: Fix nested multi-dimensional parfor type inference issue
PR #2604: Fixes for stencil tests and cmath sin().
PR #2605: Fixes issue #2603.

Additional features of note:

This release of Numba (and llvmlite) is updated to use LLVM version 5.0 as the compiler back end, the main change to Numba to support this was the addition of a custom symbol tracker to avoid the calls to LLVM’s ExecutionEngine that was crashing when asking for non-existent symbol addresses. Further, the conda packages for this release of Numba are built using conda build version 3 and the new compilers/recipe grammar that are present in that release.

PR #2568: Update for LLVM 5
PR #2607: Fixes abort when getting address to “nrt_unresolved_abort”
PR #2615: Working towards conda build 3

Thanks to community feedback and bug reports, the following fixes were also made.

Misc fixes/enhancements:

PR #2534: Add tuple support to np.take.
PR #2551: Rebranding fix
PR #2552: relative doc links
PR #2570: Fix issue #2561, handle missing successor on loop exit
PR #2588: Fix #2555. Disable libpython.so linking on linux
PR #2601: Update llvmlite version dependency.
PR #2608: Fix potential cache file collision
PR #2612: Fix NRT test failure due to increased overhead when running in coverage
PR #2619: Fix dubious pthread_cond_signal not in lock
PR #2622: Fix np.nanmedian for all NaN case.
PR #2633: Fix markdown in CONTRIBUTING.md
PR #2635: Make the dependency on compilers for AOT optional.

CUDA support fixes:

PR #2523: Fix invalid cuda context in memory transfer calls in another thread
PR #2575: Use CPU to initialize xoroshiro states for GPU RNG. Fixes #2573
PR #2581: Fix cuda gufunc mishandling of scalar arg as array and out argument

Version 0.35.0¶

This release includes some exciting new features as part of the work performed in partnership with Intel on ParallelAccelerator technology. There are also some additions made to Numpy support and small but significant fixes made as a result of considerable effort spent chasing bugs and implementing stability improvements.

ParallelAccelerator:

NOTE: The ParallelAccelerator technology is under active development and should be considered experimental.

New features relating to ParallelAccelerator, from work undertaken with Intel, include support for a larger range of np.random functions in parallel mode, printing Numpy arrays in no Python mode, the capacity to initialize Numpy arrays directly from list comprehensions, and the axis argument to .sum(). Documentation on the ParallelAccelerator technology implementation has also been added. Further, a large amount of work on equivalence relations was undertaken to enable runtime checks of broadcasting behaviours in parallel mode.

ParallelAccelerator features:

PR #2400: Array comprehension
PR #2405: Support printing Numpy arrays
PR #2438: from Support more np.random functions in ParallelAccelerator
PR #2482: Support for sum with axis in nopython mode.
PR #2487: Adding developer documentation for ParallelAccelerator technology.
PR #2492: Core PA refactor adds assertions for broadcast semantics

ParallelAccelerator fixes:

PR #2478: Rename cfg before parfor translation (#2477)
PR #2479: Fix broken array comprehension tests on unsupported platforms
PR #2484: Fix array comprehension test on win64
PR #2506: Fix for 32-bit machines.

Additional features of note:

Support for np.take, np.finfo, np.iinfo and np.MachAr in no Python mode is added. Further, three new environment variables are added, two for overriding CPU target/features and another to warn if parallel=True was set no such transform was possible.

PR #2490: Implement np.take and ndarray.take
PR #2493: Display a warning if parallel=True is set but not possible.
PR #2513: Add np.MachAr, np.finfo, np.iinfo
PR #2515: Allow environ overriding of cpu target and cpu features.

Due to expansion of the test farm and a focus on fixing bugs, the following fixes were also made.

Misc fixes/enhancements:

PR #2455: add contextual information to runtime errors
PR #2470: Fixes #2458, poor performance in np.median
PR #2471: Ensure LLVM threadsafety in {g,}ufunc building.
PR #2494: Update doc theme
PR #2503: Remove hacky code added in 2482 and feature enhancement
PR #2505: Serialise env mutation tests during multithreaded testing.
PR #2520: Fix failing cpu-target override tests

CUDA support fixes:

PR #2504: Enable CUDA toolkit version testing
PR #2509: Disable tests generating code unavailable in lower CC versions.
PR #2511: Fix Windows 64 bit CUDA tests.

Version 0.34.0¶

This release adds a significant set of new features arising from combined work with Intel on ParallelAccelerator technology. It also adds list comprehension and closure support, support for Numpy 1.13 and a new, faster, CUDA reduction algorithm. For Linux users this release is the first to be built on Centos 6, which will be the new base platform for future releases. Finally a number of thread-safety, type inference and other smaller enhancements and bugs have been fixed.

ParallelAccelerator features:

NOTE: The ParallelAccelerator technology is under active development and should be considered experimental.

The ParallelAccelerator technology is accessed via a new “nopython” mode option “parallel”. The ParallelAccelerator technology attempts to identify operations which have parallel semantics (for instance adding a scalar to a vector), fuse together adjacent such operations, and then parallelize their execution across a number of CPU cores. This is essentially auto-parallelization.

In addition to the auto-parallelization feature, explicit loop based parallelism is made available through the use of prange in place of range as a loop iterator.

More information and examples on both auto-parallelization and prange are available in the documentation and examples directory respectively.

As part of the necessary work for ParallelAccelerator, support for closures and list comprehensions is added:

PR #2318: Transfer ParallelAccelerator technology to Numba
PR #2379: ParallelAccelerator Core Improvements
PR #2367: Add support for len(range(…))
PR #2369: List comprehension
PR #2391: Explicit Parallel Loop Support (prange)

The ParallelAccelerator features are available on all supported platforms and Python versions with the exceptions of (with view of supporting in a future release):

The combination of Windows operating systems with Python 2.7.
Systems running 32 bit Python.

CUDA support enhancements:

PR #2377: New GPU reduction algorithm

CUDA support fixes:

PR #2397: Fix #2393, always set alignment of cuda static memory regions

Misc Fixes:

PR #2373, Issue #2372: 32-bit compatibility fix for parfor related code
PR #2376: Fix #2375 missing stdint.h for py2.7 vc9
PR #2378: Fix deadlock in parallel gufunc when kernel acquires the GIL.
PR #2382: Forbid unsafe casting in bitwise operation
PR #2385: docs: fix Sphinx errors
PR #2396: Use 64-bit RHS operand for shift
PR #2404: Fix threadsafety logic issue in ufunc compilation cache.
PR #2424: Ensure consistent iteration order of blocks for type inference.
PR #2425: Guard code to prevent the use of ‘parallel’ on win32 + py27
PR #2426: Basic test for Enum member type recovery.
PR #2433: Fix up the parfors tests with respect to windows py2.7
PR #2442: Skip tests that need BLAS/LAPACK if scipy is not available.
PR #2444: Add test for invalid array setitem
PR #2449: Make the runtime initialiser threadsafe
PR #2452: Skip CFG test on 64bit windows

Misc Enhancements:

PR #2366: Improvements to IR utils
PR #2388: Update README.rst to indicate the proper version of LLVM
PR #2394: Upgrade to llvmlite 0.19.*
PR #2395: Update llvmlite version to 0.19
PR #2406: Expose environment object to ufuncs
PR #2407: Expose environment object to target-context inside lowerer
PR #2413: Add flags to pass through to conda build for buildbot
PR #2414: Add cross compile flags to local recipe
PR #2415: A few cleanups for rewrites
PR #2418: Add getitem support for Enum classes
PR #2419: Add support for returning enums in vectorize
PR #2421: Add copyright notice for Intel contributed files.
PR #2422: Patch code base to work with np 1.13 release
PR #2448: Adds in warning message when using ‘parallel’ if cache=True
PR #2450: Add test for keyword arg on .sum-like and .cumsum-like array methods

Version 0.33.0¶

This release resolved several performance issues caused by atomic reference counting operations inside loop bodies. New optimization passes have been added to reduce the impact of these operations. We observe speed improvements between 2x-10x in affected programs due to the removal of unnecessary reference counting operations.

There are also several enhancements to the CUDA GPU support:

A GPU random number generator based on xoroshiro128+ algorithm is added. See details and examples in documentation.
@cuda.jit CUDA kernels can now call @jit and @njit CPU functions and they will automatically be compiled as CUDA device functions.
CUDA IPC memory API is exposed for sharing memory between proceses. See usage details in documentation.

Reference counting enhancements:

PR #2346, Issue #2345, #2248: Add extra refcount pruning after inlining
PR #2349: Fix refct pruning not removing refct op with tail call.
PR #2352, Issue #2350: Add refcount pruning pass for function that does not need refcount

CUDA support enhancements:

PR #2023: Supports CUDA IPC for device array
PR #2343, Issue #2335: Allow CPU jit decorated function to be used as cuda device function
PR #2347: Add random number generator support for CUDA device code
PR #2361: Update autotune table for CC: 5.3, 6.0, 6.1, 6.2

Misc fixes:

PR #2362: Avoid test failure due to typing to int32 on 32-bit platforms
PR #2359: Fixed nogil example that threw a TypeError when executed.
PR #2357, Issue #2356: Fix fragile test that depends on how the script is executed.
PR #2355: Fix cpu dispatcher referenced as attribute of another module
PR #2354: Fixes an issue with caching when function needs NRT and refcount pruning
PR #2342, Issue #2339: Add warnings to inspection when it is used on unserialized cached code
PR #2329, Issue #2250: Better handling of missing op codes

Misc enhancements:

PR #2360: Adds missing values in error mesasge interp.
PR #2353: Handle when get_host_cpu_features() raises RuntimeError
PR #2351: Enable SVML for erf/erfc/gamma/lgamma/log2
PR #2344: Expose error_model setting in jit decorator
PR #2337: Align blocking terminate support for fork() with new TBB version
PR #2336: Bump llvmlite version to 0.18
PR #2330: Core changes in PR #2318

Version 0.32.0¶

In this release, we are upgrading to LLVM 4.0. A lot of work has been done to fix many race-condition issues inside LLVM when the compiler is used concurrently, which is likely when Numba is used with Dask.

Improvements:

PR #2322: Suppress test error due to unknown but consistent error with tgamma
PR #2320: Update llvmlite dependency to 0.17
PR #2308: Add details to error message on why cuda support is disabled.
PR #2302: Add os x to travis
PR #2294: Disable remove_module on MCJIT due to memory leak inside LLVM
PR #2291: Split parallel tests and recycle workers to tame memory usage
PR #2253: Remove the pointer-stuffing hack for storing meminfos in lists

Fixes:

PR #2331: Fix a bug in the GPU array indexing
PR #2326: Fix #2321 docs referring to non-existing function.
PR #2316: Fixing more race-condition problems
PR #2315: Fix #2314. Relax strict type check to allow optional type.
PR #2310: Fix race condition due to concurrent compilation and cache loading
PR #2304: Fix intrinsic 1st arg not a typing.Context as stated by the docs.
PR #2287: Fix int64 atomic min-max
PR #2286: Fix #2285 @overload_method not linking dependent libs
PR #2303: Missing import statements to interval-example.rst

Version 0.31.0¶

In this release, we added preliminary support for debugging with GDB version >= 7.0. The feature is enabled by setting the debug=True compiler option, which causes GDB compatible debug info to be generated. The CUDA backend also gained limited debugging support so that source locations are showed in memory-checking and profiling tools. For details, see Troubleshooting and tips.

Also, we added the fastmath=True compiler option to enable unsafe floating-point transformations, which allows LLVM to auto-vectorize more code.

Other important changes include upgrading to LLVM 3.9.1 and adding support for Numpy 1.12.

Improvements:

PR #2281: Update for numpy1.12
PR #2278: Add CUDA atomic.{max, min, compare_and_swap}
PR #2277: Add about section to conda recipies to identify license and other metadata in Anaconda Cloud
PR #2271: Adopt itanium C++-style mangling for CPU and CUDA targets
PR #2267: Add fastmath flags
PR #2261: Support dtype.type
PR #2249: Changes for llvm3.9
PR #2234: Bump llvmlite requirement to 0.16 and add install_name_tool_fixer to mviewbuf for OS X
PR #2230: Add python3.6 to TravisCi
PR #2227: Enable caching for gufunc wrapper
PR #2170: Add debugging support
PR #2037: inspect_cfg() for easier visualization of the function operation

Fixes:

PR #2274: Fix nvvm ir patch in mishandling “load”
PR #2272: Fix breakage to cuda7.5
PR #2269: Fix caching of copy_strides kernel in cuda.reduce
PR #2265: Fix #2263: error when linking two modules with dynamic globals
PR #2252: Fix path separator in test
PR #2246: Fix overuse of memory in some system with fork
PR #2241: Fix #2240: __module__ in dynamically created function not a str
PR #2239: Fix fingerprint computation failure preventing fallback

Version 0.30.1¶

This is a bug-fix release to enable Python 3.6 support. In addition, there is now early Intel TBB support for parallel ufuncs when building from source with TBBROOT defined. The TBB feature is not enabled in our official builds.

Fixes:

PR #2232: Fix name clashes with _Py_hashtable_xxx in Python 3.6.

Improvements:

PR #2217: Add Intel TBB threadpool implementation for parallel ufunc.

Version 0.30.0¶

This release adds preliminary support for Python 3.6, but no official build is available yet. A new system reporting tool (numba --sysinfo) is added to provide system information to help core developers in replication and debugging. See below for other improvements and bug fixes.

Improvements:

PR #2209: Support Python 3.6.
PR #2175: Support np.trace(), np.outer() and np.kron().
PR #2197: Support np.nanprod().
PR #2190: Support caching for ufunc.
PR #2186: Add system reporting tool.

Fixes:

PR #2214, Issue #2212: Fix memory error with ndenumerate and flat iterators.
PR #2206, Issue #2163: Fix zip() consuming extra elements in early exhaustion.
PR #2185, Issue #2159, #2169: Fix rewrite pass affecting objmode fallback.
PR #2204, Issue #2178: Fix annotation for liftedloop.
PR #2203: Fix Appveyor segfault with Python 3.5.
PR #2202, Issue #2198: Fix target context not initialized when loading from ufunc cache.
PR #2172, Issue #2171: Fix optional type unpacking.
PR #2189, Issue #2188: Disable freezing of big (>1MB) global arrays.
PR #2180, Issue #2179: Fix invalid variable version in looplifting.
PR #2156, Issue #2155: Fix divmod, floordiv segfault on CUDA.

Version 0.29.0¶

This release extends the support of recursive functions to include direct and indirect recursion without explicit function type annotations. See new example in examples/mergesort.py. Newly supported numpy features include array stacking functions, np.linalg.eig* functions, np.linalg.matrix_power, np.roots and array to array broadcasting in assignments.

This release depends on llvmlite 0.14.0 and supports CUDA 8 but it is not required.

Improvements:

PR #2130, #2137: Add type-inferred recursion with docs and examples.
PR #2134: Add np.linalg.matrix_power.
PR #2125: Add np.roots.
PR #2129: Add np.linalg.{eigvals,eigh,eigvalsh}.
PR #2126: Add array-to-array broadcasting.
PR #2069: Add hstack and related functions.
PR #2128: Allow for vectorizing a jitted function. (thanks to @dhirschfeld)
PR #2117: Update examples and make them test-able.
PR #2127: Refactor interpreter class and its results.

Fixes:

PR #2149: Workaround MSVC9.0 SP1 fmod bug kb982107.
PR #2145, Issue #2009: Fixes kwargs for jitclass __init__ method.
PR #2150: Fix slowdown in objmode fallback.
PR #2050, Issue #1259: Fix liveness problem with some generator loops.
PR #2072, Issue #1995: Right shift of unsigned LHS should be logical.
PR #2115, Issue #1466: Fix inspect_types() error due to mangled variable name.
PR #2119, Issue #2118: Fix array type created from record-dtype.
PR #2122, Issue #1808: Fix returning a generator due to datamodel error.

Version 0.28.1¶

This is a bug-fix release to resolve packaging issues with setuptools dependency.

Version 0.28.0¶

Amongst other improvements, this version improves again the level of support for linear algebra – functions from the numpy.linalg module. Also, our random generator is now guaranteed to be thread-safe and fork-safe.

Improvements:

PR #2019: Add the @intrinsic decorator to define low-level subroutines callable from JIT functions (this is considered a private API for now).
PR #2059: Implement np.concatenate and np.stack.
PR #2048: Make random generation fork-safe and thread-safe, producing independent streams of random numbers for each thread or process.
PR #2031: Add documentation of floating-point pitfalls.
Issue #2053: Avoid polling in parallel CPU target (fixes severe performance regression on Windows).
Issue #2029: Make default arguments fast.
PR #2052: Add logging to the CUDA driver.
PR #2049: Implement the built-in divmod() function.
PR #2036: Implement the argsort() method on arrays.
PR #2046: Improving CUDA memory management by deferring deallocations until certain thresholds are reached, so as to avoid breaking asynchronous execution.
PR #2040: Switch the CUDA driver implementation to use CUDA’s “primary context” API.
PR #2017: Allow min(tuple) and max(tuple).
PR #2039: Reduce fork() detection overhead in CUDA.
PR #2021: Handle structured dtypes with titles.
PR #1996: Rewrite looplifting as a transformation on Numba IR.
PR #2014: Implement np.linalg.matrix_rank.
PR #2012: Implement np.linalg.cond.
PR #1985: Rewrite even trivial array expressions, which opens the door for other optimizations (for example, array ** 2 can be converted into array * array).
PR #1950: Have typeof() always raise ValueError on failure. Previously, it would either raise or return None, depending on the input.
PR #1994: Implement np.linalg.norm.
PR #1987: Implement np.linalg.det and np.linalg.slogdet.
Issue #1979: Document integer width inference and how to workaround.
PR #1938: Numba is now compatible with LLVM 3.8.
PR #1967: Restrict np.linalg functions to homogeneous dtypes. Users wanting to pass mixed-typed inputs have to convert explicitly, which makes the performance implications more obvious.

Fixes:

PR #2006: array(float32) ** int should return array(float32).
PR #2044: Allow reshaping empty arrays.
Issue #2051: Fix refcounting issue when concatenating tuples.
Issue #2000: Make Numpy optional for setup.py, to allow pip install to work without Numpy pre-installed.
PR #1989: Fix assertion in Dispatcher.disable_compile().
Issue #2028: Ignore filesystem errors when caching from multiple processes.
Issue #2003: Allow unicode variable and function names (on Python 3).
Issue #1998: Fix deadlock in parallel ufuncs that reacquire the GIL.
PR #1997: Fix random crashes when AOT compiling on certain Windows platforms.
Issue #1988: Propagate jitclass docstring.
Issue #1933: Ensure array constants are emitted with the right alignment.

Version 0.27.0¶

Improvements:

Issue #1976: improve error message when non-integral dimensions are given to a CUDA kernel.
PR #1970: Optimize the power operator with a static exponent.
PR #1710: Improve contextual information for compiler errors.
PR #1961: Support printing constant strings.
PR #1959: Support more types in the print() function.
PR #1823: Support compute_50 in CUDA backend.
PR #1955: Support np.linalg.pinv.
PR #1896: Improve the SmartArray API.
PR #1947: Support np.linalg.solve.
Issue #1943: Improve error message when an argument fails typing.4
PR #1927: Support np.linalg.lstsq.
PR #1934: Use system functions for hypot() where possible, instead of our own implementation.
PR #1929: Add cffi support to @cfunc objects.
PR #1932: Add user-controllable thread pool limits for parallel CPU target.
PR #1928: Support self-recursion when the signature is explicit.
PR #1890: List all lowering implementations in the developer docs.
Issue #1884: Support np.lib.stride_tricks.as_strided().

Fixes:

Issue #1960: Fix sliced assignment when source and destination areas are overlapping.
PR #1963: Make CUDA print() atomic.
PR #1956: Allow 0d array constants.
Issue #1945: Allow using Numpy ufuncs in AOT compiled code.
Issue #1916: Fix documentation example for @generated_jit.
Issue #1926: Fix regression when caching functions in an IPython session.
Issue #1923: Allow non-intp integer arguments to carray() and farray().
Issue #1908: Accept non-ASCII unicode docstrings on Python 2.
Issue #1874: Allow del container[key] in object mode.
Issue #1913: Fix set insertion bug when the lookup chain contains deleted entries.
Issue #1911: Allow function annotations on jitclass methods.

Version 0.26.0¶

This release adds support for cfunc decorator for exporting numba jitted functions to 3rd party API that takes C callbacks. Most of the overhead of using jitclasses inside the interpreter are eliminated. Support for decompositions in numpy.linalg are added. Finally, Numpy 1.11 is supported.

Improvements:

PR #1889: Export BLAS and LAPACK wrappers for pycc.
PR #1888: Faster array power.
Issue #1867: Allow “out” keyword arg for dufuncs.
PR #1871: carray() and farray() for creating arrays from pointers.
PR #1855: @cfunc decorator for exporting as ctypes function.
PR #1862: Add support for numpy.linalg.qr.
PR #1851: jitclass support for ‘_’ and ‘__’ prefixed attributes.
PR #1842: Optimize jitclass in Python interpreter.
Issue #1837: Fix CUDA simulator issues with device function.
PR #1839: Add support for decompositions from numpy.linalg.
PR #1829: Support Python enums.
PR #1828: Add support for numpy.random.rand()` and numpy.random.randn()
Issue #1825: Use of 0-darray in place of scalar index.
Issue #1824: Scalar arguments to object mode gufuncs.
Issue #1813: Let bitwise bool operators return booleans, not integers.
Issue #1760: Optional arguments in generators.
PR #1780: Numpy 1.11 support.

Version 0.25.0¶

This release adds support for set objects in nopython mode. It also adds support for many missing Numpy features and functions. It improves Numba’s compatibility and performance when using a distributed execution framework such as dask, distributed or Spark. Finally, it removes compatibility with Python 2.6, Python 3.3 and Numpy 1.6.

Improvements:

Issue #1800: Add erf(), erfc(), gamma() and lgamma() to CUDA targets.
PR #1793: Implement more Numpy functions: np.bincount(), np.diff(), np.digitize(), np.histogram(), np.searchsorted() as well as NaN-aware reduction functions (np.nansum(), np.nanmedian(), etc.)
PR #1789: Optimize some reduction functions such as np.sum(), np.prod(), np.median(), etc.
PR #1752: Make CUDA features work in dask, distributed and Spark.
PR #1787: Support np.nditer() for fast multi-array indexing with broadcasting.
PR #1799: Report JIT-compiled functions as regular Python functions when profiling (allowing to see the filename and line number where a function is defined).
PR #1782: Support np.any() and np.all().
Issue #1788: Support the iter() and next() built-in functions.
PR #1778: Support array.astype().
Issue #1775: Allow the user to set the target CPU model for AOT compilation.
PR #1758: Support creating random arrays using the size parameter to the np.random APIs.
PR #1757: Support len() on array.flat objects.
PR #1749: Remove Numpy 1.6 compatibility.
PR #1748: Remove Python 2.6 and 3.3 compatibility.
PR #1735: Support the not in operator as well as operator.contains().
PR #1724: Support homogeneous sets in nopython mode.
Issue #875: make compilation of array constants faster.

Fixes:

PR #1795: Fix a massive performance issue when calling Numba functions with distributed, Spark or a similar mechanism using serialization.
Issue #1784: Make jitclasses usable with NUMBA_DISABLE_JIT=1.
Issue #1786: Allow using linear algebra functions when profiling.
Issue #1796: Fix np.dot() memory leak on non-contiguous inputs.
PR #1792: Fix static negative indexing of tuples.
Issue #1771: Use fallback cache directory when __pycache__ isn’t writable, such as when user code is installed in a system location.
Issue #1223: Use Numpy error model in array expressions (e.g. division by zero returns inf or nan instead of raising an error).
Issue #1640: Fix np.random.binomial() for large n values.
Issue #1643: Improve error reporting when passing an invalid spec to @jitclass.
PR #1756: Fix slicing with a negative step and an omitted start.

Version 0.24.0¶

This release introduces several major changes, including the @generated_jit decorator for flexible specializations as with Julia’s “@generated” macro, or the SmartArray array wrapper type that allows seamless transfer of array data between the CPU and the GPU.

This will be the last version to support Python 2.6, Python 3.3 and Numpy 1.6.

Improvements:

PR #1723: Improve compatibility of JIT functions with the Python profiler.
PR #1509: Support array.ravel() and array.flatten().
PR #1676: Add SmartArray type to support transparent data management in multiple address spaces (host & GPU).
PR #1689: Reduce startup overhead of importing Numba.
PR #1705: Support registration of CFFI types as corresponding to known Numba types.
PR #1686: Document the extension API.
PR #1698: Improve warnings raised during type inference.
PR #1697: Support np.dot() and friends on non-contiguous arrays.
PR #1692: cffi.from_buffer() improvements (allow more pointer types, allow non-Numpy buffer objects).
PR #1648: Add the @generated_jit decorator.
PR #1651: Implementation of np.linalg.inv using LAPACK. Thanks to Matthieu Dartiailh.
PR #1674: Support np.diag().
PR #1673: Improve error message when looking up an attribute on an unknown global.
Issue #1569: Implement runtime check for the LLVM locale bug.
PR #1612: Switch to LLVM 3.7 in sync with llvmlite.
PR #1624: Allow slice assignment of sequence to array.
PR #1622: Support slicing tuples with a constant slice.

Fixes:

Issue #1722: Fix returning an optional boolean (bool or None).
Issue #1734: NRT decref bug when variable is del’ed before being defined, leading to a possible memory leak.
PR #1732: Fix tuple getitem regression for CUDA target.
PR #1718: Mishandling of optional to optional casting.
PR #1714: Fix .compile() on a JIT function not respecting ._can_compile.
Issue #1667: Fix np.angle() on arrays.
Issue #1690: Fix slicing with an omitted stop and a negative step value.
PR #1693: Fix gufunc bug in handling scalar formal arg with non-scalar input value.
PR #1683: Fix parallel testing under Windows.
Issue #1616: Use system-provided versions of C99 math where possible.
Issue #1652: Reductions of bool arrays (e.g. sum() or mean()) should return integers or floats, not bools.
Issue #1664: Fix regression when indexing a record array with a constant index.
PR #1661: Disable AVX on old Linux kernels.
Issue #1636: Allow raising an exception looked up on a module.

Version 0.23.1¶

This is a bug-fix release to address several regressions introduced in the 0.23.0 release, and a couple other issues.

Fixes:

Issue #1645: CUDA ufuncs were broken in 0.23.0.
Issue #1638: Check tuple sizes when passing a list of tuples.
Issue #1630: Parallel ufunc would keep eating CPU even after finishing under Windows.
Issue #1628: Fix ctypes and cffi tests under Windows with Python 3.5.
Issue #1627: Fix xrange() support.
PR #1611: Rewrite variable liveness analysis.
Issue #1610: Allow nested calls between explicitly-typed ufuncs.
Issue #1593: Fix *args in object mode.

Version 0.23.0¶

This release introduces JIT classes using the new @jitclass decorator, allowing user-defined structures for nopython mode. Other improvements and bug fixes are listed below.

Improvements:

PR #1609: Speed up some simple math functions by inlining them in their caller
PR #1571: Implement JIT classes
PR #1584: Improve typing of array indexing
PR #1583: Allow printing booleans
PR #1542: Allow negative values in np.reshape()
PR #1560: Support vector and matrix dot product, including np.dot() and the @ operator in Python 3.5
PR #1546: Support field lookup on record arrays and scalars (i.e. array['field'] in addition to array.field)
PR #1440: Support the HSA wavebarrier() and activelanepermute_wavewidth() intrinsics
PR #1540: Support np.angle()
PR #1543: Implement CPU multithreaded gufuncs (target=”parallel”)
PR #1551: Allow scalar arguments in np.where(), np.empty_like().
PR #1516: Add some more examples from NumbaPro
PR #1517: Support np.sinc()

Fixes:

Issue #1603: Fix calling a non-cached function from a cached function
Issue #1594: Ensure a list is homogeneous when unboxing
Issue #1595: Replace deprecated use of get_pointer_to_function()
Issue #1586: Allow tests to be run by different users on the same machine
Issue #1587: Make CudaAPIError picklable
Issue #1568: Fix using Numba from inside Visual Studio 2015
Issue #1559: Fix serializing a jit function referring a renamed module
PR #1508: Let reshape() accept integer argument(s), not just a tuple
Issue #1545: Improve error checking when unboxing list objects
Issue #1538: Fix array broadcasting in CUDA gufuncs
Issue #1526: Fix a reference count handling bug

Version 0.22.1¶

This is a bug-fix release to resolve some packaging issues and other problems found in the 0.22.0 release.

Fixes:

PR #1515: Include MANIFEST.in in MANIFEST.in so that sdist still works from source tar files.
PR #1518: Fix reference counting bug caused by hidden alias
PR #1519: Fix erroneous assert when passing nopython=True to guvectorize.
PR #1521: Fix cuda.test()

Version 0.22.0¶

This release features several highlights: Python 3.5 support, Numpy 1.10 support, Ahead-of-Time compilation of extension modules, additional vectorization features that were previously only available with the proprietary extension NumbaPro, improvements in array indexing.

Improvements:

PR #1497: Allow scalar input type instead of size-1 array to @guvectorize
PR #1480: Add distutils support for AOT compilation
PR #1460: Create a new API for Ahead-of-Time (AOT) compilation
PR #1451: Allow passing Python lists to JIT-compiled functions, and reflect mutations on function return
PR #1387: Numpy 1.10 support
PR #1464: Support cffi.FFI.from_buffer()
PR #1437: Propagate errors raised from Numba-compiled ufuncs; also, let “division by zero” and other math errors produce a warning instead of exiting the function early
PR #1445: Support a subset of fancy indexing
PR #1454: Support “out-of-line” CFFI modules
PR #1442: Improve array indexing to support more kinds of basic slicing
PR #1409: Support explicit CUDA memory fences
PR #1435: Add support for vectorize() and guvectorize() with HSA
PR #1432: Implement numpy.nonzero() and numpy.where()
PR #1416: Add support for vectorize() and guvectorize() with CUDA, as originally provided in NumbaPro
PR #1424: Support in-place array operators
PR #1414: Python 3.5 support
PR #1404: Add the parallel ufunc functionality originally provided in NumbaPro
PR #1393: Implement sorting on arrays and lists
PR #1415: Add functions to estimate the occupancy of a CUDA kernel
PR #1360: The JIT cache now stores the compiled object code, yielding even larger speedups.
PR #1402: Fixes for the ARMv7 (armv7l) architecture under Linux
PR #1400: Add the cuda.reduce() decorator originally provided in NumbaPro

Fixes:

PR #1483: Allow np.empty_like() and friends on non-contiguous arrays
Issue #1471: Allow caching JIT functions defined in IPython
PR #1457: Fix flat indexing of boolean arrays
PR #1421: Allow calling Numpy ufuncs, without an explicit output, on non-contiguous arrays
Issue #1411: Fix crash when unpacking a tuple containing a Numba-allocated array
Issue #1394: Allow unifying range_state32 and range_state64
Issue #1373: Fix code generation error on lists of bools

Version 0.21.0¶

This release introduces support for AMD’s Heterogeneous System Architecture, which allows memory to be shared directly between the CPU and the GPU. Other major enhancements are support for lists and the introduction of an opt-in compilation cache.

Improvements:

PR #1391: Implement print() for CUDA code
PR #1366: Implement integer typing enhancement proposal (NBEP 1)
PR #1380: Support the one-argument type() builtin
PR #1375: Allow boolean evaluation of lists and tuples
PR #1371: Support array.view() in CUDA mode
PR #1369: Support named tuples in nopython mode
PR #1250: Implement numpy.median().
PR #1289: Make dispatching faster when calling a JIT-compiled function from regular Python
Issue #1226: Improve performance of integer power
PR #1321: Document features supported with CUDA
PR #1345: HSA support
PR #1343: Support lists in nopython mode
PR #1356: Make Numba-allocated memory visible to tracemalloc
PR #1363: Add an environment variable NUMBA_DEBUG_TYPEINFER
PR #1051: Add an opt-in, per-function compilation cache

Fixes:

Issue #1372: Some array expressions would fail rewriting when involved the same variable more than once, or a unary operator
Issue #1385: Allow CUDA local arrays to be declared anywhere in a function
Issue #1285: Support datetime64 and timedelta64 in Numpy reduction functions
Issue #1332: Handle the EXTENDED_ARG opcode.
PR #1329: Handle the in operator in object mode
Issue #1322: Fix augmented slice assignment on Python 2
PR #1357: Fix slicing with some negative bounds or step values.

Version 0.20.0¶

This release updates Numba to use LLVM 3.6 and CUDA 7 for CUDA support. Following the platform deprecation in CUDA 7, Numba’s CUDA feature is no longer supported on 32-bit platforms. The oldest supported version of Windows is Windows 7.

Improvements:

Issue #1203: Support indexing ndarray.flat
PR #1200: Migrate cgutils to llvmlite
PR #1190: Support more array methods: .transpose(), .T, .copy(), .reshape(), .view()
PR #1214: Simplify setup.py and avoid manual maintenance
PR #1217: Support datetime64 and timedelta64 constants
PR #1236: Reload environment variables when compiling
PR #1225: Various speed improvements in generated code
PR #1252: Support cmath module in CUDA
PR #1238: Use 32-byte aligned allocator to optimize for AVX
PR #1258: Support numpy.frombuffer()
PR #1274: Use TravisCI container infrastructure for lower wait time
PR #1279: Micro-optimize overload resolution in call dispatch
Issue #1248: Improve error message when return type unification fails

Fixes:

Issue #1131: Handling of negative zeros in np.conjugate() and np.arccos()
Issue #1188: Fix slow array return
Issue #1164: Avoid warnings from CUDA context at shutdown
Issue #1229: Respect the writeable flag in arrays
Issue #1244: Fix bug in refcount pruning pass
Issue #1251: Fix partial left-indexing of Fortran contiguous array
Issue #1264: Fix compilation error in array expression
Issue #1254: Fix error when yielding array objects
Issue #1276: Fix nested generator use

Version 0.19.2¶

This release fixes the source distribution on pypi. The only change is in the setup.py file. We do not plan to provide a conda package as this release is essentially the same as 0.19.1 for conda users.

Version 0.19.1¶

Issue #1196:
- fix double-free segfault due to redundant variable deletion in the Numba IR (#1195)
- fix use-after-delete in array expression rewrite pass

Version 0.19.0¶

This version introduces memory management in the Numba runtime, allowing to allocate new arrays inside Numba-compiled functions. There is also a rework of the ufunc infrastructure, and an optimization pass to collapse cascading array operations into a single efficient loop.

Warning

Support for Windows XP and Vista with all compiler targets and support for 32-bit platforms (Win/Mac/Linux) with the CUDA compiler target are deprecated. In the next release of Numba, the oldest version of Windows supported will be Windows 7. CPU compilation will remain supported on 32-bit Linux and Windows platforms.

Known issues:

There are some performance regressions in very short running nopython functions due to the additional overhead incurred by memory management. We will work to reduce this overhead in future releases.

Features:

Issue #1181: Add a Frequently Asked Questions section to the documentation.
Issue #1162: Support the cumsum() and cumprod() methods on Numpy arrays.
Issue #1152: Support the *args argument-passing style.
Issue #1147: Allow passing character sequences as arguments to JIT-compiled functions.
Issue #1110: Shortcut deforestation and loop fusion for array expressions.
Issue #1136: Support various Numpy array constructors, for example numpy.zeros() and numpy.zeros_like().
Issue #1127: Add a CUDA simulator running on the CPU, enabled with the NUMBA_ENABLE_CUDASIM environment variable.
Issue #1086: Allow calling standard Numpy ufuncs without an explicit output array from nopython functions.
Issue #1113: Support keyword arguments when calling numpy.empty() and related functions.
Issue #1108: Support the ctypes.data attribute of Numpy arrays.
Issue #1077: Memory management for array allocations in nopython mode.
Issue #1105: Support calling a ctypes function that takes ctypes.py_object parameters.
Issue #1084: Environment variable NUMBA_DISABLE_JIT disables compilation of @jit functions, instead calling into the Python interpreter when called. This allows easier debugging of multiple jitted functions.
Issue #927: Allow gufuncs with no output array.
Issue #1097: Support comparisons between tuples.
Issue #1075: Numba-generated ufuncs can now be called from nopython functions.
Issue #1062: @vectorize now allows omitting the signatures, and will compile the required specializations on the fly (like @jit does).
Issue #1027: Support numpy.round().
Issue #1085: Allow returning a character sequence (as fetched from a structured array) from a JIT-compiled function.

Fixes:

Issue #1170: Ensure ndindex(), ndenumerate() and ndarray.flat work properly inside generators.
Issue #1151: Disallow unpacking of tuples with the wrong size.
Issue #1141: Specify install dependencies in setup.py.
Issue #1106: Loop-lifting would fail when the lifted loop does not produce any output values for the function tail.
Issue #1103: Fix mishandling of some inputs when a JIT-compiled function is called with multiple array layouts.
Issue #1089: Fix range() with large unsigned integers.
Issue #1088: Install entry-point scripts (numba, pycc) from the conda build recipe.
Issue #1081: Constant structured scalars now work properly.
Issue #1080: Fix automatic promotion of booleans to integers.

Version 0.18.2¶

Bug fixes:

Issue #1073: Fixes missing template file for HTML annotation
Issue #1074: Fixes CUDA support on Windows machine due to NVVM API mismatch

Version 0.18.1¶

Version 0.18.0 is not officially released.

This version removes the old deprecated and undocumented argtypes and restype arguments to the @jit decorator. Function signatures should always be passed as the first argument to @jit.

Features:

Issue #960: Add inspect_llvm() and inspect_asm() methods to JIT-compiled functions: they output the LLVM IR and the native assembler source of the compiled function, respectively.
Issue #990: Allow passing tuples as arguments to JIT-compiled functions in nopython mode.
Issue #774: Support two-argument round() in nopython mode.
Issue #987: Support missing functions from the math module in nopython mode: frexp(), ldexp(), gamma(), lgamma(), erf(), erfc().
Issue #995: Improve code generation for round() on Python 3.
Issue #981: Support functions from the random and numpy.random modules in nopython mode.
Issue #979: Add cuda.atomic.max().
Issue #1006: Improve exception raising and reporting. It is now allowed to raise an exception with an error message in nopython mode.
Issue #821: Allow ctypes- and cffi-defined functions as arguments to nopython functions.
Issue #901: Allow multiple explicit signatures with @jit. The signatures must be passed in a list, as with @vectorize.
Issue #884: Better error message when a JIT-compiled function is called with the wrong types.
Issue #1010: Simpler and faster CUDA argument marshalling thanks to a refactoring of the data model.
Issue #1018: Support arrays of scalars inside Numpy structured types.
Issue #808: Reduce Numba import time by half.
Issue #1021: Support the buffer protocol in nopython mode. Buffer-providing objects, such as bytearray, array.array or memoryview support array-like operations such as indexing and iterating. Furthermore, some standard attributes on the memoryview object are supported.
Issue #1030: Support nested arrays in Numpy structured arrays.
Issue #1033: Implement the inspect_types(), inspect_llvm() and inspect_asm() methods for CUDA kernels.
Issue #1029: Support Numpy structured arrays with CUDA as well.
Issue #1034: Support for generators in nopython and object mode.
Issue #1044: Support default argument values when calling Numba-compiled functions.
Issue #1048: Allow calling Numpy scalar constructors from CUDA functions.
Issue #1047: Allow indexing a multi-dimensional array with a single integer, to take a view.
Issue #1050: Support len() on tuples.
Issue #1011: Revive HTML annotation.

Fixes:

Issue #977: Assignment optimization was too aggressive.
Issue #561: One-argument round() now returns an int on Python 3.
Issue #1001: Fix an unlikely bug where two closures with the same name and id() would compile to the same LLVM function name, despite different closure values.
Issue #1006: Fix reference leak when a JIT-compiled function is disposed of.
Issue #1017: Update instructions for CUDA in the README.
Issue #1008: Generate shorter LLVM type names to avoid segfaults with CUDA.
Issue #1005: Properly clean up references when raising an exception from object mode.
Issue #1041: Fix incompatibility between Numba and the third-party library “future”.
Issue #1053: Fix the size attribute of CUDA shared arrays.

Version 0.17.0¶

The major focus in this release has been a rewrite of the documentation. The new documentation is better structured and has more detailed coverage of Numba features and APIs. It can be found online at https://numba.pydata.org/numba-doc/dev/index.html

Features:

Issue #895: LLVM can now inline nested function calls in nopython mode.
Issue #863: CUDA kernels can now infer the types of their arguments (“autojit”-like).
Issue #833: Support numpy.{min,max,argmin,argmax,sum,mean,var,std} in nopython mode.
Issue #905: Add a nogil argument to the @jit decorator, to release the GIL in nopython mode.
Issue #829: Add a identity argument to @vectorize and @guvectorize, to set the identity value of the ufunc.
Issue #843: Allow indexing 0-d arrays with the empty tuple.
Issue #933: Allow named arguments, not only positional arguments, when calling a Numba-compiled function.
Issue #902: Support numpy.ndenumerate() in nopython mode.
Issue #950: AVX is now enabled by default except on Sandy Bridge and Ivy Bridge CPUs, where it can produce slower code than SSE.
Issue #956: Support constant arrays of structured type.
Issue #959: Indexing arrays with floating-point numbers isn’t allowed anymore.
Issue #955: Add support for 3D CUDA grids and thread blocks.
Issue #902: Support numpy.ndindex() in nopython mode.
Issue #951: Numpy number types (numpy.int8, etc.) can be used as constructors for type conversion in nopython mode.

Fixes:

Issue #889: Fix NUMBA_DUMP_ASSEMBLY for the CUDA backend.
Issue #903: Fix calling of stdcall functions with ctypes under Windows.
Issue #908: Allow lazy-compiling from several threads at once.
Issue #868: Wrong error message when multiplying a scalar by a non-scalar.
Issue #917: Allow vectorizing with datetime64 and timedelta64 in the signature (only with unit-less values, though, because of a Numpy limitation).
Issue #431: Allow overloading of cuda device function.
Issue #917: Print out errors occurred in object mode ufuncs.
Issue #923: Numba-compiled ufuncs now inherit the name and doc of the original Python function.
Issue #928: Fix boolean return value in nested calls.
Issue #915: @jit called with an explicit signature with a mismatching type of arguments now raises an error.
Issue #784: Fix the truth value of NaNs.
Issue #953: Fix using shared memory in more than one function (kernel or device).
Issue #970: Fix an uncommon double to uint64 conversion bug on CentOS5 32-bit (C compiler issue).

Version 0.16.0¶

This release contains a major refactor to switch from llvmpy to llvmlite as our code generation backend. The switch is necessary to reconcile different compiler requirements for LLVM 3.5 (needs C++11) and Python extensions (need specific compiler versions on Windows). As a bonus, we have found the use of llvmlite speeds up compilation by a factor of 2!

Other Major Changes:

Faster dispatch for numpy structured arrays
Optimized array.flat()
Improved CPU feature selection
Fix constant tuple regression in macro expansion code

Known Issues:

AVX code generation is still disabled by default due to performance regressions when operating on misaligned NumPy arrays. We hope to have a workaround in the future.
In extremely rare circumstances, a known issue with LLVM 3.5 code generation can cause an ELF relocation error on 64-bit Linux systems.

Version 0.15.1¶

(This was a bug-fix release that superceded version 0.15 before it was announced.)

Fixes:

Workaround for missing __ftol2 on Windows XP.
Do not lift loops for compilation that contain break statements.
Fix a bug in loop-lifting when multiple values need to be returned to the enclosing scope.
Handle the loop-lifting case where an accumulator needs to be updated when the loop count is zero.

Version 0.15¶

Features:

Support for the Python cmath module. (NumPy complex functions were already supported.)
Support for .real, .imag, and .conjugate()` on non-complex numbers.
Add support for math.isfinite() and math.copysign().
Compatibility mode: If enabled (off by default), a failure to compile in object mode will fall back to using the pure Python implementation of the function.
Experimental support for serializing JIT functions with cloudpickle.
Loop-jitting in object mode now works with loops that modify scalars that are accessed after the loop, such as accumulators.
@vectorize functions can be compiled in object mode.
Numba can now be built using the Visual C++ Compiler for Python 2.7 on Windows platforms.
CUDA JIT functions can be returned by factory functions with variables in the closure frozen as constants.
Support for “optional” types in nopython mode, which allow None to be a valid value.

Fixes:

If nopython mode compilation fails for any reason, automatically fall back to object mode (unless nopython=True is passed to @jit) rather than raise an exeception.
Allow function objects to be returned from a function compiled in object mode.
Fix a linking problem that caused slower platform math functions (such as exp()) to be used on Windows, leading to performance regressions against NumPy.
min() and max() no longer accept scalars arguments in nopython mode.
Fix handling of ambigous type promotion among several compiled versions of a JIT function. The dispatcher will now compile a new version to resolve the problem. (issue #776)
Fix float32 to uint64 casting bug on 32-bit Linux.
Fix type inference to allow forced casting of return types.
Allow the shape of a 1D cuda.shared.array and cuda.local.array to be a one-element tuple.
More correct handling of signed zeros.
Add custom implementation of atan2() on Windows to handle special cases properly.
Eliminated race condition in the handling of the pagelocked staging area used when transferring CUDA arrays.
Fix non-deterministic type unification leading to varying performance. (issue #797)

Version 0.14¶

Features:

Support for nearly all the Numpy math functions (including comparison, logical, bitwise and some previously missing float functions) in nopython mode.
The Numpy datetime64 and timedelta64 dtypes are supported in nopython mode with Numpy 1.7 and later.
Support for Numpy math functions on complex numbers in nopython mode.
ndarray.sum() is supported in nopython mode.
Better error messages when unsupported types are used in Numpy math functions.
Set NUMBA_WARNINGS=1 in the environment to see which functions are compiled in object mode vs. nopython mode.
Add support for the two-argument pow() builtin function in nopython mode.
New developer documentation describing how Numba works, and how to add new types.
Support for Numpy record arrays on the GPU. (Note: Improper alignment of dtype fields will cause an exception to be raised.)
Slices on GPU device arrays.
GPU objects can be used as Python context managers to select the active device in a block.
GPU device arrays can be bound to a CUDA stream. All subsequent operations (such as memory copies) will be queued on that stream instead of the default. This can prevent unnecessary synchronization with other streams.

Fixes:

Generation of AVX instructions has been disabled to avoid performance bugs when calling external math functions that may use SSE instructions, especially on OS X.
JIT functions can be removed by the garbage collector when they are no longer accessible.
Various other reference counting fixes to prevent memory leaks.
Fixed handling of exception when input argument is out of range.
Prevent autojit functions from making unsafe numeric conversions when called with different numeric types.
Fix a compilation error when an unhashable global value is accessed.
Gracefully handle failure to enable faulthandler in the IPython Notebook.
Fix a bug that caused loop lifting to fail if the loop was inside an else block.
Fixed a problem with selecting CUDA devices in multithreaded programs on Linux.
The pow() function (and ** operation) applied to two integers now returns an integer rather than a float.
Numpy arrays using the object dtype no longer cause an exception in the autojit.
Attempts to write to a global array will cause compilation to fall back to object mode, rather than attempt and fail at nopython mode.
range() works with all negative arguments (ex: range(-10, -12, -1))

Version 0.13.4¶

Features:

Setting and deleting attributes in object mode
Added documentation of supported and currently unsupported numpy ufuncs
Assignment to 1-D numpy array slices
Closure variables and functions can be used in object mode
All numeric global values in modules can be used as constants in JIT compiled code
Support for the start argument in enumerate()
Inplace arithmetic operations (+=, -=, etc.)
Direct iteration over a 1D numpy array (e.g. “for x in array: …”) in nopython mode

Fixes:

Support for NVIDIA compute capability 5.0 devices (such as the GTX 750)
Vectorize no longer crashes/gives an error when bool_ is used as return type
Return the correct dictionary when globals() is used in JIT functions
Fix crash bug when creating dictionary literals in object
Report more informative error message on import if llvmpy is too old
Temporarily disable pycc –header, which generates incorrect function signatures.

Version 0.13.3¶

Features:

Support for enumerate() and zip() in nopython mode
Increased LLVM optimization of JIT functions to -O1, enabling automatic vectorization of compiled code in some cases
Iteration over tuples and unpacking of tuples in nopython mode
Support for dict and set (Python >= 2.7) literals in object mode

Fixes:

JIT functions have the same __name__ and __doc__ as the original function.
Numerous improvements to better match the data types and behavior of Python math functions in JIT compiled code on different platforms.
Importing Numba will no longer throw an exception if the CUDA driver is present, but cannot be initialized.
guvectorize now properly supports functions with scalar arguments.
CUDA driver is lazily initialized

Version 0.13.2¶

Features:

@vectorize ufunc now can generate SIMD fast path for unit strided array
Added cuda.gridsize
Added preliminary exception handling (raise exception class)

Fixes:

UNARY_POSITIVE
Handling of closures and dynamically generated functions
Global None value

Version 0.13.1¶

Features:

Initial support for CUDA array slicing

Fixes:

Indirectly fixes numbapro when the system has a incompatible CUDA driver
Fix numba.cuda.detect
Export numba.intp and numba.intc

Version 0.13¶

Features:

Opensourcing NumbaPro CUDA python support in numba.cuda
Add support for ufunc array broadcasting
Add support for mixed input types for ufuncs
Add support for returning tuple from jitted function

Fixes:

Fix store slice bytecode handling for Python2
Fix inplace subtract
Fix pycc so that correct header is emitted
Allow vectorize to work on functions with jit decorator

Version 0.12.2¶

Fixes:

Improved NumPy ufunc support in nopython mode
Misc bug fixes

Version 0.12.1¶

This version fixed many regressions reported by user for the 0.12 release. This release contains a new loop-lifting mechanism that specializes certains loop patterns for nopython mode compilation. This avoid direct support for heap-allocating and other very dynamic operations.

Improvements:

Add loop-lifting–jit-ing loops in nopython for object mode code. This allows functions to allocate NumPy arrays and use Python objects, while the tight loops in the function can still be compiled in nopython mode. Any arrays that the tight loop uses should be created before the loop is entered.

Fixes:

Add support for majority of “math” module functions
Fix for…else handling
Add support for builtin round()
Fix tenary if…else support
Revive “numba” script
Fix problems with some boolean expressions
Add support for more NumPy ufuncs

Version 0.12¶

Version 0.12 contains a big refactor of the compiler. The main objective for this refactor was to simplify the code base to create a better foundation for further work. A secondary objective was to improve the worst case performance to ensure that compiled functions in object mode never run slower than pure Python code (this was a problem in several cases with the old code base). This refactor is still a work in progress and further testing is needed.

Main improvements:

Major refactor of compiler for performance and maintenance reasons
Better fallback to object mode when native mode fails
Improved worst case performance in object mode

The public interface of numba has been slightly changed. The idea is to make it cleaner and more rational:

jit decorator has been modified, so that it can be called without a signature. When called without a signature, it behaves as the old autojit. Autojit has been deprecated in favour of this approach.
Jitted functions can now be overloaded.
Added a “njit” decorator that behaves like “jit” decorator with nopython=True.
The numba.vectorize namespace is gone. The vectorize decorator will be in the main numba namespace.
Added a guvectorize decorator in the main numba namespace. It is similar to numba.vectorize, but takes a dimension signature. It generates gufuncs. This is a replacement for the GUVectorize gufunc factory which has been deprecated.

Main regressions (will be fixed in a future release):

Creating new NumPy arrays is not supported in nopython mode
Returning NumPy arrays is not supported in nopython mode
NumPy array slicing is not supported in nopython mode
lists and tuples are not supported in nopython mode
string, datetime, cdecimal, and struct types are not implemented yet
Extension types (classes) are not supported in nopython mode
Closures are not supported
Raise keyword is not supported
Recursion is not support in nopython mode

Version 0.11¶

Experimental support for NumPy datetime type

Version 0.10¶

Annotation tool (./bin/numba –annotate –fancy) (thanks to Jay Bourque)
Open sourced prange
Support for raise statement
Pluggable array representation
Support for enumerate and zip (thanks to Eugene Toder)
Better string formatting support (thanks to Eugene Toder)
Builtins min(), max() and bool() (thanks to Eugene Toder)
Fix some code reloading issues (thanks to Björn Linse)
Recognize NumPy scalar objects (thanks to Björn Linse)

Version 0.9¶

Improved math support
Open sourced generalized ufuncs
Improved array expressions

Version 0.8¶

Support for autojit classes
- Inheritance not yet supported
Python 3 support for pycc
Allow retrieval of ctypes function wrapper
- And hence support retrieval of a pointer to the function
Fixed a memory leak of array slicing views

Version 0.7.2¶

Official Python 3 support (python 3.2 and 3.3)
Support for intrinsics and instructions
Various bug fixes (see https://github.com/numba/numba/issues?milestone=7&state=closed)

Version 0.7.1¶

Various bug fixes

Version 0.7¶

Open sourced single-threaded ufunc vectorizer
Open sourced NumPy array expression compilation
Open sourced fast NumPy array slicing
Experimental Python 3 support
Support for typed containers
- typed lists and tuples
Support for iteration over objects
Support object comparisons
Preliminary CFFI support
- Jit calls to CFFI functions (passed into autojit functions)
- TODO: Recognize ffi_lib.my_func attributes
Improved support for ctypes
Allow declaring extension attribute types as through class attributes
Support for type casting in Python
- Get the same semantics with or without numba compilation
Support for recursion
- For jit methods and extension classes
Allow jit functions as C callbacks
Friendlier error reporting
Internal improvements
A variety of bug fixes

Version 0.6.1¶

Support for bitwise operations

Version 0.6¶

Python 2.6 support
Programmable typing
- Allow users to add type inference for external code
Better NumPy type inference
- outer, inner, dot, vdot, tensordot, nonzero, where, binary ufuncs + methods (reduce, accumulate, reduceat, outer)
Type based alias analysis
- Support for strict aliasing
Much faster autojit dispatch when calling from Python
Faster numerical loops through data and stride pre-loading
Integral overflow and underflow checking for conversions from objects
Make Meta dependency optional

Version 0.5¶

SSA-based type inference
- Allows variable reuse
- Allow referring to variables before lexical definition
Support multiple comparisons
Support for template types
List comprehensions
Support for pointers
Many bug fixes
Added user documentation

Version 0.4¶

Version 0.3.2¶

Add support for object arithmetic (issue 56).
Bug fixes (issue 55).

Version 0.3¶

Changed default compilation approach to ast
Added support for cross-module linking
Added support for closures (can jit inner functions and return them) (see examples/closure.py)
Added support for dtype structures (can access elements of structure with attribute access) (see examples/structures.py)
Added support for extension types (numba classes) (see examples/numbaclasses.py)
Added support for general Python code (use nopython to raise an error if Python C-API is used to avoid unexpected slowness because of lack of implementation defaulting to generic Python)
Fixed many bugs
Added support to detect math operations.
Added with python and with nopython contexts
Added more examples

Many features need to be documented still. Look at examples and tests for more information.

Version 0.2¶

Added an ast approach to compilation
Removed d, f, i, b from numba namespace (use f8, f4, i4, b1)
Changed function to autojit2
Added autojit function to decorate calls to the function and use types of the variable to create compiled versions.
changed keyword arguments to jit and autojit functions to restype and argtypes to be consistent with ctypes module.
Added pycc – a python to shared library compiler