Releases: IntelPython/dpctl
v0.18.1
v0.18.0
This release reaches an important milestone of making offloading fully asynchronous.
Calls to dpctl.tensor
submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.
The full list of changes that went into this release are:
Added
- Implement
tensor.take_along_axis
per Python Array API specification gh-1778 - Implement
tensor.put_along_axis
to complementtensor.take_along_axis
gh-1798 - Support for 'device=tensor.kDLCPU' in
tensor.from_dlpack
function andtensor.usm_ndarray.__dlpack__
method gh-1781 - Support DLPack on Windows gh-1746
- Implement
tensor.nextafter
function per Python Array API specification gh-1730 - Implement
tensor.count_nonzero
andtensor.diff
functions from Python array API specification gh-1732, gh-1780 - Add support for
order="K"
to*_like
array creation functions, and change defaultorder
keyword value from'C'
to'K'
gh-1808 - Support for 'max dimensions' in Array API capabilities info data gh-1774
- Add support for device aspect 'emulated' gh-1691
dpctl::tensor::usm_memory
class defined indpctl4pybind11.hpp
adds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782- Add support for COVERAGE build type in project's CMake script gh-1692
Change
- Change ownership of USM allocation by
dpctl.memory
objects, make executions ofdpctl.tensor
operations asynchronous gh-1705 - Add support for Python scalars by
tensor.where
function gh-1719 - Optimize division by Python scalar in statistical functions
tensor.mean
,tensor.std
,tensor.var
gh-1820 - Use transcendental functions from
sycl
namespace instead ofstd
namespace gh-1707 - Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
- Array creation function
tensor.zeros
to use asynchronousmemset
operation gh-1806 - The setter of
tensor.usm_ndarray.shape
property now supports Python scalar value gh-1786 - Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
- No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
- Update version of 'pybind11' used gh-1758, gh-1812
- Handle possible exceptions by
usm_host_allocator
used withstd::vector
gh-1791 - Use
dpctl::tensor::offset_utils::sycl_free_noexcept
instead ofsycl::free
inhost_task
tasks associated with life-time management of temporary USM allocations gh-1797 - Add
"same_kind"
-style casting for in-place mathematical operators oftensor.usm_ndarray
gh-1827, gh-1830
Fixed
- Fix setting of release variable Sphinx config file gh-1685
- Handle possible NULL return value from device aspect queries
DPCTLDevice_GetMaxWorkGroupSize1d
andDPCTLDevice_GetMaxWorkGroupSize2d
gh-1690 - Add license header to conda script files gh-1695
- Fix
tensor.round
behavior on CUDA devices gh-1700 - Add missing
#include <sstream>
gh-1701 - Fix for issue 1724 gh-1728
- Correct USM type for return array of
tensor.extract
function gh-1727 - Fix for
tensor.unique_all
andtensor.unique_inverse
to always return index arrays with default indexing data type gh-1741 - Propagate read-only flag from
__sycl_usm_array_interface__
intensor.asarray
function gh-1756 tensor.clip
to handle Python scalars which are out of bound for the data type of integral array gh-1759- Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
- Element-wise
tensor.divide
and comparison operations allow greater range of Python integer and integer array combinations gh-1771 - Fix for unexpected behavior when using floating point types for array indexing gh-1792
- Enable
pytest --pyargs dpctl.tests
gh-1833
Maintenance
- Improve performance of
test_sort_complex_fp_nan
gh-1704 - Improve exception wording raised by
tensor.broadcast_arrays()
gh-1720 - Remove
template
keyword in method call ofsycl::kernel_bundle
gh-1726 - Backport changelog edits from maintenance/0.17.x gh-1736
- Replace uses of 'intel' channels in docs and readme file gh-1737
- Update references to deprecated environment variable
SYCL_DEVICE_FILTER
gh-1740 - Correction for installation instruction steps gh-1754
- Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
- Add missing include to fix build break with newer LLVM gh-1776
- Add
#include <utility>
for definition ofstd::move
used gh-1787 - Change to CMake script to accomodate DPC++ transition from PI to UR architecture gh-1788
- Document
tensor._flags.Flags
class gh-1794 - Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
- Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
- Clean-up uses of
Strided1DIndexer
class gh-1805 - Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
- Do not add
sycl::event
associated with compute task to vector of events representing execution ofhost_task
gh-1807 - Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on
libze1
package which provides Level-Zero loader library gh-1801, gh-1840 - Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
- Remove recommendation to install wheels from Anaconda PyPI index gh-1819
- Removed use of post-link and pre-unlink conda scripts in
dpctl
gh-1821 - Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
- A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly:
gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, [gh-1721](https...
0.17.0
This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions,
and complies with revision 2023.12 of Python Array API specification.
Added
- Added pybind11 caster for
sycl::half
to map to/from Pythonfloat
to"dpctl4pybind11.hpp"
header: gh-1655 - Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
- Implemented
tensor.cumulative_sum
,tensor.cumulative_prod
andtensor.cumulative_logsumexp
: gh-1602
Changed
- Expanded documentation for
dpctl
: gh-1619 - Expanded
utils.intel_device_info
functionality: gh-1656 - Improved performance of elementwise operations: gh-1651
- Efficiency improvement by avoiding unnecessary copying of
sycl::queue
: gh-1645 dpctl
uses pybind11 2.12.0: gh-1640- Improved performance of
tensor.reshape
operation withorder="F"
when copying is needed, or requested: gh-1677
Fixed
- Fixed initialization of byte type constants in
dpctl_capi
Python/C API loader class in"dpctl4pybind11.hpp"
: gh-1665 - Fixed crash in
tensor.sort
reported for a CPU device and a CUDA device: gh-1676 - Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
- Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
- Support use of index arrays of different integral types in indexing operations: gh-47
- Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
- Corrected
tensor.tile
for scalar inputs and empty repetitions: gh-1628 - Fixed support for
out
keyword intensor.matmul
: gh-1610 - Fixed bug in basic slicing of empty arrays: gh-1680
- Fixed bug in
tensor.bitwise_invert
for boolean input array: gh-1681 - Fixed bug in
tensor.repeat
on zero-size input arrays: gh-1682
New Contributors
- @bdmoore1 made their first contribution in #1659
- @ekomarova made their first contribution in #1666
Full Changelog: https://github.com/IntelPython/dpctl/blob/master/CHANGELOG.md
v0.16.1
This release includes bug fixes and provides a change needed by numba_dpex
project to support dispatching kernels
consuming instances of sycl::local_accessor
template type.
Changed
- Changed behavior of
dpctl.tensor.usm_ndarray.__dlpack_device__
method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
- Array creation functions and the
usm_ndarray
constructor indpctl.tensor
submodule now use cached default-selected device to improve performance: #1606 - Changed treatment of
axis
keyword fordpctl.tensor.tensordot
anddpctl.tensor.vecdot
to align with Python Array API 2023.12 specification: #1608 - Changed implementation of
DPCTLQueue_SubmitRange
,DPCTLQueue_SubmitNDRange
in DPCTLSyclInterface library to supportsycl::local_accessor
arguments needed bynumba_dpex
; the enumDPCTLKernelArgT\ ype
to correspond to C++ disjoint types: #1609, #1611, #1612
Fixed
- Fixed a crash on Windows platform during execution of getter of
dpctl.SyclPlatfom.default_context
property: : #1604 - Fixed kernel submission error on NVidia CUDA GPUs during
dpctl.tensor.matmul
operation: #1605 - Fixed corruption of context cache table entries: #1607
- Fixed incorrect result from
dpctl.tensor.tensordot
reported in issue #1570: #1608 - Fixed output of
python -m dpctl --library
to fix specified library name: #1615
v0.16.0
This release is virtually identical to 0.15.1 as far as features are concerned.
This release is meant to be built with DPC++ 2024.1.0, that no longer support older integrated Gen9 Intel GPUs, such as those that came with Intel Core 10th generation and older.
v0.15.1
Summary
This release reaches milestone of 100% compliance of dpctl.tensor
functions with Python Array API 2022.12 standard for the main namespace.
Added
- Added reduction functions
dpctl.tensor.min
,dpctl.tensor.max
,dpctl.tensor.argmin
,dpctl.tensor.argmax
, anddpctl.tensor.prod
per Python Array API specifications: #1399 - Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of
dpctl.tensor.usm_ndarray
type: #1431, #1447 - Added new elementwise functions
dpctl.tensor.cbrt
,dpctl.tensor.rsqrt
,dpctl.tensor.exp2
,dpctl.tensor.copysign
,dpctl.tensor.angle
, anddpctl.tensor.reciprocal
: #1443, #1474 - Added statistical functions
dpctl.tensor.mean
,dpctl.tensor.std
,dpctl.tensor.var
per Python Array API specifications: #1465 - Added sorting functions
dpctl.tensor.sort
anddpctl.tensor.argsort
, and set functionsdpctl.tensor.unique_values
,dpctl.tensor.unique_counts
,dpctl.tensor.unique_inverse
,dpctl.tensor.unique_all
: #1483 - Added linear algebra functions from the Array API namespace
dpctl.tensor.matrix_transpose
,dpctl.tensor.matmul
,dpctl.tensor.vecdot
, anddpctl.tensor.tensordot
: #1490, #1525, #1541 - Added
dpctl.tensor.clip
function: #1444, #1505 - Added custom reduction functions
dpt.logsumexp
(reduction using binary functiondpctl.tensor.logaddexp
),dpt.reduce_hypot
(reduction using binary functiondpctl.tensor.hypot
): #1446 - Added inspection API to query capabilities of Python Array API specification implementation: #1469
- Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
- Added
dpctl.utils.intel_device_info
function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445 - Added support for two new device descriptors,
dpctl.SyclDevice.max_mem_alloc_size
anddpctl.SyclDevice.max_clock_frequency
: #1530
Changed
- Functions
dpctl.tensor.result_type
anddpctl.tensor.can_cast
became device-aware: #1488, #1473 - Implementation of method
dpctl.SyclEvent.wait_for
changed to usesycl::event::wait
instead ofsycl::event::wait_and_throw
: gh-1436 dpctl.tensor.astype
was changed to supportdevice
keyword as per Python Array API specification: #1511- C++ header files in
libtensor/include/kernels
containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516
Fixed
v0.15.0
Summary
The 0.15.0 represents a milestone in which dpctl.tensor.usm_ndarray
object now implements all special Python operators, except __matmul__
and __rmatmul__
.
The dpctl.tensor
increases its array-API conformance test suite pass rate to 81.8%, (passed: 916, failed: 84, skipped: 119).
Details
Added
- Added
dpctl.tensor.floor
,dpctl.tensor.ceil
,dpctl.tensor.trunc
elementwise functions. - Added
dpctl.tensor.hypot
,dpctl.tensor.logaddexp
elementwise functions. - Added trigonometric (
dpctl.tensor.sin
,dpctl.tensor.cos
,dpctl.tensor.tan
) and hyperbolic (dpctl.tensor.sinh
,dpctl.tensor.cosh
,dpctl.tensor.tanh
) elementwise functions and their inverses (dpctl.tensor.asin
,dpctl.tensor.asinh
,dpctl.tensor.acos
,dpctl.tensor.acosh
,dpctl.tensor.atan
,dpctl.tensor.atanh
). - Added
dpctl.tensor.round
function. - Added
dpctl.tensor.sign
anddpctl.tensor.remainder
elementwise functions. - Added bitwise elementwise functions
dpctl.tensor.bitwise_and
,dpctl.tensor.bitwise_xor
,dpctl.tensor.bitwise_or
,dpctl.tensor.bitwise_invert
- Added bitwise shift functions
dpctl.tensor.bitwise_left_shift
anddpctl.tensor.bitwise_right_shift
. - Added
dpctl.tensor.atan2
anddpctl.tensor.signbit
elementwise functions. - Added
dpctl.tensor.minumum
anddpctl.tensor.maximum
binary elementwise functions. - Supported equality checking and hashing for
dpctl.SyclPlatform
. - Implemented
types
property for all unary and binary elementwise functions #1361 - Added
dpctl.tensor.repeat
anddpctl.tensor.tile
functions. - Added
dpctl.tensor.matrix_transpose
function.
Changed
- Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for
dpctl.tensor.usm_ndarray
type #1324. - Removed
dpctl.tensor.numpy_usm_shared
obsolete class and associated tests which were being skipped #1310 - Transitioned
dpctl
codebase to Cython 3. - Improved performance of boolean reduction functions
dpctl.tensor.all
anddpctl.tensor.any
. - Improved performance of summation function
dpctl.tensor.sum
. - Improved in-place arithmetic operations for addition, subtraction and multiplication.
- Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
- Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
- Removed deprecated
DPCTLDevice_GetMaxWorkItemSizes
function from the SyclInterface library. - Improved performance of
dpctl.tensor.reshape
in the case when a copy is being made. - Improved performance of
dpctl.tensor.roll
function.
Fixed
v0.14.5
This release builds on 0.14.3 and 0.14.4 releases and addresses some performance gaps as well as implements several new elementwise functions.
Added
- Added
dpctl.tensor.log2
anddpctl.tensor.log10
: #1267 - Added
dpctl.tensor.negative
,dpctl.tensor.positive
,dpctl.tensor.square
#1268 - Added
dpctl.tensor.logical_not
,dpctl.tensor.logical_and
,dpctl.tensor.logical_or
,dpctl.tensor.logical_xor
#1270
Changed
dpctl.tensor.astype
behavior fornewdtype=None
changes #1261dpctl.tensor.usm_ndaray
constructor default value ofdtype
keyword argument changed toNone
: #1265- Support for
out
arguments that overlap with inputs for unary elementwise functions#1281 - Copying from one array to another a no-op if both arrays view into the same memory #1284
v0.14.4
This is hot-fix for 0.14.3 release.
Added
- Added
dpctl.tensor.less_equal
,dpctl.tensor.greater
,dpctl.tensor.greater_equal
: #1239
Changed
- Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244
Fixed
- Fixed handling of 0d arrays in
dpctl.tensor.sum
: #1238
v0.14.3
Added
- Added support of
axis=None
indpctl.tensor.concat
#1125 - Added caching for
dpctl.SyclDevice.filter_string
property #1127 - Added
dpctl.tensor.isdtype
from array API #1133 - Added
dpctl.tensor.unstack
,dpctl.tensor.moveaxis
,dpctl.tensor.swapaxes
#1137, #1174 - Allow for mutation of
dpctl.tensor.usm_ndarray.flags.writable
#1141 - Added
dpctl.tensor.where
from array API #1147 - Include libtensor headers in
dpctl
installation layout #1185 - Added new properties of
dpctl.tensor.usm_ndarray
object #1199 - Added a list of unary and binary elementwise functions from array API:
- #1203:
dpctl.tensor.add
,dpctl.tensor.divide
,dpctl.tensor.isnan
,dpctl.tensor.isinf
,dpctl.tensor.isfinite
,dpctl.tensor.cos
,dpctl.tensor.abs
,dpctl.tensor.equal
- #1205:
dpctl.tensor.sqrt
- #1209: implements
out
keyword argument - #1211:
dpctl.tensor.multiply
,dpctl.tensor.subtract
- #1214:
dpctl.tensor.not_equal
- #1216:
dpctl.tensor.exp
,dpctl.tensor.sin
- #1217:
dpctl.tensor.real
,dpctl.tensor.imag
,dpctl.tensor.proj
- #1218:
dpctl.tensor.log
,dpctl.tensor.log1p
,dpctl.tensor.expm1
- #1221:
dpctl.tensor.floor_divide
- #1235:
dpctl.tensor.less
- #1237: in-place support for addition, multiplication and subtraction
- #1203:
- Added
dpctl.tensor.all
anddpctl.tensor.any
#1204 - Added
dpctl.tensor.sum
#1210
Changed
- Updated examples of native Python extensions built using
dpctl
#1108 - Used security flags to compile and link native extensions of
dpctl
#1109 - Changed types of
dpctl.tensor.finfo
anddpctl.tensor.iinfo
output structure per array API spec #1110 - Consolidated multiple USM temporaries life-time management
host_task
s to improve test suite stability #1111 - MAINT: Improved cmake target dependency tracking #1112
- MAINT: Improved docstrings for existing
dpctl.tensor
functions #1123 - Changed default value of
mode
keyword indpctl.tensor.take
anddpctl.take.put
fromclip
towrap
#1132 - Added support for (nested) sequence of
dpctl.tensor.usm_ndarray
objects indpctl.tensor.asarray
#1139 - Improved exception handling in
dpctl.tensor.usm_ndarray.__setitem__
special method #1146 - Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
- Improved speed of
dpctl.tensor.usm_ndarray
printing functionality #1187 - Require DPC++ RT 2023.1 to build and run
dpctl
#1195 - Compile offloading native extensions with
-fno-sycl-id-queries-fit-in-int
fixing gh-1184, #1200 - Transition to conda-forge ecosystem #1213
Fixed
- Fix to add empty values check for
dpctl.tensor.place
#1105, #1106 - Fixed gh-1089 by improving
dpctl.tensor.asarray
handling of NumPy arrays viewing into host-accessible USM allocation objects. - MAINT: Fixed build break with newer GCC and SYCLOS #1118
- Fixed a bug in basic indexing of
dpctl.tensor.usm_ndarray
#1136