Skip to content

Releases: IntelPython/dpctl

v0.18.1

14 Oct 11:56
5e5513f
Compare
Choose a tag to compare

This is incremental release where only installation instructions in README were updated to reflect the change in location of index with Python packages built by Intel(R) relative to 0.18.0 release.

v0.18.0

30 Sep 10:42
786365e
Compare
Choose a tag to compare

This release reaches an important milestone of making offloading fully asynchronous.

Calls to dpctl.tensor submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.

The full list of changes that went into this release are:

Added

  • Implement tensor.take_along_axis per Python Array API specification gh-1778
  • Implement tensor.put_along_axis to complement tensor.take_along_axis gh-1798
  • Support for 'device=tensor.kDLCPU' in tensor.from_dlpack function and tensor.usm_ndarray.__dlpack__ method gh-1781
  • Support DLPack on Windows gh-1746
  • Implement tensor.nextafter function per Python Array API specification gh-1730
  • Implement tensor.count_nonzero and tensor.diff functions from Python array API specification gh-1732, gh-1780
  • Add support for order="K" to *_like array creation functions, and change default order keyword value from 'C' to 'K' gh-1808
  • Support for 'max dimensions' in Array API capabilities info data gh-1774
  • Add support for device aspect 'emulated' gh-1691
  • dpctl::tensor::usm_memory class defined in dpctl4pybind11.hpp adds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782
  • Add support for COVERAGE build type in project's CMake script gh-1692

Change

  • Change ownership of USM allocation by dpctl.memory objects, make executions of dpctl.tensor operations asynchronous gh-1705
  • Add support for Python scalars by tensor.where function gh-1719
  • Optimize division by Python scalar in statistical functions tensor.mean, tensor.std, tensor.var gh-1820
  • Use transcendental functions from sycl namespace instead of std namespace gh-1707
  • Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
  • Array creation function tensor.zeros to use asynchronous memset operation gh-1806
  • The setter of tensor.usm_ndarray.shape property now supports Python scalar value gh-1786
  • Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
  • No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
  • Update version of 'pybind11' used gh-1758, gh-1812
  • Handle possible exceptions by usm_host_allocator used with std::vector gh-1791
  • Use dpctl::tensor::offset_utils::sycl_free_noexcept instead of sycl::free in host_task tasks associated with life-time management of temporary USM allocations gh-1797
  • Add "same_kind"-style casting for in-place mathematical operators of tensor.usm_ndarray gh-1827, gh-1830

Fixed

  • Fix setting of release variable Sphinx config file gh-1685
  • Handle possible NULL return value from device aspect queries DPCTLDevice_GetMaxWorkGroupSize1d and DPCTLDevice_GetMaxWorkGroupSize2d gh-1690
  • Add license header to conda script files gh-1695
  • Fix tensor.round behavior on CUDA devices gh-1700
  • Add missing #include <sstream> gh-1701
  • Fix for issue 1724 gh-1728
  • Correct USM type for return array of tensor.extract function gh-1727
  • Fix for tensor.unique_all and tensor.unique_inverse to always return index arrays with default indexing data type gh-1741
  • Propagate read-only flag from __sycl_usm_array_interface__ in tensor.asarray function gh-1756
  • tensor.clip to handle Python scalars which are out of bound for the data type of integral array gh-1759
  • Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
  • Element-wise tensor.divide and comparison operations allow greater range of Python integer and integer array combinations gh-1771
  • Fix for unexpected behavior when using floating point types for array indexing gh-1792
  • Enable pytest --pyargs dpctl.tests gh-1833

Maintenance

  • Improve performance of test_sort_complex_fp_nan gh-1704
  • Improve exception wording raised by tensor.broadcast_arrays() gh-1720
  • Remove template keyword in method call of sycl::kernel_bundle gh-1726
  • Backport changelog edits from maintenance/0.17.x gh-1736
  • Replace uses of 'intel' channels in docs and readme file gh-1737
  • Update references to deprecated environment variable SYCL_DEVICE_FILTER gh-1740
  • Correction for installation instruction steps gh-1754
  • Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
  • Add missing include to fix build break with newer LLVM gh-1776
  • Add #include <utility> for definition of std::move used gh-1787
  • Change to CMake script to accomodate DPC++ transition from PI to UR architecture gh-1788
  • Document tensor._flags.Flags class gh-1794
  • Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
  • Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
  • Clean-up uses of Strided1DIndexer class gh-1805
  • Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
  • Do not add sycl::event associated with compute task to vector of events representing execution of host_task gh-1807
  • Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on libze1 package which provides Level-Zero loader library gh-1801, gh-1840
  • Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
  • Remove recommendation to install wheels from Anaconda PyPI index gh-1819
  • Removed use of post-link and pre-unlink conda scripts in dpctl gh-1821
  • Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
  • A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly:
    gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, [gh-1721](https...
Read more

0.17.0

14 Jul 13:51
Compare
Choose a tag to compare

This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions,
and complies with revision 2023.12 of Python Array API specification.

Added

  • Added pybind11 caster for sycl::half to map to/from Python float to "dpctl4pybind11.hpp" header: gh-1655
  • Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
  • Implemented tensor.cumulative_sum, tensor.cumulative_prod and tensor.cumulative_logsumexp: gh-1602

Changed

  • Expanded documentation for dpctl: gh-1619
  • Expanded utils.intel_device_info functionality: gh-1656
  • Improved performance of elementwise operations: gh-1651
  • Efficiency improvement by avoiding unnecessary copying of sycl::queue: gh-1645
  • dpctl uses pybind11 2.12.0: gh-1640
  • Improved performance of tensor.reshape operation with order="F" when copying is needed, or requested: gh-1677

Fixed

  • Fixed initialization of byte type constants in dpctl_capi Python/C API loader class in "dpctl4pybind11.hpp": gh-1665
  • Fixed crash in tensor.sort reported for a CPU device and a CUDA device: gh-1676
  • Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
  • Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
  • Support use of index arrays of different integral types in indexing operations: gh-47
  • Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
  • Corrected tensor.tile for scalar inputs and empty repetitions: gh-1628
  • Fixed support for out keyword in tensor.matmul: gh-1610
  • Fixed bug in basic slicing of empty arrays: gh-1680
  • Fixed bug in tensor.bitwise_invert for boolean input array: gh-1681
  • Fixed bug in tensor.repeat on zero-size input arrays: gh-1682

New Contributors

Full Changelog: https://github.com/IntelPython/dpctl/blob/master/CHANGELOG.md

v0.16.1

11 Apr 01:25
1f13ce8
Compare
Choose a tag to compare

This release includes bug fixes and provides a change needed by numba_dpex project to support dispatching kernels
consuming instances of sycl::local_accessor template type.

Changed

  • Changed behavior of dpctl.tensor.usm_ndarray.__dlpack_device__ method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
  • Array creation functions and the usm_ndarray constructor in dpctl.tensor submodule now use cached default-selected device to improve performance: #1606
  • Changed treatment of axis keyword for dpctl.tensor.tensordot and dpctl.tensor.vecdot to align with Python Array API 2023.12 specification: #1608
  • Changed implementation of DPCTLQueue_SubmitRange, DPCTLQueue_SubmitNDRange in DPCTLSyclInterface library to support sycl::local_accessor arguments needed by numba_dpex; the enum DPCTLKernelArgT\ ype to correspond to C++ disjoint types: #1609, #1611, #1612

Fixed

  • Fixed a crash on Windows platform during execution of getter of dpctl.SyclPlatfom.default_context property: : #1604
  • Fixed kernel submission error on NVidia CUDA GPUs during dpctl.tensor.matmul operation: #1605
  • Fixed corruption of context cache table entries: #1607
  • Fixed incorrect result from dpctl.tensor.tensordot reported in issue #1570: #1608
  • Fixed output of python -m dpctl --library to fix specified library name: #1615

v0.16.0

28 Mar 02:59
Compare
Choose a tag to compare

This release is virtually identical to 0.15.1 as far as features are concerned.

This release is meant to be built with DPC++ 2024.1.0, that no longer support older integrated Gen9 Intel GPUs, such as those that came with Intel Core 10th generation and older.

v0.15.1

10 Feb 21:51
Compare
Choose a tag to compare

Summary

This release reaches milestone of 100% compliance of dpctl.tensor functions with Python Array API 2022.12 standard for the main namespace.

Added

  • Added reduction functions dpctl.tensor.min, dpctl.tensor.max, dpctl.tensor.argmin, dpctl.tensor.argmax, and dpctl.tensor.prod per Python Array API specifications: #1399
  • Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of dpctl.tensor.usm_ndarray type: #1431, #1447
  • Added new elementwise functions dpctl.tensor.cbrt, dpctl.tensor.rsqrt, dpctl.tensor.exp2, dpctl.tensor.copysign, dpctl.tensor.angle, and dpctl.tensor.reciprocal: #1443, #1474
  • Added statistical functions dpctl.tensor.mean, dpctl.tensor.std, dpctl.tensor.var per Python Array API specifications: #1465
  • Added sorting functions dpctl.tensor.sort and dpctl.tensor.argsort, and set functions dpctl.tensor.unique_values, dpctl.tensor.unique_counts, dpctl.tensor.unique_inverse, dpctl.tensor.unique_all: #1483
  • Added linear algebra functions from the Array API namespace dpctl.tensor.matrix_transpose, dpctl.tensor.matmul, dpctl.tensor.vecdot, and dpctl.tensor.tensordot: #1490, #1525, #1541
  • Added dpctl.tensor.clip function: #1444, #1505
  • Added custom reduction functions dpt.logsumexp (reduction using binary function dpctl.tensor.logaddexp), dpt.reduce_hypot (reduction using binary function dpctl.tensor.hypot): #1446
  • Added inspection API to query capabilities of Python Array API specification implementation: #1469
  • Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
  • Added dpctl.utils.intel_device_info function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445
  • Added support for two new device descriptors, dpctl.SyclDevice.max_mem_alloc_size and dpctl.SyclDevice.max_clock_frequency: #1530

Changed

  • Functions dpctl.tensor.result_type and dpctl.tensor.can_cast became device-aware: #1488, #1473
  • Implementation of method dpctl.SyclEvent.wait_for changed to use sycl::event::wait instead of sycl::event::wait_and_throw: gh-1436
  • dpctl.tensor.astype was changed to support device keyword as per Python Array API specification: #1511
  • C++ header files in libtensor/include/kernels containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516

Fixed

v0.15.0

29 Sep 16:06
5bd924e
Compare
Choose a tag to compare

Summary

The 0.15.0 represents a milestone in which dpctl.tensor.usm_ndarray object now implements all special Python operators, except __matmul__ and __rmatmul__.

The dpctl.tensor increases its array-API conformance test suite pass rate to 81.8%, (passed: 916, failed: 84, skipped: 119).

Details

Added

  • Added dpctl.tensor.floor, dpctl.tensor.ceil, dpctl.tensor.trunc elementwise functions.
  • Added dpctl.tensor.hypot, dpctl.tensor.logaddexp elementwise functions.
  • Added trigonometric (dpctl.tensor.sin, dpctl.tensor.cos, dpctl.tensor.tan) and hyperbolic (dpctl.tensor.sinh, dpctl.tensor.cosh, dpctl.tensor.tanh) elementwise functions and their inverses (dpctl.tensor.asin, dpctl.tensor.asinh, dpctl.tensor.acos, dpctl.tensor.acosh, dpctl.tensor.atan, dpctl.tensor.atanh).
  • Added dpctl.tensor.round function.
  • Added dpctl.tensor.sign and dpctl.tensor.remainder elementwise functions.
  • Added bitwise elementwise functions dpctl.tensor.bitwise_and, dpctl.tensor.bitwise_xor, dpctl.tensor.bitwise_or, dpctl.tensor.bitwise_invert
  • Added bitwise shift functions dpctl.tensor.bitwise_left_shift and dpctl.tensor.bitwise_right_shift.
  • Added dpctl.tensor.atan2 and dpctl.tensor.signbit elementwise functions.
  • Added dpctl.tensor.minumum and dpctl.tensor.maximum binary elementwise functions.
  • Supported equality checking and hashing for dpctl.SyclPlatform.
  • Implemented types property for all unary and binary elementwise functions #1361
  • Added dpctl.tensor.repeat and dpctl.tensor.tile functions.
  • Added dpctl.tensor.matrix_transpose function.

Changed

  • Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for dpctl.tensor.usm_ndarray type #1324.
  • Removed dpctl.tensor.numpy_usm_shared obsolete class and associated tests which were being skipped #1310
  • Transitioned dpctl codebase to Cython 3.
  • Improved performance of boolean reduction functions dpctl.tensor.all and dpctl.tensor.any.
  • Improved performance of summation function dpctl.tensor.sum.
  • Improved in-place arithmetic operations for addition, subtraction and multiplication.
  • Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
  • Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
  • Removed deprecated DPCTLDevice_GetMaxWorkItemSizes function from the SyclInterface library.
  • Improved performance of dpctl.tensor.reshape in the case when a copy is being made.
  • Improved performance of dpctl.tensor.roll function.

Fixed

v0.14.5

19 Jul 12:34
f52182d
Compare
Choose a tag to compare

This release builds on 0.14.3 and 0.14.4 releases and addresses some performance gaps as well as implements several new elementwise functions.

Added

  • Added dpctl.tensor.log2 and dpctl.tensor.log10: #1267
  • Added dpctl.tensor.negative, dpctl.tensor.positive, dpctl.tensor.square #1268
  • Added dpctl.tensor.logical_not, dpctl.tensor.logical_and, dpctl.tensor.logical_or, dpctl.tensor.logical_xor #1270

Changed

  • dpctl.tensor.astype behavior for newdtype=None changes #1261
  • dpctl.tensor.usm_ndaray constructor default value of dtype keyword argument changed to None: #1265
  • Support for out arguments that overlap with inputs for unary elementwise functions#1281
  • Copying from one array to another a no-op if both arrays view into the same memory #1284

v0.14.4

19 Jul 12:32
3794cbc
Compare
Choose a tag to compare

This is hot-fix for 0.14.3 release.

Added

  • Added dpctl.tensor.less_equal, dpctl.tensor.greater, dpctl.tensor.greater_equal: #1239

Changed

  • Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244

Fixed

  • Fixed handling of 0d arrays in dpctl.tensor.sum: #1238

v0.14.3

19 Jul 12:31
81553f8
Compare
Choose a tag to compare

Added

  • Added support of axis=None in dpctl.tensor.concat #1125
  • Added caching for dpctl.SyclDevice.filter_string property #1127
  • Added dpctl.tensor.isdtype from array API #1133
  • Added dpctl.tensor.unstack, dpctl.tensor.moveaxis, dpctl.tensor.swapaxes #1137, #1174
  • Allow for mutation of dpctl.tensor.usm_ndarray.flags.writable #1141
  • Added dpctl.tensor.where from array API #1147
  • Include libtensor headers in dpctl installation layout #1185
  • Added new properties of dpctl.tensor.usm_ndarray object #1199
  • Added a list of unary and binary elementwise functions from array API:
    • #1203: dpctl.tensor.add, dpctl.tensor.divide, dpctl.tensor.isnan, dpctl.tensor.isinf, dpctl.tensor.isfinite, dpctl.tensor.cos, dpctl.tensor.abs, dpctl.tensor.equal
    • #1205: dpctl.tensor.sqrt
    • #1209: implements out keyword argument
    • #1211: dpctl.tensor.multiply, dpctl.tensor.subtract
    • #1214: dpctl.tensor.not_equal
    • #1216: dpctl.tensor.exp, dpctl.tensor.sin
    • #1217: dpctl.tensor.real, dpctl.tensor.imag, dpctl.tensor.proj
    • #1218: dpctl.tensor.log, dpctl.tensor.log1p, dpctl.tensor.expm1
    • #1221: dpctl.tensor.floor_divide
    • #1235: dpctl.tensor.less
    • #1237: in-place support for addition, multiplication and subtraction
  • Added dpctl.tensor.all and dpctl.tensor.any #1204
  • Added dpctl.tensor.sum #1210

Changed

  • Updated examples of native Python extensions built using dpctl #1108
  • Used security flags to compile and link native extensions of dpctl #1109
  • Changed types of dpctl.tensor.finfo and dpctl.tensor.iinfo output structure per array API spec #1110
  • Consolidated multiple USM temporaries life-time management host_tasks to improve test suite stability #1111
  • MAINT: Improved cmake target dependency tracking #1112
  • MAINT: Improved docstrings for existing dpctl.tensor functions #1123
  • Changed default value of mode keyword in dpctl.tensor.take and dpctl.take.put from clip to wrap #1132
  • Added support for (nested) sequence of dpctl.tensor.usm_ndarray objects in dpctl.tensor.asarray #1139
  • Improved exception handling in dpctl.tensor.usm_ndarray.__setitem__ special method #1146
  • Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
  • Improved speed of dpctl.tensor.usm_ndarray printing functionality #1187
  • Require DPC++ RT 2023.1 to build and run dpctl #1195
  • Compile offloading native extensions with -fno-sycl-id-queries-fit-in-int fixing gh-1184, #1200
  • Transition to conda-forge ecosystem #1213

Fixed

  • Fix to add empty values check for dpctl.tensor.place #1105, #1106
  • Fixed gh-1089 by improving dpctl.tensor.asarray handling of NumPy arrays viewing into host-accessible USM allocation objects.
  • MAINT: Fixed build break with newer GCC and SYCLOS #1118
  • Fixed a bug in basic indexing of dpctl.tensor.usm_ndarray #1136