title |
---|
Utilisation cheat sheet for Kokkos |
Only for Kokkos 4.2 and more, for older verison look at the doc.
- title: Utilisation cheat sheet for Kokkos
- Header
- Initialization
- Kokkos concepts
- Memory management
- Parallelism patterns
- Execution policy
- Scratch memory
- Atomics
- Mathematics
- Utilities
- Macros
#include <Kokkos_Core.hpp>
int main(int argc, char* argv[]) {
Kokkos::initialize(argc, argv);
{ /* ... */ }
Kokkos::finalize();
return 0;
}
int main(int argc, char* argv[]) {
Kokkos::ScopeGuard kokkos(argc, argv);
/* ... */
return 0;
}
Execution space | Device backend | Host backend |
---|---|---|
Kokkos::DefaultExecutionSpace |
On device | On host |
Kokkos::DefaultHostExecutionSpace |
On host | On host |
https://kokkos.org/kokkos-core-wiki/API/core/execution_spaces.html
https://kokkos.org/kokkos-core-wiki/API/core/memory_spaces.html
Memory space | Device backend | Host backend |
---|---|---|
Kokkos::DefaultExecutionSpace::memory_space |
On device | On host |
Kokkos::DefaultHostExecutionSpace::memory_space |
On host | On host |
Memory space | Description |
---|---|
Kokkos::HostSpace |
Accessible by the host but not directly by the device |
Kokkos::SharedSpace |
Accessible by the host and the device; copy managed by the driver |
Kokkos::ScratchMemorySpace |
Accessible by the team or the thread that created it and nothing else |
Examples
// Host space
Kokkos::View<double*, Kokkos::HostSpace> hostView("hostView", numberOfElements);
// Shared space
Kokkos::View<double*, Kokkos::SharedSpace> sharedView("sharedView", numberOfElements);
// Scratch memory space
Kokkos::parallel_for(
Kokkos::TeamPolicy<>(leagueSize, teamSize),
KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& team) {
// Allocate scratch memory for each team
Kokkos::View<double*, Kokkos::ScratchMemorySpace> scratchView(team.team_scratch(1), scratchSize);
}
);
https://kokkos.org/kokkos-core-wiki/API/core/view/view.html
Kokkos::View<DataType, LayoutType, MemorySpace, MemoryTraits> view("label", numberOfElementsAtRuntimeI, numberOfElementsAtRuntimeJ);
Template argument | Description |
---|---|
DataType |
ScalarType for the data type, followed by a * for each runtime dimension, then by a [numberOfElements] for each compile time dimension |
LayoutType |
See memory layouts |
MemorySpace |
See memory spaces |
MemoryTraits |
See memory traits |
https://kokkos.org/kokkos-core-wiki/API/core/view/view.html#constructors
Examples
// A 1D view of doubles
Kokkos::View<double*> view1D("view1D", 100);
// Const view of doubles
Kokkos::View<const double*> constView1D("constView1D", 100) = view1D;
// A 2D view of integers
Kokkos::View<int**> view2D("view2D", 50, 50);
// 3D view with 2 runtime dimensions and 1 compile time dimension
Kokkos::View<double**[25]> view3D("view3D", 50, 42, 25);
Method | Description |
---|---|
(i, j...) |
Returns and sets the value at index i , j , etc. |
size() |
Returns the total number of elements in the view |
rank() |
Returns the number of dimensions |
layout() |
Returns the layout of the view |
extent(dim) |
Returns the number of elements in the requested dimension |
data() |
Returns a pointer to the underlying data |
https://kokkos.org/kokkos-core-wiki/API/core/view/view.html#data-access-functions
https://kokkos.org/kokkos-core-wiki/API/core/view/view.html#data-layout-dimensions-strides
Kokkos::resize(view, newNumberOfElementsI, newNumberOfElementsJ...);
https://kokkos.org/kokkos-core-wiki/API/core/view/resize.html
Kokkos::realloc(view, newNumberOfElementsI, newNumberOfElementsJ...);
https://kokkos.org/kokkos-core-wiki/API/core/view/realloc.html
Layout | Description | Default |
---|---|---|
Kokkos::LayoutRight |
Strides increase from the right most to the left most dimension, also known as row-major or C-like | CPU |
Kokkos::LayoutLeft |
Strides increase from the left most to the right most dimension, also known as column-major or Fortran-like | GPU |
Kokkos::LayoutStride |
Strides can be arbitrary for each dimension |
By default, a layout suited for loops on the high frequency index is used.
https://kokkos.org/kokkos-core-wiki/API/core/view/view.html#data-layout-dimensions-strides
Example
// 2D view with LayoutRight
Kokkos::View<double**, Kokkos::LayoutRight> view2D("view2D", 50, 50);
Memory traits are indicated with Kokkos::MemoryTraits<>
and are combined with the |
(pipe) operator.
Memory trait | Description |
---|---|
Kokkos::Unmanaged |
The allocation has to be managed manually |
Kokkos::Atomic |
All accesses to the view are atomic |
Kokkos::RandomAccess |
Hint that the view is used in a random access manner; if the view is also const this may trigger more efficient load operations on GPUs |
Kokkos::Restrict |
There is no aliasing of the view by other data structures in the current scope |
https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/View.html#access-traits
Examples
// Unmanaged view on CPU
double* data = new double[numberOfElements];
Kokkos::View<double*, Kokkos::MemoryTraits<Kokkos::Unmanaged>> unmanagedView(data, numberOfElements);
// Unmanaged view on GPU using CUDA
double* data;
cudaMalloc(&data, numberOfElements * sizeof(double));
Kokkos::View<double*, Kokkos::MemoryTraits<Kokkos::Unmanaged>, Kokkos::CudaSpace> unmanagedView(data, numberOfElements);
// Atomic view
Kokkos::View<double*, Kokkos::MemoryTraits<Kokkos::Atomic>> atomicView("atomicView", numberOfElements);
// Random access with constant data
// first, allocate non constant view
Kokkos::View<int*> nonConstView ("data", numberOfElements);
// then, make it constant
Kokkos::View<const int*, Kokkos::MemoryTraits<Kokkos::RandomAccess>> randomAccessView = nonConstView;
// Unmanaged, atomic, random access view on GPU using CUDA
double* data;
cudaMalloc(&data, numberOfElements* sizeof(double));
Kokkos::View<const double*, Kokkos::CudaSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged | Kokkos::Atomic | Kokkos::RandomAccess>> unmanagedView(data, numberOfElements);
Copying or assigning a view does a shallow copy, data are not synchronized in this case.
Kokkos::deep_copy(dest, src);
The views must have the same dimensions, data type, and reside in the same memory space (mirror views can be deep copied on different memory spaces).
https://kokkos.org/kokkos-core-wiki/API/core/view/deep_copy.html
- Kokkos example - simple memoryspace
- Kokkos example - overlapping deepcopy
- Kokkos Tutorials - Exercise 3
Example
Kokkos::View<double*> view1("view1", numberOfElements);
Kokkos::View<double*> view2("view2", numberOfElements);
// Deep copy of view1 to view2
Kokkos::deep_copy(view2, view1);
https://kokkos.org/kokkos-core-wiki/API/core/view/create_mirror.html
auto mirrorView = Kokkos::create_mirror(view);
auto mirrorView = Kokkos::create_mirror_view(view);
auto mirrorView = Kokkos::create_mirror_view_and_copy(ExecutionSpace(), view);
A subview has the same reference count as its parent view, so the parent view won't be deallocated before all subviews go away.
https://kokkos.org/kokkos-core-wiki/API/core/view/subview.html
https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/Subviews.html
auto subview = Kokkos::subview(view, Kokkos::ALL, Kokkos::pair(rangeFirst, rangeLast), value);
Subset selection | Description |
---|---|
Kokkos::ALL |
All elements in this dimension |
Kokkos::pair |
Range of elements in this dimension |
value |
Specific element in this dimension |
https://kokkos.org/kokkos-core-wiki/API/containers/ScatterView.html
https://github.com/kokkos/kokkos-tutorials/blob/main/LectureSeries/KokkosTutorial_03_MDRangeMoreViews.pdf
#include <Kokkos_ScatterView.hpp>
ScatterView<DataType, Operation, ExecutionSpace, Layout, Contribution> scatter(targetView);
Template argument | Description |
---|---|
DataType |
Scalar type of the view and its dimensionality |
Operation |
See scatter operation; defaults to Kokkos::Experimental::ScatterSum |
ExecutionSpace |
See execution spaces; defaults to Kokkos::DefaultExecutionSpace |
Layout |
See layouts |
Duplication |
Whether to duplicate the grid or not; defaults to Kokkos::Experimental::ScatterDuplicated , other option is Kokkos::Experimental::ScatterNonDuplicated |
Contribution |
Whether to contribute to use atomics; defaults to Kokkos::Experimental::ScatterAtomic , other option is Kokkos::Experimental::ScatterNonAtomic |
Operation | Description |
---|---|
Kokkos::Experimental::ScatterSum |
Sum |
Kokkos::Experimental::ScatterProd |
Product |
Kokkos::Experimental::ScatterMin |
Minimum value |
Kokkos::Experimental::ScatterMax |
Maximum value |
auto access = scatter.access();
access(index) += value;
Kokkos::Experimental::contribute(targetView, scatter);
Full example
#include<Kokkos_ScatterView.hpp>
// Compute histogram of values in view1D
KOKKOS_INLINE_FUNCTION int getIndex(double pos) { /* ... */ }
KOKKOS_INLINE_FUNCTION double compute(double weight) { /* ... */ }
// List of elements to process
Kokkos::View<double*> positions("positions", 100);
Kokkos::View<double*> weight("weight", 100);
// Historgram of N bins
Kokkos::View<double*> histogram("bar", N);
Kokkos::Experimental::ScatterView<double*> scatter(histogram);
Kokkos::parallel_for(
100,
KOKKOS_LAMBDA(const int i) {
// scatter
auto access = scatter.access();
// compute
auto index = getIndex(positions(i);
auto contribution = compute(weight(i);
access(index) += contribution;
}
);
// gather
Kokkos::Experimental::contribute(histogram, scatter);
Kokkos::parallel_for(
"label",
ExecutionPolicy</* ... */>(/* ... */),
KOKKOS_LAMBDA (/* ... */) { /* ... */ }
);
ScalarType result;
Kokkos::parallel_reduce(
"label",
ExecutionPolicy</* ... */>(/* ... */),
KOKKOS_LAMBDA (/* ... */, ScalarType& resultLocal) { /* ... */ },
Kokkos::ReducerConcept<ScalarType>(result)
);
With Kokkos::ReducerConcept
being one of the following:
Reducer | Operation | Description |
---|---|---|
Kokkos::BAnd |
& |
Binary and |
Kokkos::BOr |
| |
Binary or |
Kokkos::LAnd |
&& |
Logical and |
Kokkos::LOr |
|| |
Logical or |
Kokkos::Max |
std::max |
Maximum |
Kokkos::MaxLoc |
std::max_element |
Maximum and associated index |
Kokkos::Min |
std::min |
Minimum |
Kokkos::MinLoc |
std::min_element |
Minimum and associated index |
Kokkos::MinMax |
std::minmax |
Minimun and maximum |
Kokkos::MinMaxLoc |
std::minmax_element |
Minimun and maximun and associated indices |
Kokkos::Prod |
* |
Product |
Kokkos::Sum |
+ |
Sum |
The reducer class can be omitted for Kokkos::Sum
.
https://kokkos.org/kokkos-core-wiki/API/core/parallel-dispatch/parallel_reduce.html
https://kokkos.org/kokkos-core-wiki/API/core/builtin_reducers.html
Kokkos::fence();
https://kokkos.org/kokkos-core-wiki/API/core/parallel-dispatch/fence.html
ExecutionSpace().fence();
Kokkos::TeamPolicy<>::member_type().team_barrier();
https://kokkos.org/kokkos-core-wiki/API/core/execution_spaces.html#functionality
ExecutionPolicy<ExecutionSpace, Schedule, IndexType, LaunchBounds, WorkTag> policy(/* ... */);
Template argument | Description |
---|---|
ExecutionSpace |
See execution spaces; defaults to Kokkos::DefaultExecutionSpace |
Schedule |
How to schedule work items; defaults to machine and backend specifics |
IndexType |
Integer type to be used for the index; defaults to int64_t |
LaunchBounds |
Hints for CUDA and HIP launch bounds |
WorkTag |
Empty tag class to call the functor |
https://kokkos.org/kokkos-core-wiki/API/core/Execution-Policies.html
Kokkos::RangePolicy<ExecutionSpace, Schedule, IndexType LaunchBounds, WorkTag> policy(first, last);
If the range starts at 0 and uses default parameters, can be replaced by just the number of elements.
https://kokkos.org/kokkos-core-wiki/API/core/policies/RangePolicy.html
Kokkos::MDRangePolicy<ExecutionSpace, Schedule, IndexType, LaunchBounds, WorkTag, Kokkos::Rank<2>> policy({firstI, firstJ}, {lastI, lastJ});
https://kokkos.org/kokkos-core-wiki/API/core/policies/MDRangePolicy.html
https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/HierarchicalParallelism.html
Kokkos::TeamPolicy<ExecutionSpace, Schedule, IndexType, LaunchBounds, WorkTag> policy(leagueSize, teamSize);
Usually, teamSize
is replaced by Kokkos::AUTO
to let Kokkos determine it.
A kernel running in a team policy has a Kokkos::TeamPolicy<>::member_type
argument:
Method | Description |
---|---|
league_size() |
Number of teams in the league |
league_rank() |
Index of the team withing the league |
team_size() |
Number of threads in the team |
team_rank() |
Index of the thread within the team |
https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamPolicy.html
Kokkos::parallel_for(
"label",
Kokkos::TeamPolicy<>(numberOfElementsI, Kokkos::AUTO),
KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& teamMember) {
const int i = teamMember.team_rank();
Kokkos::parallel_for(
Kokkos::TeamVectorRange(teamMember, firstJ, lastJ),
[=] (const int j) { /* ... */ }
);
}
);
Kokkos::TeamVectorRange range(teamMember, firstJ, lastJ);
https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamVectorRange.html
Kokkos::TeamVectorMDRange<Kokkos::Rank<2>, Kokkos::TeamPolicy<>::member_type> range(teamMember, numberOfElementsJ, numberOfElementsK);
https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamVectorMDRange.html
Kokkos::parallel_for(
"label",
Kokkos::TeamPolicy<>(numberOfElementsI, Kokkos::AUTO),
KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& teamMember) {
const int i = teamMember.team_rank();
Kokkos::parallel_for(
Kokkos::TeamThreadRange(teamMember, firstJ, lastJ),
[=] (const int j) {
Kokkos::parallel_for(
Kokkos::ThreadVectorRange(teamMember, firstK, lastK),
[=] (const int k) { /* ... */ }
);
}
);
}
);
Kokkos::TeamThreadRange range(teamMember, firstJ, lastJ);
Kokkos::ThreadVectorRange range(teamMember, firstK, lastK);
https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamThreadRange.html
https://kokkos.org/kokkos-core-wiki/API/core/policies/ThreadVectorRange.html
Kokkos::TeamThreadMDRange<Kokkos::Rank<2>, Kokkos::TeamPolicy<>::member_type> range(teamMember, numberOfElementsJ, numberOfElementsK);
Kokkos::ThreadVectorMDRange<Kokkos::Rank<2>, Kokkos::TeamPolicy<>::member_type> range(teamMember, numberOfElementsL, numberOfElementsM);
https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamThreadMDRange.html
https://kokkos.org/kokkos-core-wiki/API/core/policies/ThreadVectorMDRange.html
Each team has access to a scratch memory pad, which has the team's lifetime, and is only accessible by the team's threads.
https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/HierarchicalParallelism.html#team-scratch-pad-memory
Level | Memory size | Access speed |
---|---|---|
0 | Limited (tens of kilobytes) | Fast |
1 | Larger (few gigabytes) | Medium |
// Define a scratch memory view type
using ScratchPadView = View<double*, ExecutionSpace::scratch_memory_space, MemoryUnmanaged>;
// Compute how much scratch memory (in bytes) is needed
size_t bytes = ScratchPadView::shmem_size(vectorSize);
Kokkos::parallel_for(
Kokkos::TeamPolicy<ExecutionSpace>(leagueSize, teamSize).set_scratch_size(spaceLevel, Kokkos::PerTeam(bytes)),
KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& teamMember) {
const int i = teamMember.team_rank();
// Create a view for the scratch pad
ScratchPadView scratch(teamMember.team_scratch(spaceLevel), vectorSize);
// Initialize it
Kokkoss::parallel_for(
Kokkos::ThreadVectorRange(teamMember, vectorSize),
[=] (const int j) { scratch(j) = view(i, j); }
);
// Synchronize
teamMember.team_barrier();
}
);
https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/Atomic-Operations.html
https://kokkos.org/kokkos-core-wiki/API/core/atomics.html
Operation | Replaces |
---|---|
Kokkos::atomic_add |
+= |
Kokkos::atomic_and |
&= |
Kokkos::atomic_assign |
= |
Kokkos::atomic_decrement |
-- |
Kokkos::atomic_increment |
++ |
Kokkos::atomic_max |
std::max on previous and new value |
Kokkos::atomic_min |
std::min on previous and new value |
Kokkos::atomic_or |
|= |
Kokkos::atomic_sub |
-= |
Example
Kokkos::parallel_for(
numberOfElements,
KOKKOS_LAMBDA (const int i) {
const int value = /* ... */;
const int bucketIndex = computeBucketIndex (value);
Kokkos::atomic_increment(&histogram(bucketIndex));
}
);
Operation | Description |
---|---|
Kokkos::atomic_exchange |
Assign destination to new value and return old value |
Kokkos::atomic_compare_exchange_strong |
Assign destination to new value if old value equals a comparison value |
Example
// Assign destination to new value and return old value
int new = 20;
int destination = 10;
int old = atomic_exchange(&destination, new);
// Assign destination to new value if old value equals a comparison value
int new = 20;
int destination = 10;
int comparison = 10;
bool success = atomic_compare_exchange_strong(&destination, comparison, new);
Function type | List of functions (prefixed by Kokkos:: ) |
---|---|
Basic operations | abs , fabs , fmod , remainder , fma , fmax , fmin , fdim , nan |
Exponential | exp , exp2 , expm1 , log , log2 , log10 , log1p |
Power | pow , sqrt , cbrt , hypot |
Trigonometric | sin , cos , tan , asin , acos , atan , atan2 |
Hyperbolic | sinh , cosh , tanh , asinh , acosh , atanh |
Error and gamma | erf , erfc , tgamma , lgamma |
Nearest | ceil , floor , trunc , round , nearbyint |
Floating point | logb , nextafter , copysign |
Comparisons | isfinite , isinf , isnan , signbit |
Note that not all C++ standard math functions are available.
https://kokkos.org/kokkos-core-wiki/API/core/numerics/mathematical-functions.html?highlight=math
Kokkos::complex<double> complex(realPart, imagPart);
Method | Description |
---|---|
real() |
Returns or sets the real part |
imag() |
Returns or sets the imaginary part |
https://kokkos.org/kokkos-core-wiki/API/core/utilities/complex.html
https://kokkos.org/kokkos-core-wiki/API/core/Utilities.html
Kokkos::abort("message");
Kokkos::printf("format string", arg1, arg2);
Similar to std::printf
.
Kokkos::Timer timer;
Method | Description |
---|---|
seconds() |
Returns the time in seconds since construction or last reset |
reset() |
Resets the timer to zero |
Function | Description |
---|---|
Kokkos::device_id() |
Returns the device ID of the current device |
Kokkos::num_devices() |
Returns the number of devices available to the current execution space |
https://kokkos.org/kokkos-core-wiki/API/core/Macros.html
Macro | Description |
---|---|
KOKKOS_LAMBDA |
Replaces capture argument for lambdas |
KOKKOS_INLINE_FUNCTION |
Inlined functor attribute |
KOKKOS_FUNCTION |
Functor attribute |
Macro | Description |
---|---|
KOKKOS_VERSION |
Kokkos full version |
KOKKOS_VERSION_MAJOR |
Kokkos major version |
KOKKOS_VERSION_MINOR |
Kokkos minor version |
KOKKOS_VERSION_PATCH |
Kokkos patch level |
KOKKOS_ENABLE_* |
Any equivalent CMake option passed when building Kokkos, see installation cheat sheet |
KOKKOS_ARCH_* |
Any equivalent CMake option passed when building Kokkos, see installation cheat sheet |