Skip to content

The vectorized (AVX-512) batched singular value decomposition algorithm for matrices of order two.

License

Notifications You must be signed in to change notification settings

venovako/VecKog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VecKog

The vectorized (AVX-512) batched singular value decomposition algorithm for matrices of order two.

This software is a supplementary material for the paper doi:10.1142/S0129626420500152 (arXiv:2005.07403 [cs.MS]).

Building

Prerequisites

A recent Intel C compiler on a 64-bit Linux (e.g., CentOS 7.8) is required. The Intel MKL (Math Kernel Library) is recommended, but another LAPACK library could work with some tweaking.

Make options

Run make in the src subdirectory as follows:

make [COMPILER=x64x|x200|x64] [MARCH=...] [NDEBUG=optimization_level] [TEST=0..15] [all|clean|help]

where COMPILER should be set to x64x for Xeons, or to x200 for Xeon Phi KNLs, respectively. Here, NDEBUG should be set to the desired optimization level (3 is a sensible choice). If unset, the predefined debug-mode build options will be used.

For testing, TEST=0 builds the vectorized code, and TEST=4 builds the pointwise code. Adding two to TEST enables the optional backscaling, while adding one enables the step-by-step printouts. Adding eight to TEST turns on tracking of IA32_MPERF and IA32_APERF MSRs (requires running the executables as root). For example, make COMPILER=x200 NDEBUG=3 clean all will trigger a full, release-mode rebuild for the KNLs of the vectorized code only (equivalent to TEST=0).

Running

The test data generator

To write N finite pseudorandom doubles into FileName file, run:

./src/rndgen.exe N FileName

A single-vector algorithm test

To test the real (or the complex, in the second line) algorithm T, where T=TEST, on N vectors from FileName, run:

./src/d8svd2tT.exe N FileName
./src/z8svd2tT.exe N FileName

The multi-batch test

To test the real (or the complex, in the second line) algorithm T, where T=TEST, on #batches batches, each with n matrices read from infile, run:

./src/dbatchT.exe n #batches infile
./src/zbatchT.exe n #batches infile

For now, n has to be a power of two (not a constraint on the algorithm itself, but only on the error testing procedure).

This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).

About

The vectorized (AVX-512) batched singular value decomposition algorithm for matrices of order two.

Topics

Resources

License

Stars

Watchers

Forks