The book High Performance Parallelism Pearls by Reinders and Jeffers (Intel) has been published by Elsevier. The book contains many examples of programming techniques proven successful on the Intel Xeon Phi. From our lab, Albert-Jan Yzelman, Dirk Roose, and Karl Meerbergen have contributed with a chapter on “Sparse matrix-vector multiplication: parallelization and vectorization”. A review is found on

The chapter authors (Albert-Jan N. Yzelman, Dirk Roose, and Karl Meerbergen) note that, “Current hardware trends lead to an increasing width of vector units as well as to decreasing effective bandwidth-per-core. For sparse computations these two trends conflict.”  For this reason they designed a usable and efficient data structure for vectorized sparse computations  on multi-core architectures with vector processing capabilities – like Intel Xeon Phi. This data structures helps with the difficulties in achieving a high performance for sparse matrix–vector (SpMV) multiplications caused by a low flop-to-byte ratio and inefficient cache use.

The corresponding software has been released in version 1.6 of the Sparse Library. The most important new features in this version are:

  • public release of the vectorised BICRS sparse matrix format, which allows for state-of-the-art high performance sparse matrix operations on the Intel Xeon Phi;
  • added support for multiple right hand side sparse matrix—vector multiplications, i.e., Z=AX or Z=XA, with Z and X tall skinny matrices of width k (for small k).

The book is available from the Amazon or the Elsevier store.

A detailed list of changes in the Sparse Library software is found on
The software can be downloaded from


Comments are closed.