Frequently Asked Questions Using UBLAS

BOOST WIKI | RecentChanges | Preferences | Page List | Links List

Frequently Asked Questions Using UBLAS

Why are uBLAS vectors slower than std::valarray

Example:

    std::valarray<RealType> vax(size);
    std::valarray<RealType> vay(size);

    vax += vay;

    boost::numeric::ublas::vector<RealType> ux(size);
    boost::numeric::ublas::vector<RealType> uy(size);

    ux += uy;

The first assigned seems to be at least twice as fast as the second assignment, although the same algorithm is used. Why? UBLAS always expects that the right hand side of an expression can have common storage with the left hand side, i.e. that there is aliasing. Thus uBLAS first evaluates the right hand side into a temporary vector and then applies the changes to the left hand side. (Update: uBLAS automatically assumes the absence of aliasing when lhs and rhs are simple containers. uBLAS can not automatically decide when the rhs is a general vector or matrix expression.)

If you know for sure that there is no aliasing, you have to explicitly state it, either by using noalias() or the corresponding *_assign() member function.

    ux .plus_assign( uy );
    noalias( ux ) += uy;

Why is uBLAS so much slower than (atlas-)BLAS

$ g++-3.4 -o sample83 sample83.cpp -I $HOME/include/ -DNDEBUG -O2 -g \ 
  -L /usr/lib/3dnow -lcblas -latlas -march=athlon -funroll-loops
$ ./sample83 500 2>/dev/null
size of double matrices - 500x500*500x500 (2005-06-11)

       prod    axpy    opb    block   goto    atlas    rkc     rck     krc     kcr     crk     ckr
RRR    2.77    1.38    1.73    0.48    0.46    0.12    1.69    3.76    2.06    7.09    2.96    7.11
RRC    1.56    1.73    1.74    0.47    0.47    0.12    3.51    1.83    2.84    7.14     1.9    7.09
RCR    6.04    1.42    1.71    0.49    0.45    0.13    1.76    6.61    2.03       4    6.69    3.08
RCC    2.63    2.77    1.73    0.48    0.45    0.12    3.53    2.96    2.81    4.03     3.9    3.03
CRR    2.61    2.63    1.73    0.48    0.46    0.12    3.04    3.78    4.05    2.81    2.91    3.44
CRC    1.13    1.69    1.76    0.48    0.46    0.13    6.98     1.9    7.01    2.86    1.85    3.45
CCR    6.05    1.62    1.73    0.49    0.47    0.11    3.07    6.66       4    1.99    6.66    1.77
CCC    2.75    1.59    1.72    0.49    0.46    0.13    6.99    3.02    6.98    2.04    3.86    1.72

(first column gives storage orientation of X, A and B, other columns present times on an Athlon XP (1466MHz) for X += A*B using different products) source: sample83.cpp [1] (this link is temporarily down)

Users on the mailing list replied:

: An important design goal of ublas is to be as general as possible. Otherwise blas are very special. So AFAIR we decided not to invest to much time into connecting ublas and blas, but to concentrate on correctness of the algorithms. I think the consense was first to provide an easy to use interface and (long long time later ;-) maximize performance.

: Performance is important, and without high performance I would not be allowed to use the super-computers I have access to, but correctness is more important. And please be aware, that there are vendor supplied BLAS implementations which are just very lazy with accuracy, especially on the Itanium 2, so a reliable implementation is crucial for me to check the correctness of the code. And yes, I do get different results, depending on the BLAS implementation I use in the case I work with not-so well conditioned matrices, and it is _not_ a bug in my code.

Results for other data types

size of float matrices - 500x500*500x500

       prod    axpy    opb    block   goto    atlas    rkc     rck     krc     kcr     crk     ckr
RRR    1.14    0.86    0.99    0.46    0.45    0.11    1.02    2.65    1.29    4.56    1.66    4.39
RRC    0.86    0.96    1.01    0.45    0.43     0.1    2.24    1.18     1.4    4.58    1.25    4.42
RCR     3.5    0.91    0.98    0.47    0.44    0.11    1.06    4.36    1.25     2.6    4.46    1.54
RCC    1.01    1.28       1    0.45    0.44     0.1    2.28    1.48    1.34    2.64    2.62     1.5
CRR    1.03     1.1    1.01    0.45    0.43     0.1    1.49     2.7    2.66    1.35    1.61    2.19
CRC    0.88    0.94    1.02    0.44    0.43    0.11    4.36    1.27     4.5    1.37    1.19    2.17
CCR     3.5    0.93    0.98    0.47    0.44     0.1    1.54    4.38    2.62    1.21    4.41    1.06
CCC    1.05    0.88    0.99    0.45    0.43    0.11     4.4     1.6    4.51    1.24    2.55    1.04

Strange compile error using zero_vector

Q: Why can't I initialize a vector from zero_vector?

  typedef ublas::zero_vector<double> zero_vec;
  typedef ublas::vector<double>      Vector;
  std::size_t N = 5;
  Vector vec(zero_vec(N));  // compile error

A: This syntax declares a function taking one argument, a zero_vec named N, and returning Vector. You have to make it not look like a function:

   Vector vec((zero_vec(N)));  // extra brackets
   Vector vec = zero_vec(N);

How do you convert between uBLAS and STL types

Q: how can I get a row out as a STL vector if I have data in a STL vector, how do I get it into a ublas vector/matrix?

A: You can always use std::copy(ublas_container.begin(), ublas_container.end(), stl_container.begin()); and std::copy(stl_container.begin(), stl_container.end(), ublas_container.begin()); if they have the same size.

Why are small vectors/matrices so slow?

Q: Why is ublas so slow when I use 3x3 matrices and vectors of size 3?

A: Because the main development was on (large) dynamic sized matrices and sparse or structured matrices. For small fixed size linear algebra you should have a look at tvmet[2].

Why does ublas::symmetric_matrix not work with BLAS, LAPACK for symmetric matrices?

Q: I get compile errors with potrs, symm, symv, syrk, ...

A: The ublas::symmetric_matrix uses packed storage but the above functions require a dense matrix. Try to use the TP, SP ord HP versions of your BLAS-lib. LAPACK usually operates on dense matrices.

Disclaimer: This site not officially maintained by Boost Developers

Frequently Asked Questions Using UBLAS

Frequently Asked Questions Using UBLAS

Other Collections of Answers

Why are uBLAS vectors slower than std::valarray

Why is uBLAS so much slower than (atlas-)BLAS

Strange compile error using zero_vector

How do you convert between uBLAS and STL types

Why are small vectors/matrices so slow?

Why does ublas::symmetric_matrix not work with BLAS, LAPACK for symmetric matrices?