![]() ![]() The operation I was actually interested can be done by: X and Yt are both matricies. ![]() The cases that immediately come to mind where this might actually make a big difference are: a) switching to OpenBLAS/BLIS/ACML for AMD CPU's, and b) switching to a newer version of MKL for AVX-512 enabled CPU's. If you use a different/newer, it will depend how well the BLAS package you use runs on your hardware. The difference is probably between 'very small' and 'negligible' though. If you call the same BLAS library that Matlab natively uses, using Matlab's builtin 'mtimes' function will be slightly faster, since it is a pre-compiled built-in BLAS wrapper which almost certainly has less overhead than what you could do in Matlab. Id be interested to see an actual comparison, but I can make an educated guess: I bet if you compared BLAS with the built-in Matlab multiplication operator, they'd be pretty close version -blasĢ017b on windows uses MKL v11.3.1 / LaPACK v3.5.0 / FFTW v3.3.3 This works with a few of the external libraries called by Matlab. This is why I refer to it as BLAS, since Matlab is really just acting as a wrapper for BLAS (in particular, Intel's MKL).įun fact: you can get the BLAS library name and version using an undocumented option for the 'version' command. Not all functions that ship with Matlab are built-ins. Note: I am using 'built-in' as it is defined by the MATLAB documentation. A built-in calls a pre-compiled library (probably in C, though it could be other non-matlab languages), and actually probably is faster then trying to manually call the same library from within Matlab (since calling a specific library natively in the binary is almost certainly faster than calling a function designed to handle general calls to external libraries). Though, this wouldnt matter when comparing to something like bsxfun, since bsxfun is a built-in. Why would you expect any Matlab function to be as fast as directly accessing a C library?Īs /u/silverbluephoenix said, I was referring to the built-in matrix multiplication (the function that actually implements this is called 'mtimes'). Things seem very consisstent once you get above. I re-ran this for both real and complex arrays of size, ,, ,, , and. The 3 inputs default to the values used to generate the original image I posted if they are left blank. Results get saved in a folder that is named based on the 3 aforementioned input variables. I tweaked it to make it very user friendly - tell it the matrix size to use, the number of trials / unique sparsity levels to use, and whether to use real- or complex-valued arrays, and it does everything for you. Id expect these results might change (not so much in overall ordering, but in magnitude of the difference) on other CPU artichetures. Side note: this was run on and ivy-bridge 4c/8t CPU. ![]() However, making both X and y improved things by another ~1.5x or so (except at very low sparsity). Making y sparse was a little better than making X sparse, which makes sense (you drop the same number of computations, but the "downsides" of using a sparse array are always smaller than for a sparse array). Note that X and y has the same overall sparsity in each test. The situation might not be as bad for y' * X'.įor BLAS, using sparse matricies was unilaterally helpful (in terms of execution time) for sparsity larger that 20% or so.įor BLAS, Making either X or y produced a similar gain. NOTE: this is probably because of how much Matlab dislikes out-of-order memory access with sparse arrays. It is never useful for the loop-based method. In the bsxfun method, having X be sparse is useful for ~70% sparsity and up. y being sparse (or not being sparse) has almost no effect.įor both bsxfun and the loop method, using a sparse X made things ~ 3x slower for low sparsity cases. The loop method was the worst in all cases (also not a surprise).įor (dense X * dense y) inputs, it doesnt matter how sparse the matrix actually is, as expected.įor (dense X * dense y) inputs, bsxfun is ~3-4x slower than BLAS, and the loop-based vector-vector product method is 5-8x slower than bsxfun.įor bsxfun and loops, it only matters whether or not X is sparse. Note the y-axis is log scale in both subplots.īLAS is best in literally every scenario, usually by a huge margin.This is no surprise, but I admit I didnt expect bsxfun to lag behind BLAS quite as much as it did (particularly in sparse cases). ![]() I ran 51 trials with sparsity ranging from ~3% to ~97%. I also tried having both X and y as dense matricies, both as sparse, and one sparse / one dense. Method 2: 3: Loop + vector-vector products I ended up doing this in order to try and speed up a code I was working on, but I thought the results were interesting and figured a few people on here might think the same.īasically, I took a matrix(X)-vector(y) product and solved it 3 ways: Method 1: BLAS (standard matrix multiply) ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |