Skip to content

SVD cleanup#46

Draft
mgates3 wants to merge 11 commits into
icl-utk-edu:mainfrom
mgates3:svd
Draft

SVD cleanup#46
mgates3 wants to merge 11 commits into
icl-utk-edu:mainfrom
mgates3:svd

Conversation

@mgates3

@mgates3 mgates3 commented Mar 19, 2025

Copy link
Copy Markdown
Collaborator

Cleanup the SVD code, applying changes similar to the eigenvalue code (#40).

Note that the 2-stage reduction goes to upper or lower triangular band (tb) form, not general band (gb) form. The LAPACK routine gbbrd takes a general band matrix that can have non-zero upper (ku > 0) and lower (kl > 0) bandwidths. Hence renaming PLASMA's routines to tbbrd.

Also add diagrams to the eigenvalue bulge chasing kernels.

Since both SVD and eig use proper atomic operations, remove the volatile variables. (See discussion on atomics vs. volatile in the PLASMA style guide and Scott Meyers book, Effective Modern C++.)

Comment thread test/test_ztbbrd.c Outdated
'S', seed,
'N', Sigma_ref, mode, rcond,
dmax, kl, ku,
pack, Aband + nb, ldab, work);

@mgates3 mgates3 Mar 21, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works fine with OpenBLAS 0.3.27 (using Netlib LAPACK), but fails with MKL 2024.2.0. I can disable checks in that case.

################################################################################
# Test with OpenBLAS, using Netlib LAPACK.

sh methane build-openblas> ./run_tests.py tbbrd
Fri Mar 21 18:15:55 2025
./plasmatest stbbrd  --nb=64 --dim=100:500:100 --uplo=l,u
% PLASMA 24.8.7, OpenMP num threads 10,  OpenBLAS 0.3.27 , 2025-03-21 18:15:55
% input: ./plasmatest stbbrd --nb=64 --dim=100:500:100 --uplo=l,u

  Status      Error       Time    Gflop/s    uplo       n    nb
    pass   5.52e-09     0.0008     0.0000       l     100    64
    pass   2.95e-08     0.0028     0.0000       l     200    64
    pass   3.28e-08     0.0054     0.0000       l     300    64
    pass   3.99e-08     0.0081     0.0000       l     400    64
    pass   4.28e-08     0.0115     0.0000       l     500    64
    pass   5.00e-09     0.0008     0.0000       u     100    64
    pass   2.90e-08     0.0036     0.0000       u     200    64
    pass   3.22e-08     0.0080     0.0000       u     300    64
    pass   4.30e-08     0.0137     0.0000       u     400    64
    pass   4.23e-08     0.0184     0.0000       u     500    64

% All tests passed
pass
./plasmatest dtbbrd  --nb=64 --dim=100:500:100 --uplo=l,u
% PLASMA 24.8.7, OpenMP num threads 10,  OpenBLAS 0.3.27 , 2025-03-21 18:15:55
% input: ./plasmatest dtbbrd --nb=64 --dim=100:500:100 --uplo=l,u

  Status      Error       Time    Gflop/s    uplo       n    nb
    pass   1.05e-17     0.0011     0.0000       l     100    64
    pass   4.82e-17     0.0035     0.0000       l     200    64
    pass   6.41e-17     0.0070     0.0000       l     300    64
    pass   8.26e-17     0.0104     0.0000       l     400    64
    pass   8.59e-17     0.0150     0.0000       l     500    64
    pass   1.08e-17     0.0010     0.0000       u     100    64
    pass   4.91e-17     0.0046     0.0000       u     200    64
    pass   6.49e-17     0.0099     0.0000       u     300    64
    pass   8.08e-17     0.0155     0.0000       u     400    64
    pass   8.40e-17     0.0219     0.0000       u     500    64

% All tests passed
pass
./plasmatest ctbbrd  --nb=64 --dim=100:500:100 --uplo=l,u
% PLASMA 24.8.7, OpenMP num threads 10,  OpenBLAS 0.3.27 , 2025-03-21 18:15:56
% input: ./plasmatest ctbbrd --nb=64 --dim=100:500:100 --uplo=l,u

  Status      Error       Time    Gflop/s    uplo       n    nb
    pass   5.32e-09     0.0016     0.0000       l     100    64
    pass   3.11e-08     0.0044     0.0000       l     200    64
    pass   4.55e-08     0.0084     0.0000       l     300    64
    pass   5.56e-08     0.0128     0.0000       l     400    64
    pass   6.48e-08     0.0181     0.0000       l     500    64
    pass   5.56e-09     0.0012     0.0000       u     100    64
    pass   3.15e-08     0.0053     0.0000       u     200    64
    pass   4.85e-08     0.0112     0.0000       u     300    64
    pass   5.79e-08     0.0182     0.0000       u     400    64
    pass   6.24e-08     0.0264     0.0000       u     500    64

% All tests passed
pass
./plasmatest ztbbrd  --nb=64 --dim=100:500:100 --uplo=l,u
% PLASMA 24.8.7, OpenMP num threads 10,  OpenBLAS 0.3.27 , 2025-03-21 18:15:57
% input: ./plasmatest ztbbrd --nb=64 --dim=100:500:100 --uplo=l,u

  Status      Error       Time    Gflop/s    uplo       n    nb
    pass   1.24e-17     0.0016     0.0000       l     100    64
    pass   6.00e-17     0.0058     0.0000       l     200    64
    pass   7.32e-17     0.0117     0.0000       l     300    64
    pass   8.68e-17     0.0181     0.0000       l     400    64
    pass   9.71e-17     0.0250     0.0000       l     500    64
    pass   1.00e-17     0.0015     0.0000       u     100    64
    pass   5.95e-17     0.0065     0.0000       u     200    64
    pass   7.58e-17     0.0136     0.0000       u     300    64
    pass   8.89e-17     0.0224     0.0000       u     400    64
    pass   9.06e-17     0.0313     0.0000       u     500    64

% All tests passed
pass
--------------------------------------------------------------------------------

All routines passed.
Elapsed 3.35 sec
Fri Mar 21 18:15:58 2025


################################################################################
# Test with Intel MKL.

sh methane build> ./run_tests.py tbbrd
Fri Mar 21 18:16:07 2025
./plasmatest stbbrd  --nb=64 --dim=100:500:100 --uplo=l,u
% PLASMA 24.8.7, OpenMP num threads 10, Intel MKL 2024.0.2, 2025-03-21 18:16:07
% input: ./plasmatest stbbrd --nb=64 --dim=100:500:100 --uplo=l,u

  Status      Error       Time    Gflop/s    uplo       n    nb
    pass   5.54e-09     0.0011     0.0000       l     100    64
    pass   2.80e-08     0.0028     0.0000       l     200    64
    pass   3.58e-08     0.0055     0.0000       l     300    64
    pass   4.58e-08     0.0085     0.0000       l     400    64
    pass   4.70e-08     0.0117     0.0000       l     500    64
    pass   6.42e-09     0.0008     0.0000       u     100    64

Intel oneMKL ERROR: Parameter 4 was incorrect on entry to SLAROT.
#### cut 161545 duplicate lines ####
Intel oneMKL ERROR: Parameter 4 was incorrect on entry to SLAROT.
  FAILED   7.01e-03     0.0163     0.0000       u     500    64

% 4 tests failed
FAILED: exit code 4
./plasmatest dtbbrd  --nb=64 --dim=100:500:100 --uplo=l,u
% PLASMA 24.8.7, OpenMP num threads 10, Intel MKL 2024.0.2, 2025-03-21 18:16:08
% input: ./plasmatest dtbbrd --nb=64 --dim=100:500:100 --uplo=l,u

  Status      Error       Time    Gflop/s    uplo       n    nb
    pass   9.60e-18     0.0011     0.0000       l     100    64
    pass   5.29e-17     0.0031     0.0000       l     200    64
    pass   6.15e-17     0.0062     0.0000       l     300    64
    pass   8.12e-17     0.0097     0.0000       l     400    64
    pass   8.45e-17     0.0135     0.0000       l     500    64
    pass   1.04e-17     0.0009     0.0000       u     100    64
    pass   5.35e-17     0.0043     0.0000       u     200    64
    pass   6.30e-17     0.0094     0.0000       u     300    64
    pass   8.36e-17     0.0151     0.0000       u     400    64
    pass   8.80e-17     0.0209     0.0000       u     500    64

% All tests passed
pass
./plasmatest ctbbrd  --nb=64 --dim=100:500:100 --uplo=l,u
% PLASMA 24.8.7, OpenMP num threads 10, Intel MKL 2024.0.2, 2025-03-21 18:16:08
% input: ./plasmatest ctbbrd --nb=64 --dim=100:500:100 --uplo=l,u

  Status      Error       Time    Gflop/s    uplo       n    nb
    pass   6.03e-09     0.0014     0.0000       l     100    64
    pass   3.48e-08     0.0039     0.0000       l     200    64
    pass   5.00e-08     0.0073     0.0000       l     300    64
    pass   6.17e-08     0.0113     0.0000       l     400    64
    pass   7.33e-08     0.0154     0.0000       l     500    64
    pass   4.30e-09     0.0011     0.0000       u     100    64
    pass   3.43e-08     0.0051     0.0000       u     200    64
    pass   5.48e-08     0.0110     0.0000       u     300    64
    pass   5.91e-08     0.0171     0.0000       u     400    64
    pass   7.58e-08     0.0304     0.0000       u     500    64

% All tests passed
pass
./plasmatest ztbbrd  --nb=64 --dim=100:500:100 --uplo=l,u
% PLASMA 24.8.7, OpenMP num threads 10, Intel MKL 2024.0.2, 2025-03-21 18:16:09
% input: ./plasmatest ztbbrd --nb=64 --dim=100:500:100 --uplo=l,u

  Status      Error       Time    Gflop/s    uplo       n    nb
    pass   1.29e-17     0.0015     0.0000       l     100    64
    pass   5.38e-17     0.0053     0.0000       l     200    64
    pass   7.83e-17     0.0101     0.0000       l     300    64
    pass   8.72e-17     0.0164     0.0000       l     400    64
    pass   9.68e-17     0.0210     0.0000       l     500    64
    pass   1.51e-17     0.0014     0.0000       u     100    64
    pass   5.60e-17     0.0059     0.0000       u     200    64
    pass   8.63e-17     0.0128     0.0000       u     300    64
    pass   8.76e-17     0.0212     0.0000       u     400    64
    pass   9.63e-17     0.0272     0.0000       u     500    64

% All tests passed
pass
--------------------------------------------------------------------------------

1 routines FAILED: tbbrd
Elapsed 3.67 sec
Fri Mar 21 18:16:10 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is concerning that the tests are not passing with all optimized BLAS. Do you think is it related to threading or the OpenMP tasking runtime?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know; never finished investigating.

@mgates3 mgates3 marked this pull request as draft April 7, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants