Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Venado optimizations #755

Merged
merged 23 commits into from
Feb 5, 2025
Merged

Venado optimizations #755

merged 23 commits into from
Feb 5, 2025

Conversation

mewall
Copy link
Collaborator

@mewall mewall commented Jan 14, 2025

  • Modify bml_transpose() fortran API to match the C API
    o Add bml_transpose_new()
    o Change the tests to use the new API
  • Add methods to get the pointer for MAGMA arrays, for use in Fortran OpenACC and OpenMP offload
    o Write fortran wrapper for existing bml_get_data_ptr_dense()
    o Add new bml_get_ld_dense()
  • Add bml_set_N_dense() to change the size of a bml array that's already been allocated
    o This avoids unnecessary allocations and leads to substantial speedups
    o Unsafe method that's exposed in fortran for dense matrices only

Copy link
Collaborator

@nicolasbock nicolasbock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nicolasbock nicolasbock enabled auto-merge January 14, 2025 18:45
scripts/build_bml_cray.sh Outdated Show resolved Hide resolved
src/C-interface/dense/bml_setters_dense.c Outdated Show resolved Hide resolved
@mewall
Copy link
Collaborator Author

mewall commented Jan 28, 2025 via email

@jeanlucf22
Copy link
Collaborator

Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported. Meanwhile, what do you think about merging the current function? Get Outlook for iOShttps://aka.ms/o0ukef
________________________________ From: Jean-Luc Fattebert @.> Sent: Monday, January 27, 2025 4:21:48 PM To: lanl/bml @.> Cc: Wall, Michael E @.>; Author @.> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755) @jeanlucf22 commented on this pull request.
________________________________ In src/C-interface/dense/bml_setters_dense.chttps://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$:
@@ -3,6 +3,22 @@
#include "bml_setters_dense.h" #include "bml_types_dense.h" +#ifdef BML_USE_MAGMA +#include "magma_v2.h" +#endif + +void bml_set_N_dense( I understand now. I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time. Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case. — Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$. You are receiving this because you authored the thread.Message ID: @.***>

I don't think it is a good idea to merge as is. That's what branches are for if you need it as is right away.

@jmohdyusof
Copy link
Collaborator

FWIW, for the ellpack format, you can probably make this work simply by making nnz(i) = 0 for i > n. Obviously M is only an upper bound already, and the loops would continue to run over all N rows, unless you decided to introduce another variable to truncate them.

@mewall
Copy link
Collaborator Author

mewall commented Jan 29, 2025 via email

@mewall
Copy link
Collaborator Author

mewall commented Jan 29, 2025 via email

@jeanlucf22
Copy link
Collaborator

Can you please provide a list of requirements for merging this into master?

________________________________ From: Jean-Luc Fattebert @.> Sent: Wednesday, January 29, 2025 12:25 PM To: lanl/bml @.> Cc: Wall, Michael E @.>; Author @.> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755) Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported. Meanwhile, what do you think about merging the current function? Get Outlook for iOShttps://aka.ms/o0ukefhttps://urldefense.com/v3/__https://aka.ms/o0ukef__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47KSGeMqG$
________________________________ From: Jean-Luc Fattebert @.> Sent: Monday, January 27, 2025 4:21:48 PM To: lanl/bml @.> Cc: Wall, Michael E @.>; Author @.> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47HtHaUCk$) @jeanlucf22https://urldefense.com/v3/__https://github.com/jeanlucf22__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47HKzc6_0$ commented on this pull request.
________________________________ In src/C-interface/dense/bml_setters_dense.chttps://urldefense.com/v3/#755*discussion_r1931312603;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$: @@ -3,6 +3,22 @@ #include "bml_setters_dense.h" #include "bml_types_dense.h" +#ifdef BML_USE_MAGMA +#include "magma_v2.h" +#endif + +void bml_set_N_dense( I understand now. I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time. Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case. — Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/#755*discussion_r1931312603;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$. You are receiving this because you authored the thread.Message ID: @.> I don't think it is a good idea to merge as is. That's what branches are for if you need it as is right away. — Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*issuecomment-2622639216__;Iw!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47BQYtol7$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEK7MTI6M5EYAESKTYT2NETI7AVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRSGYZTSMRRGY__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47E1cI_bk$. You are receiving this because you authored the thread.Message ID: @.>

Have the function keep the matrix in a consistent state: memory allocated <= N*ld

@mewall
Copy link
Collaborator Author

mewall commented Jan 30, 2025

Have the function keep the matrix in a consistent state: memory allocated <= N*ld

OK here's what I propose

A add a new variable "num_elems_allocated" to the following struct that holds the total number of elements in the dense matrix

struct bml_matrix_dense_t

If you can foresee any consequences for this modification elsewhere in the code, please let me know

Leave bml_set_N_dense() as a function to change N.

Fail using LOG_ERROR()

I will be unable to merge our PROGRESS optimizations into master without merging this into master, so please, if there is anything else that will hold this up, let me know now.

@jmohdyusof
Copy link
Collaborator

I generally agree with the set/get duality for naming.
So in the case of reusing an allocation, you just set N_new < N_old but retain the same pointer to allocated memory, and keep track of N_original somewhere to make sure N_new <= N_original, otherwise reallocate?

@mewall
Copy link
Collaborator Author

mewall commented Jan 30, 2025

I generally agree with the set/get duality for naming. So in the case of reusing an allocation, you just set N_new < N_old but retain the same pointer to allocated memory, and keep track of N_original somewhere to make sure N_new <= N_original, otherwise reallocate?

That's right. The burden is currently on the programmer to ensure the method isn't called when N_new > N_original. Nothing in BML helps with that, AFAIK it would require keeping track of N_original in the struct (the call can happen many times with different values of N_new, so just comparing to the current N doesn't work).

@jmohdyusof
Copy link
Collaborator

In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go.

@mewall
Copy link
Collaborator Author

mewall commented Jan 30, 2025

In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go.

I'm OK with using N_original (linear size) instead of num_elems_allocated (total size of the array). That should work for either the CPU of MAGMA code path.

@jeanlucf22?

@jmohdyusof
Copy link
Collaborator

I definitely prefer N specification over num_elements.

@jeanlucf22
Copy link
Collaborator

In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go.

I'm OK with using N_original (linear size) instead of num_elems_allocated (total size of the array). That should work for either the CPU of MAGMA code path.

@jeanlucf22?

Fine with N_allocated.

@mewall
Copy link
Collaborator Author

mewall commented Jan 30, 2025

In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go.

I'm OK with using N_original (linear size) instead of num_elems_allocated (total size of the array). That should work for either the CPU of MAGMA code path.
@jeanlucf22?

Fine with N_allocated.

OK, it sounds like we have a consensus around N_allocated. I'll make the change and push a revision.

# Make sure all the paths are correct

rm -r build
#rm -r install_magma_2.7.2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to clean up this file or remove it?

@mewall
Copy link
Collaborator Author

mewall commented Feb 3, 2025

In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go.

I'm OK with using N_original (linear size) instead of num_elems_allocated (total size of the array). That should work for either the CPU of MAGMA code path.
@jeanlucf22?

Fine with N_allocated.

OK, it sounds like we have a consensus around N_allocated. I'll make the change and push a revision.

@jeanlucf22 @jmohdyusof Please check latest commit. N_allocated added to the struct. Check before setting N, do it if new N <= N_allocated, otherwise reallocate the matrix. I'll clean up the lint if it looks OK.

@jeanlucf22
Copy link
Collaborator

@mewall If you rebase this branch, it should now pass the tests

@nicolasbock nicolasbock added this pull request to the merge queue Feb 4, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 4, 2025
@jeanlucf22 jeanlucf22 added this pull request to the merge queue Feb 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 5, 2025
@nicolasbock nicolasbock added this pull request to the merge queue Feb 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 5, 2025
@nicolasbock nicolasbock merged commit f386348 into master Feb 5, 2025
31 checks passed
@nicolasbock nicolasbock deleted the hackathon branch February 5, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants