-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Venado optimizations #755
Venado optimizations #755
Conversation
o Back out bml_get_ptr_dense() from bml_getters o Write fortran wrapper for existing bml_get_data_ptr_dense() method o Write new bml_get_ld_dense() to enable magma matrix pointer use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported.
Meanwhile, what do you think about merging the current function?
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Jean-Luc Fattebert ***@***.***>
Sent: Monday, January 27, 2025 4:21:48 PM
To: lanl/bml ***@***.***>
Cc: Wall, Michael E ***@***.***>; Author ***@***.***>
Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755)
@jeanlucf22 commented on this pull request.
________________________________
In src/C-interface/dense/bml_setters_dense.c<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$>:
@@ -3,6 +3,22 @@
#include "bml_setters_dense.h"
#include "bml_types_dense.h"
+#ifdef BML_USE_MAGMA
+#include "magma_v2.h"
+#endif
+
+void bml_set_N_dense(
I understand now.
I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time.
Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I don't think it is a good idea to merge as is. That's what branches are for if you need it as is right away. |
FWIW, for the ellpack format, you can probably make this work simply by making nnz(i) = 0 for i > n. Obviously M is only an upper bound already, and the loops would continue to run over all N rows, unless you decided to introduce another variable to truncate them. |
Can you please provide a list of requirements for merging this into master?
…________________________________
From: Jean-Luc Fattebert ***@***.***>
Sent: Wednesday, January 29, 2025 12:25 PM
To: lanl/bml ***@***.***>
Cc: Wall, Michael E ***@***.***>; Author ***@***.***>
Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755)
Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported. Meanwhile, what do you think about merging the current function? Get Outlook for iOShttps://aka.ms/o0ukef<https://urldefense.com/v3/__https://aka.ms/o0ukef__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47KSGeMqG$>
________________________________ From: Jean-Luc Fattebert @.> Sent: Monday, January 27, 2025 4:21:48 PM To: lanl/bml @.> Cc: Wall, Michael E @.>; Author @.> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47HtHaUCk$>) @jeanlucf22<https://urldefense.com/v3/__https://github.com/jeanlucf22__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47HKzc6_0$> commented on this pull request.
________________________________ In src/C-interface/dense/bml_setters_dense.chttps://urldefense.com/v3/__#755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$:
@@ -3,6 +3,22 @@
#include "bml_setters_dense.h" #include "bml_types_dense.h" +#ifdef BML_USE_MAGMA +#include "magma_v2.h" +#endif + +void bml_set_N_dense( I understand now. I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time. Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case. — Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__#755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$. You are receiving this because you authored the thread.Message ID: @.***>
I don't think it is a good idea to merge as is. That's what branches are for if you need it as is right away.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*issuecomment-2622639216__;Iw!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47BQYtol7$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEK7MTI6M5EYAESKTYT2NETI7AVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRSGYZTSMRRGY__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47E1cI_bk$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I'll take a look, I think for ellpack it's possible that the bml_set_N() method could work as well, for resizing N in an existing matrix buffer. There could also be a bml_set_M() method for resizing M. I think the expectation should be that the data are lost upon resizing, and that there's no zeroing of elements, as the purpose is to make it fast by avoiding memory access.
…________________________________
From: Jamal Mohd-Yusof ***@***.***>
Sent: Wednesday, January 29, 2025 1:07 PM
To: lanl/bml ***@***.***>
Cc: Wall, Michael E ***@***.***>; Author ***@***.***>
Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755)
FWIW, for the ellpack format, you can probably make this work simply by making nnz(i) = 0 for i > n. Obviously M is only an upper bound already, and the loops would continue to run over all N rows, unless you decided to introduce another variable to truncate them.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*issuecomment-2622730507__;Iw!!Bt8fGhp8LhKGRg!B99uRRq12z8IMCZGBgVUiQyDYd8GtmmNzTTbeYAvf0MbSkB75_z_8zxznk5Z9VDMvL1v_vGSlk4g39_XhAKe3P87$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEJ5TEFKXLLPCS5VK432NEYGRAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRSG4ZTANJQG4__;!!Bt8fGhp8LhKGRg!B99uRRq12z8IMCZGBgVUiQyDYd8GtmmNzTTbeYAvf0MbSkB75_z_8zxznk5Z9VDMvL1v_vGSlk4g39_XhChlp8l8$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Have the function keep the matrix in a consistent state: memory allocated <= N*ld |
OK here's what I propose A add a new variable "num_elems_allocated" to the following struct that holds the total number of elements in the dense matrix
If you can foresee any consequences for this modification elsewhere in the code, please let me know Leave bml_set_N_dense() as a function to change N. Fail using LOG_ERROR() I will be unable to merge our PROGRESS optimizations into master without merging this into master, so please, if there is anything else that will hold this up, let me know now. |
I generally agree with the set/get duality for naming. |
That's right. The burden is currently on the programmer to ensure the method isn't called when N_new > N_original. Nothing in BML helps with that, AFAIK it would require keeping track of N_original in the struct (the call can happen many times with different values of N_new, so just comparing to the current N doesn't work). |
In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go. |
I'm OK with using N_original (linear size) instead of num_elems_allocated (total size of the array). That should work for either the CPU of MAGMA code path. |
I definitely prefer N specification over num_elements. |
Fine with N_allocated. |
OK, it sounds like we have a consensus around N_allocated. I'll make the change and push a revision. |
scripts/build_venado_hackathon.sh
Outdated
# Make sure all the paths are correct | ||
|
||
rm -r build | ||
#rm -r install_magma_2.7.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to clean up this file or remove it?
@jeanlucf22 @jmohdyusof Please check latest commit. N_allocated added to the struct. Check before setting N, do it if new N <= N_allocated, otherwise reallocate the matrix. I'll clean up the lint if it looks OK. |
@mewall If you rebase this branch, it should now pass the tests |
Signed-off-by: Nicolas Bock <[email protected]>
Signed-off-by: Nicolas Bock <[email protected]>
Helpful when editing workflow files. Signed-off-by: Nicolas Bock <[email protected]>
Signed-off-by: Nicolas Bock <[email protected]>
Signed-off-by: Nicolas Bock <[email protected]>
Signed-off-by: Nicolas Bock <[email protected]>
o Add bml_transpose_new()
o Change the tests to use the new API
o Write fortran wrapper for existing bml_get_data_ptr_dense()
o Add new bml_get_ld_dense()
o This avoids unnecessary allocations and leads to substantial speedups
o Unsafe method that's exposed in fortran for dense matrices only