Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized multi resolution LBM #44

Merged
merged 117 commits into from
Apr 10, 2024
Merged

Optimized multi resolution LBM #44

merged 117 commits into from
Apr 10, 2024

Conversation

Ahdhn
Copy link
Collaborator

@Ahdhn Ahdhn commented Oct 10, 2023

  • Added polyscope for visualization
  • Added libigl for voxlization
  • Implemented KBC
  • Fused stream and collide of the finest level
  • Better CLI
  • host container for mGird
  • slicing for vtk
  • added optional cull for overlapping levels in mGrid

@Ahdhn Ahdhn marked this pull request as ready for review October 12, 2023 18:01
@Ahdhn
Copy link
Collaborator Author

Ahdhn commented Oct 12, 2023

From Windows

Test project E:/Github/Neon/temp/Neon/build
      Start  1: coreUt_chrono
 1/40 Test  #1: coreUt_chrono ..........................   Passed   36.31 sec
      Start  2: coreUt_cli
 2/40 Test  #2: coreUt_cli .............................   Passed    0.22 sec
      Start  3: coreUt_digraph
 3/40 Test  #3: coreUt_digraph .........................   Passed    0.16 sec
      Start  4: coreUt_exceptions
 4/40 Test  #4: coreUt_exceptions ......................   Passed    0.28 sec
      Start  5: coreUt_io
 5/40 Test  #5: coreUt_io ..............................   Passed   12.00 sec
      Start  6: coreUt_logging
 6/40 Test  #6: coreUt_logging .........................   Passed    0.16 sec
      Start  7: coreUt_tools
 7/40 Test  #7: coreUt_tools ...........................   Passed    0.16 sec
      Start  8: coreUt_tuple3d
 8/40 Test  #8: coreUt_tuple3d .........................   Passed    0.28 sec
      Start  9: sysUt_devCpu
 9/40 Test  #9: sysUt_devCpu ...........................   Passed    0.26 sec
      Start 10: sysUt_devGpu
10/40 Test #10: sysUt_devGpu ...........................   Passed    0.55 sec
      Start 11: sysUt_devGpuNvcc
11/40 Test #11: sysUt_devGpuNvcc .......................   Passed    0.47 sec
      Start 12: sysUt_mem
12/40 Test #12: sysUt_mem ..............................   Passed   20.21 sec
      Start 13: sysUt_patterns
13/40 Test #13: sysUt_patterns .........................   Passed    3.18 sec
      Start 14: sysUt_report
14/40 Test #14: sysUt_report ...........................   Passed    0.55 sec
      Start 15: setUt_gpuSet
15/40 Test #15: setUt_gpuSet ...........................   Passed    0.47 sec
      Start 16: setUt_gpuSetNvcc
16/40 Test #16: setUt_gpuSetNvcc .......................   Passed    0.32 sec
      Start 17: setUt_memMirrorSet
17/40 Test #17: setUt_memMirrorSet .....................   Passed    0.73 sec
      Start 18: setUt_patterns
18/40 Test #18: setUt_patterns .........................   Passed    3.24 sec
      Start 19: setUt_multiDeviceObject
19/40 Test #19: setUt_multiDeviceObject ................   Passed    0.51 sec
      Start 20: setUt_containerGraph
20/40 Test #20: setUt_containerGraph ...................   Passed    4.69 sec
      Start 21: domain-globalIdx
21/40 Test #21: domain-globalIdx .......................   Passed    3.30 sec
      Start 22: domain-host-containers
22/40 Test #22: domain-host-containers .................   Passed    9.93 sec
      Start 23: domain-map
23/40 Test #23: domain-map .............................   Passed    2.24 sec
      Start 24: domain-neighbour-globalIdx
24/40 Test #24: domain-neighbour-globalIdx .............   Passed    4.97 sec
      Start 25: domain-halos
25/40 Test #25: domain-halos ...........................   Passed   11.65 sec
      Start 26: domain-stencil
26/40 Test #26: domain-stencil .........................   Passed    5.19 sec
      Start 27: domain-bGrid-tray
27/40 Test #27: domain-bGrid-tray ......................   Passed    0.25 sec
      Start 28: domainUt_sGrid
28/40 Test #28: domainUt_sGrid .........................   Passed    0.39 sec
      Start 29: domain-unit-test-eGrid
29/40 Test #29: domain-unit-test-eGrid .................   Passed    0.56 sec
      Start 30: domain-unit-test-gridInterface
30/40 Test #30: domain-unit-test-gridInterface .........   Passed    1.92 sec
      Start 31: domain-unit-test-patterns-containers
31/40 Test #31: domain-unit-test-patterns-containers ...   Passed    0.65 sec
      Start 32: domainUt_swap
32/40 Test #32: domainUt_swap ..........................   Passed   10.16 sec
      Start 33: gUt_tools
33/40 Test #33: gUt_tools ..............................   Passed    0.36 sec
      Start 34: gUt_vtk
34/40 Test #34: gUt_vtk ................................   Passed    0.67 sec
      Start 35: gUt_mGrid
35/40 Test #35: gUt_mGrid ..............................   Passed    0.61 sec
      Start 36: skeleton-map
36/40 Test #36: skeleton-map ...........................   Passed    3.25 sec
      Start 37: skeleton-stencil
37/40 Test #37: skeleton-stencil .......................   Passed    1.17 sec
      Start 38: sUt_skeletonOnStreams
38/40 Test #38: sUt_skeletonOnStreams ..................   Passed   14.86 sec
      Start 39: sUt_userInterface
39/40 Test #39: sUt_userInterface ......................   Passed  123.14 sec
      Start 40: sUt_multiRes
40/40 Test #40: sUt_multiRes ...........................   Passed    9.81 sec

100% tests passed, 0 tests failed out of 40

Total Test time (real) = 289.99 sec
"******************************"
"Test final report location: E:\Github\Neon\temp\Neon\build\CTestNeonWindowsReport.log"
"******************************"

@Ahdhn
Copy link
Collaborator Author

Ahdhn commented Oct 12, 2023

From Linux (cuda:12.2.0.ubuntu.22.04)

Test project /home/ahmed/Neon/temp/Neon/build
      Start  1: coreUt_chrono
 1/40 Test  #1: coreUt_chrono ..........................   Passed   36.00 sec
      Start  2: coreUt_cli
 2/40 Test  #2: coreUt_cli .............................   Passed    0.00 sec
      Start  3: coreUt_digraph
 3/40 Test  #3: coreUt_digraph .........................   Passed    0.00 sec
      Start  4: coreUt_exceptions
 4/40 Test  #4: coreUt_exceptions ......................   Passed    0.00 sec
      Start  5: coreUt_io
 5/40 Test  #5: coreUt_io ..............................   Passed    1.73 sec
      Start  6: coreUt_logging
 6/40 Test  #6: coreUt_logging .........................   Passed    0.00 sec
      Start  7: coreUt_tools
 7/40 Test  #7: coreUt_tools ...........................   Passed    0.00 sec
      Start  8: coreUt_tuple3d
 8/40 Test  #8: coreUt_tuple3d .........................   Passed    0.00 sec
      Start  9: sysUt_devCpu
 9/40 Test  #9: sysUt_devCpu ...........................   Passed    0.84 sec
      Start 10: sysUt_devGpu
10/40 Test #10: sysUt_devGpu ...........................   Passed    0.94 sec
      Start 11: sysUt_devGpuNvcc
11/40 Test #11: sysUt_devGpuNvcc .......................   Passed    0.91 sec
      Start 12: sysUt_mem
12/40 Test #12: sysUt_mem ..............................   Passed    0.93 sec
      Start 13: sysUt_patterns
13/40 Test #13: sysUt_patterns .........................   Passed    1.08 sec
      Start 14: sysUt_report
14/40 Test #14: sysUt_report ...........................   Passed    2.17 sec
      Start 15: setUt_gpuSet
15/40 Test #15: setUt_gpuSet ...........................   Passed    1.93 sec
      Start 16: setUt_gpuSetNvcc
16/40 Test #16: setUt_gpuSetNvcc .......................   Passed    0.76 sec
      Start 17: setUt_memMirrorSet
17/40 Test #17: setUt_memMirrorSet .....................   Passed    1.98 sec
      Start 18: setUt_patterns
18/40 Test #18: setUt_patterns .........................   Passed    1.02 sec
      Start 19: setUt_multiDeviceObject
19/40 Test #19: setUt_multiDeviceObject ................   Passed    1.02 sec
      Start 20: setUt_containerGraph
20/40 Test #20: setUt_containerGraph ...................   Passed    3.84 sec
      Start 21: domain-globalIdx
21/40 Test #21: domain-globalIdx .......................   Passed   17.94 sec
      Start 22: domain-host-containers
22/40 Test #22: domain-host-containers .................   Passed   41.22 sec
      Start 23: domain-map
23/40 Test #23: domain-map .............................   Passed   15.09 sec
      Start 24: domain-neighbour-globalIdx
24/40 Test #24: domain-neighbour-globalIdx .............   Passed   28.07 sec
      Start 25: domain-halos
25/40 Test #25: domain-halos ...........................   Passed  117.22 sec
      Start 26: domain-stencil
26/40 Test #26: domain-stencil .........................   Passed   31.78 sec
      Start 27: domain-bGrid-tray
27/40 Test #27: domain-bGrid-tray ......................   Passed    0.74 sec
      Start 28: domainUt_sGrid
28/40 Test #28: domainUt_sGrid .........................   Passed    0.76 sec
      Start 29: domain-unit-test-eGrid
29/40 Test #29: domain-unit-test-eGrid .................   Passed    2.64 sec
      Start 30: domain-unit-test-gridInterface
30/40 Test #30: domain-unit-test-gridInterface .........   Passed    8.63 sec
      Start 31: domain-unit-test-patterns-containers
31/40 Test #31: domain-unit-test-patterns-containers ...   Passed    0.75 sec
      Start 32: domainUt_swap
32/40 Test #32: domainUt_swap ..........................   Passed   36.61 sec
      Start 33: gUt_tools
33/40 Test #33: gUt_tools ..............................   Passed    1.17 sec
      Start 34: gUt_vtk
34/40 Test #34: gUt_vtk ................................   Passed    1.20 sec
      Start 35: gUt_mGrid
35/40 Test #35: gUt_mGrid ..............................   Passed    1.06 sec
      Start 36: skeleton-map
36/40 Test #36: skeleton-map ...........................   Passed   41.58 sec
      Start 37: skeleton-stencil
37/40 Test #37: skeleton-stencil .......................   Passed    3.25 sec
      Start 38: sUt_skeletonOnStreams
38/40 Test #38: sUt_skeletonOnStreams ..................   Passed   47.18 sec
      Start 39: sUt_userInterface
39/40 Test #39: sUt_userInterface ......................   Passed   87.54 sec
      Start 40: sUt_multiRes
40/40 Test #40: sUt_multiRes ...........................   Passed    4.97 sec

100% tests passed, 0 tests failed out of 40

Total Test time (real) = 544.58 sec
******************************
Test final report location: /home/ahmed/Neon/temp/Neon/build/CTestNeonUnixReport.log
******************************

@Ahdhn Ahdhn requested a review from massimim October 12, 2023 18:06
Copy link
Collaborator

@massimim massimim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
I added only one comment on the use of the compile time for-loop for LBM.
This is an example of how it can be used:

Neon::ConstexprFor<0, Lattice::Q, 1>([&](auto q) {

Comment on lines +142 to +144
if (offset.x == 0 && offset.y == 0 && offset.z == 0) {
return idx;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A case where the offset is zero should not be a common case that requires a special treatment. The rest of the method should cover it already.

@@ -20,34 +20,30 @@ inline Neon::set::Container explosion(Neon::domain::mGrid& grid,
return [=] NEON_CUDA_HOST_DEVICE(const typename Neon::domain::mGrid::Idx& cell) mutable {
//If this cell has children i.e., it is been refined, then we should not work on it
//because this cell is only there to allow query and not to operate on
if (!pin.hasChildren(cell)) {
if (!pin.hasChildren(cell) && pin.hasParent(cell)) {
for (int8_t q = 0; q < Q; ++q) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be automatically unrolled via templates.

Comment on lines 26 to 28
if (dir.x == 0 && dir.y == 0 && dir.z == 0) {
continue;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check can be removed by looping only on the direction that are not the center.
For example STLBM leverages the structure of the stencil to do that, however more easily if the previous loop is based on a template for, we can transform the if statement into a constexpr if.

//if the neighbor cell has children, then this 'cell' is interfacing with L-1 (fine) along q direction
//we want to only work on cells that interface with L+1 (coarse) cell along q
if (!pin.hasChildren(cell, dir)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a version of the hasChildren where the dir parameter is a template.

//since we are on the finest level, we only need to do streaming and explosion (no coalescence)
//streaming is done as push

const Neon::int8_3d dir = getDir(q);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be passed as a template parameter.

if (nghType == CellType::bulk) {
out(nghCell, q) = cellVal;
} else {
const int8_t opposte_q = latticeOppositeID[q];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With q be a runtime parameter, latticeOppositeID is loaded to memory.
With a template parameter q would be resolved at compile time.
However because of the type this is the place where using a compile time q would not impact performance.

const Neon::int8_3d dir = getDir(q);

//if the neighbor cell has children, then this 'cell' is interfacing with L-1 (fine) along q direction
const auto nghCell = out.helpGetNghIdx(cell, dir);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bGrid has a helpGetNghIdx version where dir is a template parameter. That version has quite less number of runtime checks.

Comment on lines +276 to +306
auto fdecompose_shear = [&](const int q) -> T {
const T Nxz = Pi[0] - Pi[5];
const T Nyz = Pi[3] - Pi[5];
if (q == 9) {
return (2.0 * Nxz - Nyz) / 6.0;
} else if (q == 18) {
return (2.0 * Nxz - Nyz) / 6.0;
} else if (q == 3) {
return (-Nxz + 2.0 * Nyz) / 6.0;
} else if (q == 6) {
return (-Nxz + 2.0 * Nyz) / 6.0;
} else if (q == 1) {
return (-Nxz - Nyz) / 6.0;
} else if (q == 2) {
return (-Nxz - Nyz) / 6.0;
} else if (q == 12 || q == 24) {
return Pi[1] / 4.0;
} else if (q == 21 || q == 15) {
return -Pi[1] / 4.0;
} else if (q == 10 || q == 20) {
return Pi[2] / 4.0;
} else if (q == 19 || q == 11) {
return -Pi[2] / 4.0;
} else if (q == 8 || q == 4) {
return Pi[4] / 4.0;
} else if (q == 7 || q == 5) {
return -Pi[4] / 4.0;
} else {
return T(0);
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a compile time q parameter, this could be transform into a constexpr function and saving quite a lot of checks.

@@ -22,7 +22,8 @@ inline Neon::set::Container coalescence(Neon::domain::mGrid& g
return [=] NEON_CUDA_HOST_DEVICE(const typename Neon::domain::mGrid::Idx& cell) mutable {
//If this cell has children i.e., it is been refined, than we should not work on it
//because this cell is only there to allow query and not to operate on
const int refFactor = pout.getRefFactor(level);
//const int refFactor = pout.getRefFactor(level);
constexpr T repRefFactor = 0.5;
if (!pin.hasChildren(cell)) {

for (int q = 0; q < Q; ++q) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Consider template for loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sphere can we use an analytic sdf instead to replace the obj file?

@Ahdhn Ahdhn merged commit 82e4c75 into develop Apr 10, 2024
12 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants