-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized multi resolution LBM #44
Conversation
This reverts commit ead3a64.
From Windows
|
From Linux (cuda:12.2.0.ubuntu.22.04)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I added only one comment on the use of the compile time for-loop for LBM.
This is an example of how it can be used:
Neon/benchmarks/lbm/src/DeviceD3QXX.h
Line 30 in 1d48965
Neon::ConstexprFor<0, Lattice::Q, 1>([&](auto q) { |
if (offset.x == 0 && offset.y == 0 && offset.z == 0) { | ||
return idx; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A case where the offset is zero should not be a common case that requires a special treatment. The rest of the method should cover it already.
@@ -20,34 +20,30 @@ inline Neon::set::Container explosion(Neon::domain::mGrid& grid, | |||
return [=] NEON_CUDA_HOST_DEVICE(const typename Neon::domain::mGrid::Idx& cell) mutable { | |||
//If this cell has children i.e., it is been refined, then we should not work on it | |||
//because this cell is only there to allow query and not to operate on | |||
if (!pin.hasChildren(cell)) { | |||
if (!pin.hasChildren(cell) && pin.hasParent(cell)) { | |||
for (int8_t q = 0; q < Q; ++q) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be automatically unrolled via templates.
if (dir.x == 0 && dir.y == 0 && dir.z == 0) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check can be removed by looping only on the direction that are not the center.
For example STLBM leverages the structure of the stencil to do that, however more easily if the previous loop is based on a template for, we can transform the if
statement into a constexpr if
.
//if the neighbor cell has children, then this 'cell' is interfacing with L-1 (fine) along q direction | ||
//we want to only work on cells that interface with L+1 (coarse) cell along q | ||
if (!pin.hasChildren(cell, dir)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a version of the hasChildren where the dir
parameter is a template.
//since we are on the finest level, we only need to do streaming and explosion (no coalescence) | ||
//streaming is done as push | ||
|
||
const Neon::int8_3d dir = getDir(q); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be passed as a template parameter.
if (nghType == CellType::bulk) { | ||
out(nghCell, q) = cellVal; | ||
} else { | ||
const int8_t opposte_q = latticeOppositeID[q]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With q be a runtime parameter, latticeOppositeID
is loaded to memory.
With a template parameter q would be resolved at compile time.
However because of the type this is the place where using a compile time q
would not impact performance.
const Neon::int8_3d dir = getDir(q); | ||
|
||
//if the neighbor cell has children, then this 'cell' is interfacing with L-1 (fine) along q direction | ||
const auto nghCell = out.helpGetNghIdx(cell, dir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bGrid has a helpGetNghIdx
version where dir is a template parameter. That version has quite less number of runtime checks.
auto fdecompose_shear = [&](const int q) -> T { | ||
const T Nxz = Pi[0] - Pi[5]; | ||
const T Nyz = Pi[3] - Pi[5]; | ||
if (q == 9) { | ||
return (2.0 * Nxz - Nyz) / 6.0; | ||
} else if (q == 18) { | ||
return (2.0 * Nxz - Nyz) / 6.0; | ||
} else if (q == 3) { | ||
return (-Nxz + 2.0 * Nyz) / 6.0; | ||
} else if (q == 6) { | ||
return (-Nxz + 2.0 * Nyz) / 6.0; | ||
} else if (q == 1) { | ||
return (-Nxz - Nyz) / 6.0; | ||
} else if (q == 2) { | ||
return (-Nxz - Nyz) / 6.0; | ||
} else if (q == 12 || q == 24) { | ||
return Pi[1] / 4.0; | ||
} else if (q == 21 || q == 15) { | ||
return -Pi[1] / 4.0; | ||
} else if (q == 10 || q == 20) { | ||
return Pi[2] / 4.0; | ||
} else if (q == 19 || q == 11) { | ||
return -Pi[2] / 4.0; | ||
} else if (q == 8 || q == 4) { | ||
return Pi[4] / 4.0; | ||
} else if (q == 7 || q == 5) { | ||
return -Pi[4] / 4.0; | ||
} else { | ||
return T(0); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a compile time q parameter, this could be transform into a constexpr
function and saving quite a lot of checks.
@@ -22,7 +22,8 @@ inline Neon::set::Container coalescence(Neon::domain::mGrid& g | |||
return [=] NEON_CUDA_HOST_DEVICE(const typename Neon::domain::mGrid::Idx& cell) mutable { | |||
//If this cell has children i.e., it is been refined, than we should not work on it | |||
//because this cell is only there to allow query and not to operate on | |||
const int refFactor = pout.getRefFactor(level); | |||
//const int refFactor = pout.getRefFactor(level); | |||
constexpr T repRefFactor = 0.5; | |||
if (!pin.hasChildren(cell)) { | |||
|
|||
for (int q = 0; q < Q; ++q) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Consider template for loop.
apps/lbmMultiRes/sphere3.obj
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sphere can we use an analytic sdf instead to replace the obj file?
mGird
mGrid