Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short range #1197

Merged
merged 37 commits into from
Oct 23, 2017
Merged

Short range #1197

merged 37 commits into from
Oct 23, 2017

Conversation

fweik
Copy link
Contributor

@fweik fweik commented Jun 12, 2017

Description of changes:

  • Separated kernels from traversal algorithms for the short range loops
  • Store neighbors and verlet lists in cells instead of separate structure
  • verlet lists for all cell systems (needs interface)
  • test for cell systems
  • Function to access pairs from python for testing

Performance has slightly improved:
For a 1/1 wca system with 2362 parts, it improved from 600 ms / 1000 steps to
500 ms / 1000 steps.

Encapsulation is not perfect, but this is only a transient state, will improve further
with cell system abstraction.

PR Checklist

  • Tests?
    • Interface
    • Core
  • Docs?

fweik added 5 commits June 1, 2017 18:20
independent of the cell system. New function short_range_loop
that ties together Espresso globals (skin...) and cell system
specific distance calculation, and runs caller-supplied kernels
on the pairs. Addapted icc, energy, force and pressure calculation to
use these new functions. Tests for link_cell and verler_ia.
Sorry for the monolithic commit.
@fweik fweik requested a review from KaiSzuttor July 25, 2017 09:17
@RudolfWeeber RudolfWeeber added this to the Espresso 4.0 milestone Sep 6, 2017
@hirschsn
Copy link
Contributor

hirschsn commented Oct 9, 2017

Since I plan to use this, two quick questions:

  1. When is this PR going to be merged into the official python branch
  2. What cell system abstractions do you plan? Is this already WIP?

@fweik
Copy link
Contributor Author

fweik commented Oct 9, 2017

This is currently under review and is supposed to be part of the upcoming release. I haven't started any further cell system abstraction, but I have a plan on how to do it. Some parts are obvious, those currently solved by function pointers and the switch statements in cells.cpp can be easily solved by dynamic polymorphism. I think we can stay with the free short-range algorithm as in this pull request, and can than use some form of tag dispatch to select the distance function, I think we can not use virtual functions for that for performance reasons, but I think the 3 cases implemented in this PR should be sufficient, so we can largely keep that generic.
There are still some remaining issues, e.g. some algorithms need complete neighbor search (not only half-sided like the short-range IAs).

@hirschsn
Copy link
Contributor

A comment on the distance functors: At some point in time we talked about not folding the particles during ghost exchange anymore and fulfilling the minimum image convention of particle pairs where a ghost cells is involved by explicitly applying it in this case. With the proposed changes, this is not possible anymore as the distance functors only have the particles as parameters. The only option is to test p1.l.ghost || p2.l.ghost. However, this is a check in every distance calculation.

@fweik
Copy link
Contributor Author

fweik commented Oct 10, 2017

I'm not sure that I follow. If the ghosts are not folded but just have their "real" position, MI can just be applied always.

@hirschsn
Copy link
Contributor

But you can save the MI overhead in the interior as non-ghost particles are folded into their cells in the exchange. To do this you need to discriminate between local-local and local-ghost (cell) pairs in the distance function.

@RudolfWeeber
Copy link
Contributor

Check && in args in algorithm/.hpp
*Cache more than skin in VerletCriterion SQR(coulomb_cutoff +skin), etc.

@RudolfWeeber
Copy link
Contributor

get_pairs() in parallel

@hirschsn
Copy link
Contributor

Let me quickly do a follow up on my last comments:
I tried the workaound querying p2.l.ghost and it does not even work if we make the ghost layer as minimal as possible. :)

My aim was to create as few ghost cells as possible and stop shifting the particles as this is quite complicated for non-regular domain decomposition.

If we do not require ghost particles to be shifted during ghost communication, we need to use get_mi_vector on particle pairs which include ghost particles. For local-ghost cell pairs the workaround with p2.l.ghost works. However, if we do not shift ghost particles, there is no need for ghost cells on subdomain boundaries where a process neighbors itself (as the process has these cells already as local cells). If we simply go ahead and add local cells as ghost neighbors into m_neighbors, this workaround won't help because the ghost neighbors are actually local cells and therefore the particles do not have the ghost flag set, but still get_mi_vector needs to be used.

The only option to support this is to have a distance functor also taking the cells/neighbors and then marking them as "needs get_mi_vector".

Any comments or suggestions on this problem?

@fweik
Copy link
Contributor Author

fweik commented Oct 13, 2017

We should check if something else is affected if cells are not there own neighbors anymore, this would save another irregular branch in the hot path.

@RudolfWeeber
Copy link
Contributor

RudolfWeeber commented Oct 13, 2017

@KaiSzuttor and I reviewed the code as compared to the common ancestor.

Generally, this pr brings much more clarity and a ton of (nearly) duplicate code is removed.

The following issues arose:

  • Can you please put docs in the files in src/core/algorithm as they are of rather general use
  • There is inconsistency in the argument types for the various kernels being passed to the algorithms. Some are passed by value, some by ref, some by rvalue ref.
  • In the cell class, could you please put a comment above the GetReference struct to explain its purpose.
  • In the verlet criterion, the other globals containing the cutoffs (electrostatics, dipolar,...) should be passed to the constructor and the expressions derived from them cashed. E.g., (coolumb_cutoff +skin)^2.
  • In the layerd toplogy initialization:
    local_cells.cell[c] = &cells[c + 1];
    cells[c + 1].m_neighbors.push_back(std::ref(cells[c + 2]));
    are the +1 and +2 intentioal? If so, why ?
  • For the local stress tensor, could you please move out the single partilce and pair kernels into separate functions. We got lost in the long lambdas.
  • statistics.cpp has domain decomposition included twice
  • Can you please insert a comment for the layerd cell system distance funciton? Why is that extra subtraciton in z needed?

@fweik
Copy link
Contributor Author

fweik commented Oct 14, 2017

*Cache more than skin in VerletCriterion SQR(coulomb_cutoff +skin), etc.

Done. The effective pair max_cut could also be precalculated, but this has to
be done elsewhere.

There is inconsistency in the argument types for the various kernels being passed to the algorithms. Some are passed by value, some by ref, some by rvalue ref.

There kernels nor all are forwarding refs, the Ranges are passed by value because they
are supposed to be cheap to copy (they are just 2 or 4 pointers).

Check && in args in algorithm/.hpp

Done.

get_pairs() in parallel

Done.

In the layerd toplogy initialization:
local_cells.cell[c] = &cells[c + 1];
cells[c + 1].m_neighbors.push_back(std::ref(cells[c + 2]));
are the +1 and +2 intentioal? If so, why ?

This is not mine, it was like this in the original implementation. It's shifted because cells[0] is the lower ghost cell. So yes, this is intentional and correct.

Can you please insert a comment for the layerd cell system distance funciton? Why is that extra subtraciton in z needed?

Also from the original. Beats me, I'm not sure that this is actually correct, that seems to imply that layered has never pbc in the 2-direction. (Original is in layered.cpp:88). This is because the ghosts
in z direction are shifted (but not those in x and y dir).

We should check if something else is affected if cells are not there own neighbors anymore, this would save another irregular branch in the hot path.

Cells were only their own neighbors for dd, I dropped that which seems not to break anything and removed the check in the pair algos.

@codecov
Copy link

codecov bot commented Oct 16, 2017

Codecov Report

Merging #1197 into python will increase coverage by 0.49%.
The diff coverage is 73.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           python    #1197      +/-   ##
==========================================
+ Coverage   43.19%   43.69%   +0.49%     
==========================================
  Files         371      377       +6     
  Lines       29593    29481     -112     
  Branches     5613     5515      -98     
==========================================
+ Hits        12784    12881      +97     
+ Misses      16785    16576     -209     
  Partials       24       24
Impacted Files Coverage Δ
src/core/unit_tests/mock/Particle.hpp 100% <ø> (ø) ⬆️
src/core/grid.cpp 54.8% <ø> (ø) ⬆️
src/core/virtual_sites_com.cpp 2.43% <ø> (ø) ⬆️
src/core/grid.hpp 73.91% <ø> (ø) ⬆️
src/core/utils.hpp 33.19% <ø> (ø) ⬆️
src/core/unit_tests/ParticleIterator_test.cpp 100% <ø> (ø) ⬆️
src/core/unit_tests/mock/Cell.hpp 100% <ø> (ø) ⬆️
src/core/lbgpu.cpp 66.36% <ø> (ø) ⬆️
src/core/rotation.cpp 84.46% <ø> (-0.19%) ⬇️
src/core/integrate.cpp 41.89% <ø> (+0.48%) ⬆️
... and 68 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb42bfe...30d29ea. Read the comment docs.

@fweik
Copy link
Contributor Author

fweik commented Oct 16, 2017

This makes the code considerably slower in debug mode, it's getting close to the 10 minute limit on travis (e.g. langevin_thermostat).

@fweik
Copy link
Contributor Author

fweik commented Oct 18, 2017

@RudolfWeeber ready to be reviewed again. Docs could be more, but it is not clear to me which part of the algorithms would not be selfexplainatory. If you want a more detailed description please let me know what needs more explaining and I will extend the docs. Also this now contains a bugfix for the layered cell system, which fixes an issue that showed up with the new tests.

@RudolfWeeber
Copy link
Contributor

Can be merged.

@fweik fweik merged commit 24e7da6 into espressomd:python Oct 23, 2017
@fweik fweik deleted the short_range branch October 23, 2017 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants