Return correct shapes from Correlator #3848

jngrad · 2020-08-03T19:17:30Z

Description of changes:

reshape Correlator output arrays based on the dimensions of the input observables
return only the correlated values from the Correlator observable
get time lags and sample sizes from dedicated class methods

review-notebook-app · 2020-08-03T19:17:34Z

Check out this pull request on

Review Jupyter notebook visual diffs & provide feedback on notebooks.

Powered by ReviewNB

Use the shape(s) of the correlated observable(s) to deduce the shape of the correlation output. Extract time lags and sample size from the correlation matrix and provide methods `correlation_lags()` and `correlation_sizes()` instead.

Used to throw cryptic runtime_errors when one of the test used an observable tracking two particles (e.g. `No data can be added after finalize() was called.` or `Particle node for id 1 not found!`).

jngrad · 2020-08-03T21:17:34Z

@KaiSzuttor while working on this PR, I realized the output of the Correlator class could be ambiguous. From the user perspective, correlation happens on observables with shapes, but internally we correlate flattened arrays, which can lead to different results. For example, if we correlate the positions of particles with square_distance_componentwise resp. scalar_product, we get shapes [X, N, 3] resp. [X, 1] (instead of [X, N, 1]). Below is a MWE on a modified version of this PR, where the two exceptions scalar_product requires vectors, but observable ... is a matrix in src/core/accumulators/Correlator.cpp were removed:

import espressomd
from espressomd.accumulators import Correlator
from espressomd.observables import ParticlePositions
import numpy as np

system = espressomd.System(box_l=[1.0, 1.0, 1.0])
system.time_step = 0.05
system.cell_system.skin = 0.4
system.part.add(pos=[(0, 0, 0), (0.5, .5, 0)])
system.thermostat.set_langevin(kT=1.37, gamma=2.4, seed=42)

pos_obs = ParticlePositions(ids=(0,1))

c_pos1 = Correlator(obs1=pos_obs, tau_lin=16, tau_max=20., delta_N=10,
                   corr_operation="square_distance_componentwise",
                   compress1="discard1")
c_pos2 = Correlator(obs1=pos_obs, tau_lin=16, tau_max=20., delta_N=10,
                   corr_operation="scalar_product", compress1="discard1")
system.auto_update_accumulators.add(c_pos1)
system.auto_update_accumulators.add(c_pos2)

system.integrator.run(10000)

c_pos1.finalize()
c_pos2.finalize()
print(c_pos1.result().shape)  # output: (33, 2, 3)
print(c_pos2.result().shape)  # output: (33, 1)

For the tensor_product resp. scalar_product I've added extra checks to prevent passing two matrices as input, as we don't actually do Kronecker products resp. per-axis scalar products. Do you think it's the correct approach? Do we have a use for implementing the Kronecker product and per-axis scalar product (which is a simpler form of fcs_acf)?

KaiSzuttor · 2020-08-04T08:16:09Z

so you mean the shape is a function not only of the observable type but also of the operation? Isn't that expected?

jngrad · 2020-08-04T08:37:17Z

sure, but the scalar_product currently ignores the observable shape:

espresso/src/core/accumulators/Correlator.cpp

Lines 234 to 235 in f22be4c

    
           } else if (corr_operation_name == "scalar_product") { 
        
             m_dim_corr = 1;

this is surprising given that we already have the code logic to take the observable shape into account

KaiSzuttor · 2020-08-04T08:41:29Z

So, the correlator has a bug which needs a fix before we can go on with returning the correct shape, right?

jngrad · 2020-08-04T08:55:19Z

I just tried to implement a scalar_product_last_axis that would allow us to take e.g. a ParticlePositions(ids=(0,1)) and return the scalar product of the individual particles, e.g. a correlation shape of [N_tau, 2, 1], but it can't be done with our current infrastructure. The correlation functions take two flat arrays as input, so we cannot know whether we have two particles with 3 coordinates or 3 particles with 2 coordinates. The fcs_acf function "trusts" that the last dimension has size 3. It also checks the array dimension is a multiple of 3, but it's to avoid out-of-bounds errors, not to guarantee the user took the correct observable.
We also cannot provide information about the flat array dimensions as additional function arguments, because we would have to provide the same arguments to all other functions (they need to have the same signature because we store a function pointer).

This gives us 2 options: throw an error when doing a scalar product of matrices (that's currently implemented in acfa0b4) which is an API breaking change, or not throw an error and trust the user can figure out why the correlation shape is 1. Same thing for tensor product.

KaiSzuttor · 2020-08-04T10:05:15Z

so our correlation function behaves like numpy correlate and assumes 1d data... I think there is no generic solution to correlating multidimensional data because the algorithms are not implemented in a generic fashion.

jngrad · 2020-08-04T11:03:24Z

What should we do then? Right now I throw an error if the user accidentally passes a matrix instead of a vector to the scalar product, because it was not originally designed to handle this case. There was no mechanism to prevent it because it didn't really make sense back then to calculate the scalar product of a ParticlePositions(ids=(0,1)), as the user guide made it clear these positions were flattened. But now, the user guide shows these positions are stored in 2D matrices (but only at the script interface level!), which can give the impression to users that they should be able to "scalar product" these matrices.

I cannot think of an application where one would want to do a scalar product on flattened matrices, but maybe there is one, in which case this would no longer be possible because we do not offer the possibility to reshape an observable before passing it to the correlator.

We already have an m_correlation_args designed to help with this kind of situation, unfortunately it's constrained to a Vector3d:

using a boost::variant instead of Vector3d (aa0b4f8) won't work because the script interface would need a visitor pattern
storing the dimensionality of the data in e.g. the first index m_correlation_args would work, although it is now visible to the user and mutable (which is not good design)
replacing the function pointer by a functor means the correlation functions have a state that has to be serialized
adding extra parameters to all correlation functions doesn't scale; the core of the problem lies in the tight coupling between the Correlator class and its operators; right now they are implemented as free functions to be completely decoupled, which prevents us from generalizing the infrastructure

KaiSzuttor · 2020-08-05T09:42:48Z

i'm not so sure if this is a cat 1 issue to be resolved next... let's wait for @RudolfWeeber

jngrad · 2020-08-07T09:50:02Z

i'm not so sure if this is a cat 1 issue to be resolved next...

Agreed, refactoring the Correlator framework is out of scope for this PR. I've removed the API-breaking matrix assertions in 56764cb, we can always get back to that idea in the future. Let's move forward with this PR.

src/python/espressomd/accumulators.py

KaiSzuttor · 2020-08-10T08:01:56Z

doc/tutorials/04-lattice_boltzmann/04-lattice_boltzmann_part2.ipynb

@@ -138,7 +139,9 @@
    "    for i in range(LOOPS):\n",
    "        system.integrator.run(STEPS)\n",
    "    correlator.finalize()\n",
-    "    msd_results.append(correlator.result())\n",
+    "    msd_results.append(np.column_stack((correlator.lag_times(),\n",


https://numpy.org/doc/stable/reference/generated/numpy.c_.html

KaiSzuttor · 2020-08-10T08:02:38Z

doc/tutorials/06-active_matter/06-active_matter.ipynb

@@ -385,7 +385,10 @@
    "# Finalize the correlator and write to disk\n",
    "system.auto_update_accumulators.remove(msd)\n",
    "msd.finalize()\n",
-    "numpy.savetxt(\"output.dat\", msd.result())\n",
+    "numpy.savetxt(\"output.dat\",\n",
+    "              numpy.column_stack((msd.lag_times(),\n",


https://numpy.org/doc/stable/reference/generated/numpy.c_.html

KaiSzuttor

LGTM

src/core/accumulators/MeanVarianceCalculator.hpp

src/core/accumulators/Correlator.hpp

jngrad added 2 commits August 3, 2020 15:09

accumulators: Use size_t to avoid casts

93fe7d8

accumulators: Provide shape() method

45e5a4b

jngrad force-pushed the fix-3577 branch 2 times, most recently from 3760cda to a1c5795 Compare August 3, 2020 21:01

accumulators: Calculate the correlated shape

acfa0b4

Use the shape(s) of the correlated observable(s) to deduce the shape of the correlation output. Extract time lags and sample size from the correlation matrix and provide methods `correlation_lags()` and `correlation_sizes()` instead.

jngrad force-pushed the fix-3577 branch from a1c5795 to acfa0b4 Compare August 3, 2020 21:03

testsuite: Clear accumulators after each test

40806cd

Used to throw cryptic runtime_errors when one of the test used an observable tracking two particles (e.g. `No data can be added after finalize() was called.` or `Particle node for id 1 not found!`).

jngrad added 2 commits August 7, 2020 11:10

accumulators: Remove matrix assertions

56764cb

docs: Document Correlator methods

f374377

jngrad marked this pull request as ready for review August 7, 2020 09:50

jngrad added this to the Espresso 4.2 milestone Aug 7, 2020

jngrad added ApiChange Core Improvement labels Aug 7, 2020

KaiSzuttor reviewed Aug 7, 2020

View reviewed changes

src/python/espressomd/accumulators.py Outdated Show resolved Hide resolved

accumulators: Rename Correlator methods

baede48

KaiSzuttor reviewed Aug 10, 2020

View reviewed changes

KaiSzuttor previously approved these changes Aug 10, 2020

View reviewed changes

src/core/accumulators/MeanVarianceCalculator.hpp Show resolved Hide resolved

src/core/accumulators/Correlator.hpp Outdated Show resolved Hide resolved

accumulators: Rename Correlator member

0cb372c

jngrad added 2 commits August 10, 2020 16:52

accumulators: Replace Correlator member by method

59ec974

testsuite: Test Correlator checkpointing

7f1e6b6

jngrad dismissed KaiSzuttor’s stale review via 7f1e6b6 August 10, 2020 15:12

jngrad added the Testcase label Aug 10, 2020

KaiSzuttor approved these changes Aug 11, 2020

View reviewed changes

KaiSzuttor added the automerge Merge with kodiak label Aug 11, 2020

Merge branch 'python' into fix-3577

09b6f2c

kodiakhq bot merged commit 403df2d into espressomd:python Aug 11, 2020

jngrad deleted the fix-3577 branch January 18, 2022 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return correct shapes from Correlator #3848

Return correct shapes from Correlator #3848

jngrad commented Aug 3, 2020

review-notebook-app bot commented Aug 3, 2020

jngrad commented Aug 3, 2020

KaiSzuttor commented Aug 4, 2020

jngrad commented Aug 4, 2020

KaiSzuttor commented Aug 4, 2020

jngrad commented Aug 4, 2020

KaiSzuttor commented Aug 4, 2020

jngrad commented Aug 4, 2020

KaiSzuttor commented Aug 5, 2020

jngrad commented Aug 7, 2020

KaiSzuttor Aug 10, 2020

KaiSzuttor Aug 10, 2020

KaiSzuttor left a comment

Return correct shapes from Correlator #3848

Return correct shapes from Correlator #3848

Conversation

jngrad commented Aug 3, 2020

review-notebook-app bot commented Aug 3, 2020

jngrad commented Aug 3, 2020

KaiSzuttor commented Aug 4, 2020

jngrad commented Aug 4, 2020

KaiSzuttor commented Aug 4, 2020

jngrad commented Aug 4, 2020

KaiSzuttor commented Aug 4, 2020

jngrad commented Aug 4, 2020

KaiSzuttor commented Aug 5, 2020

jngrad commented Aug 7, 2020

KaiSzuttor Aug 10, 2020

Choose a reason for hiding this comment

KaiSzuttor Aug 10, 2020

Choose a reason for hiding this comment

KaiSzuttor left a comment

Choose a reason for hiding this comment