Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'd like to help speed up computation #1

Open
dmiraenko opened this issue Nov 30, 2023 · 23 comments
Open

I'd like to help speed up computation #1

dmiraenko opened this issue Nov 30, 2023 · 23 comments

Comments

@dmiraenko
Copy link

dmiraenko commented Nov 30, 2023

Hi!

I really liked your video on information speed in a medium and, as it's something I've been interested in for a while, I'd like to help with improving your pulse-medium simulation. I'm a quantum chemistry PhD student, and pretty good at writing parallel HPC programs in Fortran. (Can't do GPU parallelization yet, though, sorry)

If you pointed me to some resources with theoretical considerations, I could start working on it in my free time. The most helpful resource would be your code for pulse-medium interaction, but I'm having trouble finding it, or recognizing it when I found it. Is it simplified_wall_model/main.py? If not, could you please link to it?

Thanks for reigniting this question in me and hoping for your reply,
Dmitrii

@stopdesign
Copy link

We can parallelize the *_electric_field calculation in Python and get 10x performance. It doesn't look difficult.

On the other hand, time complexity of the algorithm is O(N^4) or something like that, so more optimization is needed to increase the number of particles.

What performance would be enough to make a difference?

@mithuna-y
Copy link
Owner

mithuna-y commented Dec 1, 2023 via email

@dmiraenko
Copy link
Author

I'm confused. Is the pulse-electron simulation actually in simplified_wall_model/main.py? Maybe my first post got lost in an email?

I can try to figure out GPU parallelization in Fortran this week, it's time I've learned it anyway, but I can't start without understanding what I'm trying to achieve.

Please tell if you're interested in this, Mithuna.

@stopdesign
Copy link

Here's my progress so far.

By the way, I'm working with the multiple_layers/main.py script. Don't know if it is the right one.

Profiling with cProfile showed that the electron_field_contribution calculation is indeed the main contributor to the execution time.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  5575838    7.080    0.000    7.080    0.000 main.py:44(electron_field_contribution)
     7189    2.843    0.000    9.923    0.001 main.py:138(electrons_electric_field)
     7189    0.011    0.000    0.011    0.000 main.py:34(gaussian)
        1    0.007    0.007    9.944    9.944 main.py:200(main)
     7189    0.002    0.000    0.013    0.000 main.py:29(original_electric_field)
    21417    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 main.py:113(<listcomp>)

I tried running electron_field_contribution in parallel on the CPU. Unfortunately, the actual performance gain was negligible. I suspect that there is too much overhead for running the method in another process. In any case, the algorithm has such time complexity that increasing performance by 10x will not improve the situation much.

So I implemented the electron_field_contribution method in Rust and imported it as a python module. It was surprisingly easy and gave a nice productivity boost.

Here's a performance comparison for the constants that were originally in the script:

$ python multiple_layers/main_original.py
DONE in 293.89 sec
$ python multiple_layers/main_optimized.py
step number:     0, step_time: 0.01 sec, total_time: 0.08 sec
step number:    10, step_time: 0.22 sec, total_time: 1.88 sec
step number:    20, step_time: 0.41 sec, total_time: 5.65 sec
step number:    30, step_time: 0.59 sec, total_time: 11.27 sec
step number:    40, step_time: 0.77 sec, total_time: 18.69 sec
step number:    50, step_time: 0.96 sec, total_time: 27.92 sec
step number:    60, step_time: 1.15 sec, total_time: 38.99 sec
step number:    70, step_time: 1.33 sec, total_time: 51.93 sec
step number:    80, step_time: 1.49 sec, total_time: 66.61 sec
step number:    90, step_time: 1.68 sec, total_time: 83.09 sec
DONE in 99.60 sec
$ python multiple_layers/main_with_rust.py
step number:     0, step_time: 0.02 sec, total_time: 0.08 sec
step number:    10, step_time: 0.03 sec, total_time: 0.73 sec
step number:    20, step_time: 0.04 sec, total_time: 1.44 sec
step number:    30, step_time: 0.05 sec, total_time: 2.23 sec
step number:    40, step_time: 0.06 sec, total_time: 3.13 sec
step number:    50, step_time: 0.07 sec, total_time: 4.14 sec
step number:    60, step_time: 0.08 sec, total_time: 5.23 sec
step number:    70, step_time: 0.09 sec, total_time: 6.41 sec
step number:    80, step_time: 0.10 sec, total_time: 7.71 sec
step number:    90, step_time: 0.11 sec, total_time: 9.10 sec
DONE in 10.45 sec

Now it is possible to set number_of_evaluation_points = 1000:

$ python multiple_layers/main_with_rust.py
step number:     0, step_time: 0.13 sec, total_time: 0.20 sec
step number:    10, step_time: 0.23 sec, total_time: 2.62 sec
step number:    20, step_time: 0.31 sec, total_time: 5.88 sec
step number:    30, step_time: 0.40 sec, total_time: 9.98 sec
step number:    40, step_time: 0.48 sec, total_time: 14.96 sec
step number:    50, step_time: 0.57 sec, total_time: 20.81 sec
step number:    60, step_time: 0.65 sec, total_time: 27.44 sec
step number:    70, step_time: 0.73 sec, total_time: 34.92 sec
step number:    80, step_time: 0.82 sec, total_time: 43.23 sec
step number:    90, step_time: 0.90 sec, total_time: 52.42 sec
DONE in 61.42 sec

But the result in my opinion is not much different:

In addition, when adding evaluation points, the “singularity” points (where our “photon” comes close to the electron) became more visible. I changed the number_of_electrons_wide to an even number so that the x-z position of any electron is not 0. Though I don't know if this is allowed according to the model.

@stopdesign
Copy link

@dmiraenko, sorry for hijacking your question/suggestion.

Low-invasive optimization using Python has almost reached its limit. I have no experience with GPUs and Fortran, so I don't have any intuition about potential performance. It seems we wouldn't get any different results by simply increasing performance. Maybe we have to review the model. I'll try to read some Feynman.

@mithuna-y, I don't know if my code worth publishing at this point.
There is some Rust code but it is easy to build and easy to run from Python:

import rust_helper

def electrons_electric_field(time_step, y):
    ...
    return rust_helper.electrons_electric_field(electron_positions, accel_history, time_step, y)

I have long wanted to try Rust+Python in this exact role: calculations where Python cannot handle the performance. And I think it's worth it.

@stopdesign
Copy link

More particles, more space for waves to develop (the light comes from right to left):
Screen Shot 2023-12-02 at 13 57 59
Screen Shot 2023-12-02 at 13 50 59
Screen Shot 2023-12-02 at 13 51 57
Screen Shot 2023-12-02 at 13 54 50
Screen Shot 2023-12-02 at 13 56 54

@mithuna-y
Copy link
Owner

@stopdesign, thank you, these results look fantastic! Especially the plane wave ones. It makes me feel more confident that this is the real behaviour (or there's a flaw in the way I made the simulation!). Is there any chance you could export it as an animation and link it here? If not, feel free to send it to my email: [email protected].

@mithuna-y
Copy link
Owner

@dmiraenko I'm sorry, yes, I didn't see the email about your first post. So sorry to be replying so late!

The file that contains the 3D animation is called "multiple_layers.py"
I'm writing up a doc with all theoretical considerations now. When I'm done I'll link it here!

But if it's ok, I think I'd like to keep this simulation in python (and on CPU) so others can run it and understand it more easily. I would still love if you made a version in Fortran though, so if you do then please tag me in it and could you send me the results? Thank you so much!

@stopdesign
Copy link

stopdesign commented Dec 2, 2023

@mithuna-y

Ok, I'll try to get animation out of matplotlib.

I had to change this: angular_frequency = resonant_angular_frequency * 0.7. Is it ok?
With initial coefficient of 0.99 the secondary plane wave was huge. Actually, damping_constant is also very interesting. As I understod, the secondary wave should be only a fraction of the initial wave's amplitude...

And this is the pulse with number_of_evaluation_points = 2000:
Screen Shot 2023-12-02 at 15 01 04
Screen Shot 2023-12-02 at 15 06 42
Screen Shot 2023-12-02 at 15 10 13

Also I'd like to try longer pulse. Like 5-10 wavelengths.

@stopdesign
Copy link

Some test animations:

plain_wave_9.mp4
plain_wave_10.mp4

@aistrych
Copy link

aistrych commented Dec 3, 2023

@stopdesign As I suggested in PR #2 you can speed up original code simply by using numba and jiting all functions and extracting main loop to separate function (also jited) precomputing all data before plotting. It gave me 20x speedup.
EDIT: doing this you have to be sure that all non constant data are passed as functions parameters and not taken from global space

@stopdesign
Copy link

@Astrych, yes, this is another simple way to boost performance.

Our approaches are similar. I optimized it with a Rust precompiled module. You did it with JIT-compiling heavy parts. But time complexity is still very bad. We still have a lot of function calls and a lot data moving around. I think vectorization is a great improvement because it reduces time complexity of the non-parallel code.

Vectorization would give much better execution time if we increase a number of particles or a number of evaluation points. For number_of_evaluation_points = 1000 the vectorized code 8 times faster than jited or rusted.

@artli
Copy link

artli commented Dec 3, 2023

If I understand the logic of electron_field_contribution correctly, you can get rid of the for loop over past time steps and instead directly compute the time step to look up based on the distance r and the current time t_p. Not sure how much it's going to matter for the vectorized version (I reckon you can get of one dimension there as well), but for the original version something like this makes sense to me:

-    # Ensure that the point is at a later time than the electron
-    assert t_p >= t_e, f't_p should be less than t_e, but got t_p {t_p} and t_e {t_e}'
-
-    # If (t_p - t_e)  is approximately r / c, this electron is currently contributing to the electric field of this point
-    if -DT < (t_p - t_e) - r / c < DT:
-        index = int(t_e / DT)
-        accel_scalar = accel_hist[index]
-        # The size of the contribution is given in the Feynman lectures, vol 1, eq 29-1
-        # https://www.feynmanlectures.caltech.edu/I_29.html
-        contrib = - np.sqrt((x_e - x_p) ** 2 + (y_e - y_p) ** 2) / r * accel_scalar / r
-        return charge_electron * contrib
-    return 0
+    # This point will feel the effect of the electron's acceleration as of r / c ago.
+    accel_time = t_p - r / c
+    if accel_time < 0:
+        return 0
+    index = int(accel_time / DT)
+    accel_scalar = accel_hist[index]
+
+    # The size of the contribution is given in the Feynman lectures, vol 1, eq 29-1
+    # https://www.feynmanlectures.caltech.edu/I_29.html
+    contrib = - np.sqrt((x_e - x_p) ** 2 + (y_e - y_p) ** 2) / r * accel_scalar / r
+    return charge_electron * contrib

and

-        for t_e_step in range(step):
-            electron_layer = np.where(electron_y_positions == y_electron)[0][0]
-            t_electron = t_e_step * DT
-            contribution = 0
-            contribution += electron_field_contribution(x_electron, y_electron, z_electron, t_electron,
-                                                        0, y, 0, t, accel_history[electron_layer])
-            ef_due_to_electrons += contribution
+        electron_layer = np.where(electron_y_positions == y_electron)[0][0]
+        ef_due_to_electrons += electron_field_total_contribution(
+            x_electron, y_electron, z_electron,
+            0, y, 0, t,
+            accel_history[electron_layer])

It also subtly changes the logic of the computation in a way that feels more correct to me (there's a potential for double-counting in the original -DT < (t_p - t_e) - r / c < DT check) but I could be missing some physics reasoning here.

@tedtoal
Copy link

tedtoal commented Dec 5, 2023

@mithuna-y, thanks for your code and video, I'm just starting to look at it and see if I can play around with it to answer some questions of my own. I have a request. Could you add a one-sentence comment to the start of each main.py file saying what the program does? And for multi-file programs, a one-sentence comment saying what the FILE does? Thanks!

@ericnutsch
Copy link

Please disregard if this is already being done in the code. Just having watched the videos, I was thinking that you could speed up computation on the 2D model by only evaluating positive y volume electrons then doubling the effect; doubling the computing speed. And for the 3D by only evaluating positive y and positive z then quadrupling the effect. To keep the visualization looking nice just mirror those values to their respective points.

@mithuna-y
Copy link
Owner

@dmiraenko thank you for the animations! It's great you can run it for so long and the results look so smooth! How many layers of electrons did you use?

One thing I find troubling about the result though is that the new plane waves aren't going slower than the original plane wave (it seems). Instead they seem to settle into a phase shift, but aren't actually travelling at a different speed. I don't think this is a bug in your code, I think it's in mine. When I run my original I seem to get the same sort of thing (but I wasn't sure until I saw it so clearly in yours). I'm going to go back through the logic in my original code and see if I can find the issue. It may be something wrong in the physics. I will report back if I see the issue, or if you have an idea let me know, but don't let it block you.

There's already a PR ( #2 ) that vectorizes the code in numpy. I haven't accepted it yet because I'm not sure it works, but I think we could have both options simultaneously (your Rust implementation and the vectorized one in Python), controlled by a flag. Generally I am more than happy to have the Rust code in there but I want to keep the code accessible and not everyone will have Rust installed, so would you be able to create a PR that adds your Rust implementation, but add it under a feature flag (just a constant at the top of the file is fine for now)? Thanks!


@artli thank you for this- you're very right about how to get rid of the loop over time. I couldn't figure out how to implement it your way correctly, which is why I went with the loop. Is this something you could integrate into #2 by any chance?


@tedtoal Absolutely! I am currently writing up a doc that explains everything I have in there in more detail (including explaining the physics). It's not done yet but I'm aiming to have it done tomorrow: https://colab.research.google.com/drive/1L9X_tq-Kjt-foEhcnSXpvNujbbJEedBz?usp=sharing


@ericnutsch I'm not sure if I understand what you mean? Could you clarify? Thanks!

@ericnutsch
Copy link

@mithuna-y, In CFD modeling we always take advantage of symmetry where possible to minimize the number of cells in the computation, allowing for increased resolution for the same computational power. I think you could do the same in your model by only looking at the upper half of the electrons or positive quadrant in the 3D model. If your computations are quantitative you will have to double the effect of the electrons on the light (4x in the 3D model).

With the example from your video the top three electrons would be part of the calculation and effect from the bottom two could be a duplicate of the top two (Clearly using an even number of electrons makes life easier). Unfortunately I did not have time to read through the sets of code so forgive me if I am way off in my understanding!
Screenshot 2023-12-17 203903

@oliver-thiel
Copy link

Hi, @dmiraenko and @mithuna-y

I have made a version with parallel computations instead of for-loops. It is still python, but on a GPU, using PyTorch:
https://github.com/oliver-thiel/light_pulse_through_medium/blob/5e946ccc7c0b14859bf6c7feae286ee1fb4a823a/main.py

The video shows what it looks like with over 700 evaluation points.
https://github.com/mithuna-y/speed_of_light_in_a_medium/assets/125210039/0379bef7-661d-4f16-acd0-349479b708fa

@2252
Copy link

2252 commented Jan 3, 2024

This is an exciting project

@mithuna-y
Copy link
Owner

@ericnutsch thanks for explaining! You’re absolutely right that there’s symmetry to exploit in the calculation.


@oliver-thiel thank you so much for sending those videos. I’ve been thinking about it for a while. They clearly show that there is no slowdown in this simulation, which means something is wrong with my model. I haven’t figured out what yet though. If anyone has any ideas I’d love to hear them. Meanwhile, I think it may not be worth optimising this model further, since it seems to a fundamental problem in it.

@oliver-thiel
Copy link

Yes, @mithuna-y, I hoped to see that the main peek was cancelled out when it went deeper into the material, but it did not happen. One problem may be the proportions of the model. The distance between molecules in liquid water is about one hundred picometers, but the wavelength of visible light is several hundred nanometers - about 5000 times larger.

@mithuna-y
Copy link
Owner

That’s a good point! It may just be quite difficult to make the simulation efficient enough to run it with those sorts of parameters

@artli
Copy link

artli commented Jan 8, 2024

Is this something you could integrate into #2 by any chance?

Since I'm not the author of that PR, that's going to be a bit trickier, but here's what I think is the required patch for that:

diff --git a/multiple_layers/main.py b/multiple_layers/main.py
index 4ff1c9d..c6f2195 100644
--- a/multiple_layers/main.py
+++ b/multiple_layers/main.py
@@ -37,45 +37,43 @@ def gaussian(t, y):
 def plane_wave(t, y, angular_frequency):
     return 2 * np.cos((angular_frequency * (t + y / c)))
 
+
 # The electric field at a point p due to an electron at position x_e, y_e, z_e, t_e.
-def electron_field_contribution(x_e, y_e, z_e, t_e, x_p, y_p, z_p, t_p):
+def electron_field_contribution(x_e, y_e, z_e, x_p, y_p, z_p, t_p):
     # axes: all electrons X all positions
     x_square_diff = (x_e[:, np.newaxis] - x_p[np.newaxis, :]) ** 2
     y_square_diff = (y_e[:, np.newaxis] - y_p[np.newaxis, :]) ** 2
     z_square_diff = (z_e[:, np.newaxis] - z_p[np.newaxis, :]) ** 2
     r = np.sqrt(x_square_diff + y_square_diff + z_square_diff)
-    # expanding to: all electrons X all positions X all times (1 -> broadcasted)
-    x_square_diff = x_square_diff[:,:, np.newaxis]
-    y_square_diff = y_square_diff[:, :, np.newaxis]
-    z_square_diff = z_square_diff[:, :, np.newaxis]
-    r = r[:,:,np.newaxis]
 
     electron_layer_idx = np.argmax(y_e[:, np.newaxis] == np.array(electron_y_positions)[np.newaxis, :],1)
 
-    # Ensure that the point is at a later time than the electron
-    assert np.all(t_p >= t_e), f't_p should be less than t_e, but got t_p {t_p} and t_e {t_e}'
+    # A point at a distance r from the electron will feel the effect
+    # of the electron's acceleration as of r / c ago.
+    accel_time = t_p - r / c
+    # Then for each electron-position pair we know where to look
+    # inside accel_history (being careful not to look before
+    # the start of the simulation)
+    accel_hist_t_idx = np.int32(accel_time / DT)
+    accel_hist_t_idx_nonnegative = np.maximum(accel_hist_t_idx, 0)
+
+    # Now we have electron_layer_idx, which tells us for each electron
+    # which row of accel_history to look at, and we have accel_hist_t_idx_nonnegative,
+    # which tells us for each electron-position pair which column of accel_history
+    # to look at. Combining them, we index into accel_history and get the relevant
+    # historical acceleration for each electron X position.
+    accel = accel_history[electron_layer_idx[:, np.newaxis], accel_hist_t_idx_nonnegative]
+    # Zero out contributions from earlier than t=0.
+    accel[accel_hist_t_idx < 0] = 0
 
-    # If (t_p - t_e)  is approximately r / c,
-    # this electron is currently contributing to the electric field of this point
-    accel_hist_t_idx = np.uint32(t_e/DT)  # axes: all times
     # The size of the contribution is given in the Feynman lectures, vol 1, eq 29-1
     # https://www.feynmanlectures.caltech.edu/I_29.html
-    # axes: all electrons X all positions X all earlier times
-    contribution_if_contributing = (
-       - np.sqrt(x_square_diff + y_square_diff) / r
-       * accel_history[electron_layer_idx,:][:,accel_hist_t_idx][:, np.newaxis,:] / r
-    )
+    # axes: all electrons X all positions
+    contributions = - np.sqrt(x_square_diff + y_square_diff) / r * accel / r
     # we want to return 0 when r=0 to prevent electrons affecting themselves
-    contribution_if_contributing[np.broadcast_to(r, contribution_if_contributing.shape) == 0] = 0
-    # They contribute if the time from the time step to the position is roughly in accordance to c
-    time_vs_distance_diff = (t_p - t_e[np.newaxis, np.newaxis, :]) - r / c
-    contributing = (-DT < time_vs_distance_diff) & (time_vs_distance_diff < DT)
-    # mask to zero those values under which the contribution is not valid
-    contribution_if_contributing[~contributing] = 0
-
-    contribution_if_contributing *= charge_electron
+    contributions[r == 0] = 0
 
-    return contribution_if_contributing
+    return contributions * charge_electron
 
 
 def force(total_electric_field, z, z_velocity):
@@ -115,22 +113,18 @@ def electrons_electric_field(time_step, y):
     t = time_step * DT
 
     electron_positions_np = np.array(electron_positions)
-    electron_y_positions_np = np.array(electron_y_positions)
-    # Calculate the electric field due to all electrons (at all possible t_electron) on the point (0, y_point, 0)
-
-    t_electron = np.arange(time_step) * DT
 
+    # Calculate the electric field due to all electrons (at all possible t_electron) on the point (0, y_point, 0)
     contributions = electron_field_contribution(
         electron_positions_np[:,0],
         electron_positions_np[:,1],
         electron_positions_np[:,2],
-        t_electron,
         np.array([0]),
         np.array(y),
         np.array([0]),
         t,
     )
-    return contributions.sum(0).sum(1)  # sum across all electrons and times
+    return contributions.sum(0)  # sum across all electrons
 
 
 if __name__ == '__main__':

IIUC, you should be able to add that to the PR yourself without issue as the owner of this repo if you're happy with the patch (and you can just use git patch on top of the PR branch to do that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants