I'd like to help speed up computation #1

dmiraenko · 2023-11-30T19:37:39Z

Hi!

I really liked your video on information speed in a medium and, as it's something I've been interested in for a while, I'd like to help with improving your pulse-medium simulation. I'm a quantum chemistry PhD student, and pretty good at writing parallel HPC programs in Fortran. (Can't do GPU parallelization yet, though, sorry)

If you pointed me to some resources with theoretical considerations, I could start working on it in my free time. The most helpful resource would be your code for pulse-medium interaction, but I'm having trouble finding it, or recognizing it when I found it. Is it simplified_wall_model/main.py? If not, could you please link to it?

Thanks for reigniting this question in me and hoping for your reply,
Dmitrii

The text was updated successfully, but these errors were encountered:

stopdesign · 2023-12-01T18:21:04Z

We can parallelize the *_electric_field calculation in Python and get 10x performance. It doesn't look difficult.

On the other hand, time complexity of the algorithm is O(N^4) or something like that, so more optimization is needed to increase the number of particles.

What performance would be enough to make a difference?

mithuna-y · 2023-12-01T22:30:29Z

@stopdesign Thank you so so much for looking into this!! I’m not sure how much of an efficiency gain is needed but it sounds like a great idea to parallelise the electric field calculation. If you think you could do it without it being terribly difficult, that would be great!

…

On Fri, 1 Dec 2023 at 6:21 pm, Gregory Zhizhilkin ***@***.***> wrote: We can parallelize the *_electric_field calculation in Python and get 10x performance. It doesn't look difficult. On the other hand, time complexity of the algorithm is O(N^4) or something like that, so more optimization is needed to increase the number of particles. What performance would be enough to make a difference? — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMRLPQKLTI64P7CTON3H5Z3YHINZXAVCNFSM6AAAAABABXFZLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZWGU3TIOJTHA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

dmiraenko · 2023-12-02T11:08:10Z

I'm confused. Is the pulse-electron simulation actually in simplified_wall_model/main.py? Maybe my first post got lost in an email?

I can try to figure out GPU parallelization in Fortran this week, it's time I've learned it anyway, but I can't start without understanding what I'm trying to achieve.

Please tell if you're interested in this, Mithuna.

stopdesign · 2023-12-02T18:42:37Z

Here's my progress so far.

By the way, I'm working with the multiple_layers/main.py script. Don't know if it is the right one.

Profiling with cProfile showed that the electron_field_contribution calculation is indeed the main contributor to the execution time.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  5575838    7.080    0.000    7.080    0.000 main.py:44(electron_field_contribution)
     7189    2.843    0.000    9.923    0.001 main.py:138(electrons_electric_field)
     7189    0.011    0.000    0.011    0.000 main.py:34(gaussian)
        1    0.007    0.007    9.944    9.944 main.py:200(main)
     7189    0.002    0.000    0.013    0.000 main.py:29(original_electric_field)
    21417    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 main.py:113(<listcomp>)

I tried running electron_field_contribution in parallel on the CPU. Unfortunately, the actual performance gain was negligible. I suspect that there is too much overhead for running the method in another process. In any case, the algorithm has such time complexity that increasing performance by 10x will not improve the situation much.

So I implemented the electron_field_contribution method in Rust and imported it as a python module. It was surprisingly easy and gave a nice productivity boost.

Here's a performance comparison for the constants that were originally in the script:

$ python multiple_layers/main_original.py
DONE in 293.89 sec

$ python multiple_layers/main_optimized.py
step number:     0, step_time: 0.01 sec, total_time: 0.08 sec
step number:    10, step_time: 0.22 sec, total_time: 1.88 sec
step number:    20, step_time: 0.41 sec, total_time: 5.65 sec
step number:    30, step_time: 0.59 sec, total_time: 11.27 sec
step number:    40, step_time: 0.77 sec, total_time: 18.69 sec
step number:    50, step_time: 0.96 sec, total_time: 27.92 sec
step number:    60, step_time: 1.15 sec, total_time: 38.99 sec
step number:    70, step_time: 1.33 sec, total_time: 51.93 sec
step number:    80, step_time: 1.49 sec, total_time: 66.61 sec
step number:    90, step_time: 1.68 sec, total_time: 83.09 sec
DONE in 99.60 sec

$ python multiple_layers/main_with_rust.py
step number:     0, step_time: 0.02 sec, total_time: 0.08 sec
step number:    10, step_time: 0.03 sec, total_time: 0.73 sec
step number:    20, step_time: 0.04 sec, total_time: 1.44 sec
step number:    30, step_time: 0.05 sec, total_time: 2.23 sec
step number:    40, step_time: 0.06 sec, total_time: 3.13 sec
step number:    50, step_time: 0.07 sec, total_time: 4.14 sec
step number:    60, step_time: 0.08 sec, total_time: 5.23 sec
step number:    70, step_time: 0.09 sec, total_time: 6.41 sec
step number:    80, step_time: 0.10 sec, total_time: 7.71 sec
step number:    90, step_time: 0.11 sec, total_time: 9.10 sec
DONE in 10.45 sec

Now it is possible to set number_of_evaluation_points = 1000:

$ python multiple_layers/main_with_rust.py
step number:     0, step_time: 0.13 sec, total_time: 0.20 sec
step number:    10, step_time: 0.23 sec, total_time: 2.62 sec
step number:    20, step_time: 0.31 sec, total_time: 5.88 sec
step number:    30, step_time: 0.40 sec, total_time: 9.98 sec
step number:    40, step_time: 0.48 sec, total_time: 14.96 sec
step number:    50, step_time: 0.57 sec, total_time: 20.81 sec
step number:    60, step_time: 0.65 sec, total_time: 27.44 sec
step number:    70, step_time: 0.73 sec, total_time: 34.92 sec
step number:    80, step_time: 0.82 sec, total_time: 43.23 sec
step number:    90, step_time: 0.90 sec, total_time: 52.42 sec
DONE in 61.42 sec

But the result in my opinion is not much different:

In addition, when adding evaluation points, the “singularity” points (where our “photon” comes close to the electron) became more visible. I changed the number_of_electrons_wide to an even number so that the x-z position of any electron is not 0. Though I don't know if this is allowed according to the model.

stopdesign · 2023-12-02T19:16:06Z

@dmiraenko, sorry for hijacking your question/suggestion.

Low-invasive optimization using Python has almost reached its limit. I have no experience with GPUs and Fortran, so I don't have any intuition about potential performance. It seems we wouldn't get any different results by simply increasing performance. Maybe we have to review the model. I'll try to read some Feynman.

@mithuna-y, I don't know if my code worth publishing at this point.
There is some Rust code but it is easy to build and easy to run from Python:

import rust_helper

def electrons_electric_field(time_step, y):
    ...
    return rust_helper.electrons_electric_field(electron_positions, accel_history, time_step, y)

I have long wanted to try Rust+Python in this exact role: calculations where Python cannot handle the performance. And I think it's worth it.

stopdesign · 2023-12-02T22:09:32Z

More particles, more space for waves to develop (the light comes from right to left):

mithuna-y · 2023-12-02T23:02:22Z

@stopdesign, thank you, these results look fantastic! Especially the plane wave ones. It makes me feel more confident that this is the real behaviour (or there's a flaw in the way I made the simulation!). Is there any chance you could export it as an animation and link it here? If not, feel free to send it to my email: [email protected].

mithuna-y · 2023-12-02T23:08:22Z

@dmiraenko I'm sorry, yes, I didn't see the email about your first post. So sorry to be replying so late!

The file that contains the 3D animation is called "multiple_layers.py"
I'm writing up a doc with all theoretical considerations now. When I'm done I'll link it here!

But if it's ok, I think I'd like to keep this simulation in python (and on CPU) so others can run it and understand it more easily. I would still love if you made a version in Fortran though, so if you do then please tag me in it and could you send me the results? Thank you so much!

stopdesign · 2023-12-02T23:30:00Z

@mithuna-y

Ok, I'll try to get animation out of matplotlib.

I had to change this: angular_frequency = resonant_angular_frequency * 0.7. Is it ok?
With initial coefficient of 0.99 the secondary plane wave was huge. Actually, damping_constant is also very interesting. As I understod, the secondary wave should be only a fraction of the initial wave's amplitude...

And this is the pulse with number_of_evaluation_points = 2000:

Also I'd like to try longer pulse. Like 5-10 wavelengths.

stopdesign · 2023-12-03T02:20:42Z

Some test animations:

plain_wave_9.mp4

plain_wave_10.mp4

aistrych · 2023-12-03T18:16:42Z

@stopdesign As I suggested in PR #2 you can speed up original code simply by using numba and jiting all functions and extracting main loop to separate function (also jited) precomputing all data before plotting. It gave me 20x speedup.
EDIT: doing this you have to be sure that all non constant data are passed as functions parameters and not taken from global space

stopdesign · 2023-12-03T21:26:34Z

@Astrych, yes, this is another simple way to boost performance.

Our approaches are similar. I optimized it with a Rust precompiled module. You did it with JIT-compiling heavy parts. But time complexity is still very bad. We still have a lot of function calls and a lot data moving around. I think vectorization is a great improvement because it reduces time complexity of the non-parallel code.

Vectorization would give much better execution time if we increase a number of particles or a number of evaluation points. For number_of_evaluation_points = 1000 the vectorized code 8 times faster than jited or rusted.

artli · 2023-12-03T22:40:58Z

If I understand the logic of electron_field_contribution correctly, you can get rid of the for loop over past time steps and instead directly compute the time step to look up based on the distance r and the current time t_p. Not sure how much it's going to matter for the vectorized version (I reckon you can get of one dimension there as well), but for the original version something like this makes sense to me:

-    # Ensure that the point is at a later time than the electron
-    assert t_p >= t_e, f't_p should be less than t_e, but got t_p {t_p} and t_e {t_e}'
-
-    # If (t_p - t_e)  is approximately r / c, this electron is currently contributing to the electric field of this point
-    if -DT < (t_p - t_e) - r / c < DT:
-        index = int(t_e / DT)
-        accel_scalar = accel_hist[index]
-        # The size of the contribution is given in the Feynman lectures, vol 1, eq 29-1
-        # https://www.feynmanlectures.caltech.edu/I_29.html
-        contrib = - np.sqrt((x_e - x_p) ** 2 + (y_e - y_p) ** 2) / r * accel_scalar / r
-        return charge_electron * contrib
-    return 0
+    # This point will feel the effect of the electron's acceleration as of r / c ago.
+    accel_time = t_p - r / c
+    if accel_time < 0:
+        return 0
+    index = int(accel_time / DT)
+    accel_scalar = accel_hist[index]
+
+    # The size of the contribution is given in the Feynman lectures, vol 1, eq 29-1
+    # https://www.feynmanlectures.caltech.edu/I_29.html
+    contrib = - np.sqrt((x_e - x_p) ** 2 + (y_e - y_p) ** 2) / r * accel_scalar / r
+    return charge_electron * contrib

and

-        for t_e_step in range(step):
-            electron_layer = np.where(electron_y_positions == y_electron)[0][0]
-            t_electron = t_e_step * DT
-            contribution = 0
-            contribution += electron_field_contribution(x_electron, y_electron, z_electron, t_electron,
-                                                        0, y, 0, t, accel_history[electron_layer])
-            ef_due_to_electrons += contribution
+        electron_layer = np.where(electron_y_positions == y_electron)[0][0]
+        ef_due_to_electrons += electron_field_total_contribution(
+            x_electron, y_electron, z_electron,
+            0, y, 0, t,
+            accel_history[electron_layer])

It also subtly changes the logic of the computation in a way that feels more correct to me (there's a potential for double-counting in the original -DT < (t_p - t_e) - r / c < DT check) but I could be missing some physics reasoning here.

tedtoal · 2023-12-05T21:24:58Z

@mithuna-y, thanks for your code and video, I'm just starting to look at it and see if I can play around with it to answer some questions of my own. I have a request. Could you add a one-sentence comment to the start of each main.py file saying what the program does? And for multi-file programs, a one-sentence comment saying what the FILE does? Thanks!

ericnutsch · 2023-12-17T01:53:30Z

Please disregard if this is already being done in the code. Just having watched the videos, I was thinking that you could speed up computation on the 2D model by only evaluating positive y volume electrons then doubling the effect; doubling the computing speed. And for the 3D by only evaluating positive y and positive z then quadrupling the effect. To keep the visualization looking nice just mirror those values to their respective points.

mithuna-y · 2023-12-17T21:15:07Z

@dmiraenko thank you for the animations! It's great you can run it for so long and the results look so smooth! How many layers of electrons did you use?

One thing I find troubling about the result though is that the new plane waves aren't going slower than the original plane wave (it seems). Instead they seem to settle into a phase shift, but aren't actually travelling at a different speed. I don't think this is a bug in your code, I think it's in mine. When I run my original I seem to get the same sort of thing (but I wasn't sure until I saw it so clearly in yours). I'm going to go back through the logic in my original code and see if I can find the issue. It may be something wrong in the physics. I will report back if I see the issue, or if you have an idea let me know, but don't let it block you.

There's already a PR ( #2 ) that vectorizes the code in numpy. I haven't accepted it yet because I'm not sure it works, but I think we could have both options simultaneously (your Rust implementation and the vectorized one in Python), controlled by a flag. Generally I am more than happy to have the Rust code in there but I want to keep the code accessible and not everyone will have Rust installed, so would you be able to create a PR that adds your Rust implementation, but add it under a feature flag (just a constant at the top of the file is fine for now)? Thanks!

@artli thank you for this- you're very right about how to get rid of the loop over time. I couldn't figure out how to implement it your way correctly, which is why I went with the loop. Is this something you could integrate into #2 by any chance?

@tedtoal Absolutely! I am currently writing up a doc that explains everything I have in there in more detail (including explaining the physics). It's not done yet but I'm aiming to have it done tomorrow: https://colab.research.google.com/drive/1L9X_tq-Kjt-foEhcnSXpvNujbbJEedBz?usp=sharing

@ericnutsch I'm not sure if I understand what you mean? Could you clarify? Thanks!

ericnutsch · 2023-12-18T04:53:26Z

@mithuna-y, In CFD modeling we always take advantage of symmetry where possible to minimize the number of cells in the computation, allowing for increased resolution for the same computational power. I think you could do the same in your model by only looking at the upper half of the electrons or positive quadrant in the 3D model. If your computations are quantitative you will have to double the effect of the electrons on the light (4x in the 3D model).

With the example from your video the top three electrons would be part of the calculation and effect from the bottom two could be a duplicate of the top two (Clearly using an even number of electrons makes life easier). Unfortunately I did not have time to read through the sets of code so forgive me if I am way off in my understanding!

oliver-thiel · 2023-12-26T17:31:26Z

Hi, @dmiraenko and @mithuna-y

I have made a version with parallel computations instead of for-loops. It is still python, but on a GPU, using PyTorch:
https://github.com/oliver-thiel/light_pulse_through_medium/blob/5e946ccc7c0b14859bf6c7feae286ee1fb4a823a/main.py

The video shows what it looks like with over 700 evaluation points.
https://github.com/mithuna-y/speed_of_light_in_a_medium/assets/125210039/0379bef7-661d-4f16-acd0-349479b708fa

2252 · 2024-01-03T00:15:35Z

This is an exciting project

mithuna-y · 2024-01-03T08:16:34Z

@ericnutsch thanks for explaining! You’re absolutely right that there’s symmetry to exploit in the calculation.

@oliver-thiel thank you so much for sending those videos. I’ve been thinking about it for a while. They clearly show that there is no slowdown in this simulation, which means something is wrong with my model. I haven’t figured out what yet though. If anyone has any ideas I’d love to hear them. Meanwhile, I think it may not be worth optimising this model further, since it seems to a fundamental problem in it.

oliver-thiel · 2024-01-03T08:49:25Z

Yes, @mithuna-y, I hoped to see that the main peek was cancelled out when it went deeper into the material, but it did not happen. One problem may be the proportions of the model. The distance between molecules in liquid water is about one hundred picometers, but the wavelength of visible light is several hundred nanometers - about 5000 times larger.

mithuna-y · 2024-01-03T09:08:17Z

That’s a good point! It may just be quite difficult to make the simulation efficient enough to run it with those sorts of parameters

artli · 2024-01-08T00:03:12Z

Is this something you could integrate into #2 by any chance?

Since I'm not the author of that PR, that's going to be a bit trickier, but here's what I think is the required patch for that:

diff --git a/multiple_layers/main.py b/multiple_layers/main.py
index 4ff1c9d..c6f2195 100644
--- a/multiple_layers/main.py
+++ b/multiple_layers/main.py
@@ -37,45 +37,43 @@ def gaussian(t, y):
 def plane_wave(t, y, angular_frequency):
     return 2 * np.cos((angular_frequency * (t + y / c)))
 
+
 # The electric field at a point p due to an electron at position x_e, y_e, z_e, t_e.
-def electron_field_contribution(x_e, y_e, z_e, t_e, x_p, y_p, z_p, t_p):
+def electron_field_contribution(x_e, y_e, z_e, x_p, y_p, z_p, t_p):
     # axes: all electrons X all positions
     x_square_diff = (x_e[:, np.newaxis] - x_p[np.newaxis, :]) ** 2
     y_square_diff = (y_e[:, np.newaxis] - y_p[np.newaxis, :]) ** 2
     z_square_diff = (z_e[:, np.newaxis] - z_p[np.newaxis, :]) ** 2
     r = np.sqrt(x_square_diff + y_square_diff + z_square_diff)
-    # expanding to: all electrons X all positions X all times (1 -> broadcasted)
-    x_square_diff = x_square_diff[:,:, np.newaxis]
-    y_square_diff = y_square_diff[:, :, np.newaxis]
-    z_square_diff = z_square_diff[:, :, np.newaxis]
-    r = r[:,:,np.newaxis]
 
     electron_layer_idx = np.argmax(y_e[:, np.newaxis] == np.array(electron_y_positions)[np.newaxis, :],1)
 
-    # Ensure that the point is at a later time than the electron
-    assert np.all(t_p >= t_e), f't_p should be less than t_e, but got t_p {t_p} and t_e {t_e}'
+    # A point at a distance r from the electron will feel the effect
+    # of the electron's acceleration as of r / c ago.
+    accel_time = t_p - r / c
+    # Then for each electron-position pair we know where to look
+    # inside accel_history (being careful not to look before
+    # the start of the simulation)
+    accel_hist_t_idx = np.int32(accel_time / DT)
+    accel_hist_t_idx_nonnegative = np.maximum(accel_hist_t_idx, 0)
+
+    # Now we have electron_layer_idx, which tells us for each electron
+    # which row of accel_history to look at, and we have accel_hist_t_idx_nonnegative,
+    # which tells us for each electron-position pair which column of accel_history
+    # to look at. Combining them, we index into accel_history and get the relevant
+    # historical acceleration for each electron X position.
+    accel = accel_history[electron_layer_idx[:, np.newaxis], accel_hist_t_idx_nonnegative]
+    # Zero out contributions from earlier than t=0.
+    accel[accel_hist_t_idx < 0] = 0
 
-    # If (t_p - t_e)  is approximately r / c,
-    # this electron is currently contributing to the electric field of this point
-    accel_hist_t_idx = np.uint32(t_e/DT)  # axes: all times
     # The size of the contribution is given in the Feynman lectures, vol 1, eq 29-1
     # https://www.feynmanlectures.caltech.edu/I_29.html
-    # axes: all electrons X all positions X all earlier times
-    contribution_if_contributing = (
-       - np.sqrt(x_square_diff + y_square_diff) / r
-       * accel_history[electron_layer_idx,:][:,accel_hist_t_idx][:, np.newaxis,:] / r
-    )
+    # axes: all electrons X all positions
+    contributions = - np.sqrt(x_square_diff + y_square_diff) / r * accel / r
     # we want to return 0 when r=0 to prevent electrons affecting themselves
-    contribution_if_contributing[np.broadcast_to(r, contribution_if_contributing.shape) == 0] = 0
-    # They contribute if the time from the time step to the position is roughly in accordance to c
-    time_vs_distance_diff = (t_p - t_e[np.newaxis, np.newaxis, :]) - r / c
-    contributing = (-DT < time_vs_distance_diff) & (time_vs_distance_diff < DT)
-    # mask to zero those values under which the contribution is not valid
-    contribution_if_contributing[~contributing] = 0
-
-    contribution_if_contributing *= charge_electron
+    contributions[r == 0] = 0
 
-    return contribution_if_contributing
+    return contributions * charge_electron
 
 
 def force(total_electric_field, z, z_velocity):
@@ -115,22 +113,18 @@ def electrons_electric_field(time_step, y):
     t = time_step * DT
 
     electron_positions_np = np.array(electron_positions)
-    electron_y_positions_np = np.array(electron_y_positions)
-    # Calculate the electric field due to all electrons (at all possible t_electron) on the point (0, y_point, 0)
-
-    t_electron = np.arange(time_step) * DT
 
+    # Calculate the electric field due to all electrons (at all possible t_electron) on the point (0, y_point, 0)
     contributions = electron_field_contribution(
         electron_positions_np[:,0],
         electron_positions_np[:,1],
         electron_positions_np[:,2],
-        t_electron,
         np.array([0]),
         np.array(y),
         np.array([0]),
         t,
     )
-    return contributions.sum(0).sum(1)  # sum across all electrons and times
+    return contributions.sum(0)  # sum across all electrons
 
 
 if __name__ == '__main__':

IIUC, you should be able to add that to the PR yourself without issue as the owner of this repo if you're happy with the patch (and you can just use git patch on top of the PR branch to do that).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I'd like to help speed up computation #1

I'd like to help speed up computation #1

dmiraenko commented Nov 30, 2023 •

edited

Loading

stopdesign commented Dec 1, 2023

mithuna-y commented Dec 1, 2023 via email •

edited

Loading

dmiraenko commented Dec 2, 2023

stopdesign commented Dec 2, 2023

stopdesign commented Dec 2, 2023

stopdesign commented Dec 2, 2023

mithuna-y commented Dec 2, 2023

mithuna-y commented Dec 2, 2023

stopdesign commented Dec 2, 2023 •

edited

Loading

stopdesign commented Dec 3, 2023

aistrych commented Dec 3, 2023 •

edited

Loading

stopdesign commented Dec 3, 2023

artli commented Dec 3, 2023 •

edited

Loading

tedtoal commented Dec 5, 2023

ericnutsch commented Dec 17, 2023

mithuna-y commented Dec 17, 2023

ericnutsch commented Dec 18, 2023

oliver-thiel commented Dec 26, 2023

2252 commented Jan 3, 2024

mithuna-y commented Jan 3, 2024

oliver-thiel commented Jan 3, 2024

mithuna-y commented Jan 3, 2024

artli commented Jan 8, 2024

I'd like to help speed up computation #1

I'd like to help speed up computation #1

Comments

dmiraenko commented Nov 30, 2023 • edited Loading

stopdesign commented Dec 1, 2023

mithuna-y commented Dec 1, 2023 via email • edited Loading

dmiraenko commented Dec 2, 2023

stopdesign commented Dec 2, 2023

stopdesign commented Dec 2, 2023

stopdesign commented Dec 2, 2023

mithuna-y commented Dec 2, 2023

mithuna-y commented Dec 2, 2023

stopdesign commented Dec 2, 2023 • edited Loading

stopdesign commented Dec 3, 2023

aistrych commented Dec 3, 2023 • edited Loading

stopdesign commented Dec 3, 2023

artli commented Dec 3, 2023 • edited Loading

tedtoal commented Dec 5, 2023

ericnutsch commented Dec 17, 2023

mithuna-y commented Dec 17, 2023

ericnutsch commented Dec 18, 2023

oliver-thiel commented Dec 26, 2023

2252 commented Jan 3, 2024

mithuna-y commented Jan 3, 2024

oliver-thiel commented Jan 3, 2024

mithuna-y commented Jan 3, 2024

artli commented Jan 8, 2024

dmiraenko commented Nov 30, 2023 •

edited

Loading

mithuna-y commented Dec 1, 2023 via email •

edited

Loading

stopdesign commented Dec 2, 2023 •

edited

Loading

aistrych commented Dec 3, 2023 •

edited

Loading

artli commented Dec 3, 2023 •

edited

Loading