-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'd like to help speed up computation #1
Comments
We can parallelize the *_electric_field calculation in Python and get 10x performance. It doesn't look difficult. On the other hand, time complexity of the algorithm is O(N^4) or something like that, so more optimization is needed to increase the number of particles. What performance would be enough to make a difference? |
@stopdesign Thank you so so much for looking into this!!
I’m not sure how much of an efficiency gain is needed but it sounds like a
great idea to parallelise the electric field calculation. If you think you
could do it without it being terribly difficult, that would be great!
…On Fri, 1 Dec 2023 at 6:21 pm, Gregory Zhizhilkin ***@***.***> wrote:
We can parallelize the *_electric_field calculation in Python and get 10x
performance. It doesn't look difficult.
On the other hand, time complexity of the algorithm is O(N^4) or something
like that, so more optimization is needed to increase the number of
particles.
What performance would be enough to make a difference?
—
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMRLPQKLTI64P7CTON3H5Z3YHINZXAVCNFSM6AAAAABABXFZLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZWGU3TIOJTHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I'm confused. Is the pulse-electron simulation actually in I can try to figure out GPU parallelization in Fortran this week, it's time I've learned it anyway, but I can't start without understanding what I'm trying to achieve. Please tell if you're interested in this, Mithuna. |
@dmiraenko, sorry for hijacking your question/suggestion. Low-invasive optimization using Python has almost reached its limit. I have no experience with GPUs and Fortran, so I don't have any intuition about potential performance. It seems we wouldn't get any different results by simply increasing performance. Maybe we have to review the model. I'll try to read some Feynman. @mithuna-y, I don't know if my code worth publishing at this point.
I have long wanted to try Rust+Python in this exact role: calculations where Python cannot handle the performance. And I think it's worth it. |
@stopdesign, thank you, these results look fantastic! Especially the plane wave ones. It makes me feel more confident that this is the real behaviour (or there's a flaw in the way I made the simulation!). Is there any chance you could export it as an animation and link it here? If not, feel free to send it to my email: [email protected]. |
@dmiraenko I'm sorry, yes, I didn't see the email about your first post. So sorry to be replying so late! The file that contains the 3D animation is called "multiple_layers.py" But if it's ok, I think I'd like to keep this simulation in python (and on CPU) so others can run it and understand it more easily. I would still love if you made a version in Fortran though, so if you do then please tag me in it and could you send me the results? Thank you so much! |
Some test animations: plain_wave_9.mp4plain_wave_10.mp4 |
@stopdesign As I suggested in PR #2 you can speed up original code simply by using numba and jiting all functions and extracting main loop to separate function (also jited) precomputing all data before plotting. It gave me 20x speedup. |
@Astrych, yes, this is another simple way to boost performance. Our approaches are similar. I optimized it with a Rust precompiled module. You did it with JIT-compiling heavy parts. But time complexity is still very bad. We still have a lot of function calls and a lot data moving around. I think vectorization is a great improvement because it reduces time complexity of the non-parallel code. Vectorization would give much better execution time if we increase a number of particles or a number of evaluation points. For number_of_evaluation_points = 1000 the vectorized code 8 times faster than jited or rusted. |
If I understand the logic of
and
It also subtly changes the logic of the computation in a way that feels more correct to me (there's a potential for double-counting in the original |
@mithuna-y, thanks for your code and video, I'm just starting to look at it and see if I can play around with it to answer some questions of my own. I have a request. Could you add a one-sentence comment to the start of each main.py file saying what the program does? And for multi-file programs, a one-sentence comment saying what the FILE does? Thanks! |
Please disregard if this is already being done in the code. Just having watched the videos, I was thinking that you could speed up computation on the 2D model by only evaluating positive y volume electrons then doubling the effect; doubling the computing speed. And for the 3D by only evaluating positive y and positive z then quadrupling the effect. To keep the visualization looking nice just mirror those values to their respective points. |
@dmiraenko thank you for the animations! It's great you can run it for so long and the results look so smooth! How many layers of electrons did you use? One thing I find troubling about the result though is that the new plane waves aren't going slower than the original plane wave (it seems). Instead they seem to settle into a phase shift, but aren't actually travelling at a different speed. I don't think this is a bug in your code, I think it's in mine. When I run my original I seem to get the same sort of thing (but I wasn't sure until I saw it so clearly in yours). I'm going to go back through the logic in my original code and see if I can find the issue. It may be something wrong in the physics. I will report back if I see the issue, or if you have an idea let me know, but don't let it block you. There's already a PR ( #2 ) that vectorizes the code in numpy. I haven't accepted it yet because I'm not sure it works, but I think we could have both options simultaneously (your Rust implementation and the vectorized one in Python), controlled by a flag. Generally I am more than happy to have the Rust code in there but I want to keep the code accessible and not everyone will have Rust installed, so would you be able to create a PR that adds your Rust implementation, but add it under a feature flag (just a constant at the top of the file is fine for now)? Thanks! @artli thank you for this- you're very right about how to get rid of the loop over time. I couldn't figure out how to implement it your way correctly, which is why I went with the loop. Is this something you could integrate into #2 by any chance? @tedtoal Absolutely! I am currently writing up a doc that explains everything I have in there in more detail (including explaining the physics). It's not done yet but I'm aiming to have it done tomorrow: https://colab.research.google.com/drive/1L9X_tq-Kjt-foEhcnSXpvNujbbJEedBz?usp=sharing @ericnutsch I'm not sure if I understand what you mean? Could you clarify? Thanks! |
@mithuna-y, In CFD modeling we always take advantage of symmetry where possible to minimize the number of cells in the computation, allowing for increased resolution for the same computational power. I think you could do the same in your model by only looking at the upper half of the electrons or positive quadrant in the 3D model. If your computations are quantitative you will have to double the effect of the electrons on the light (4x in the 3D model). With the example from your video the top three electrons would be part of the calculation and effect from the bottom two could be a duplicate of the top two (Clearly using an even number of electrons makes life easier). Unfortunately I did not have time to read through the sets of code so forgive me if I am way off in my understanding! |
Hi, @dmiraenko and @mithuna-y I have made a version with parallel computations instead of for-loops. It is still python, but on a GPU, using PyTorch: The video shows what it looks like with over 700 evaluation points. |
This is an exciting project |
@ericnutsch thanks for explaining! You’re absolutely right that there’s symmetry to exploit in the calculation. @oliver-thiel thank you so much for sending those videos. I’ve been thinking about it for a while. They clearly show that there is no slowdown in this simulation, which means something is wrong with my model. I haven’t figured out what yet though. If anyone has any ideas I’d love to hear them. Meanwhile, I think it may not be worth optimising this model further, since it seems to a fundamental problem in it. |
Yes, @mithuna-y, I hoped to see that the main peek was cancelled out when it went deeper into the material, but it did not happen. One problem may be the proportions of the model. The distance between molecules in liquid water is about one hundred picometers, but the wavelength of visible light is several hundred nanometers - about 5000 times larger. |
That’s a good point! It may just be quite difficult to make the simulation efficient enough to run it with those sorts of parameters |
Since I'm not the author of that PR, that's going to be a bit trickier, but here's what I think is the required patch for that:
IIUC, you should be able to add that to the PR yourself without issue as the owner of this repo if you're happy with the patch (and you can just use |
Hi!
I really liked your video on information speed in a medium and, as it's something I've been interested in for a while, I'd like to help with improving your pulse-medium simulation. I'm a quantum chemistry PhD student, and pretty good at writing parallel HPC programs in Fortran. (Can't do GPU parallelization yet, though, sorry)
If you pointed me to some resources with theoretical considerations, I could start working on it in my free time. The most helpful resource would be your code for pulse-medium interaction, but I'm having trouble finding it, or recognizing it when I found it. Is it
simplified_wall_model/main.py
? If not, could you please link to it?Thanks for reigniting this question in me and hoping for your reply,
Dmitrii
The text was updated successfully, but these errors were encountered: