-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add modern poisson2d solver & rename optimized.f90 #23
base: main
Are you sure you want to change the base?
Conversation
Refactorings ------------ 1. Use BLOCK for tighter variable scoping (reduces some clutter). 2. Use array statements with vector subscripts to write compact initializations. 3. Use DO CONCURRENT to express fine-grained parallelism and expose opportunities for optimizations such as vectorization, multithreading, or GPU offloading. 4. Replace some magic values, e.g., initial delta. 5. Eliminate some unnecessary computation by initializing and updating only values that actually need initializing and updating. 6. Use more compact conditional logic. 7. Use a real kind specificaiton based on precision requirements rather than the ambiguously defined double-precision representation of any one given compiler on any one given platform. 8. Eliminate unnecessary precision specifiers ("_dp") when an expression is simply a constant that is exactly representable in any reasonable floating-point representation. 9. Separate the procedure interface into a module and the procedure definition into a submodule.** 10. Use a more descriptive variable names in a few places. 11. Add comments! (includeing some in FORD style) ** In larger projects, item 9 makes code much more user-friendly by isolation the application programmer interface (API) and reduces compile times by eliminating unnecessary compilation cascades.
This code is old-fashioned, verbose, and most importantly, less clealry expressive, particularly in terms of exposing optimization opportunties, which makes the file name a misnomer. By contrast, commit a7ccf54 introduces a modern.f90 refactored to expose concurrency more explicitly through the use of `do concurrent` and array statements, both of which express sets of operations that can be safely executed in any order, opening up opportunities for vectorization, multithreading, and GPU offloading. In timings on a 2.3 GHz 8-Core Intel Core i9, modern.f90 executes in roughly the same time -- possibly 5-10% longer based on informal ad hoc testing -- but is considerably more compact in many places, is arguably more clear and expressive.
Thank you for this! Looks nice.
I see --- I think you are taking advantage of this here, so you wouldn't need my suggested changes above? But is that really true? I know that |
Either is fine. I'm a minimalist so I use the first one because the `_dp` isn't required when the real literal is exactly representable in the chosen floating-post representation, which presumably is the case for `1.` in any floating-point scheme. Co-authored-by: Ondřej Čertík <[email protected]>
I concur that `_dp` is required in this one. Co-authored-by: Ondřej Čertík <[email protected]>
I see. I am a minimalist too, that's why I just use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is +1 from me. @milancurcic would you mind also please reviewing this?
@certik good points. I think I just accepted your changes, but I'm not sure. If so, feel free to merge this or let me know if you want me to merge it. If I didn't accept your changes correctly, let me know and I'll see if I can fix it. |
Co-authored-by: Ondřej Čertík <[email protected]>
interface | ||
pure real(dp) module function rho(x,y) | ||
!! Poisson equation inhomogeneous term. | ||
implicit none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question for my knowledge: why is there an implicit none
here? Is the implicit none
in the module not covering it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is a common mistake that I also learned the hard way --- implicit none
in a module propagates and applies to everything except interface
, where you have to repeat it.... :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @certik for your answer. I will have to check all my codes ;)
I don't know how to suggest changes to files that weren't changed in the pull request, so I'll just write here. Since you renamed one of the Fortran files, can you also rename the corresponding row in the table in README.md? Also, you could add a row to the table, with the following numbers: M=100: 0.43 M=200: 8.24 M=300: 38.39 which are the numbers I got on my machine with the same settings as the other files (-Ofast, WSL2, same CPU, etc.) |
Ah, I didn't realize the file got renamed from (To rephrase what I am trying to achieve: I absolutely want to have such a discussion about how modern Fortran codes should be written and arrive at more opinionated community agreed upon style(s). But in order to have a good discussion about this, I am hoping to have a few dozen examples, each with different styles. And only then have that discussion. I am afraid the discussion would not be productive if we do it now.) |
@certik I like your suggestion to rename |
poisson2d/modern.f90
Outdated
phi_prime(i,j) = (phi(i+1,j) + phi(i-1,j) + phi(i,j+1) + phi(i,j-1))/4._dp & | ||
+ (dx/2._dp)*(dy/2._dp)/epsilon0*rho_sampled(i,j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _dp
suffix for a few literal constants here is unnecessary (let the compiler coerce to correct precision). And, if you wanted to switch to single precision, you'd only need to change it in the declaration of phi
and other arrays, and not fiddle with literal constants' precisions here.
phi_prime(i,j) = (phi(i+1,j) + phi(i-1,j) + phi(i,j+1) + phi(i,j-1))/4._dp & | |
+ (dx/2._dp)*(dy/2._dp)/epsilon0*rho_sampled(i,j) | |
phi_prime(i,j) = (phi(i+1,j) + phi(i-1,j) + phi(i,j+1) + phi(i,j-1))/4 & | |
+ (dx/2)*(dy/2)/epsilon0*rho_sampled(i,j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the rules on expression evaluation well. I was guessing that the _dp
is required to avoid getting a default-precision result for the division. I'm fine with changing it. but GitHub is marking this review comment as outdated so I think I'll have to make the change locally and push it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a fan of concise code, so I agree with Milan. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the above cases where 2
is an integer it doesn't matter. However when you are dealing with 0.213535
then things gets interesting since it first gets interpreted as a real(single)
, then gets cast to a real(double)
which is not what you want. :)
But here 2 is sufficient int -> double
is perfect representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I recommend to always write floating point numbers as 0.5_dp
, whether or not they can be represented exactly. That way one cannot make a mistake and it's a simple rule to follow. Floating point divided by an integer is guaranteed to be a floating point, so there I prefer to just use an integer 2
instead of 2.0_dp
.
If you like the whole-array form over the |
I don't know why yet, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the code fails to compile with two of four compilers I tested (gfortran 11.1 and nvfortran 21.5 are okay, while nagfor 7.0 and ifort 2021.2 fail). At least the build with ifort seems mendable by putting the submodule in a separate file.
I seem to be able to converge it for M=100, 200, 300 with -Ofast, or at least it outputs a number of iterations in line with the C code (and Cython/python where available) using gfortran-9.3.0 |
I apologize, I gave incomplete information. I don't converge for M=50 (I made it small for shorter tests). Do the method and the problem require a minimum value of M for the method to converge? |
I get the same problem for M=50. I'm not sure if there's any general proof for what kind of grid is too small, but the other Fortran codes seem to work. So clearly the modern code is somehow generating a different code. The basic logic of it seems to be the same, though, and works for bigger M, so I wonder if the compiler isn't doing some wonky optimization because of the modern features that is simply buggy? |
Co-authored-by: Milan Curcic <[email protected]>
Co-authored-by: Ondřej Čertík <[email protected]>
Co-authored-by: Milan Curcic <[email protected]>
When compiled with I also tried the NAG compiler version 7.0, Build 7044, and found a compiler bug that I just reported to NAG. I'm pasting the bug report below. I'm not sure what to do from here. Before refactoring code, it's always safest to have autoamated tests in place. I hope that any code contributed to this repository will include tests, which will probably need to be internal to the code, given the difficulty of setting up a multi-language test harness that covers so many languages. Bug Report
|
I've managed to isolate the problem. In a puzzling turn of events, it appears gfortran doesn't like the rho function and the submodule for some reason. Removing both the rewritten if-else as well as the submodule seems to fix this problem (but neither one on their own fixes it). That is, just use the old rho function in a module instead of the submodule, and for some reason the code works again. Edit: but also, simply putting rho in a submodule and then calling it from a test program still doesn't cause any issues, and the code executes correctly. The loop seems to be required for the convergence issue to arise. |
@arunningcroc could you either insert code edits on GitHub or submit a new pull request with working code? Most importantly, if you were the original author (or simply if you have thoughts on how to do so), could you include some sort of results-verification code? Possibly the best thing might be for each of the Fortran versions to be refactored into being functions instead of main programs. Assuming someone has verified that some version produces the desire solution, that version could be the reference version of the code. Then there can be just one Fortran main program that calls each version with @certik if you don't mind introducing object-oriented programming, @everythingfunctional and I have developed a design pattern that captures everything I just wrote and specifies requirements in the form of deferred bindings on an abstract type we call the Oracle pattern, inspired by the common term "test oracle." It's actually a special case of a Template Method pattern. In addition to saving time explaining the same concepts in English, requiring contributors extend an oracle abstract derived type enforces the project requirements formally and operational and ensures uniformity across the project. |
Absolutely. Please submit another PR with the OO code! |
Yes, I'll try to get this done this afternoon. Also, I think I should probably write some sort of problem description pdf, explaining the theory and the numerical method in detail, so that anyone may easily implement it in their language of choice. For the test cases, I know that the Python code should produce correct results, seeing as it is lifted almost wholesale from a computational physics textbook. However, there are also known analytical solutions to 2D Poisson and Laplace equations, and these might be valuable for testing as well. Presumably finding the correct solution to a few such cases will be a good guarantee that the code works as expected. If I separate the Fortran code in to modules/functions and call them all from a single main program, should I do the same for the Python, C and Cython codes as well, so that each language is called from some file like "main.f90", "main.py", etc? Then anyone who wants to see all the timings of a particular language could always just execute the main file for that language, and one could imagine adding some bash/powershell script to run all of them consequentially. |
@arunningcroc that all sounds great. The C code could be called by the Fortran main program via C bindings. I've heard that other languages also support C bindings so it might also be possible to call other languages from one Fortran main program. |
I hope the benefit of this pull request extends beyond this specific refactoring and serves as a guide for other codes contributed to this project. The aim is to demonstrate what modern Fortran can do to improve the expressiveness of code while maintaining and potentially improving performance.
In my local tests compiling with
gfortran -Ofast
, what was previously namedoptimized.f90
only performs about 5% faster thanmodern.f90
on average. In 2021, that hardly justifies restricting oneself to a small subset of Fortran 95 that is very nearly Fortran 77 + free format even in the name of performance. The modern version is more compact in some places, adds safety in some places (e.g., tighter variable scoping means fewer opportunities for mistakenly changing variables), and adds additional optimization opportunities now that compilers such as NVIDIA's compiler can offloaddo concurrent
to a GPU. This means there's even more that can be explored with this new code than I've been able to investigate so far.Refactorings
block
for tighter variable scoping (reduces some clutter).do concurrent
to express fine-grained parallelism and expose opportunities for optimizations such as vectorization, multithreading, or GPU offloading.real
kind specification based on precision requirements rather than the ambiguously defined double-precision representation of any one given compiler on any one given platform._dp
) when an expression is simply a constant that is exactly representable in any reasonable floating-point representation.module
and the procedure definition into asubmodule
.**associate
to ensure immutable state for what is essentially a runtime constant.** In larger projects, item 9 makes code much more user-friendly by isolation the application programmer interface (API) and reduces compile times by eliminating unnecessary compilation cascades.
[magic numbers]: https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants