-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output seconds/point in get_farfields_array progress #865
Output seconds/point in get_farfields_array progress #865
Conversation
@@ -323,11 +324,14 @@ realnum *dft_near2far::get_farfields_array(const volume &where, int &rank, size_ | |||
x.set_direction(dirs[1], where.in_direction_min(dirs[1]) + i1 * dx[1]); | |||
for (size_t i2 = 0; i2 < dims[2]; ++i2) { | |||
x.set_direction(dirs[2], where.in_direction_min(dirs[2]) + i2 * dx[2]); | |||
if (!quiet && wall_time() > start + MEEP_MIN_OUTPUT_TIME) { | |||
double t; | |||
if (!quiet && (t = wall_time()) > start + MEEP_MIN_OUTPUT_TIME) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want this to work in the parallel case then we may need to add:
if (!quiet) all_wait();
to make the timing accurate on the master process.
The question is, what is the performance impact of putting synchronization in the loop like this? I'm not sure
The synchronization doesn't seem to affect performance at all. Also calling |
You tried it with a big parallel calculation? |
I tried it with 4 processors on the binary_grating_n2f example which spent around 350 seconds in |
I was more worried about the scaling for a large number of processors (~100). |
Can I just crank the resolution in binary_grating_n2f.py, or does it need to be modified in some other way to make a realistic example for 100 procs? |
To test for a large number of processors, you can increase three parameters simultaneously: |
…g output accuracy for speed in #865
* Output seconds/point in get_farfields_array progress * Only evaluate wall_time() once * Add all_wait for consistent output, and flush python stdout from C
…g output accuracy for speed in NanoComp#865
@stevengj @oskooi