Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output seconds/point in get_farfields_array progress #865

Merged
merged 3 commits into from
May 14, 2019
Merged

Output seconds/point in get_farfields_array progress #865

merged 3 commits into from
May 14, 2019

Conversation

ChristopherHogan
Copy link
Contributor

src/near2far.cpp Outdated Show resolved Hide resolved
@@ -323,11 +324,14 @@ realnum *dft_near2far::get_farfields_array(const volume &where, int &rank, size_
x.set_direction(dirs[1], where.in_direction_min(dirs[1]) + i1 * dx[1]);
for (size_t i2 = 0; i2 < dims[2]; ++i2) {
x.set_direction(dirs[2], where.in_direction_min(dirs[2]) + i2 * dx[2]);
if (!quiet && wall_time() > start + MEEP_MIN_OUTPUT_TIME) {
double t;
if (!quiet && (t = wall_time()) > start + MEEP_MIN_OUTPUT_TIME) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this to work in the parallel case then we may need to add:

if (!quiet) all_wait();

to make the timing accurate on the master process.

The question is, what is the performance impact of putting synchronization in the loop like this? I'm not sure

@ChristopherHogan
Copy link
Contributor Author

The synchronization doesn't seem to affect performance at all. Also calling stdout.flush from C instead of python so it flushes on every call to master_printf instead of just during time stepping.

@stevengj
Copy link
Collaborator

stevengj commented May 9, 2019

You tried it with a big parallel calculation?

@ChristopherHogan
Copy link
Contributor Author

I tried it with 4 processors on the binary_grating_n2f example which spent around 350 seconds in get_farfields_array, both with and without the all_wait().

@stevengj
Copy link
Collaborator

I was more worried about the scaling for a large number of processors (~100).

@ChristopherHogan
Copy link
Contributor Author

Can I just crank the resolution in binary_grating_n2f.py, or does it need to be modified in some other way to make a realistic example for 100 procs?

@oskooi
Copy link
Collaborator

oskooi commented May 10, 2019

To test for a large number of processors, you can increase three parameters simultaneously: resolution, nfreq, and nperiods. (Also, you'll probably just want to run the last of the three runs involving the supercell.)

@stevengj stevengj merged commit 3478320 into NanoComp:master May 14, 2019
@ChristopherHogan ChristopherHogan deleted the chogan/n2f_s/point branch May 14, 2019 19:19
stevengj added a commit that referenced this pull request Dec 27, 2019
bencbartlett pushed a commit to bencbartlett/meep that referenced this pull request Sep 9, 2021
* Output seconds/point in get_farfields_array progress

* Only evaluate wall_time() once

* Add all_wait for consistent output, and flush python stdout from C
bencbartlett pushed a commit to bencbartlett/meep that referenced this pull request Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants