Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance #514

Merged
merged 4 commits into from
Mar 31, 2016
Merged

Performance #514

merged 4 commits into from
Mar 31, 2016

Conversation

yeganer
Copy link
Contributor

@yeganer yeganer commented Mar 8, 2016

This is a small collection of performance boosts and improvement of code readability.

Included changes:

  • remove confusion from line_scatter by setting the next line only when it is known
  • change memory layout of line_list_tau_sobolevs and transition_probabilities so the montecarlo routine can interprete it as no_of_shells x no_of_lines
  • Fix for Optimization: cmontecarlo.c:compute_distance2boundary restructure logic #497: distance2boundary is more structured now and has no overhead anymore
  • tardis_example.yml
  • tardis_w7.yml ~20-25%
  • abn_tom_test.yml ~31%

yeganer added 3 commits March 8, 2016 14:58
Previously rpacket_set_next_line_id() was called at the beginning of the
function and later get_next_line_id() - 1 was used to get the current
id. This was changed and set_next_line_id() is now called at the right
position.
The option '--gdb' will print the pid and pause before running the simulation.
This allows attaching gdb to the running process to faciliate debugging.
Previously accessing two back-to-back elements of a two-dimensional
array resulted in two completely different memmory locations. This resulted in
memory bandwith being the bottleneck for these calculations.

This change converts the arrays to a layout beneficial for our calculations
before passing them to the C extension. The indexing of those arrays is changed
appropriately.
The result of the j_blue_estimator has to be converted back to the format
python expects.
Additionally the j_blue_estimators will only be calculated if the corresponding
array exists. This allows setting line_lists_j_blues to point to Null in order
to skip this part if it is not needed.
@yeganer
Copy link
Contributor Author

yeganer commented Mar 9, 2016

@wkerzendorf @unoebauer I tested this setup with tardis_example.yml and tardis_w7.yml. In with both setups I got identical results compared to the current master (diff ref.txt performance.txt reported nothing at least) and the tests ran fine.

The speedup I measured was ~20-25%.
I think restructuring the way the distances are calculated could yield another 25% or even more but that change would be less straightforward.

Can you please point me to other setups I can run to ensure I didn't implement any nasty bugs?

If I'm not mistaken all I have to do is ensure, that both macroatom and downbranch return the same results, right? Other parameters have no influence on the MC simulation.

@unoebauer
Copy link
Contributor

Hi @yeganer - having a quick at the changes, it looks to me as if the sequence of RNG calls is not altered. To double check that the optimization tweaks do not interfere with the implemented physics, I'd suggest that you do a bit comparison of the spectra before and after the optimization changes for

  • tardis_example
  • tardis_w7
  • abn_tom_test

@wkerzendorf
Copy link
Member

@unoebauer - are they in the TEP?

@unoebauer
Copy link
Contributor

@wkerzendorf yes (apart from tardis_example)

The calculation that needs to be performed is a simple trigonomic
calculation with two possible results. For performance reason one can
introduce a previous check. For a detailed discussion see tardis-sn#497

This results in the following sheme:
 - For mu > 0 (outward propagation) the next boundary ist the outer
   boundary (case 1)
 - For mu < 0 we have to decide whether we hit the inner boundary or
   not and set the distance accordingly (case 2&3)

Additionally branch prediction optimization was performed in
distance2line. The compiler always assumes if statements to be true,
which means the default branch should always the one the if statement
checks for. This slightly reduces instruction misses.

Resolves: tardis-sn#497
@yeganer
Copy link
Contributor Author

yeganer commented Mar 10, 2016

@unoebauer @wkerzendorf @orbitfold I updated this PR and I'm not seeing any difference between spectra. np.all(spec_ref == spec_pr) : True
Can you have a look please and merge once you are fine with the changes.

@@ -36,6 +36,8 @@ parser.add_argument('--profile', action='store_true', help=
parser.add_argument('--profiler_log_file', default='profiler.log', help=
'name of the profiler output file')

parser.add_argument('--gdb', action='store_true', help='print pid and pause')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this - or is this a debug statement from you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a debug statement I added specifically to enable debugging. If we don't want that capanility I can remove that commit but I think it's a handy option.

I wasn't able to start tardis with gdb to debug the C code, so I added it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fine - just wanted to know.

@yeganer
Copy link
Contributor Author

yeganer commented Mar 11, 2016

@unoebauer @wkerzendorf @chvogl @orbitfold I rerun my comparison with this commit and the previous one(with recently_crossed_boundary missing) and both gave me spectra which were bit equal for all three configurations.

With that Information I think it is save to assume that I didn't change any logic. As for the Issue with packets potentially having a discriminant < 0 which would result in the packet missing the outer boundary. A quick calculation showed that mu < sqrt( 2 e_m + e_m**2) where e_m is the machine precision. This results in a probability of roughly 1e-9. In addition to the exact angle the packet needs a position of r=r_outer + e_m. We can safely assume this because the maximum error done during move_packet is the machine precision.

@wkerzendorf
Copy link
Member

@tardis-sn/tardis-core are we happy to merge this?

@yeganer
Copy link
Contributor Author

yeganer commented Mar 30, 2016

@wkerzendorf I'm happy to merge. My local branch has recently_crossed_boundary removed and all tests are passing. Should I update the PR again?

@wkerzendorf
Copy link
Member

@yeganer I'm happy to merge this! I need one more confirmation from @tardis-sn/tardis-core

@orbitfold
Copy link
Contributor

I'm fine as well
On 31 Mar 2016 11:36, "Wolfgang Kerzendorf" [email protected]
wrote:

@yeganer https://github.com/yeganer I'm happy to merge this! I need one
more confirmation from @tardis-sn/tardis-core
https://github.com/orgs/tardis-sn/teams/tardis-core


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#514 (comment)

@wkerzendorf wkerzendorf merged commit d5cfae2 into tardis-sn:master Mar 31, 2016
kdexd pushed a commit to kdexd/tardis that referenced this pull request Mar 31, 2016
kdexd pushed a commit to kdexd/tardis that referenced this pull request Apr 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants