Performance improvements for `chirho.robust` #459

agrawalraj · 2023-12-19T00:37:29Z

There are three main performance improvements:

Optimization when guide is a point estimate
If the guide is only a point estimate, using NMCLogPredictiveLikelihood, which relies on torch.func.vmap to batch over multiple datapoints can be orders of magnitude slower than manually batching the likelihood. In test_empirical_fisher_vp_performance_with_likelihood in test_performance.py, we see that NMCLogPredictiveLikelihood can be 200x slower than manually vectorizing the likelihood using the new class PointLogPredictiveLikelihood.
Manually batching in _flat_conjugate_gradient_solve
One big performance hit was using torch.func.vmap to batch over multiple conjugate gradient solves. Since torch.func.vmap does not allow conditional if/else statements, the conjugate gradient solver cannot terminate earlier if the error tolerance is met. In this PR, we batch the conjugate gradient solver without using torch.func.vmap so that we can use conditional statements. In one experiment, I was able to run only 23 congugate gradient steps instead of a few hundred to reach error tolerance. This behavior is also exploited here: https://arxiv.org/abs/1903.08114.
Batching over multiple points in linearize and influence_fn
As shown in test_performance.py, simulating synthetic data from the model can be the most lengthy step. In the previous implementation, data is simulated each time the influence function is evaluated at a single point. In this PR, we evaluate the influence function over multiple points so that we only need to simulate data once.

Closes #451

eb8680

Nice detective work!

I can see why 2 and 3 would cause slowdowns, but 1 is still a bit mysterious to me even after spending some time yesterday playing around with the code and tests - there are a couple places where a naive implementation of vmap could introduce unnecessarily repeated computation, but I made some smaller changes that I thought would work around those and they didn't seem to make a difference.

At any rate, whatever is really going on, you've convinced me that it would be a good idea to replace all uses in chirho.robust of torch.func.vmap to batch over datapoints with the use of a manually vectorized version of NMCLogPredictiveLikelihood generalizing your PointPredictiveLikelihood. I'll create a separate issue for this and assign myself.

In the meantime, can we break this into smaller PRs that address the issues independently, starting with 2 and 3? I have some comments on API changes here but I think it would be easiest to address them in more isolated contexts.

eb8680 · 2023-12-19T22:52:00Z

tests/robust/test_performance.py

+    log_prob_params = {"loc": loc}
+    num_monte_carlo = 10000
+    start_time = time.time()
+    data = Predictive(GaussianModel(cov_mat), num_samples=num_monte_carlo)(loc)


This data generation step is slow because it's done sequentially. Passing parallel=True to Predictive (which we do in linearize) speeds it up.

Got it thanks!

eb8680 · 2023-12-19T22:52:28Z

tests/robust/test_performance.py

+    guide()
+
+    start_time = time.time()
+    data = Predictive(


ditto: this is slow because it's not being vectorized. Passing parallel=True should speed it up.

Makes sense thanks!

agrawalraj · 2023-12-20T17:41:32Z

Nice detective work!

I can see why 2 and 3 would cause slowdowns, but 1 is still a bit mysterious to me even after spending some time yesterday playing around with the code and tests - there are a couple places where a naive implementation of vmap could introduce unnecessarily repeated computation, but I made some smaller changes that I thought would work around those and they didn't seem to make a difference.

At any rate, whatever is really going on, you've convinced me that it would be a good idea to replace all uses in chirho.robust of torch.func.vmap to batch over datapoints with the use of a manually vectorized version of NMCLogPredictiveLikelihood generalizing your PointPredictiveLikelihood. I'll create a separate issue for this and assign myself.

In the meantime, can we break this into smaller PRs that address the issues independently, starting with 2 and 3? I have some comments on API changes here but I think it would be easiest to address them in more isolated contexts.

Yup, I'll do that now!

agrawalraj · 2023-12-20T17:42:12Z

This PR will be broken up into 3 separate issues so will close it

agrawalraj added 10 commits December 11, 2023 12:45

ucommitted changes

52ee7ff

refactor to improve speed

dcaf4aa

only compute log prob at relevant sites

9040878

more small performance tests and optimizations

4b2fe45

added flag for average point scores, and small refactors

109b766

batched congugate gradients, need to still fix tests

890578c

cleaned up cg function

014cd02

tests passing

8a62aff

fixed pointlogprob bug

f971a27

patched vmap bug of not specifying randomness

d689010

agrawalraj added enhancement New feature or request module:robust labels Dec 19, 2023

agrawalraj added this to the Automated doubly robust estimation milestone Dec 19, 2023

agrawalraj self-assigned this Dec 19, 2023

typing error

aec2007

agrawalraj requested a review from eb8680 December 19, 2023 01:08

agrawalraj added the status:awaiting review Awaiting response from reviewer label Dec 19, 2023

agrawalraj linked an issue Dec 19, 2023 that may be closed by this pull request

Potential slowdown of vmapping on the outside in influence_fn #451

Closed

agrawalraj mentioned this pull request Dec 19, 2023

Reproduce DR learner notebook using chirho.robust #460

Closed

eb8680 reviewed Dec 20, 2023

View reviewed changes

agrawalraj closed this Dec 20, 2023

This was referenced Dec 20, 2023

Manually batching _flat_conjugate_gradient_solve #462

Closed

Optimization when guide is a point estimate in chirho.robust #463

Closed

Batching over multiple points in linearize and influence_fn in chirho.robust #464

Closed

eb8680 mentioned this pull request Dec 30, 2023

Replace some torch.vmap usage with a hand-vectorized BatchedNMCLogPredictiveLikelihood #473

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements for `chirho.robust` #459

Performance improvements for `chirho.robust` #459

agrawalraj commented Dec 19, 2023 •

edited

Loading

eb8680 left a comment

eb8680 Dec 19, 2023

agrawalraj Dec 20, 2023

eb8680 Dec 19, 2023

agrawalraj Dec 20, 2023

agrawalraj commented Dec 20, 2023

agrawalraj commented Dec 20, 2023

Performance improvements for chirho.robust #459

Performance improvements for chirho.robust #459

Conversation

agrawalraj commented Dec 19, 2023 • edited Loading

eb8680 left a comment

Choose a reason for hiding this comment

eb8680 Dec 19, 2023

Choose a reason for hiding this comment

agrawalraj Dec 20, 2023

Choose a reason for hiding this comment

eb8680 Dec 19, 2023

Choose a reason for hiding this comment

agrawalraj Dec 20, 2023

Choose a reason for hiding this comment

agrawalraj commented Dec 20, 2023

agrawalraj commented Dec 20, 2023

Performance improvements for `chirho.robust` #459

Performance improvements for `chirho.robust` #459

agrawalraj commented Dec 19, 2023 •

edited

Loading