Batch `apply_lse_kernel` for `online=True` #23

michalk8 · 2022-03-02T14:31:04Z

As discussed in #20 , this PR fixes online by batching apply_lse_kernel and running the fully-vectorized computation on a batch of shape n * batch_size or m * batch_size (depends of the axis) instead of n * m. Minor inefficiency comes from that this approach computed the kernel application for extra {n,m} % batch_size points, given that result of each iteration of jax.lax.scan must have the same shape.

@LaetitiaPapaxanthos there are 2 points I am unsure how you want to handle:

shall the backward compatibility be retained and allow backward=True? In that case, we'd need to use some default value, that can either be fixed or depend on the number of points (from our benchmarks, value of 1024 seems to work well) UPDATE: online=True is the same as online=1024
~~currently, the batch size is the same when using axis=0 or axis=1, but this could be done in axis-specific manner~~ UPDATE: kept the same batch size for both axes

TODOs:

add new tests (and depending on the 1st point above, might need to adjust old tests where online=True)
update docs

closes #20

michalk8 · 2022-03-02T18:28:58Z

There was a bug which caused the coupling/marginals not to match online=False (neither tests here or running the tutorial notebook caught it for the particular values of online), should be fixed now (added a test for this + 2 more for jitting); locally tests pass (but would be great to enable CI on PRs).
Performance-wise, should even be slightly faster, since apply_lse_kernel doesn't do any extra work (985.29s for a coupling of shape (131072, 65536), Nvidia A100). As far as corner-cases, there should be tests for them (batch size of 1, n, m and some prime number).

marcocuturi · 2022-03-02T19:13:45Z

Thanks a lot Michal! this is fantastic.

Batch `apply_lse_kernel` for `online=True`

michalk8 added 12 commits February 26, 2022 21:12

Explicitly batch in apply_lse when online=True

dc419f8

Try lax.scan

71fe848

Fix not being traceable

15a326c

Fix bug - wrong indexing

19679a0

Use shape attribute

2e19fe2

More robustly check batch size

c1c63f2

Use math.ceil to allow jitting

406459c

Fix flatten/unflatten in vmap

2348da5

Correctly fix setting batch size

8dd1262

Fix bug where online didn't match offline

d5bc39d

Add bwcompat online=True using batch size of 1024

813f371

Fix typo in Pointcloud, add tests

309ecfb

marcocuturi merged commit 0909dfe into ott-jax:master Mar 2, 2022

michalk8 pushed a commit that referenced this pull request Jun 27, 2024

Merge pull request #23 from michalk8/feature/apply-lse-batch

f4fceab

Batch `apply_lse_kernel` for `online=True`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch `apply_lse_kernel` for `online=True` #23

Batch `apply_lse_kernel` for `online=True` #23

michalk8 commented Mar 2, 2022 •

edited

Loading

michalk8 commented Mar 2, 2022 •

edited

Loading

marcocuturi commented Mar 2, 2022

Batch apply_lse_kernel for online=True #23

Batch apply_lse_kernel for online=True #23

Conversation

michalk8 commented Mar 2, 2022 • edited Loading

michalk8 commented Mar 2, 2022 • edited Loading

marcocuturi commented Mar 2, 2022

Batch `apply_lse_kernel` for `online=True` #23

Batch `apply_lse_kernel` for `online=True` #23

michalk8 commented Mar 2, 2022 •

edited

Loading

michalk8 commented Mar 2, 2022 •

edited

Loading