Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quadratic layers and enhance ICNNs, update tutorial #477

Merged
merged 30 commits into from
Dec 20, 2023

Conversation

nvesseron
Copy link
Contributor

As discussed during the hackaton, we updated the code of quadratic layers and fix the bug described in issue #463. We added these layers in the ICNN class and updated the tutorial neural_dual.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@MUCDK
Copy link
Contributor

MUCDK commented Dec 1, 2023

Amazing Nina! @michalk8 can we wait with this PR, and then update the ICNN such that it inherits from the new base classes?

@michalk8 michalk8 added the enhancement New feature or request label Dec 1, 2023
Copy link

codecov bot commented Dec 1, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (ef3b544) 90.43% compared to head (b702b27) 90.53%.
Report is 1 commits behind head on main.

❗ Current head b702b27 differs from pull request most recent head 20bfa12. Consider uploading reports for the commit 20bfa12 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #477      +/-   ##
==========================================
+ Coverage   90.43%   90.53%   +0.09%     
==========================================
  Files          60       60              
  Lines        6546     6551       +5     
  Branches      930      933       +3     
==========================================
+ Hits         5920     5931      +11     
+ Misses        480      472       -8     
- Partials      146      148       +2     
Files Coverage Δ
src/ott/neural/solvers/neuraldual.py 60.49% <50.00%> (+2.70%) ⬆️
src/ott/problems/linear/potentials.py 91.66% <66.66%> (-0.55%) ⬇️
src/ott/neural/models.py 94.36% <93.18%> (-0.44%) ⬇️
src/ott/neural/layers.py 91.30% <90.00%> (+2.62%) ⬆️

Copy link
Contributor

@marcocuturi marcocuturi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Nina, starting with a minor review.

I am wondering if we may not want to provide flexibility, when creating the ICNN, to define a sequence of dimensions, but also a sequence of what you called rank (i.e. rank of linear operator that's "squared" to have a PSD operator on top of diagonal.)

I would also suggest to call rank something a bit different (it's not the rank of the PSD operator, since that takes into account the diagonal element as well). same for d_i

maybe
d_i-> diagonal
A_i -> matrix or lr_matrix
rank -> rank_matrix or rank_lr_matrix

src/ott/neural/layers.py Show resolved Hide resolved
src/ott/neural/layers.py Outdated Show resolved Hide resolved
src/ott/neural/layers.py Outdated Show resolved Hide resolved
src/ott/neural/layers.py Outdated Show resolved Hide resolved
src/ott/neural/layers.py Outdated Show resolved Hide resolved
src/ott/neural/layers.py Outdated Show resolved Hide resolved
src/ott/neural/models.py Outdated Show resolved Hide resolved
src/ott/neural/models.py Outdated Show resolved Hide resolved
src/ott/neural/models.py Outdated Show resolved Hide resolved
src/ott/problems/linear/potentials.py Outdated Show resolved Hide resolved
@michalk8 michalk8 requested a review from marcocuturi December 19, 2023 15:58
Copy link
Contributor

@marcocuturi marcocuturi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks so much Michal! minor comments, LGTM

precision: Any = None
kernel_init: Optional[Callable[[PRNGKey, Shape, Dtype], Array]] = None,
bias_init: Callable[[PRNGKey, Shape, Dtype], Array] = nn.initializers.zeros
kernel_init: Callable[[PRNGKey, Shape, Dtype], Array] = DEFAULT_KERNEL_INIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is counterproductive to use a default initializer for this layer that has symmetric values. Here this will result in half of entries that will be below 0, and whose gradients will likely vanish quite quickly. See e.g https://openreview.net/pdf?id=pWZ97hUQtQ . Although I am not sure what we could use, it seemes that initializing by default with absolute value of a the default seems more appropriate. Another legit option would be to normalize any kernel matrix with row values summing to 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about, for simpliciy, a truncated normal with low=0.0?

src/ott/neural/layers.py Show resolved Hide resolved
src/ott/neural/layers.py Outdated Show resolved Hide resolved
src/ott/neural/layers.py Outdated Show resolved Hide resolved
src/ott/neural/layers.py Show resolved Hide resolved
src/ott/neural/layers.py Outdated Show resolved Hide resolved
src/ott/neural/layers.py Show resolved Hide resolved
src/ott/neural/layers.py Show resolved Hide resolved
src/ott/neural/layers.py Show resolved Hide resolved
z = self.act_fn(z)
z += self.pos_def_potential(x)
return z.squeeze()
source, target = self.gaussian_map_samples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something i am a bit unsure about is, in the current setup, the gaussian quadratic potential (i.e. just one) is added at the very end, regardless of all the rest. Is there a way to ensure its scale is therefore reasonable / meaningful / large enough compared to the rest of the initialization?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will check #90 for answers

@michalk8
Copy link
Collaborator

I am wondering if we may not want to provide flexibility, when creating the ICNN

Added this

@michalk8
Copy link
Collaborator

I will need to regenerate the notebooks, will merge after, thanks @nvesseron for the additions and @marcocuturi for the review.

@michalk8 michalk8 self-requested a review December 20, 2023 18:39
Copy link
Collaborator

@michalk8 michalk8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging now, let's move the discussion for initialization for another issue/PR.

@michalk8 michalk8 merged commit b6cf039 into ott-jax:main Dec 20, 2023
4 of 11 checks passed
michalk8 added a commit that referenced this pull request Jun 27, 2024
* fix a bug when bias is False

* update the PosDefPotentials class

* added icnn adjustments

* neuraldual fix freezee weights

* use relu by default as activation function and rectifier_fn

* updates

* Update neural layers

* Clean ICNN impl.

* Revert changes in the potentials

* Fix D102

* Fix indentation

* Remove `;`

* Use tensordot

* Update docs

* First rounds of test fixing

* Fix rest of the tests

* Revert assertion

* Polish more docs

* Fix docs linter

* Fix links in neuraldual notebook

* Fix links in the rest of the neural docs

* Update docs

* Allow ranks to be a tuple

* Remvoe note

* Fix MetaMLP

* Rerun neural notebooks

* Fix rendering

---------

Co-authored-by: lucaeyring <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
@nvesseron nvesseron deleted the fix_bug_quad_layer branch August 4, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants