-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quadratic layers and enhance ICNNs, update tutorial #477
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Amazing Nina! @michalk8 can we wait with this PR, and then update the ICNN such that it inherits from the new base classes? |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #477 +/- ##
==========================================
+ Coverage 90.43% 90.53% +0.09%
==========================================
Files 60 60
Lines 6546 6551 +5
Branches 930 933 +3
==========================================
+ Hits 5920 5931 +11
+ Misses 480 472 -8
- Partials 146 148 +2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Nina, starting with a minor review.
I am wondering if we may not want to provide flexibility, when creating the ICNN, to define a sequence of dimensions, but also a sequence of what you called rank
(i.e. rank of linear operator that's "squared" to have a PSD operator on top of diagonal.)
I would also suggest to call rank
something a bit different (it's not the rank of the PSD operator, since that takes into account the diagonal element as well). same for d_i
maybe
d_i
-> diagonal
A_i
-> matrix
or lr_matrix
rank
-> rank_matrix
or rank_lr_matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks so much Michal! minor comments, LGTM
precision: Any = None | ||
kernel_init: Optional[Callable[[PRNGKey, Shape, Dtype], Array]] = None, | ||
bias_init: Callable[[PRNGKey, Shape, Dtype], Array] = nn.initializers.zeros | ||
kernel_init: Callable[[PRNGKey, Shape, Dtype], Array] = DEFAULT_KERNEL_INIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it is counterproductive to use a default initializer for this layer that has symmetric values. Here this will result in half of entries that will be below 0, and whose gradients will likely vanish quite quickly. See e.g https://openreview.net/pdf?id=pWZ97hUQtQ . Although I am not sure what we could use, it seemes that initializing by default with absolute value of a the default seems more appropriate. Another legit option would be to normalize any kernel matrix with row values summing to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about, for simpliciy, a truncated normal with low=0.0
?
z = self.act_fn(z) | ||
z += self.pos_def_potential(x) | ||
return z.squeeze() | ||
source, target = self.gaussian_map_samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something i am a bit unsure about is, in the current setup, the gaussian quadratic potential (i.e. just one) is added at the very end, regardless of all the rest. Is there a way to ensure its scale is therefore reasonable / meaningful / large enough compared to the rest of the initialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will check #90 for answers
Added this |
I will need to regenerate the notebooks, will merge after, thanks @nvesseron for the additions and @marcocuturi for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging now, let's move the discussion for initialization for another issue/PR.
* fix a bug when bias is False * update the PosDefPotentials class * added icnn adjustments * neuraldual fix freezee weights * use relu by default as activation function and rectifier_fn * updates * Update neural layers * Clean ICNN impl. * Revert changes in the potentials * Fix D102 * Fix indentation * Remove `;` * Use tensordot * Update docs * First rounds of test fixing * Fix rest of the tests * Revert assertion * Polish more docs * Fix docs linter * Fix links in neuraldual notebook * Fix links in the rest of the neural docs * Update docs * Allow ranks to be a tuple * Remvoe note * Fix MetaMLP * Rerun neural notebooks * Fix rendering --------- Co-authored-by: lucaeyring <[email protected]> Co-authored-by: Michal Klein <[email protected]>
As discussed during the hackaton, we updated the code of quadratic layers and fix the bug described in issue #463. We added these layers in the ICNN class and updated the tutorial neural_dual.