Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In att_flow_layer of bidaf model #25

Open
FayeXXX opened this issue Feb 28, 2020 · 1 comment
Open

In att_flow_layer of bidaf model #25

FayeXXX opened this issue Feb 28, 2020 · 1 comment

Comments

@FayeXXX
Copy link

FayeXXX commented Feb 28, 2020

Hi,I have just started to learn QA models and thank u sooo much 4 sharing this.
I found that the attention u write is a little bit different from the origin paper:
on line 141 of model.py
s = self.att_weight_c(c).expand(-1, -1, q_len) +
self.att_weight_q(q).permute(0, 2, 1).expand(-1, c_len, -1) +
cq
However, the paper use [h; u; h ◦u], that is 6d after concatenation, which is different from ur multiplication above.
Does it make a difference?

@kushalj001
Copy link

This implementation has broken down the learnable parameters. Instead of using a 6d trainable weight after concatenating the 3 tensors [h; u; h ◦u], he has used 3 different trainable weights of dim 2d.
I am not sure how much of a difference does this makes. I am reimplementing this paper currently using the 6d approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants