You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,I have just started to learn QA models and thank u sooo much 4 sharing this.
I found that the attention u write is a little bit different from the origin paper:
on line 141 of model.py
s = self.att_weight_c(c).expand(-1, -1, q_len) +
self.att_weight_q(q).permute(0, 2, 1).expand(-1, c_len, -1) +
cq
However, the paper use [h; u; h ◦u], that is 6d after concatenation, which is different from ur multiplication above.
Does it make a difference?
The text was updated successfully, but these errors were encountered:
This implementation has broken down the learnable parameters. Instead of using a 6d trainable weight after concatenating the 3 tensors [h; u; h ◦u], he has used 3 different trainable weights of dim 2d.
I am not sure how much of a difference does this makes. I am reimplementing this paper currently using the 6d approach.
Hi,I have just started to learn QA models and thank u sooo much 4 sharing this.
I found that the attention u write is a little bit different from the origin paper:
on line 141 of model.py
s = self.att_weight_c(c).expand(-1, -1, q_len) +
self.att_weight_q(q).permute(0, 2, 1).expand(-1, c_len, -1) +
cq
However, the paper use [h; u; h ◦u], that is 6d after concatenation, which is different from ur multiplication above.
Does it make a difference?
The text was updated successfully, but these errors were encountered: