In att_flow_layer of bidaf model #25

FayeXXX · 2020-02-28T05:00:27Z

Hi,I have just started to learn QA models and thank u sooo much 4 sharing this.
I found that the attention u write is a little bit different from the origin paper:
on line 141 of model.py
s = self.att_weight_c(c).expand(-1, -1, q_len) +
self.att_weight_q(q).permute(0, 2, 1).expand(-1, c_len, -1) +
cq
However, the paper use [h; u; h ◦u], that is 6d after concatenation, which is different from ur multiplication above.
Does it make a difference?

kushalj001 · 2020-06-03T23:07:26Z

This implementation has broken down the learnable parameters. Instead of using a 6d trainable weight after concatenating the 3 tensors [h; u; h ◦u], he has used 3 different trainable weights of dim 2d.
I am not sure how much of a difference does this makes. I am reimplementing this paper currently using the 6d approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In att_flow_layer of bidaf model #25

In att_flow_layer of bidaf model #25

FayeXXX commented Feb 28, 2020

kushalj001 commented Jun 3, 2020

In att_flow_layer of bidaf model #25

In att_flow_layer of bidaf model #25

Comments

FayeXXX commented Feb 28, 2020

kushalj001 commented Jun 3, 2020