Question about prepare alignments #22

chynphh · 2020-06-15T15:22:35Z

Hi,
In prepare_fastspeech.ipynb file,
about

F = torch.mean(torch.max(alignments, dim=-1)[0], dim=-1) 
r, c = torch.argmax(F).item()//4, torch.argmax(F).item()%4
location = torch.max(alignments[r,c], dim=1)[1]

My understanding is: In the first line, the tensor shape changed from (layer_num, target_length, source_length) to (layer_num, target_length), and to (layer_num).
But I don't understand what's the mean of "4", and why use the layer num to calculate the location?

If there is a problem with my understanding, thanks for pointing out.

The text was updated successfully, but these errors were encountered:

Jackson-Kang · 2020-08-20T12:13:09Z

Hello, @chynphh

4 means the number of heads used to multihead-attention.
If you edit the return value of multihead attention in pytorch, you can get the attention with (layer_num, head_num, target_length, source_length) shape.

Consequently, r and c means n_layers and n_heads.
Hope that this comment be helpful to you.

Sincerely,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about prepare alignments #22

Question about prepare alignments #22

chynphh commented Jun 15, 2020

Jackson-Kang commented Aug 20, 2020

Question about prepare alignments #22

Question about prepare alignments #22

Comments

chynphh commented Jun 15, 2020

Jackson-Kang commented Aug 20, 2020