Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DIN] models/din.py 的第 94~97行是否有点问题 #176

Closed
cuihu1998 opened this issue May 10, 2021 · 2 comments · Fixed by #184
Closed

[DIN] models/din.py 的第 94~97行是否有点问题 #176

cuihu1998 opened this issue May 10, 2021 · 2 comments · Fixed by #184
Labels
question Further information is requested

Comments

@cuihu1998
Copy link

是代码写错了吗? 运行的是examples/run_din.py。

@cuihu1998 cuihu1998 added the question Further information is requested label May 10, 2021
@cuihu1998
Copy link
Author

cuihu1998 commented May 10, 2021

从din.py的第94行开始
sequence_embed_dict = varlen_embedding_lookup(X, self.embedding_dict, self.feature_index, self.sparse_varlen_feature_columns)
sequence_embed_list = get_varlen_pooling_list(sequence_embed_dict, X, self.feature_index, self.sparse_varlen_feature_columns, self.device)

sequence_embed_dict实际得到的是经过Embedding后输出的Tensor组成的dict,然后作get_varlen_polling_list的第一个参数传入;但是get_varlen_polling_list的第一个参数应该是一个放着Embedding Layer的dict...
运行examples/run_din.py不会报错,是因为这里的sequence_embed_dict 和 self.sparse_varlen_feature_columns直接就是一个空数组 :[]。
整体逻辑应该是有问题,在此提问。

@cuihu1998 cuihu1998 changed the title [DIN] 为什么DIN 102行的sequence_embed_dict是个空数组呢? [DIN] models/din.py 的第 94~97行是否有点问题 May 10, 2021
zanshuxun added a commit that referenced this issue May 12, 2021
@zanshuxun
Copy link
Collaborator

感谢您细致地排查问题,我们会在近期的版本中对该问题进行修复,祝好!

代码逻辑是有问题的,get_varlen_pooling_list()两次调用的入参类型不一致。

第一次调用:

_, dense_value_list = self.input_from_feature_columns(X, self.dnn_feature_columns, self.embedding_dict)

varlen_sparse_embedding_list = get_varlen_pooling_list(self.embedding_dict, X, self.feature_index,

self.input_from_feature_columns()里调用了get_varlen_pooling_list()。这时第一个输入embedding_dict的value为embed layer

第二次调用:

sequence_embed_dict = varlen_embedding_lookup(X, self.embedding_dict, self.feature_index,
self.sparse_varlen_feature_columns)
sequence_embed_list = get_varlen_pooling_list(sequence_embed_dict, X, self.feature_index,
self.sparse_varlen_feature_columns, self.device)

处理非hist feat类变长特征时,也会调用get_varlen_pooling_list()。这时第一个输入参数embedding_dict的value为tensor。而get_varlen_pooling_list()中做pooling之前会根据index取embedding,相当于取了两次index,所以当使用了非hist类VarLenSparseFeat时,就会报错。(但由于例子run_din.py中的VarLenSparseFeat均为hist类特征,所以该bug一直未被发现。)

@shenweichen shenweichen linked a pull request Jun 13, 2021 that will close this issue
@shenweichen shenweichen removed a link to a pull request Jun 13, 2021
shenweichen pushed a commit that referenced this issue Jun 13, 2021
* Fix bugs: #74, #171, #176, #179, #180

* fix Early-stopping Bugs

* update multi-head self-attention in AutoInt

* update dims in difm
@shenweichen shenweichen linked a pull request Jun 14, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants