You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I built a model based on de.keras.layers.BasicEmbedding() and everything worked fine in training phase.
But when serving the trained SavedModel with TFServing, I got two issues:
error msg: Input to reshape is a tensor with 60 values, but the requested shape has 228 ....
TFServing process cored dump occasionally
Resolving
After double checking my code and data without any progresses, I dived into the TFRA source code and found following code could bring a race condition:
That is , L250 updatesShadowVariable.ids (typed ResourceVariable) every time before the real lookup operation, which is not thread-safe in multi-thread scenario, so are all APIs depending on de.shadow_ops.embedding_lookup().
A similar issue of DynamiceEmbedding was fixed at #24 , and a quick fixing can be borrowed.
I have solved and verified this issue in my environment and am willing to contribute the fixing.
Thanks for feedback, @alionkun . A way to solve this is using sparse_variable.lookup(keys) instead of embedding_lookup(shadow, ids) in inference phase or exporting the inference model. Maybe it's possible to make it internal.
Describe the Problem
I built a model based on
de.keras.layers.BasicEmbedding()
and everything worked fine in training phase.But when serving the trained SavedModel with TFServing, I got two issues:
Resolving
After double checking my code and data without any progresses, I dived into the TFRA source code and found following code could bring a race condition:
recommenders-addons/tensorflow_recommenders_addons/dynamic_embedding/python/ops/shadow_embedding_ops.py
Lines 219 to 251 in 634b96d
That is , L250 updates
ShadowVariable.ids
(typedResourceVariable
) every time before the real lookup operation, which is not thread-safe in multi-thread scenario, so are all APIs depending onde.shadow_ops.embedding_lookup()
.A similar issue of
DynamiceEmbedding
was fixed at #24 , and a quick fixing can be borrowed.I have solved and verified this issue in my environment and am willing to contribute the fixing.
@Lifann @rhdong Can you please take a look at this.
The text was updated successfully, but these errors were encountered: