You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we export the embedding table of a categorical feature to a dataframe (embedding_table_df()), dataset (embedding_table_dataset()) or parquet file (export_embedding_table()) the values in the exported dataframe/parquet do not match the values of the original feature embedding weighs.
Steps/Code to reproduce bug
Disable skipping the test test_embedding_features_exporting_and_loading_pretrained_initializer() in tests\tf\features\test_embeddings.py
That test will fail because EmbeddingFeatures.embedding_table_dataset() uses under the hood cudf.from_dlpack(), which is not working as expected we reported in this issue #10754 of cudf repo.
Expected behavior
The values of the exported embedding table should match the embedding table weights.
The text was updated successfully, but these errors were encountered:
This issue is blocked by issue #10754 of cudf repo.
gabrielspmoreira
changed the title
[BUG] Exporting embedding tables to parquet does not match the weights of the embedding tables
[BUG] Exporting embedding tables to parquet does not match the TF variable weights
Apr 28, 2022
@gabrielspmoreira we may be assuming cudf behaves in a specific way. I believe that order of the list isn't guaranteed unless you specifically order it. There may be a way to force that.
Sure @EvenOldridge. The cudf team mentioned in my issue that cudf.from_dlpack() expects tensor to be in column-major (Fortran order). The cudf team placed a PR based on that to raise an exception if tensor is not in column-major.
From I what investigated, TF doesn't support encoding tensors in Fortran order (like numpy and cupy do).
So I fixed the issue on our side in this PR by converting from TF to cuPy using DLPack and then creating the cudf dataframe from cuPy.
Bug description
When we export the embedding table of a categorical feature to a dataframe (
embedding_table_df()
), dataset (embedding_table_dataset()
) or parquet file (export_embedding_table()
) the values in the exported dataframe/parquet do not match the values of the original feature embedding weighs.Steps/Code to reproduce bug
test_embedding_features_exporting_and_loading_pretrained_initializer()
intests\tf\features\test_embeddings.py
EmbeddingFeatures.embedding_table_dataset()
uses under the hoodcudf.from_dlpack()
, which is not working as expected we reported in this issue #10754 ofcudf
repo.Expected behavior
The values of the exported embedding table should match the embedding table weights.
The text was updated successfully, but these errors were encountered: