Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix]fix synchronous trainging error for redis backend #360

Merged
merged 1 commit into from
Sep 7, 2023

Conversation

fuhailin
Copy link
Contributor

@fuhailin fuhailin commented Sep 6, 2023

Description

Brief Description of the PR:

When synchronous training, multiple processes write the same file will cause the following JSON decoding error, so I think there is no need to write the params back to the file.

Fixes # (issue)

[1,1]< stderr >:  File "/usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/dynamic_embedding/python/keras/layers/embedding.py", line 179, in __init__
[1,1]< stderr >:    self.params = de.get_variable(parameter_name,
[1,1]< stderr >:  File "/usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_variable.py", line 1237, in get_variable
[1,1]< stderr >:    var_ = Variable(
[1,1]< stderr >:  File "/usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_variable.py", line 618, in __init__
[1,1]< stderr >:    mht = self.kv_creator.create(
[1,1]< stderr >:  File "/usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_creator.py", line 228, in create
[1,1]< stderr >:    return de.RedisTable(key_dtype=self.key_dtype,
[1,1]< stderr >:  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/resource.py", line 104, in __call__
[1,1]< stderr >:    return previous_getter(*args, **kwargs)
[1,1]< stderr >:  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/resource.py", line 99, in < lambda >
[1,1]< stderr >:    previous_getter = lambda *a, **kw: default_resource_creator(None, *a, **kw)
[1,1]< stderr >:  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/resource.py", line 96, in default_resource_creator
[1,1]< stderr >:    obj.__init__(*a, **kw)
[1,1]< stderr >:  File "/usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/redis_table_ops.py", line 216, in __init__
[1,1]< stderr >:    params_load = json.load(f0)
[1,1]< stderr >:  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
[1,1]< stderr >:    return loads(fp.read(),
[1,1]< stderr >:  File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
[1,1]< stderr >:    return _default_decoder.decode(s)
[1,1]< stderr >:  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
[1,1]< stderr >:    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[1,1]< stderr >:  File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
[1,1]< stderr >:    raise JSONDecodeError("Expecting value", s, err.value) from None
[1,1]< stderr >:json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Type of change

  • Bug fix
  • New Tutorial
  • Updated or additional documentation
  • Additional Testing
  • New Feature

Checklist:

  • [x ] I've properly formatted my code according to the guidelines
    • [x ] By running yapf
    • [ x] By running clang-format
  • This PR addresses an already submitted issue for TensorFlow Recommenders-Addons
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

How Has This Been Tested?

If you're adding a bugfix or new feature please describe the tests that you ran to verify your changes:
*

@fuhailin fuhailin requested a review from rhdong as a code owner September 6, 2023 09:38
Copy link
Collaborator

@MoFHeka MoFHeka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fuhailin fuhailin changed the title fix synchronous trainging error for redis backend [fix]fix synchronous trainging error for redis backend Sep 6, 2023
@rhdong rhdong merged commit 363bdb4 into tensorflow:master Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants