Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Save and load cuckoo hashtable from hdfs #257

Merged
merged 1 commit into from
Jul 18, 2022

Conversation

luliyucoordinate
Copy link
Contributor

Description

Because of the storage mechanism of tensorflow, when the embedding table is large, the worker will consume a lot of memory, and eventually OOM.
This PR provides some methods to save or load files from hdfs for dynamic embedding tables.

Type of change

  • Bug fix
  • New Tutorial
  • Updated or additional documentation
  • Additional Testing
  • New Feature

Checklist:

  • I've properly formatted my code according to the guidelines
    • By running yapf
    • By running clang-format
  • This PR addresses an already submitted issue for TensorFlow Recommenders-Addons
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

How Has This Been Tested?

In the test file, you need to replace the filepath with the real hdfs path. In addition, you need to add the header file third_party/hadoop/hdfs.h of hadoop in tensorflow.

@luliyucoordinate luliyucoordinate requested a review from rhdong as a code owner June 29, 2022 12:24
@rhdong rhdong requested review from bashimao and Lifann and removed request for bashimao July 1, 2022 01:59
@luliyucoordinate luliyucoordinate force-pushed the export_to_hdfs branch 2 times, most recently from 4c9a513 to cdd9ddf Compare July 8, 2022 10:58
@rhdong rhdong requested review from thorneliu and removed request for Lifann July 11, 2022 12:43
@thorneliu thorneliu self-requested a review July 14, 2022 04:28
@thorneliu thorneliu self-requested a review July 15, 2022 01:32
thorneliu
thorneliu previously approved these changes Jul 15, 2022
@rhdong
Copy link
Member

rhdong commented Jul 18, 2022

Hi @luliyucoordinate, please squash the commits into one, I believe all of them belong to one feature. After that I will merge it, Thank you!

code format and fix bracket issue

[fix] Fixed bug which cause by hiredis updating failed to compile.

add hadoop

fix undefined symbol and skip test hdfs

fix test error

fix variable-sized object may not be initialized

fix build for tf version >= 2.7.0

save to tmp file

convert values to const

load_from_hdfs adopts zero-copy method
Copy link
Member

@rhdong rhdong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhdong rhdong merged commit 64631be into tensorflow:master Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants