Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetworkWriter #2635

Closed
makslevental opened this issue Sep 11, 2019 · 3 comments
Closed

NetworkWriter #2635

makslevental opened this issue Sep 11, 2019 · 3 comments
Assignees

Comments

@makslevental
Copy link

I have experiments that run on various machines. I would like to centralize all tensorboard logs on one dashboard machine. In my imagination this should be possible by simple sending the protobufs over the wire rather than writing to disk.

@gowthamkpr gowthamkpr self-assigned this Sep 11, 2019
@stephanwlee
Copy link
Contributor

We did consider that but it leads to other complexities when network is flaky (should we fallback to local filesystem while offline? then how do we sync?) and when throughput is not enough (we don't want to slow down TensorFlow). We do actively think about this problem but it currently is not our priority.

+cc @nfelt.

@nfelt
Copy link
Contributor

nfelt commented Sep 11, 2019

My recommendation would be using an existing general solution for exposing files across the network, e.g. you might consider things like rsync or sshfs mounts (for rsync you'll want the --inplace option per #349).

We've considered ways to make it easier to user TensorBoard with remote jobs, but in reality making a robust distributed summary writing system is not a particularly simple task, and in the short term it would be better to rely on tools that have already solved the distributed filesystem problem.

@makslevental
Copy link
Author

@nfelt @stephanwlee sshfs occurred to me but it felt hacky. but thanks for the input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants