NetworkWriter #2635

makslevental · 2019-09-11T16:03:38Z

I have experiments that run on various machines. I would like to centralize all tensorboard logs on one dashboard machine. In my imagination this should be possible by simple sending the protobufs over the wire rather than writing to disk.

stephanwlee · 2019-09-11T22:58:21Z

We did consider that but it leads to other complexities when network is flaky (should we fallback to local filesystem while offline? then how do we sync?) and when throughput is not enough (we don't want to slow down TensorFlow). We do actively think about this problem but it currently is not our priority.

+cc @nfelt.

nfelt · 2019-09-11T23:04:17Z

My recommendation would be using an existing general solution for exposing files across the network, e.g. you might consider things like rsync or sshfs mounts (for rsync you'll want the --inplace option per #349).

We've considered ways to make it easier to user TensorBoard with remote jobs, but in reality making a robust distributed summary writing system is not a particularly simple task, and in the short term it would be better to rely on tools that have already solved the distributed filesystem problem.

makslevental · 2019-09-12T01:09:32Z

@nfelt @stephanwlee sshfs occurred to me but it felt hacky. but thanks for the input.

gowthamkpr added the type:support label Sep 11, 2019

gowthamkpr self-assigned this Sep 11, 2019

makslevental closed this as completed Sep 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetworkWriter #2635

NetworkWriter #2635

makslevental commented Sep 11, 2019

stephanwlee commented Sep 11, 2019

nfelt commented Sep 11, 2019

makslevental commented Sep 12, 2019

NetworkWriter #2635

NetworkWriter #2635

Comments

makslevental commented Sep 11, 2019

stephanwlee commented Sep 11, 2019

nfelt commented Sep 11, 2019

makslevental commented Sep 12, 2019