From c3beebfe0bd60dbed8249ed0090b23b0130274df Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Tue, 8 Aug 2017 01:53:00 +0200 Subject: [PATCH] Fixing the reload_frequency to reload_interval and changed default (#1) The parameter "reload_frequency" has changed to reload_interval some time between 0.8 and 1.2.1 This change makes it a consistent now - there has been and inconsistency - in configuration files and run_tensorboard.sh it was still RELOAD_FREQUENCY, where in the example-app there was already (unused) RELOAD_INTERVAL enviroment variable. This commit fixes it and makes it RELOAD_INTERVAL everywhere, together with fixing the version of tensorboard in the Dockerfile (using latest in such dockerfile is a bad practice - in case of such incompatible changes in parameter values, it might simply silently stop working properly as it did this time). Also this commit changes the default value of the RELOAD_INTERVAL parameter. Due to the issue: https://github.com/tensorflow/tensorboard/issues/158 it seems that accessing GCS directly causes a lot of costs connected with high GCP API count usage, therefore if you have thousands of log files (which is not a lot) it is very easy to overcharge your GCP account with millions of requests every day just having tensorboard idling and checking for new data. In our case we got about 4 USD/day for around 3000 files which is quite incredible. --- Dockerfile | 2 +- README.md | 8 +++++++- example-app.yaml | 2 +- run_tensorboard.sh | 2 +- 4 files changed, 10 insertions(+), 4 deletions(-) diff --git a/Dockerfile b/Dockerfile index 121f1e2..6b361af 100644 --- a/Dockerfile +++ b/Dockerfile @@ -11,7 +11,7 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -FROM gcr.io/tensorflow/tensorflow:latest +FROM gcr.io/tensorflow/tensorflow:1.2.1 # Install gsutil WORKDIR /var/gcloud diff --git a/README.md b/README.md index dc4c69b..d1f2080 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,13 @@ Setup assumes you have [installed the Google Cloud SDK](https://cloud.google.com 3. Write an `app.yaml` file with `runtime: custom` to configure your application. Full reference [here](https://cloud.google.com/appengine/docs/flexible/custom-runtimes/configuring-your-app-with-app-yaml). In the `env_variables` section you must specify 3 environment variables: * `GCS_PROXY_CONFIG` The fully qualified path to the configuration file you uploaded in step 2. * `EVENT_FILE_PATH` The fully qualified GCS Path for your Tensorboard summary files. - * `RELOAD_FREQUENCY` Frequency (in seconds) to poll GCS for new summary files. + * `RELOAD_INTERVAL` Frequency (in seconds) to poll GCS for new summary files. + +Note that reloading data directly from GCS might cause a lot of API requests and incur high costs, so do not use the strategy +of directly reading from GCS if you use it for more than a few hundreds of log files. This is due to inefficient way GCS direct +access is working currently as described in https://github.com/tensorflow/tensorboard/issues/158 + +For more frequent reloads/bigger number of log files, syncing the GCS data to local folder using gsutil rsync and loading the data from local folder seems to be a better strategy. 4. Deploy your GAE app. ``` diff --git a/example-app.yaml b/example-app.yaml index dee1b51..bdaafed 100644 --- a/example-app.yaml +++ b/example-app.yaml @@ -8,5 +8,5 @@ automatic_scaling: max_num_instances: 1 env_variables: EVENT_FILE_PATH: gs://ml-workshop/output/mnist_hptuning10 - RELOAD_INTERVAL: 1 + RELOAD_INTERVAL: 600 GCS_PROXY_CONFIG: gs://ml-workshop/oauth2_proxy.cfg diff --git a/run_tensorboard.sh b/run_tensorboard.sh index d790301..a4541d3 100644 --- a/run_tensorboard.sh +++ b/run_tensorboard.sh @@ -14,4 +14,4 @@ gsutil cp $GCS_PROXY_CONFIG /etc/oauth2_proxy.cfg export OAUTH2_PROXY_COOKIE_SECRET=`python -c 'import os,base64; print base64.b64encode(os.urandom(16))'` oauth2_proxy -http-address="0.0.0.0:8080" -upstream="http://127.0.0.1:8081" -config=/etc/oauth2_proxy.cfg 2>&1 & -tensorboard --logdir $EVENT_FILE_PATH --reload_frequency $RELOAD_FREQUENCY --debug --port 8081 --host 127.0.0.1 +tensorboard --logdir $EVENT_FILE_PATH --reload_interval $RELOAD_INTERVAL --debug --port 8081 --host 127.0.0.1