Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.summary.text fails keeping summaries #10204

Closed
j-min opened this issue May 26, 2017 · 4 comments
Closed

tf.summary.text fails keeping summaries #10204

j-min opened this issue May 26, 2017 · 4 comments
Labels
comp:tensorboard Tensorboard related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower

Comments

@j-min
Copy link

j-min commented May 26, 2017

I got following issues when I use tf.summary.text and view the summaries on tensorboard.

  • It shows me text summaries in random order.
  • It randomly removes existing summaries and show me only a few (Is there a configuration for maximum number of summaries to keep?)
  • I can usually see only around 5 summaries on tensorboard even if I added summaries 100+ times.
  • Other summaries work properly when I use summaries like below.
summary_op = tf.summary.merge(summaries) # Other scalar, distribution, histogram summaries
valid_summary_op = tf.summary.merge([valid_sentence_summary]) # text summary with tf.summary.text

I can reproduce this problem in two different environments.

  1. Ubuntu 14.04 / CUDA 8.0 / Cudnn 5.1 / TF 1.1.0rc2 / Bazel 0.4.5 / GPU TITAN X Pascal (use 0 gpus~4gpus)
  2. Mac OSx Sierra / TF 1.1.0rc2 / Bazel 0.4.5 / No GPU

Below is sample code to reproduce this issue.

import tensorflow as tf

text_list = ['this is the first text', 'this is 2nd text', 'this is random text']
id2sent = {id:sent for id, sent in enumerate(text_list)}
sent2id = {sent:id for id, sent in id2sent.items()}

tf.reset_default_graph()    

outer_string = tf.convert_to_tensor('This is string outside inner scope.')
outer_summary = tf.summary.text('outside_summary', outer_string)

with tf.name_scope('validation_sentences') as scope:
    id_list = tf.placeholder(tf.int32, shape=[3], name='sent_ids')

    valid_placeholder = tf.placeholder(tf.string, name='valid_summaries')

    inner_summary = tf.summary.text('sent_summary', valid_placeholder)
    summaries = [outer_summary, inner_summary]
    summary_op = tf.summary.merge(summaries)
        
sess = tf.Session()
summary_writer = tf.summary.FileWriter(logdir='./text_summary', graph=sess.graph)

for step in range(10):

    predicted_sents_ids = sess.run(
        id_list,
        feed_dict={
            id_list: [0, 1, 2]
        })

    # list of string
    predicted_sents = [id2sent[id] for id in predicted_sents_ids]

    valid_summary = sess.run(summary_op, feed_dict={
        valid_placeholder: predicted_sents
    })

    summary_writer.add_summary(valid_summary, global_step=step)
    # summary_writer.flush()
# summary_writer.flush()
# flush() didn't help..

And below is the result on tensorboard.

image

@j-min j-min changed the title tf.summary.text losing summaries tf.summary.text fails keeping summaries May 26, 2017
@j-min
Copy link
Author

j-min commented May 26, 2017

I just found that

  • In CPU environment, the summaries are correctly stored when I add summary_writer.flush() after every summary_writer.add_summary()and run the code with python. (The errors were produced in jupyter notebook environment)
    Maybe we can update documentation about using Writer.flush()?

  • However, in multi GPU environment, even if I use summary_writer.flush(), the order of summaries are correct, but still not all the summaries are saved.

I was training image-captioning model.
During training, I used 4 replica graphs sharing variables and update variable synchronously with average gradient across all towers at each step.
Every 100 step, I feed validation data sample text from separate validation graph sharing variables with training graphs above without loss function and optimizer like above.
And I added the validation result summary right after running training ops.

Moreover, I found that summaries are overwritten by recent summaries. For example, the summaries added at step 1900 erased summaries added at step 1800.

screenshot_20170526-155055
screenshot_20170526-155313

The brief codes look as below.

for step in range(total_steps):
    ....
    sess.run([training_op, lost...])

    if step % 100:
        generated_valid_ids = sess.run(validation_graph_inference_result, feed_dict={validation_input_placeholder: validation_data}
        valid_sent = id2sent(genrated_valid_ids)
        valid_summary = sess.run(valid_summary_op, feed_dict={valid_summary_placeholder: valid_sent}
        summary_writer.add_summary(valid_summary, global_step=step)
        summary_writer.flush()

@asimshankar
Copy link
Contributor

@dandelionmane @jart : Mind taking a look?

@asimshankar asimshankar added stat:awaiting tensorflower Status - Awaiting response from tensorflower comp:tensorboard Tensorboard related issues labels May 30, 2017
@thinxer
Copy link
Contributor

thinxer commented Jun 1, 2017

On the 'summaries are overwritten' part: TensorBoard downsamples the summaries so that it can loaded into the memory.

@chihuahua
Copy link
Member

chihuahua commented Jun 16, 2017

I have migrated this issue to tensorflow/tensorboard#83 because TensorBoard has moved into a separate repository. Lets continue discussion there. The logic for writing text summaries will change in v1.3. The previous means had used a construct called the plugin assets manager, which relied on a certain plugins directory existing within each run to identify text summaries.

copybara-service bot pushed a commit that referenced this issue Mar 4, 2024
…Buffer on ROCm.

Imported from GitHub PR openxla/xla#10204

Copybara import of the project:

--
1afdbfb6763cd9c75d5cb97e51beafea3761132f by Zoran Jovanovic <[email protected]>:

Fixed issues with return value from AllocateStridedBuffer on ROCm.

Merging this change closes #10204

PiperOrigin-RevId: 612572013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:tensorboard Tensorboard related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower
Projects
None yet
Development

No branches or pull requests

4 participants