Detector training not working #58

csmcallister · 2020-04-03T18:11:57Z

The example of fine-tuning the detector in the docs isn't working with the 0.8.0 release, although other examples, like this one, are working.

Downgrading to 0.6.3 got the example working again (intermediate versions, e.g. 0.7.x, were also failing with the same error, which is detailed below).

To reproduce, create an empty python 3.7.4 conda environment with the following installs on Windows 10:

conda install -c anaconda tensorflow-gpu
pip install keras-ocr
pip install scikit-learn
conda install -c conda-forge shapely

I then copy-pasted that fine-tuning example into train.py and got the following when running it:

python train.py
2020-04-03 13:30:07.403362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Looking for .\icdar2013\Challenge2_Training_Task12_Images.zip
Downloading .\icdar2013\Challenge2_Training_Task12_Images.zip
Looking for .\icdar2013\Challenge2_Training_Task2_GT.zip
Downloading .\icdar2013\Challenge2_Training_Task2_GT.zip
Looking for C:\Users\scottmcallister\.keras-ocr\craft_mlt_25k.h5

...
...<LOTS OF TENSORFLOW GPU MESSAGES>
...

WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to
  ['...']
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to
  ['...']
Train for 183 steps, validate for 46 steps
Epoch 1/1000
  1/183 [..............................] - ETA: 1:35WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are:
  1/183 [..............................] - ETA: 2:20Traceback (most recent call last):
  File "train.py", line 67, in <module>
    validation_steps=math.ceil(len(validation) / batch_size)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1306, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 342, in fit
    total_epochs=epochs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 128, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 98, in execution_function
    distributed_function(input_fn))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 615, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 497, in _initialize
    *args, **kwds))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\function.py", line 2389, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\function.py", line 2703, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\function.py", line 2593, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\framework\func_graph.py", line 978, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 439, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 85, in distributed_function
    per_replica_function, args=args)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 763, in experimental_run_v2
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 1819, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 2164, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\autograph\impl\api.py", line 292, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 433, in train_on_batch
    output_loss_metrics=model._output_loss_metrics)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 312, in train_on_batch
    output_loss_metrics=output_loss_metrics))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 253, in _process_single_batch
    training=training))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 171, in _model_loss
    reduction=losses_utils.ReductionV2.NONE)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\utils\losses_utils.py", line 107, in compute_weighted_loss
    losses, sample_weight)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\ops\losses\util.py", line 148, in scale_losses_by_sample_weight
    sample_weight = weights_broadcast_ops.broadcast_weights(sample_weight, losses)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\ops\weights_broadcast_ops.py", line 167, in broadcast_weights
    with ops.control_dependencies((assert_broadcastable(weights, values),)):
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\ops\weights_broadcast_ops.py", line 103, in assert_broadcastable
    weights_rank_static, values.shape, weights.shape))
ValueError: weights can not be broadcast to values. values.rank=3. weights.rank=1. values.shape=(None, None, None). weights.shape=(None,).

The source of the error can probably be uncovered here, likely within detection.py. I'd try to uncover myself, but your familiarity with the source might be more expeditious.

The text was updated successfully, but these errors were encountered:

faustomorales · 2020-04-04T16:03:38Z

Starting in keras-ocr 0.7.x, we started including sample weights with detection batch generators. This exception is related to a problem with how TensorFlow 2.x mishandles sample weights (see tensorflow/tensorflow#30983). You can reproduce the error with the following minimal example.

import tensorflow as tf
import numpy as np

inputs = tf.keras.layers.Input((None, None, 3))
outputs = tf.keras.layers.Conv2D(kernel_size=2, strides=2, filters=2)(inputs)
model = tf.keras.models.Model(inputs=inputs, outputs=outputs)

model.compile(loss='mse', optimizer='Adam')
x=np.ones((1, 320, 320, 3))
y=np.ones((1, 160, 160, 2))
sample_weight = np.ones((1, ))

# This raises ValueError: weights can not be broadcast to values.
model.fit(x=x, y=y,  sample_weight=sample_weight)

# This works.
model.fit(x=x, y=y)

The good news is that if you install the most recent release candidate for TF 2.2, this seems to work properly. So if you install tensorflow==2.2.0rc2 it should work as expected.

Let me know if this resolves the issue for you. Thanks!

csmcallister · 2020-04-07T16:23:44Z

Issue resolved wit pip install tensorflow==2.2.0rc2. Thanks for the help!

nightfuryyy · 2020-04-16T15:05:46Z

@faustomorales. i installed 2.2.0rc2 but i dont have gpu version, so i can't train on gpu. How can i fix it ? tks so much

csmcallister · 2020-04-24T21:51:59Z

@nightfuryyy If you've done pip install tensorflow==2.2.0rc2, you'll need to follow tensorflow's instructions for configuring GPU access. I was on Windows 10 and followed their instructions and got it working fine.

nightfuryyy · 2020-04-25T13:50:35Z

i on linux, it doesn't work. T fixed that by down version to 0.6.3.

NeighborhoodCoding · 2020-08-18T07:04:57Z

Really thanks... I have similar error.... I'will upgrade TF.

csmcallister closed this as completed Apr 7, 2020

GrigoriiTarasov mentioned this issue Apr 10, 2023

"Tried to convert 'num' to a tensor and failed. Error: None values not supported." #232

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detector training not working #58

Detector training not working #58

csmcallister commented Apr 3, 2020

faustomorales commented Apr 4, 2020

csmcallister commented Apr 7, 2020

nightfuryyy commented Apr 16, 2020 •

edited

Loading

csmcallister commented Apr 24, 2020

nightfuryyy commented Apr 25, 2020

NeighborhoodCoding commented Aug 18, 2020

Detector training not working #58

Detector training not working #58

Comments

csmcallister commented Apr 3, 2020

faustomorales commented Apr 4, 2020

csmcallister commented Apr 7, 2020

nightfuryyy commented Apr 16, 2020 • edited Loading

csmcallister commented Apr 24, 2020

nightfuryyy commented Apr 25, 2020

NeighborhoodCoding commented Aug 18, 2020

nightfuryyy commented Apr 16, 2020 •

edited

Loading