Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detector training not working #58

Closed
csmcallister opened this issue Apr 3, 2020 · 6 comments
Closed

Detector training not working #58

csmcallister opened this issue Apr 3, 2020 · 6 comments

Comments

@csmcallister
Copy link

The example of fine-tuning the detector in the docs isn't working with the 0.8.0 release, although other examples, like this one, are working.

Downgrading to 0.6.3 got the example working again (intermediate versions, e.g. 0.7.x, were also failing with the same error, which is detailed below).

To reproduce, create an empty python 3.7.4 conda environment with the following installs on Windows 10:

conda install -c anaconda tensorflow-gpu
pip install keras-ocr
pip install scikit-learn
conda install -c conda-forge shapely

I then copy-pasted that fine-tuning example into train.py and got the following when running it:

python train.py
2020-04-03 13:30:07.403362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Looking for .\icdar2013\Challenge2_Training_Task12_Images.zip
Downloading .\icdar2013\Challenge2_Training_Task12_Images.zip
Looking for .\icdar2013\Challenge2_Training_Task2_GT.zip
Downloading .\icdar2013\Challenge2_Training_Task2_GT.zip
Looking for C:\Users\scottmcallister\.keras-ocr\craft_mlt_25k.h5

...
...<LOTS OF TENSORFLOW GPU MESSAGES>
...

WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to
  ['...']
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to
  ['...']
Train for 183 steps, validate for 46 steps
Epoch 1/1000
  1/183 [..............................] - ETA: 1:35WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are:
  1/183 [..............................] - ETA: 2:20Traceback (most recent call last):
  File "train.py", line 67, in <module>
    validation_steps=math.ceil(len(validation) / batch_size)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1306, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 342, in fit
    total_epochs=epochs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 128, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 98, in execution_function
    distributed_function(input_fn))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 615, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 497, in _initialize
    *args, **kwds))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\function.py", line 2389, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\function.py", line 2703, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\function.py", line 2593, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\framework\func_graph.py", line 978, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 439, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 85, in distributed_function
    per_replica_function, args=args)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 763, in experimental_run_v2
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 1819, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 2164, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\autograph\impl\api.py", line 292, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 433, in train_on_batch
    output_loss_metrics=model._output_loss_metrics)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 312, in train_on_batch
    output_loss_metrics=output_loss_metrics))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 253, in _process_single_batch
    training=training))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 171, in _model_loss
    reduction=losses_utils.ReductionV2.NONE)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\utils\losses_utils.py", line 107, in compute_weighted_loss
    losses, sample_weight)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\ops\losses\util.py", line 148, in scale_losses_by_sample_weight
    sample_weight = weights_broadcast_ops.broadcast_weights(sample_weight, losses)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\ops\weights_broadcast_ops.py", line 167, in broadcast_weights
    with ops.control_dependencies((assert_broadcastable(weights, values),)):
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\ops\weights_broadcast_ops.py", line 103, in assert_broadcastable
    weights_rank_static, values.shape, weights.shape))
ValueError: weights can not be broadcast to values. values.rank=3. weights.rank=1. values.shape=(None, None, None). weights.shape=(None,).

The source of the error can probably be uncovered here, likely within detection.py. I'd try to uncover myself, but your familiarity with the source might be more expeditious.

@faustomorales
Copy link
Owner

Starting in keras-ocr 0.7.x, we started including sample weights with detection batch generators. This exception is related to a problem with how TensorFlow 2.x mishandles sample weights (see tensorflow/tensorflow#30983). You can reproduce the error with the following minimal example.

import tensorflow as tf
import numpy as np

inputs = tf.keras.layers.Input((None, None, 3))
outputs = tf.keras.layers.Conv2D(kernel_size=2, strides=2, filters=2)(inputs)
model = tf.keras.models.Model(inputs=inputs, outputs=outputs)

model.compile(loss='mse', optimizer='Adam')
x=np.ones((1, 320, 320, 3))
y=np.ones((1, 160, 160, 2))
sample_weight = np.ones((1, ))

# This raises ValueError: weights can not be broadcast to values.
model.fit(x=x, y=y,  sample_weight=sample_weight)

# This works.
model.fit(x=x, y=y)

The good news is that if you install the most recent release candidate for TF 2.2, this seems to work properly. So if you install tensorflow==2.2.0rc2 it should work as expected.

Let me know if this resolves the issue for you. Thanks!

@csmcallister
Copy link
Author

Issue resolved wit pip install tensorflow==2.2.0rc2. Thanks for the help!

@nightfuryyy
Copy link

nightfuryyy commented Apr 16, 2020

@faustomorales. i installed 2.2.0rc2 but i dont have gpu version, so i can't train on gpu. How can i fix it ? tks so much

@csmcallister
Copy link
Author

@nightfuryyy If you've done pip install tensorflow==2.2.0rc2, you'll need to follow tensorflow's instructions for configuring GPU access. I was on Windows 10 and followed their instructions and got it working fine.

@nightfuryyy
Copy link

i on linux, it doesn't work. T fixed that by down version to 0.6.3.

@NeighborhoodCoding
Copy link

Really thanks... I have similar error.... I'will upgrade TF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants