Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting class_weight in model.fit() with tf.data.Dataset causes error #47032

Closed
tensortorch opened this issue Feb 9, 2021 · 19 comments
Closed
Assignees
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.4 for issues related to TF 2.4 type:bug Bug

Comments

@tensortorch
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 / Windows 10
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.3.0 / 2.4.1
  • Python version: 3.6 / 3.7
  • CUDA/cuDNN version: 10.1 / none
  • GPU model and memory: RTX2080 / none

Describe the current behavior
When a tf.data.Dataset is used in model.fit(), setting class_weight causes an error.

Describe the expected behavior
No error occurs.

Standalone code to reproduce the issue

from tensorflow import keras
import tensorflow as tf
import numpy as np


def get_model():
    inputs = keras.layers.Input(shape=(10, 10, 3))
    x = keras.layers.Flatten()(inputs)
    outputs = keras.layers.Dense(5)(x)
    model = keras.Model(inputs, outputs)
    model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=0.001))
    return model


def map_fun(_):
    dummy_image = np.zeros((10, 10, 3))  
    dummy_label = np.array([0, 0, 1, 0, 0]) 
    return dummy_image, dummy_label


if __name__ == '__main__':
    # dummy dataset
    dataset = tf.data.Dataset.from_tensor_slices([1, 2])  # values are ignored, dummy data generated in map()
    dataset = dataset.map(map_func=lambda x: tf.py_function(map_fun, [x], [tf.uint8, tf.uint8])).batch(2)

    # dummy model
    model = get_model()

    # call fit() without class weights - ok
    model.fit(dataset, epochs=1)

    # define class weights
    class_weight = {idx: weight for (idx, weight) in enumerate([1., 1., 1., 1., 1.])}

    # transform dataset to iterator, call fit() with class weights - ok
    model.fit(dataset.as_numpy_iterator(), class_weight=class_weight, epochs=1)

    # call fit() with class weights on tf.data.Dataset - error
    model.fit(dataset, class_weight=class_weight, epochs=1)

Error message

Traceback (most recent call last):
  File "/data/sandbox/reproduce.py", line 39, in <module>
    model.fit(dataset, class_weight=class_weight, epochs=1)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1063, in fit
    steps_per_execution=self._steps_per_execution)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 1122, in __init__
    dataset = dataset.map(_make_class_weight_map_fn(class_weight))
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1695, in map
    return MapDataset(self, map_func, preserve_cardinality=True)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4045, in __init__
    use_legacy_function=use_legacy_function)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3371, in __init__
    self._function = wrapper_fn.get_concrete_function()
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2939, in get_concrete_function
    *args, **kwargs)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2906, in _get_concrete_function_garbage_collected
    graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3364, in wrapper_fn
    ret = _wrapper_helper(*args)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3299, in _wrapper_helper
    ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in wrapper
    return converted_call(f, args, kwargs, options=options)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 532, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 339, in _call_unconverted
    return f(*args, **kwargs)
  File "/data/sandbox/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 1314, in _class_weights_map_fn
    if y.shape.rank > 2:
TypeError: '>' not supported between instances of 'NoneType' and 'int'

Process finished with exit code 1
@amahendrakar
Copy link
Contributor

@tensortorch,
Please take a look at this comment from similar issue and check if it helps. Thanks!

@amahendrakar amahendrakar added comp:keras Keras related issues stat:awaiting response Status - Awaiting response from author TF 2.4 for issues related to TF 2.4 labels Feb 9, 2021
@tensortorch
Copy link
Author

tensortorch commented Feb 10, 2021

@amahendrakar,
thank you for your suggestion!

It does help since I can make my dataset return a sample weight as a third value, which is what model.fit() does anyway under the hood when class_weight is provided. So one can work around this issue using sample weights instead.

However, I believe providing class_weight in model.fit() should still work. The linked comment explains that it is not expected to work for 3+ dimensional targets, but this is not the case here. In fact, the error occurs within the check for target dimensionality, and is caused by the target rank being None for some reason:

    if y.shape.rank > 2:   # <== this is where the error occurs, because y.shape.rank is None
      raise ValueError("`class_weight` not supported for "
                       "3+ dimensional targets.")

Also, it does work for for the same inputs when the dataset is converted to an iterator, as my minimal example shows. The error only occurs for a tf.data.Dataset as input value. Here, the DataHandler attempts to add the third output to the dataset by calling map() to convert class_weight to sample_weight:

    if class_weight:
      dataset = dataset.map(_make_class_weight_map_fn(class_weight))

which fails due to the aforementioned error.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Feb 12, 2021
@tensortorch
Copy link
Author

The workaround for a somewhat related problem also works in this case: an additional call to map() to manually set the tensor shape makes it work.

I previously tried manually converting the outputs of the py_function to tensors and also manually setting their shapes, but it did not work, so the key here is the second call to map() to set the shapes after batch().

@amahendrakar
Copy link
Contributor

@tensortorch,
Thank you for the update. Is this still an issue?

@amahendrakar amahendrakar added the stat:awaiting response Status - Awaiting response from author label Feb 22, 2021
@tensortorch
Copy link
Author

@amahendrakar
I still think this is an issue. I believe it should not be necessary to call map() a second time - I would expect this to work without the extra steps. The issue might not be with the class weights functionality itself though - maybe rather with the combination of map/py_function.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Feb 27, 2021
@amahendrakar
Copy link
Contributor

@jvishnuvardhan,
I was able to reproduce the issue with TF v2.3, TF v2.4 and TF-nightly. Please find the gist of it here. Thanks!

@yuriy-vorontsov
Copy link

Same error, checked on python 3.8 and TF: v2.2.0, 2.3.0, 2.4.0, 2.4.1
Found a quick solution with custom loss function:

def weighted_categorical_crossentropy( weights ):
    # weights = [ 0.9, 0.05, 0.04, 0.01 ]
    def wcce( y_true, y_pred ):
        tf_weights = tf.constant( weights )
        if not tf.is_tensor( y_pred ):
            y_pred = tf.constant( y_pred )

        y_true = tf.cast( y_true, y_pred.dtype )
        return tf.keras.losses.categorical_crossentropy( y_true, y_pred ) * tf.experimental.numpy.sum( y_true * tf_weights, axis = -1 )
    return wcce

...
config['loss'] = weighted_categorical_crossentropy( config['classWeight'] )
model.compile(
    loss = config['loss'],
    optimizer = config['optimizer'],
    metrics = ['accuracy'],
    run_eagerly = True
)

@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 10, 2021
@kretes
Copy link

kretes commented May 21, 2021

This still fails in tf 2.5.0. On the go I've created another reproduction script: https://gist.github.com/kretes/ca911085b2eb0fa3985894245ce3fd0c
Setting shape works but introduce burden on user's code.

I suggest changing the name of the issue to 'Setting class_weight in model.fit() with tf.data.Dataset using py_function causes error'. as this is the step that makes for unknown shape

@sumanttyagi
Copy link

Same error, checked on python 3.8 and TF: v2.2.0, 2.3.0, 2.4.0, 2.4.1
Found a quick solution with custom loss function:

def weighted_categorical_crossentropy( weights ):
    # weights = [ 0.9, 0.05, 0.04, 0.01 ]
    def wcce( y_true, y_pred ):
        tf_weights = tf.constant( weights )
        if not tf.is_tensor( y_pred ):
            y_pred = tf.constant( y_pred )

        y_true = tf.cast( y_true, y_pred.dtype )
        return tf.keras.losses.categorical_crossentropy( y_true, y_pred ) * tf.experimental.numpy.sum( y_true * tf_weights, axis = -1 )
    return wcce

...
config['loss'] = weighted_categorical_crossentropy( config['classWeight'] )
model.compile(
    loss = config['loss'],
    optimizer = config['optimizer'],
    metrics = ['accuracy'],
    run_eagerly = True
)

this still shows error , please help
tf.keras.losses.categorical_crossentropy( y_true, y_pred ) expects 0 arguments got 2

@sachinprasadhs
Copy link
Contributor

When I do the type(dataset) in your code, it returns tensorflow.python.data.ops.dataset_ops.BatchDataset, for BatchDataset, you don't have to specify y separately, you simply have to feed dataset is a tuple object containing both (x,y).
Check the document here which states,

Target data. Like the input data x, it could be either Numpy array(s) or TensorFlow tensor(s). It should be consistent with x (you cannot have Numpy inputs and tensor targets, or inversely). If x is a dataset, generator, or keras.utils.Sequence instance, y should not be specified (since targets will be obtained from x).

For validation data, you can explicitly mention fit with validation data like below.
model.fit(train_dataset, validation_data=val_dataset, batch_size=32, epochs=100)

@sachinprasadhs sachinprasadhs added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Jun 10, 2022
@kretes
Copy link

kretes commented Jun 13, 2022

@sachinprasadhs you are referring to specifying y separately, but I can't see that in the code in the issue. Same is in the gist I prepared previously: https://gist.github.com/kretes/ca911085b2eb0fa3985894245ce3fd0c where the dataset yields a tuple of (x,y), and so is inline with documentation.

Can you try running the gist e.g. in collab to see if it fails on your side, and modify accordingly for it to pass?

@sachinprasadhs
Copy link
Contributor

@kretes , train_dataset in model.fit() is a BatchDataset, in this case y should not be specified (since targets will be obtained from train_dataset).

If you still face the issue, could you please open the issue in keras/team-keras repo. Thanks!

@kretes
Copy link

kretes commented Jun 15, 2022

Ok, I will move to keras repo. Just that I am not passing y separately - it is part of the BatchDataset - as you say.

@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 22, 2022
@tensortorch
Copy link
Author

@sachinprasadhs as @kretes already mentioned, the fit method in the example is following the documentation and not using a separate y argument. You can also see from the example that the fit call works when the same dataset is transformed to an iterator.

@google-ml-butler google-ml-butler bot removed stat:awaiting response Status - Awaiting response from author stale This label marks the issue/pr stale - to be closed automatically if no activity labels Jun 23, 2022
@sachinprasadhs
Copy link
Contributor

Development of keras moved to separate repository https://github.com/keras-team/keras/issues

Please post this issue on keras-team/keras repo.
To know more see;
https://discuss.tensorflow.org/t/keras-project-moved-to-new-repository-in-https-github.aaakk.us.kg-keras-team-keras/1999
Thank you!

@sachinprasadhs sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Jun 23, 2022
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 30, 2022
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.4 for issues related to TF 2.4 type:bug Bug
Projects
None yet
Development

No branches or pull requests

8 participants