Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.61 trainer class_weight issue #74

Closed
filefolder opened this issue Jun 22, 2021 · 9 comments
Closed

1.61 trainer class_weight issue #74

filefolder opened this issue Jun 22, 2021 · 9 comments

Comments

@filefolder
Copy link
Contributor

filefolder commented Jun 22, 2021

Hi,

Having trouble with 1.62 / tf2.5 (compiled from source / no gpu), generator mode, python3.6.

It seems this may be one or possibly two separate issues. My limited understanding is that class_weights may be depreciated for tf 2.5, or at least they are used differently. The other concern is that the "attention" expansions (D0, D, P, S) seem to have output shapes of (None, None,*) which seems wrong and later affect the dimension of further decoding layers as well as the final output.

Learning rate:  0.001
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input (InputLayer)              [(None, 6000, 3)]    0
__________________________________________________________________________________________________
conv1d (Conv1D)                 (None, 6000, 8)      272         input[0][0]
__________________________________________________________________________________________________
max_pooling1d (MaxPooling1D)    (None, 3000, 8)      0           conv1d[0][0]
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 3000, 16)     1168        max_pooling1d[0][0]
__________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)  (None, 1500, 16)     0           conv1d_1[0][0]
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 1500, 16)     1808        max_pooling1d_1[0][0]
__________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D)  (None, 750, 16)      0           conv1d_2[0][0]
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, 750, 32)      3616        max_pooling1d_2[0][0]
__________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D)  (None, 375, 32)      0           conv1d_3[0][0]
__________________________________________________________________________________________________
conv1d_4 (Conv1D)               (None, 375, 32)      5152        max_pooling1d_3[0][0]
__________________________________________________________________________________________________
max_pooling1d_4 (MaxPooling1D)  (None, 188, 32)      0           conv1d_4[0][0]
__________________________________________________________________________________________________
conv1d_5 (Conv1D)               (None, 188, 64)      10304       max_pooling1d_4[0][0]
__________________________________________________________________________________________________
max_pooling1d_5 (MaxPooling1D)  (None, 94, 64)       0           conv1d_5[0][0]
__________________________________________________________________________________________________
conv1d_6 (Conv1D)               (None, 94, 64)       12352       max_pooling1d_5[0][0]
__________________________________________________________________________________________________
max_pooling1d_6 (MaxPooling1D)  (None, 47, 64)       0           conv1d_6[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 47, 64)       256         max_pooling1d_6[0][0]
__________________________________________________________________________________________________
activation (Activation)         (None, 47, 64)       0           batch_normalization[0][0]
__________________________________________________________________________________________________
spatial_dropout1d (SpatialDropo (None, 47, 64)       0           activation[0][0]
__________________________________________________________________________________________________
conv1d_7 (Conv1D)               (None, 47, 64)       12352       spatial_dropout1d[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 47, 64)       256         conv1d_7[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 47, 64)       0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_1 (SpatialDro (None, 47, 64)       0           activation_1[0][0]
__________________________________________________________________________________________________
conv1d_8 (Conv1D)               (None, 47, 64)       12352       spatial_dropout1d_1[0][0]
__________________________________________________________________________________________________
add (Add)                       (None, 47, 64)       0           max_pooling1d_6[0][0]
                                                                 conv1d_8[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 47, 64)       256         add[0][0]
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 47, 64)       0           batch_normalization_2[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_2 (SpatialDro (None, 47, 64)       0           activation_2[0][0]
__________________________________________________________________________________________________
conv1d_9 (Conv1D)               (None, 47, 64)       12352       spatial_dropout1d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 47, 64)       256         conv1d_9[0][0]
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 47, 64)       0           batch_normalization_3[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_3 (SpatialDro (None, 47, 64)       0           activation_3[0][0]
__________________________________________________________________________________________________
conv1d_10 (Conv1D)              (None, 47, 64)       12352       spatial_dropout1d_3[0][0]
__________________________________________________________________________________________________
add_1 (Add)                     (None, 47, 64)       0           add[0][0]
                                                                 conv1d_10[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 47, 64)       256         add_1[0][0]
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 47, 64)       0           batch_normalization_4[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_4 (SpatialDro (None, 47, 64)       0           activation_4[0][0]
__________________________________________________________________________________________________
conv1d_11 (Conv1D)              (None, 47, 64)       12352       spatial_dropout1d_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 47, 64)       256         conv1d_11[0][0]
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 47, 64)       0           batch_normalization_5[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_5 (SpatialDro (None, 47, 64)       0           activation_5[0][0]
__________________________________________________________________________________________________
conv1d_12 (Conv1D)              (None, 47, 64)       12352       spatial_dropout1d_5[0][0]
__________________________________________________________________________________________________
add_2 (Add)                     (None, 47, 64)       0           add_1[0][0]
                                                                 conv1d_12[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 47, 64)       256         add_2[0][0]
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 47, 64)       0           batch_normalization_6[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_6 (SpatialDro (None, 47, 64)       0           activation_6[0][0]
__________________________________________________________________________________________________
conv1d_13 (Conv1D)              (None, 47, 64)       12352       spatial_dropout1d_6[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 47, 64)       256         conv1d_13[0][0]
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 47, 64)       0           batch_normalization_7[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_7 (SpatialDro (None, 47, 64)       0           activation_7[0][0]
__________________________________________________________________________________________________
conv1d_14 (Conv1D)              (None, 47, 64)       12352       spatial_dropout1d_7[0][0]
__________________________________________________________________________________________________
add_3 (Add)                     (None, 47, 64)       0           add_2[0][0]
                                                                 conv1d_14[0][0]
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 47, 64)       256         add_3[0][0]
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 47, 64)       0           batch_normalization_8[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_8 (SpatialDro (None, 47, 64)       0           activation_8[0][0]
__________________________________________________________________________________________________
conv1d_15 (Conv1D)              (None, 47, 64)       8256        spatial_dropout1d_8[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 47, 64)       256         conv1d_15[0][0]
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 47, 64)       0           batch_normalization_9[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_9 (SpatialDro (None, 47, 64)       0           activation_9[0][0]
__________________________________________________________________________________________________
conv1d_16 (Conv1D)              (None, 47, 64)       8256        spatial_dropout1d_9[0][0]
__________________________________________________________________________________________________
add_4 (Add)                     (None, 47, 64)       0           add_3[0][0]
                                                                 conv1d_16[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 47, 64)       256         add_4[0][0]
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 47, 64)       0           batch_normalization_10[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_10 (SpatialDr (None, 47, 64)       0           activation_10[0][0]
__________________________________________________________________________________________________
conv1d_17 (Conv1D)              (None, 47, 64)       12352       spatial_dropout1d_10[0][0]
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 47, 64)       256         conv1d_17[0][0]
__________________________________________________________________________________________________
activation_11 (Activation)      (None, 47, 64)       0           batch_normalization_11[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_11 (SpatialDr (None, 47, 64)       0           activation_11[0][0]
__________________________________________________________________________________________________
conv1d_18 (Conv1D)              (None, 47, 64)       12352       spatial_dropout1d_11[0][0]
__________________________________________________________________________________________________
add_5 (Add)                     (None, 47, 64)       0           add_4[0][0]
                                                                 conv1d_18[0][0]
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 47, 64)       256         add_5[0][0]
__________________________________________________________________________________________________
activation_12 (Activation)      (None, 47, 64)       0           batch_normalization_12[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_12 (SpatialDr (None, 47, 64)       0           activation_12[0][0]
__________________________________________________________________________________________________
conv1d_19 (Conv1D)              (None, 47, 64)       8256        spatial_dropout1d_12[0][0]
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 47, 64)       256         conv1d_19[0][0]
__________________________________________________________________________________________________
activation_13 (Activation)      (None, 47, 64)       0           batch_normalization_13[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_13 (SpatialDr (None, 47, 64)       0           activation_13[0][0]
__________________________________________________________________________________________________
conv1d_20 (Conv1D)              (None, 47, 64)       8256        spatial_dropout1d_13[0][0]
__________________________________________________________________________________________________
add_6 (Add)                     (None, 47, 64)       0           add_5[0][0]
                                                                 conv1d_20[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional)   (None, 47, 32)       10368       add_6[0][0]
__________________________________________________________________________________________________
conv1d_21 (Conv1D)              (None, 47, 16)       528         bidirectional[0][0]
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 47, 16)       64          conv1d_21[0][0]
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 47, 32)       4224        batch_normalization_14[0][0]
__________________________________________________________________________________________________
conv1d_22 (Conv1D)              (None, 47, 16)       528         bidirectional_1[0][0]
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 47, 16)       64          conv1d_22[0][0]
__________________________________________________________________________________________________
attentionD0 (SeqSelfAttention)  [(None, None, 16), ( 1089        batch_normalization_15[0][0]
__________________________________________________________________________________________________
add_7 (Add)                     (None, 47, 16)       0           batch_normalization_15[0][0]
                                                                 attentionD0[0][0]
__________________________________________________________________________________________________
layer_normalization (LayerNorma (None, 47, 16)       32          add_7[0][0]
__________________________________________________________________________________________________
feed_forward (FeedForward)      (None, 47, 16)       4240        layer_normalization[0][0]
__________________________________________________________________________________________________
add_8 (Add)                     (None, 47, 16)       0           layer_normalization[0][0]
                                                                 feed_forward[0][0]
__________________________________________________________________________________________________
layer_normalization_1 (LayerNor (None, 47, 16)       32          add_8[0][0]
__________________________________________________________________________________________________
attentionD (SeqSelfAttention)   [(None, None, 16), ( 1089        layer_normalization_1[0][0]
__________________________________________________________________________________________________
add_9 (Add)                     (None, 47, 16)       0           layer_normalization_1[0][0]
                                                                 attentionD[0][0]
__________________________________________________________________________________________________
layer_normalization_2 (LayerNor (None, 47, 16)       32          add_9[0][0]
__________________________________________________________________________________________________
feed_forward_1 (FeedForward)    (None, 47, 16)       4240        layer_normalization_2[0][0]
__________________________________________________________________________________________________
add_10 (Add)                    (None, 47, 16)       0           layer_normalization_2[0][0]
                                                                 feed_forward_1[0][0]
__________________________________________________________________________________________________
layer_normalization_3 (LayerNor (None, 47, 16)       32          add_10[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM)                   (None, 47, 16)       2112        layer_normalization_3[0][0]
__________________________________________________________________________________________________
lstm_3 (LSTM)                   (None, 47, 16)       2112        layer_normalization_3[0][0]
__________________________________________________________________________________________________
attentionP (SeqSelfAttention)   [(None, None, 16), ( 1089        lstm_2[0][0]
__________________________________________________________________________________________________
attentionS (SeqSelfAttention)   [(None, None, 16), ( 1089        lstm_3[0][0]
__________________________________________________________________________________________________
up_sampling1d (UpSampling1D)    (None, 94, 16)       0           layer_normalization_3[0][0]
__________________________________________________________________________________________________
up_sampling1d_7 (UpSampling1D)  (None, None, 16)     0           attentionP[0][0]
__________________________________________________________________________________________________
up_sampling1d_14 (UpSampling1D) (None, None, 16)     0           attentionS[0][0]
__________________________________________________________________________________________________
conv1d_23 (Conv1D)              (None, 94, 64)       3136        up_sampling1d[0][0]
__________________________________________________________________________________________________
conv1d_30 (Conv1D)              (None, None, 64)     3136        up_sampling1d_7[0][0]
__________________________________________________________________________________________________
conv1d_37 (Conv1D)              (None, None, 64)     3136        up_sampling1d_14[0][0]
__________________________________________________________________________________________________
up_sampling1d_1 (UpSampling1D)  (None, 188, 64)      0           conv1d_23[0][0]
__________________________________________________________________________________________________
up_sampling1d_8 (UpSampling1D)  (None, None, 64)     0           conv1d_30[0][0]
__________________________________________________________________________________________________
up_sampling1d_15 (UpSampling1D) (None, None, 64)     0           conv1d_37[0][0]
__________________________________________________________________________________________________
conv1d_24 (Conv1D)              (None, 188, 64)      20544       up_sampling1d_1[0][0]
__________________________________________________________________________________________________
conv1d_31 (Conv1D)              (None, None, 64)     20544       up_sampling1d_8[0][0]
__________________________________________________________________________________________________
conv1d_38 (Conv1D)              (None, None, 64)     20544       up_sampling1d_15[0][0]
__________________________________________________________________________________________________
up_sampling1d_2 (UpSampling1D)  (None, 376, 64)      0           conv1d_24[0][0]
__________________________________________________________________________________________________
up_sampling1d_9 (UpSampling1D)  (None, None, 64)     0           conv1d_31[0][0]
__________________________________________________________________________________________________
up_sampling1d_16 (UpSampling1D) (None, None, 64)     0           conv1d_38[0][0]
__________________________________________________________________________________________________
conv1d_25 (Conv1D)              (None, 376, 32)      10272       up_sampling1d_2[0][0]
__________________________________________________________________________________________________
conv1d_32 (Conv1D)              (None, None, 32)     10272       up_sampling1d_9[0][0]
__________________________________________________________________________________________________
conv1d_39 (Conv1D)              (None, None, 32)     10272       up_sampling1d_16[0][0]
__________________________________________________________________________________________________
up_sampling1d_3 (UpSampling1D)  (None, 752, 32)      0           conv1d_25[0][0]
__________________________________________________________________________________________________
up_sampling1d_10 (UpSampling1D) (None, None, 32)     0           conv1d_32[0][0]
__________________________________________________________________________________________________
up_sampling1d_17 (UpSampling1D) (None, None, 32)     0           conv1d_39[0][0]
__________________________________________________________________________________________________
cropping1d (Cropping1D)         (None, 750, 32)      0           up_sampling1d_3[0][0]
__________________________________________________________________________________________________
cropping1d_1 (Cropping1D)       (None, None, 32)     0           up_sampling1d_10[0][0]
__________________________________________________________________________________________________
cropping1d_2 (Cropping1D)       (None, None, 32)     0           up_sampling1d_17[0][0]
__________________________________________________________________________________________________
conv1d_26 (Conv1D)              (None, 750, 32)      7200        cropping1d[0][0]
__________________________________________________________________________________________________
conv1d_33 (Conv1D)              (None, None, 32)     7200        cropping1d_1[0][0]
__________________________________________________________________________________________________
conv1d_40 (Conv1D)              (None, None, 32)     7200        cropping1d_2[0][0]
__________________________________________________________________________________________________
up_sampling1d_4 (UpSampling1D)  (None, 1500, 32)     0           conv1d_26[0][0]
__________________________________________________________________________________________________
up_sampling1d_11 (UpSampling1D) (None, None, 32)     0           conv1d_33[0][0]
__________________________________________________________________________________________________
up_sampling1d_18 (UpSampling1D) (None, None, 32)     0           conv1d_40[0][0]
__________________________________________________________________________________________________
conv1d_27 (Conv1D)              (None, 1500, 16)     3600        up_sampling1d_4[0][0]
__________________________________________________________________________________________________
conv1d_34 (Conv1D)              (None, None, 16)     3600        up_sampling1d_11[0][0]
__________________________________________________________________________________________________
conv1d_41 (Conv1D)              (None, None, 16)     3600        up_sampling1d_18[0][0]
__________________________________________________________________________________________________
up_sampling1d_5 (UpSampling1D)  (None, 3000, 16)     0           conv1d_27[0][0]
__________________________________________________________________________________________________
up_sampling1d_12 (UpSampling1D) (None, None, 16)     0           conv1d_34[0][0]
__________________________________________________________________________________________________
up_sampling1d_19 (UpSampling1D) (None, None, 16)     0           conv1d_41[0][0]
__________________________________________________________________________________________________
conv1d_28 (Conv1D)              (None, 3000, 16)     2320        up_sampling1d_5[0][0]
__________________________________________________________________________________________________
conv1d_35 (Conv1D)              (None, None, 16)     2320        up_sampling1d_12[0][0]
__________________________________________________________________________________________________
conv1d_42 (Conv1D)              (None, None, 16)     2320        up_sampling1d_19[0][0]
__________________________________________________________________________________________________
up_sampling1d_6 (UpSampling1D)  (None, 6000, 16)     0           conv1d_28[0][0]
__________________________________________________________________________________________________
up_sampling1d_13 (UpSampling1D) (None, None, 16)     0           conv1d_35[0][0]
__________________________________________________________________________________________________
up_sampling1d_20 (UpSampling1D) (None, None, 16)     0           conv1d_42[0][0]
__________________________________________________________________________________________________
conv1d_29 (Conv1D)              (None, 6000, 8)      1416        up_sampling1d_6[0][0]
__________________________________________________________________________________________________
conv1d_36 (Conv1D)              (None, None, 8)      1416        up_sampling1d_13[0][0]
__________________________________________________________________________________________________
conv1d_43 (Conv1D)              (None, None, 8)      1416        up_sampling1d_20[0][0]
__________________________________________________________________________________________________
detector (Conv1D)               (None, 6000, 1)      89          conv1d_29[0][0]
__________________________________________________________________________________________________
picker_P (Conv1D)               (None, None, 1)      89          conv1d_36[0][0]
__________________________________________________________________________________________________
picker_S (Conv1D)               (None, None, 1)      89          conv1d_43[0][0]
==================================================================================================
Total params: 373,495
Trainable params: 371,639
Non-trainable params: 1,856
__________________________________________________________________________________________________
Started training in generator mode ...
Traceback (most recent call last):
  File "./train_model.py", line 59, in <module>
    use_multiprocessing=True)
  File "/home/seisop/.local/lib/python3.6/site-packages/EQTransformer/core/trainer.py", line 352, in trainer
    history, model, start_training, end_training, save_dir, save_models, training_size, validation_size=train(args)
  File "/home/seisop/.local/lib/python3.6/site-packages/EQTransformer/core/trainer.py", line 321, in train
    class_weight={0: 0.11, 1: 0.89})
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1957, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1147, in fit
    steps_per_execution=self._steps_per_execution)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 1364, in get_data_handler
    return DataHandler(*args, **kwargs)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 1175, in __init__
    class_weight, distribute)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 1186, in _configure_dataset_and_inferred_steps
    dataset = dataset.map(_make_class_weight_map_fn(class_weight))
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1925, in map
    return MapDataset(self, map_func, preserve_cardinality=True)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4487, in __init__
    use_legacy_function=use_legacy_function)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3712, in __init__
    self._function = fn_factory()
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3135, in get_concrete_function
    *args, **kwargs)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3100, in _get_concrete_function_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3444, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3289, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 999, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3687, in wrapped_fn
    ret = wrapper_helper(*args)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3617, in wrapper_helper
    ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 692, in wrapper
    return converted_call(f, args, kwargs, options=options)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 382, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 463, in _call_unconverted
    return f(*args, **kwargs)
  File "/home/seisop/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 1402, in _class_weights_map_fn
    raise ValueError("`class_weight` not supported for "
ValueError: `class_weight` not supported for 3+ dimensional targets.

If I remove the class_weight call in fit_generator entirely, the code runs, models are written, but the output dimensions are still mostly None, None etc as above and the trainer seems to perform poorly, although I have not tested it fully.

Otherwise the picker/predictor seems to work but I have not tested it using new models created in 1.61.

A relevant discussion here: keras-team/keras#3653

And the solution seems to be here although I don't quite understand it fully: https://www.tensorflow.org/tutorials/images/segmentation#optional_imbalanced_classes_and_class_weights

@smousavi05
Copy link
Owner

@filefolder thanks for sharing this. This is another issue arises by new changes in Keras and TF 2.5. I have modified other parts of the code to accompany with the new changes the except the trainer module. I may need some time to modify this part that is why I haven't upload version 1.62 into pip and anaconda yet. Could you use the 1.59 version for now? except the use of TF 2 everything else is the same in that version of EqT.

@filefolder
Copy link
Contributor Author

No problem, 1.59 works fine, although I think we only have GPU set up for TF.2.5 so I am looking forward to trying it out.

@filefolder
Copy link
Contributor Author

Revisiting this briefly, it seems there are two issues.

The first is in SeqSelfAttention where layer dimensions are becoming corrupted, but ONLY when the attention type is 'additive' (multiplicative works). I think I can track it down to this segment in the _call_additive_emission() function

        # e_{t, t'} = W_a h_{t, t'} + b_a
        if self.use_attention_bias:
            e = K.reshape(K.dot(h, self.Wa) + self.ba, (batch_size, input_len, input_len)) 
        else:
            e = K.reshape(K.dot(h, self.Wa), (batch_size, input_len, input_len))
        return e

as e is returned as e: Tensor("attentionD0/Reshape_9:0", shape=(None, None, None), dtype=float32). There seems to be something lost in the translation here into tf2.5 but I am not well versed in the syntax since everything is a tf object rather than an integer.

The second issue is of course converting the class_weights to sample_weights, but I don't understand how they were defined in the first place. Originally the defaults are [.11,.89]. Are there supposed to be three class_weights, one for y1, y2, y3? Or are they a true/false penalty type of thing?

What I have sort of figured out is that you can define sample weights entirely within DataGenerator.getitem and simply return them in addition, e.g. return ({'input': X}, {'detector': y1, 'picker_P': y2, 'picker_S': y3}), sample_weights (& removing class weights from the model.fit_generator (now model.fit in tf2.5) call in trainer)

Something rudimentary like this seems to work but I've just sort of guessed as to what the proper translation is between class and sample_weight, plus I don't fully understand how the class_weights were defined in the first place. Here I assume they should correspond to [detector, p, s] and also sum to 1

    def __getitem__(self, index):
        'Generate one batch of data'
        if self.augmentation:
            indexes = self.indexes[index*self.batch_size//2:(index+1)*self.batch_size//2]
            indexes = np.append(indexes, indexes)
        else:
            indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        list_IDs_temp = [self.list_IDs[k] for k in indexes]

        X, y1, y2, y3 = self.__data_generation(list_IDs_temp)
        
        class_weights = [0.055, 0.445, 0.5]
        sample_weights = np.asarray([np.where(y==0,self.class_weights[0],
	                             np.where(y==1,self.class_weights[1],
	                             np.where(y==2,self.class_weights[2],y))) for y in [y1,y2,y3]])
        
        return ({'input': X}, {'detector': y1, 'picker_P': y2, 'picker_S': y3}) ,sample_weights)

Any comments appreciated

@smousavi05
Copy link
Owner

@filefolder I have modified SeqSelfAttention for TF2.5 that might cause this issue. To track the issue better it might be helpful to copy the original one from version 1.59.

The weight in the attention layers are totally different thing. They are attention weights.

The class weights as I explained earlier as well are defined empirically and they sum to 1.

@filefolder
Copy link
Contributor Author

quick update, the fix for the SelfSeqAttention is this

From

    def _call_additive_emission(self, inputs):
        input_shape = K.shape(inputs)
        batch_size, input_len = input_shape[0], input_shape[1]

to:

    def _call_additive_emission(self, inputs):
        input_shape = K.shape(inputs)
        batch_size = input_shape[0]
        input_len = inputs.get_shape().as_list()[1]

What was happening is that input_len was returning None, as for whatever reason K.shape in tf2.5 no longer returns list values you can just reference, need to use .as_list(). Still need to define batch_size the same way, however, if you want to keep the K.reshape syntax the same at the bottom of the same function.

Still working on best way to implement sample weights. My understanding though is that class_weights should not be purely empirical, they should be a ratio of the number of label==1 values to the total size of y1 (e.g. len(np.where(y1 ==1))/y1.size). USUALLY this is around .11 but not always, and potentially far less for y2 and y3. The sample_weights approach allows for this to be defined dynamically, per batch, so it will be interesting to see what affect that has.

@smousavi05
Copy link
Owner

@filefolder thanks for update that is because of the changes in new version of TF. They have moved things around.
Could you please add this change to the code base here on GitHub ?

From your explanation, now I can guess where the possible source of the misunderstanding comes from. What you are referring to are the class weights for that are used to compensate the unblanced labels in the dataset. But what I was explaining earlier were the loss weights that are used for optimization. These are two different things.

I used the unblance weights here when training the network:

history = model.fit_generator(generator=training_generator,
validation_data=validation_generator,
use_multiprocessing=args['use_multiprocessing'],
workers=multiprocessing.cpu_count(),
callbacks=callbacks,
epochs=args['epochs'],
class_weight={0: 0.11, 1: 0.89})

and loss weight here when building the network:

inp = Input(shape=args['input_dimention'], name='input') 
model = cred2(nb_filters=[8, 16, 16, 32, 32, 64, 64],
          kernel_size=[11, 9, 7, 7, 5, 5, 3],
          padding=args['padding'],
          activationf =args['activation'],
          cnn_blocks=args['cnn_blocks'],
          BiLSTM_blocks=args['lstm_blocks'],
          drop_rate=args['drop_rate'], 
          loss_weights=args['loss_weights'],
          loss_types=args['loss_types'],
          kernel_regularizer=keras.regularizers.l2(1e-6),
          bias_regularizer=keras.regularizers.l1(1e-4)
           )(inp) 

@filefolder
Copy link
Contributor Author

Thanks for the clarification, and I see you've just fixed the seqselfattention code (sorry I was in the field).

Determining the best way to convert the class_weights (.11 / .89) to sample_weights in getitem should be the next priority and then I think 1.61 should be ready to go. I have some ideas about this but haven't had the time to test them.

@filefolder
Copy link
Contributor Author

filefolder commented Jul 31, 2021

OK this appears to be working as expected... let me know if this makes sense to you. A perfect translation would be to force class_weights to be [.11,.89] but I am attempting a dynamic approach that (slightly) changes per batch. I'll try testing it a bit more next few days to see if it matches the output of 1.59 and if the dynamic method possibly out-performs the static version.

    def __getitem__(self, index):
        'Generate one batch of data'
        if self.augmentation:
            indexes = self.indexes[index*self.batch_size//2:(index+1)*self.batch_size//2]
            indexes = np.append(indexes, indexes)
        else:
            indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]           
        list_IDs_temp = [self.list_IDs[k] for k in indexes]

        X, y1, y2, y3 = self.__data_generation(list_IDs_temp)
        
        cw1 = np.array(np.where(y1==1)).size/y1.size
        cw2 = np.array(np.where(y2>0)).size/y2.size
        cw3 = np.array(np.where(y3>0)).size/y3.size
        
        class_weights = [[cw1, 1-cw1],[cw2, 1-cw2],[cw3, 1-cw3]]
        
        sample_weights = np.array([y1,y2,y3].copy()) 
        for i,y in enumerate([y1,y2,y3]):
            sample_weights[i][np.where(y >0)] = class_weights[i][1]
            sample_weights[i][np.where(y==0)] = class_weights[i][0] 


        return (X, [y1,y2,y3],list(sample_weights)) #convert back to list

@filefolder
Copy link
Contributor Author

also noticing a small bug in _document_training that I am unsure how to fix. Since the data in history.npz is shown elsewhere I've just commented it out for now.

np.save(save_dir+'/history',history)

> AttributeError: Can't pickle local object 'Layer.add_loss.<locals>._tag_callable'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants