-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.61 trainer class_weight issue #74
Comments
@filefolder thanks for sharing this. This is another issue arises by new changes in Keras and TF 2.5. I have modified other parts of the code to accompany with the new changes the except the trainer module. I may need some time to modify this part that is why I haven't upload version 1.62 into pip and anaconda yet. Could you use the 1.59 version for now? except the use of TF 2 everything else is the same in that version of EqT. |
No problem, 1.59 works fine, although I think we only have GPU set up for TF.2.5 so I am looking forward to trying it out. |
Revisiting this briefly, it seems there are two issues. The first is in SeqSelfAttention where layer dimensions are becoming corrupted, but ONLY when the attention type is 'additive' (multiplicative works). I think I can track it down to this segment in the _call_additive_emission() function
as e is returned as The second issue is of course converting the class_weights to sample_weights, but I don't understand how they were defined in the first place. Originally the defaults are [.11,.89]. Are there supposed to be three class_weights, one for y1, y2, y3? Or are they a true/false penalty type of thing? What I have sort of figured out is that you can define sample weights entirely within DataGenerator.getitem and simply return them in addition, e.g. Something rudimentary like this seems to work but I've just sort of guessed as to what the proper translation is between class and sample_weight, plus I don't fully understand how the class_weights were defined in the first place. Here I assume they should correspond to [detector, p, s] and also sum to 1
Any comments appreciated |
@filefolder I have modified SeqSelfAttention for TF2.5 that might cause this issue. To track the issue better it might be helpful to copy the original one from version 1.59. The weight in the attention layers are totally different thing. They are attention weights. The class weights as I explained earlier as well are defined empirically and they sum to 1. |
quick update, the fix for the SelfSeqAttention is this From
to:
What was happening is that input_len was returning None, as for whatever reason K.shape in tf2.5 no longer returns list values you can just reference, need to use .as_list(). Still need to define batch_size the same way, however, if you want to keep the K.reshape syntax the same at the bottom of the same function. Still working on best way to implement sample weights. My understanding though is that class_weights should not be purely empirical, they should be a ratio of the number of label==1 values to the total size of y1 (e.g. len(np.where(y1 ==1))/y1.size). USUALLY this is around .11 but not always, and potentially far less for y2 and y3. The sample_weights approach allows for this to be defined dynamically, per batch, so it will be interesting to see what affect that has. |
@filefolder thanks for update that is because of the changes in new version of TF. They have moved things around. From your explanation, now I can guess where the possible source of the misunderstanding comes from. What you are referring to are the class weights for that are used to compensate the unblanced labels in the dataset. But what I was explaining earlier were the loss weights that are used for optimization. These are two different things. I used the unblance weights here when training the network: history = model.fit_generator(generator=training_generator, and loss weight here when building the network:
|
Thanks for the clarification, and I see you've just fixed the seqselfattention code (sorry I was in the field). Determining the best way to convert the class_weights (.11 / .89) to sample_weights in getitem should be the next priority and then I think 1.61 should be ready to go. I have some ideas about this but haven't had the time to test them. |
OK this appears to be working as expected... let me know if this makes sense to you. A perfect translation would be to force class_weights to be [.11,.89] but I am attempting a dynamic approach that (slightly) changes per batch. I'll try testing it a bit more next few days to see if it matches the output of 1.59 and if the dynamic method possibly out-performs the static version.
|
also noticing a small bug in _document_training that I am unsure how to fix. Since the data in history.npz is shown elsewhere I've just commented it out for now.
|
Hi,
Having trouble with 1.62 / tf2.5 (compiled from source / no gpu), generator mode, python3.6.
It seems this may be one or possibly two separate issues. My limited understanding is that class_weights may be depreciated for tf 2.5, or at least they are used differently. The other concern is that the "attention" expansions (D0, D, P, S) seem to have output shapes of (None, None,*) which seems wrong and later affect the dimension of further decoding layers as well as the final output.
If I remove the class_weight call in fit_generator entirely, the code runs, models are written, but the output dimensions are still mostly None, None etc as above and the trainer seems to perform poorly, although I have not tested it fully.
Otherwise the picker/predictor seems to work but I have not tested it using new models created in 1.61.
A relevant discussion here: keras-team/keras#3653
And the solution seems to be here although I don't quite understand it fully: https://www.tensorflow.org/tutorials/images/segmentation#optional_imbalanced_classes_and_class_weights
The text was updated successfully, but these errors were encountered: