-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
predict_generator cannot maintain data order #5048
Comments
If you want to compare predictions versus known outputs, use an |
@patyork I think evaluate_generator is more useful for validation. But for the purpose of 'real' predicting, it is important to know my predictions are from which exact input samples, right? |
That's a use case, of course. If setting That's a pretty niche/uncommon issue to need to solve (high speed, large dataset, prediction and saving); most likely too niche for inclusion in the Keras core. |
@patyork @iammarvelous I suggest one of the following:
|
@LamDang @patyork agreed they should force predict_generator and evaluate_generator I suggest to always force workers=1 for evaluate_generator and predict_generator. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
agreed. This has long worried me. predict_generator appears to be the most efficient way of evaluating a very large collection of images, and keeping the card at full utilization, but this order ambiguity invites errors. Instead of fixing the ordering though, the interface could be changed so that the input to this is a generator that yields (ID, Data) pairs, and the generator could yield (ID, prediction) pairs. |
I agree, I do not see how to use predict generator, @fchollet this is a bug isn't it ? (and same thing for evaluate_generator (#6499) I do not get the same results (compare to evaluate) et re-run it will give slitly different results ...) |
May it be that this might be caused due to
In this case rstudio/keras3#149 I had a similar problem (although using R Keras) and the reason was that shuffle was set to TRUE when batch importing the images. Therefore, the image order was different for each repetition. It might be also that I don´t get the point of this issue #5048 because I am a greenhorn in this topic. Then please ignore my posting. |
I recently crashed into predict_generator inconsistency too. Looks like it's a quite old issue. |
Has there been a fix on this please? I have the exact same issue, as I am trying to create bottlenecks using a model with the top off. I need to store the bottlenecks in separate files on disk, but without knowing which input file produced which output bottleneck, the whole thing would end up a mess. Or does anyone know a better way to create bottleneck files for individual inputs? |
I bumped into this issue when I used the same generator for both |
I had similar issue where the predictions got slightly worse each time I called
|
When loading your test_set using .flow_from_dataframe/directory make sure to disable the shuffle option; shuffle=None, that will force it to have the original ordering. |
It seems that
predict_generator
cannot maintain the data order when using multiprocessing. When feeding into several batches test data into predict_generator, the output array does not correspond to input batch index, which makes us have no clue which output is the prediction of which input, and that makes the function useless. One possible remedy for this might be using priority queue rather than normal queue to maintain the order.Here is detailed test code.
And here are results.
The text was updated successfully, but these errors were encountered: