Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does mnist bag loader make sure it sees all data? #7

Open
amirhfarzaneh opened this issue Mar 4, 2019 · 2 comments
Open

Does mnist bag loader make sure it sees all data? #7

amirhfarzaneh opened this issue Mar 4, 2019 · 2 comments

Comments

@amirhfarzaneh
Copy link

Hello,

I was wondering if your mnist bag loader makes sure that your network sees all samples of digit 9 against negative samples. In particular, does this line generates different random indices that are not seen by the network each time? To be more clear, my understanding is that your data loader generates random indices based on bag length, looks at the data at those indices and if there is a target number (here 9) and makes a bag out of them. In the next iteration the random index generator doesn't consider the already generated random numbers in the last iteration and makes random numbers from scratch. So in this way, in one epoch of learning, your network does not see all the 9s and in the wors case it sees only one 9 in the dataset (say the random number generator "accidentaly" picks only one index of 9 every time)

@max-ilse
Copy link
Collaborator

max-ilse commented Mar 4, 2019

Hi Amir,

You are right I don't make sure that all the '9' from the MNIST dataset are used. Let's assume I want to generate a bag of length 10. In this case, the line you've pointed out is getting 10 random indices between 0 and 59999. There are about 6000 '9's in the dataset. It is very very unlikely that it picks the same '9' over and over again. I don't think you should be concerned about this possibility.

@amirhfarzaneh
Copy link
Author

Thank you for the response. It is clear to me now. So if you wanted the network to see all the examples, how would you approach it? I understand that your bags contain positive '9's and negative '4's; How would you partition 6000 samples of '9's and 6000 samples of '4's into bags of positive samples and negative samples so your network sees all of your '9's and '4's?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants