-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify JPL and UMD virtual computing environments #44
Comments
@hannah-rae Was this resolved? |
No, not yet, but it is on my to do list. |
I think the issue might be partly related to how os.listdir or glob.glob lists directories on different machines (used when the image loader loads images). When listing the files in the fmnist or planetary test directory, I get different orderings on the UMD cluster vs my local machine. This causes a labels.csv file to be totally wrong between machines. One fix could be sorting directory contents once they're loaded, and making sure labels correspond to that order. Or, labels files could use the filename/sample id rather than it's index. |
@wkiri and I ran into this when running experiments for our DEMUD paper - every https://docs.python.org/3/library/os.html#os.listdir
|
My preference would be for |
What does everyone think of this approach? I changed a few lines in the data loader so that each sample would have a string id (just converted the sample indexes to a string for tabular data), and then in the results organization used data id rather than data index to make the comparison plot (so you could run a modified experiment with a exhaustive labels file). We would still need to change the combined plot script but wanted to get thoughts first. |
@bdubayah Yes, this looks great! I think the update in |
Yes, it's good to go (aside from the combined plot issues you mentioned). I think the labels files for the experiments will need to be updated as well. |
That's right. I'm updating the planetary experiment label files, but it's a good point that this will trigger updates needed for the other use cases too. |
👍 Should I merge this to master or did you want to include the combined plot script in the same PR? |
@bdubayah Let me commit the updates to that script. It's worth alerting the team that this merge may break compatibility with experiments until folks update their label files, too. |
@bdubayah Ok, it should be ready if you want to take a look. Note that I also changed the y axis to start from 0, since it's possible for an algorithm to not select at least one novel item in the beginning. |
Looks good to me! |
@bdubayah Feel free to PR when ready! |
- Rely on data id instead of row index to match results to labels file, addressing an issue where os.listdir provides inconsistent ordering across machines - Use data id from labels.csv instead of row number for comparison plots, allowing experiments to be run on a subset of the test data - Ensure all data ids are strings Co-authored-by: Kiri Wagstaff <[email protected]>
We are seeing slightly different behavior in DORA runs between the JPL and UMD virtual environments, which probably means a different version of some Python package(s) are employed, and likely can be resolved by updating
setup.py
to specify those package versions as well.@hannah-rae noted these issues:
The text was updated successfully, but these errors were encountered: