Unify JPL and UMD virtual computing environments #44

wkiri · 2021-07-20T16:40:36Z

We are seeing slightly different behavior in DORA runs between the JPL and UMD virtual environments, which probably means a different version of some Python package(s) are employed, and likely can be resolved by updating setup.py to specify those package versions as well.

@hannah-rae noted these issues:

When running the planetary_rover PNG sample case, the PNG images seem to be read in using a different ordering
When analyzing the planetary_rover PNG images, the selections are roughly the same but the scores are different.

The text was updated successfully, but these errors were encountered:

wkiri · 2021-08-31T01:15:21Z

@hannah-rae Was this resolved?

hannah-rae · 2021-09-01T16:24:26Z

No, not yet, but it is on my to do list.

bdubayah · 2021-09-17T22:39:47Z

I think the issue might be partly related to how os.listdir or glob.glob lists directories on different machines (used when the image loader loads images). When listing the files in the fmnist or planetary test directory, I get different orderings on the UMD cluster vs my local machine. This causes a labels.csv file to be totally wrong between machines. One fix could be sorting directory contents once they're loaded, and making sure labels correspond to that order. Or, labels files could use the filename/sample id rather than it's index.

jakehlee · 2021-09-17T22:44:14Z

@wkiri and I ran into this when running experiments for our DEMUD paper - every glob.glob() or os.listdir() call should be wrapped by a sorted(), the lists/iterators they return is in some arbitrary order determined by the individual filesystem.

https://docs.python.org/3/library/os.html#os.listdir

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order...

wkiri · 2021-09-18T18:06:44Z

My preference would be for labels.csv to use an identifier for each item (as noted by @bdubayah) instead of relying on ordering. In addition to increasing robustness across machines, it would mean we can more easily change the experiment to include/exclude items without having to regenerate every line in this file. This makes sense for individually named items like the images in an image data set. It's less clear how it would work for some of our other data set types. Ideas welcome :)

bdubayah · 2021-09-20T20:28:55Z

What does everyone think of this approach? I changed a few lines in the data loader so that each sample would have a string id (just converted the sample indexes to a string for tabular data), and then in the results organization used data id rather than data index to make the comparison plot (so you could run a modified experiment with a exhaustive labels file). We would still need to change the combined plot script but wanted to get thoughts first.

wkiri · 2021-10-14T17:50:11Z

@bdubayah Yes, this looks great!

I think the update in dora_results_organization.py to read string names instead of integers from the labels file should also occur in combined_plot_script.py. It looks like some additional changes are needed to the latter script too. I will work on this. In the meantime, is this branch ready to merge? (issue44-unify-envs)

bdubayah · 2021-10-14T18:21:21Z

Yes, it's good to go (aside from the combined plot issues you mentioned). I think the labels files for the experiments will need to be updated as well.

wkiri · 2021-10-14T18:28:04Z

That's right. I'm updating the planetary experiment label files, but it's a good point that this will trigger updates needed for the other use cases too.

bdubayah · 2021-10-14T18:47:37Z

👍 Should I merge this to master or did you want to include the combined plot script in the same PR?

wkiri · 2021-10-14T18:51:43Z

@bdubayah Let me commit the updates to that script. It's worth alerting the team that this merge may break compatibility with experiments until folks update their label files, too.

wkiri · 2021-10-14T18:59:51Z

@bdubayah Ok, it should be ready if you want to take a look.

Note that I also changed the y axis to start from 0, since it's possible for an algorithm to not select at least one novel item in the beginning.

bdubayah · 2021-10-14T19:22:51Z

Looks good to me!

wkiri · 2021-10-15T16:07:33Z

@bdubayah Feel free to PR when ready!

- Rely on data id instead of row index to match results to labels file, addressing an issue where os.listdir provides inconsistent ordering across machines - Use data id from labels.csv instead of row number for comparison plots, allowing experiments to be run on a subset of the test data - Ensure all data ids are strings Co-authored-by: Kiri Wagstaff <[email protected]>

wkiri mentioned this issue Jul 20, 2021

Add test data for each use case #2

Closed

hannah-rae self-assigned this Aug 3, 2021

hannah-rae added the experiments label Aug 3, 2021

wkiri mentioned this issue Sep 15, 2021

Add PAE ranking method #13

Open

bdubayah added a commit that referenced this issue Sep 20, 2021

Use data id for comparison plots instead of index #44

36cf109

wkiri added a commit that referenced this issue Oct 14, 2021

Update to allow sample ids (non-int item ids). (#44)

6a8a4b3

bdubayah added a commit that referenced this issue Oct 15, 2021

Use data id for comparison plots instead of index #44

c154fd9

bdubayah pushed a commit that referenced this issue Oct 15, 2021

Update to allow sample ids (non-int item ids). (#44)

ced506b

urebbapr mentioned this issue Oct 16, 2021

Update script which generates DES validation data to output coadd_object_ids #74

Open

bdubayah mentioned this issue Oct 20, 2021

Add unit tests #14

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify JPL and UMD virtual computing environments #44

Unify JPL and UMD virtual computing environments #44

wkiri commented Jul 20, 2021

wkiri commented Aug 31, 2021

hannah-rae commented Sep 1, 2021

bdubayah commented Sep 17, 2021

jakehlee commented Sep 17, 2021

wkiri commented Sep 18, 2021 •

edited

Loading

bdubayah commented Sep 20, 2021

wkiri commented Oct 14, 2021

bdubayah commented Oct 14, 2021

wkiri commented Oct 14, 2021

bdubayah commented Oct 14, 2021

wkiri commented Oct 14, 2021

wkiri commented Oct 14, 2021

bdubayah commented Oct 14, 2021

wkiri commented Oct 15, 2021

Unify JPL and UMD virtual computing environments #44

Unify JPL and UMD virtual computing environments #44

Comments

wkiri commented Jul 20, 2021

wkiri commented Aug 31, 2021

hannah-rae commented Sep 1, 2021

bdubayah commented Sep 17, 2021

jakehlee commented Sep 17, 2021

wkiri commented Sep 18, 2021 • edited Loading

bdubayah commented Sep 20, 2021

wkiri commented Oct 14, 2021

bdubayah commented Oct 14, 2021

wkiri commented Oct 14, 2021

bdubayah commented Oct 14, 2021

wkiri commented Oct 14, 2021

wkiri commented Oct 14, 2021

bdubayah commented Oct 14, 2021

wkiri commented Oct 15, 2021

wkiri commented Sep 18, 2021 •

edited

Loading