-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plot_pairwise_average_fst may change the order of the cohorts when using a cohort dict #540
Conversation
malariagen_data/anoph/fst.py
Outdated
@@ -493,7 +493,9 @@ def plot_pairwise_average_fst( | |||
**kwargs, | |||
): | |||
# setup df | |||
cohort_list = np.unique(fst_df[["cohort1", "cohort2"]].values) | |||
cohort_list = pd.unique( | |||
[c for cl in fst_df[["cohort1", "cohort2"]].values for c in cl] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The for loops in this line look a little funky. What is it supposed to be doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to get the list of all cohorts in the same order they were defined in the cohort dictionary. plot_pairwise_average_fst
does not have the list of cohorts or the dictionary as an input so we have to access it in another way. fst_df[["cohort1", "cohort2"]]
is a list of list each one containing two cohort names. The last cohort of the dictionary is missing from the cohort1
column and the first is missing from cohort2
, hence why they are both accessed. np.unique
is able to take a list of list and return every unique value but it sorts them alphabetically breaking the dictionary's order. pd.unique
keeps the order but can't deal with a list of lists. The for loops are thus used to go from a list of list of cohorts to a list of cohorts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonbrenas. What about:
pd.unique(fst_df[["cohort1", "cohort2"]].values.flatten())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested list comprehensions always bake my noodle!
I think it's equivalent to this:
cohort_pairs = fst_df[["cohort1", "cohort2"]].values
flattened_cohorts = []
for cohort_pair in cohort_pairs:
for cohort in cohort_pair:
flattened_cohorts.append(cohort)
Or like Alistair suggested:
list(fst_df[["cohort1", "cohort2"]].values.flatten())
Hi @jonbrenas, just to mention I've updated this branch to bring in some test fixes from master. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #540 +/- ##
==========================================
- Coverage 98.61% 95.89% -2.73%
==========================================
Files 38 39 +1
Lines 3690 3821 +131
==========================================
+ Hits 3639 3664 +25
- Misses 51 157 +106 ☔ View full report in Codecov by Sentry. |
@jonbrenas I've just re-ran the tests here to check they are all good now. It is only the coverage that needs a bit of additional work. |
Thanks @ahernank. I am not quite sure what the problem with codecov/project is, though. It looks like there is an indirect change that leaves a line uncovered in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try using flatten()
instead of nested list comprehension, just for readability?
Thanks, @leehart. The bottom-left half of the diagram wasn't being filled anymore. I also changed the list comprehension to .flatten(). (I personally find the list comprehension more readable but I see I am in the minority ;) ) |
Cool, thanks @jonbrenas ! I see there's suddenly a code coverage failure in CI, which I don't quite understand, which I think we've experienced before. I wonder if it's the product of something random. I can't seem to re-run it with any success. I think in the past I've just had to add more coverage to an unrelated part of the code to satisfy the threshold, but it's annoying. |
I think the problem is with dipcluster.py. It might be that the test doesn't cover some of the new functions. |
Yes, I see that I'm not sure what these figures mean!? From https://docs.codecov.com/docs/coverage-percentages
I guess if we can somehow claw back -2.72% test coverage, it should pass. |
Resolves #539. There might be other functions affected in the same way.