plot_pairwise_average_fst may change the order of the cohorts when using a cohort dict #540

jonbrenas · 2024-05-24T09:40:10Z

Resolves #539. There might be other functions affected in the same way.

alimanfoo · 2024-05-24T23:35:51Z

malariagen_data/anoph/fst.py

@@ -493,7 +493,9 @@ def plot_pairwise_average_fst(
        **kwargs,
    ):
        # setup df
-        cohort_list = np.unique(fst_df[["cohort1", "cohort2"]].values)
+        cohort_list = pd.unique(
+            [c for cl in fst_df[["cohort1", "cohort2"]].values for c in cl]


The for loops in this line look a little funky. What is it supposed to be doing?

The goal is to get the list of all cohorts in the same order they were defined in the cohort dictionary. plot_pairwise_average_fst does not have the list of cohorts or the dictionary as an input so we have to access it in another way. fst_df[["cohort1", "cohort2"]] is a list of list each one containing two cohort names. The last cohort of the dictionary is missing from the cohort1 column and the first is missing from cohort2, hence why they are both accessed. np.unique is able to take a list of list and return every unique value but it sorts them alphabetically breaking the dictionary's order. pd.unique keeps the order but can't deal with a list of lists. The for loops are thus used to go from a list of list of cohorts to a list of cohorts.

Thanks @jonbrenas. What about:

pd.unique(fst_df[["cohort1", "cohort2"]].values.flatten())

Nested list comprehensions always bake my noodle!

I think it's equivalent to this:

cohort_pairs = fst_df[["cohort1", "cohort2"]].values flattened_cohorts = [] for cohort_pair in cohort_pairs: for cohort in cohort_pair: flattened_cohorts.append(cohort)

Or like Alistair suggested:

list(fst_df[["cohort1", "cohort2"]].values.flatten())

alimanfoo · 2024-05-24T23:48:56Z

Hi @jonbrenas, just to mention I've updated this branch to bring in some test fixes from master.

codecov · 2024-05-24T23:56:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.89%. Comparing base (dc89c9c) to head (3622b97).
Report is 353 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #540      +/-   ##
==========================================
- Coverage   98.61%   95.89%   -2.73%     
==========================================
  Files          38       39       +1     
  Lines        3690     3821     +131     
==========================================
+ Hits         3639     3664      +25     
- Misses         51      157     +106

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ahernank · 2024-05-28T08:51:03Z

@jonbrenas I've just re-ran the tests here to check they are all good now. It is only the coverage that needs a bit of additional work.

jonbrenas · 2024-05-28T09:29:52Z

Thanks @ahernank. I am not quite sure what the problem with codecov/project is, though. It looks like there is an indirect change that leaves a line uncovered in dipclust.py but I can't really figure out why that would be.

leehart

Could we try using flatten() instead of nested list comprehension, just for readability?

jonbrenas · 2024-06-18T11:33:09Z

Thanks, @leehart. The bottom-left half of the diagram wasn't being filled anymore. I also changed the list comprehension to .flatten(). (I personally find the list comprehension more readable but I see I am in the minority ;) )

leehart · 2024-06-18T13:13:53Z

Cool, thanks @jonbrenas !

I see there's suddenly a code coverage failure in CI, which I don't quite understand, which I think we've experienced before. I wonder if it's the product of something random. I can't seem to re-run it with any success. I think in the past I've just had to add more coverage to an unrelated part of the code to satisfy the threshold, but it's annoying.

jonbrenas · 2024-06-18T15:06:15Z

I think the problem is with dipcluster.py. It might be that the test doesn't cover some of the new functions.

leehart · 2024-06-18T15:14:50Z

Yes, I see that anoph/dipclust.py is the file with the biggest "Change %", i.e. -49.40%, with anoph/snp_frq.py having -4.79%.

I'm not sure what these figures mean!?

From https://docs.codecov.com/docs/coverage-percentages

Head coverage percent is the line coverage of all your files in your head commit.
Patch coverage percent is the line coverage of all the lines changed in your commit.
Change percent is the percent that the overall project line coverage has changed from base to head.

I guess if we can somehow claw back -2.72% test coverage, it should pass.

alimanfoo · 2024-08-09T15:54:54Z

Hi folks, I've folded this fix into some other maintenance I was doing on the plot_pairwise_fst_function() over in #579. I'll close here and merge in #579 if approved.

Changed the unique function used to avoid sorting

6cae710

jonbrenas requested a review from alimanfoo May 24, 2024 09:40

alimanfoo reviewed May 24, 2024

View reviewed changes

Merge branch 'master' into 539-no-alphabeticl-sorting-of-keys-fst-plots

8e9037a

jonbrenas and others added 2 commits June 17, 2024 15:16

Merge branch 'master' into 539-no-alphabeticl-sorting-of-keys-fst-plots

eeaa693

Merge branch 'master' into 539-no-alphabeticl-sorting-of-keys-fst-plots

bbb31a5

leehart requested changes Jun 18, 2024

View reviewed changes

A line was missing and used flatten.

3622b97

jonbrenas requested a review from leehart June 18, 2024 11:34

leehart mentioned this pull request Jun 20, 2024

codecov/project CI check fails but seems to require additional coverage on code unrelated to the PR #554

Open

alimanfoo mentioned this pull request Aug 9, 2024

Fix and improve plotting pairwise average fst #579

Merged

alimanfoo closed this Aug 9, 2024

alimanfoo added the BMGF-068808 Work supported by BMGF grant INV-068808 (MalariaGEN 2024-2027). label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plot_pairwise_average_fst may change the order of the cohorts when using a cohort dict #540

plot_pairwise_average_fst may change the order of the cohorts when using a cohort dict #540

jonbrenas commented May 24, 2024

alimanfoo May 24, 2024

jonbrenas May 25, 2024

alimanfoo Jun 17, 2024

leehart Jun 18, 2024

alimanfoo commented May 24, 2024

codecov bot commented May 24, 2024 •

edited

Loading

ahernank commented May 28, 2024

jonbrenas commented May 28, 2024

leehart left a comment

jonbrenas commented Jun 18, 2024 •

edited

Loading

leehart commented Jun 18, 2024

jonbrenas commented Jun 18, 2024

leehart commented Jun 18, 2024

alimanfoo commented Aug 9, 2024

plot_pairwise_average_fst may change the order of the cohorts when using a cohort dict #540

plot_pairwise_average_fst may change the order of the cohorts when using a cohort dict #540

Conversation

jonbrenas commented May 24, 2024

alimanfoo May 24, 2024

Choose a reason for hiding this comment

jonbrenas May 25, 2024

Choose a reason for hiding this comment

alimanfoo Jun 17, 2024

Choose a reason for hiding this comment

leehart Jun 18, 2024

Choose a reason for hiding this comment

alimanfoo commented May 24, 2024

codecov bot commented May 24, 2024 • edited Loading

Codecov Report

ahernank commented May 28, 2024

jonbrenas commented May 28, 2024

leehart left a comment

Choose a reason for hiding this comment

jonbrenas commented Jun 18, 2024 • edited Loading

leehart commented Jun 18, 2024

jonbrenas commented Jun 18, 2024

leehart commented Jun 18, 2024

alimanfoo commented Aug 9, 2024

codecov bot commented May 24, 2024 •

edited

Loading

jonbrenas commented Jun 18, 2024 •

edited

Loading