Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot_pairwise_average_fst may change the order of the cohorts when using a cohort dict #540

Closed
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion malariagen_data/anoph/fst.py
Original file line number Diff line number Diff line change
Expand Up @@ -493,7 +493,9 @@ def plot_pairwise_average_fst(
**kwargs,
):
# setup df
cohort_list = np.unique(fst_df[["cohort1", "cohort2"]].values)
cohort_list = pd.unique(
[c for cl in fst_df[["cohort1", "cohort2"]].values for c in cl]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The for loops in this line look a little funky. What is it supposed to be doing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to get the list of all cohorts in the same order they were defined in the cohort dictionary. plot_pairwise_average_fst does not have the list of cohorts or the dictionary as an input so we have to access it in another way. fst_df[["cohort1", "cohort2"]] is a list of list each one containing two cohort names. The last cohort of the dictionary is missing from the cohort1 column and the first is missing from cohort2, hence why they are both accessed. np.unique is able to take a list of list and return every unique value but it sorts them alphabetically breaking the dictionary's order. pd.unique keeps the order but can't deal with a list of lists. The for loops are thus used to go from a list of list of cohorts to a list of cohorts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jonbrenas. What about:

pd.unique(fst_df[["cohort1", "cohort2"]].values.flatten())

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested list comprehensions always bake my noodle!

I think it's equivalent to this:

cohort_pairs = fst_df[["cohort1", "cohort2"]].values
flattened_cohorts = []
for cohort_pair in cohort_pairs:
	for cohort in cohort_pair:
		flattened_cohorts.append(cohort)

Or like Alistair suggested:

list(fst_df[["cohort1", "cohort2"]].values.flatten())

)
# df to fill
fig_df = pd.DataFrame(columns=cohort_list, index=cohort_list)
# fill df from fst_df
Expand Down
Loading