-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compensation (Python API) #340
base: develop
Are you sure you want to change the base?
Conversation
These are replies to comments sent by @JS3xton via email.
I'm going to assume that this point refers to an example here in the PR comments, showing that the function works. I'll do that soon.
Three things here:
Yeah unfortunately we never got any of this done, so I'm not willing to port this into the Excel UI for now. There's precedent of capabilities only available for Python users: the ellipse gate function, the new reusable density gate you made, and maybe more.
Eh. While I think you're technically right, there is precedent for moving transform-like functionality outside of this module to showcase a specific feature (i.e. the mef module). More generally, I've become skeptical about the need to have a dedicated module for "transformations". I think this came out of our old view that it was worth distinguishing between "channel" units and "a.u.", and therefore having a module that transformed between these two. But having worked with a lot of flow cytometry data, including data from more modern instruments which are stored directly in a.u., I started seeing channel units as an intermediate step that should not be used for anything. If present-day me had to remake FlowCal from scratch, I'd probably have FCSData objects be directly converted to a.u. upon loading, and eliminate the transform module, since we never used it for anything other than the
Sure.
Yeah that's a good question.
Yeah I don't know what I was thinking. We don't change the channel name to "Calibrated FL1" after applying
I think at the end it will come down to a linear transformation anyway, which is what this module implements. And making room for a more sophisticated compensation method is yet another reason to keep |
Maybe the
Ooo boy, I'll have to dig and see if I can find that. I don't remember what all controls I ran either, so I'll get back to you. IF that data set looks good, I agree it could very well replace the current example dataset we just added in the
The I've been reflecting on the
Upon reflection, I currently favor removing
Yeah, I was thinking along the exact same lines. I also agree supporting one replicate should be fine. The user could concatenate data beforehand if they really wanted to. And anything more complicated than that (e.g., taking the mean of means) could be left to the user to figure out on their own. |
It seems that we're on the same page regarding I have looked at the example files you sent me. Unfortunately, your FL1 intensities were never high enough to significantly bleed into FL3. The following is the autofluorescence control: And the following is the sfGFP control: FL3 fluorescence seems to be at around the same place. Therefore, while we can make a technically valid demonstration, it would not illustrate the point of the module very well. My B. subtilis dataset is only slightly better. This is a sample with low fluorescence in both FL1/FL3: This is one with high FL1 bleeding slightly into FL3: I talked to Karl and it seems he might not have the data we need with the appropriate controls. Therefore, the best example dataset so far seems to be my B. subtilis dataset. If you can think of any better data source let me know. |
I agree this would be a major change, and I'm in favor of filing a new issue for it to be completed when we have time. I'm also willing to take a crack at it now, but not if you want the first attempt or if you won't be able to consider it quickly.
I feel strongly that That being the case,
Yeah, I'm not surprised to hear that. I didn't compensate that data set, and I don't recall having any problems with it. Could bleed-through and compensation be better demonstrated with |
I think the best course of action for now would be to make
That's a good idea. Do you remember if we ever agreed on what calibrated units were appropriate for FL2? |
Sounds good.
I don't think we ever did. Investigating that: (I have two sources for the MEF Channel Configurations, and they don't always agree—specifically for MEPE and MEAP.) Looks like MEFL aligns well with (and thus appropriately calibrates) FL1, as does MECY with FL3 (as previously determined). If I had to guess the correct MEPE configuration, I would guess MEPE1, which looks like it would calibrate FL2 well. MEPTR also looks like it could be used, though. |
Sorry, I'm not sure I understand the last graph. What's the difference between MEPE1 and MEPE2? I agree that MEPE1 sounds like the best MEF to calibrate against. There seems to be a small but noticeable difference in FL2 signal when sfGFP is present in your example. This is all based on made up FL2 MEF calibration values, so this may change a little when we figure out what values to actually use. Still, it may not be enough. I can try to run my B. subtilis example with FL2 later today and see how it looks. |
That plot, as best I can determine, shows the configuration of the cytometer used by Spherotech to cross-calibrate rainbow calibration beads to beads surface-labeled with a known amount of corresponding fluorophore (e.g., FITC for MEFL, PE for MEPE, etc.). However, I found two different emails listing two different emission filter sets for their PE and APC channels, hence MEPE1 and MEPE2, etc. Our interlab colleagues have embraced MEPE1, though.
It's hard for me to tell because they're not plotted on the same scale. It looks like the FL2 peak shifts from ~60 MEF to ~95 MEF, though. I don't know if that's enough for a compelling compensation demonstration, though, and would have to see the final demonstration. |
Ok I'll use MEPE. Do you happen to have the beads datasheet or the MEPE values corresponding to your example? |
I just emailed it to you. |
I reorganized the module and moved the functions to more closely resemble what we did with Note that this method can be used in two ways. The "full" version uses a non-fluorescence control and subtracts autofluorescence from the samples to calibrate. This will result in non-fluorescent samples with histograms centered around zero, even with old flow cytometers that don't normally have events with negative fluorescence values. In newer cytometers, I assume it would center non-fluorescent histograms to zero if/when they're not already there. For a detailed justification on why I think this is the correct way to compensate, see here. On the other hand, one can pass Now for the examples: This is the data from your dual reporter gate, before and after color compensation but without compensating for autofluorescence: And this is with autofluroescence compensation (data in the two panels on top were subtracted from the mean of the autofluorescence control, but no further color compensation was performed): In both cases, FL2 output is noticeably lower after compensating. I think the second plot is better at showing this difference, though. As for the B. subtilis data, an apparent increase in FL2 fluorescence is eliminated after compensation. This is what that looks like without autofluorescence compensation. Furthermore, samples at t=330 and 450 min have a high FL2 fluorescence subpopulation. This is actually what I would expect since FL3, which doesn't suffer from much bleedthrough in this example, has the same long tails: I would like to hear opinions on which dataset to include as an example. I would prefer if only one example is included with FlowCal, so if we choose the B. subtilis example we should eliminate the previous gate dataset. |
Awesome, yeah it looks much better. Some additional thoughts:
Do you think
Interesting. I would try to simplify them. E.g., use fewer data points such that the violins don't overlap (but keep them roughly evenly spaced). And try to achieve a square plot aspect ratio. You might also show the compensated and uncompensated violins on the same plot. E.g., uncompensated in the background in gray and compensated in color in the foreground. That might make the compensation effect more stark. Right now, it's hard to compare them (especially when the scale is not the same between the two plots). Showing the green violins almost feels unnecessary and confusing. If they explain a compensation effect that is greater at low aTc (when green is high) and lower at high aTc (when green is low), then they may be helpful, though. I don't have strong feelings on the use of autofluorescence compensation; I'm generally in favor of whatever is simpler and more widely used (which sounded like not using it?). If I gave you a Min and Max for the transfer function data, then I think it could be used to replace the current |
Also, regarding the examples: Compensation is traditionally shown with a 2D plot. I'm all for pushing the new violin plots, but we might also want to pick the most egregious sample and show a more traditional 2D plot of skewed data before and unskewed data after compensation. |
Yeah I was thinking the same. I'll add this next.
This sounds really confusing to me. From a user's perspective, why would I want to specify more channels than those for which I have controls? And even if I wanted to do this, how do I specify which channels each control corresponds to? Will we use another
I would argue that this is different from
Thanks for pointing that out. I think issues with handling other statistics functions or numpy arrays will come out and be solved when I make the unit tests.
It seems clear enough to me. List comprehensions are, to me, way more readable than broadcasting, which seems to be what you want. And the statements can be further clarified with comments if necessary. I don't know what other "discipline" issues you may have, but I think our lives are gonna be easier if you propose code snippets to replace what's currently there. I can incorporate them after the unit tests, to be sure that the results are identical to what we currently have.
One issue is that the matrix terms in the diagonal are treated differently than the others, which makes a fully matrix-based treatment challenging. I'll see what I can do. I think mentioning units is a good idea though.
Yeah I'm not sure either. I'll leave it as it is for now.
These are all good ideas. The plots I attached were not meant to be the final plots for the tutorials. But I'll come back to these suggestions when I get there.
Yeah I think the point is to show that the biggest correction occurs when FL1 signal is higher. Again, I'll have an opinion on whether to keep it or not when I actually get to the tutorials.
Having the mix/max samples may make the compensation effect more obvious without doing autofluorescence compensation. On the other hand, I would prefer if the example showed the most complete version of compensation, which is including autofluorescence compensation. I'll have more to say when I get there, but please do send me the min/max samples. The B. subtilis data has the drawback that the biology is hard to explain, and the results don't match similar published experiments for various very technical reasons. I eventually ran better experiments, but these are not good compensation examples. This is why I would prefer to go with your dataset. |
Modern cytometers often have many channels, whereas a user may only require 2 or 3 differently colored fluorophores (with 2 or 3 appropriate controls). If additional channels could be specified trivially and result in "better" compensation, that seems worth supporting to me. (I don't know how much better using an overdetermined system is; it's hard to know without a good example. Intuitively, though, I thought using more information from more channels seemed like it would do a better job. If implemented well, it would also not preclude users from explicitly specifying one channel per fluorophore as you propose; it would be a pure extension of functionality.) The
Normalize the columns of (There may be some weird edge cases where one channel becomes the corresponding channel for multiple fluorophores, i.e., the a=1 coefficient fall multiple times in the same row. I haven't done the math to determine if the resulting compensation fails, nor do I understand the relationship between the channels and the fluorophores in that situation.) I think this normalization would need to be done after extracting the requested channels from
I thought this was basically just gonna require swapping
Yeah, fair enough. I'll wait for other things to resolve and then see if I think anything is worth clarifying.
Ahh, OK sorry, I was getting a little ahead of myself, haha. I look forward to seeing the final plots.
I'm not sure I completely understand you here. I don't think Min and Max should be included in the compensation demonstration because I think they would just make it more confusing. I do think they should be included in the separate |
Oh, and the Min and Max data should already be included for the green violins (which are the gate output). |
Oh, I think I understand your idea of using more channels than fluorophores now. If that could be implemented that would be cool. Besides the channel selection issue, there's also the issue of what data structure to return. For example, if you're using a flow cytometer with fluorescence channels FL1, FL2, and FL3, you want to use all three for compensation, and you're trying to reconstruct signals from fluorophores GFP and mCherry, how would the output data structure look like? If the algorithm determines that FL1 and FL3 are "better" for GFP and mCherry respectively, should it discard FL2? Should it leave it unmodified? How does the transformation function tell the user which channels contain the GFP and mCherry signals? Even if the math ends up working out, there's a lot to think about from the API level. (btw the overdetermined system can be solved using a pseudoinverse which directly solves the least-squares problem). |
This ended up being quite a chore. I had to make a small synthetic data set and basically implement my proposed API (two things I was trying to avoid doing unnecessarily), but I think it's been illuminating. The math underlying my API was a little more complicated than I thought, but it's actually quite doable. The core of it is the following three lines of pseudocode: # Solve for independent fluorophore signals. `spillover` does NOT need to be
# normalized. Use of `lstsq()` eliminates the need to extract channels from
# `spillover` and `data` to get square matrix. Should also be more robust to
# noise in real data.
f_hat = np.linalg.lstsq(spillover, data.T)
# Reconstruct requested channel-fluorophore signals by selectively applying
# `spillover` back to `f_hat`.
for channel,fluorophore in zip(*channel_fluorophore_pairs):
data[:,fluorophore@channel] = spillover.loc[channel,fluorophore] * f_hat[fluorophore,:] My proposed API can reconstruct all channel-fluorophore signals from a single spillover matrix by selectively reapplying the spillover matrix to Example 1: Unmixing the contributions of two fluorophores to one channelOne example I would really like to (eventually) support in a clean way is simply unmixing the contributions of two fluorophores to a single channel (here, GFP and mCherry to FL2). With the current constraints on the output object—two new channels cannot replace one old channel—we cannot do this in one command. However, we should be able to do it in two passes: code
Using the current implementation: compensate_transform_fxn1 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[gfp_data,mcher_data],
channels=['FL2','FL3'])
fl2_gfp = compensate_transform_fxn1(data=multicolor_data)[:,'FL2']
compensate_transform_fxn2 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[mcher_data,gfp_data],
channels=['FL2','FL1'])
fl2_mcher = compensate_transform_fxn2(data=multicolor_data)[:,'FL2'] Using my proposed API: compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples={'gfp' :gfp_data,
'mcher':mcher_data})
fl2_gfp = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL2','gfp'),
('FL3','mcher')])[:,'FL2']
fl2_mcher = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL2','mcher'),
('FL1','gfp')])[:,'FL2'] (Eventually:) compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples={'gfp' :gfp_data,
'mcher':mcher_data})
multicolor_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL2','gfp'),
('FL2','mcher')],
compensating_channels=['FL1','FL2','FL3'])
fl2_gfp = multicolor_data[:,'gfp@FL2']
fl2_mcher = multicolor_data[:,'mcher@FL2'] (Note: The last example would require m-to-n compensation given the channels I chose to unmix with; the prior behavior could be reproduced using a second call to Example 2: Compensating multiple channels to a single fluorophore with crosstalkCompensate FL1, FL2, and FL3 to code
Using the current implementation: # assume FL1 is the primary channel for fluor1 and FL4 for fluor2
compensate_transform_fxn1 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data],
channels=['FL1','FL4'])
fluor1_FL1 = compensate_transform_fxn1(data=multicolor_data)[:,'FL1']
fluor2_FL4 = compensate_transform_fxn1(data=multicolor_data)[:,'FL4']
compensate_transform_fxn2 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data],
channels=['FL2','FL4'])
fluor1_FL2 = compensate_transform_fxn2(data=multicolor_data)[:,'FL2']
compensate_transform_fxn3 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data],
channels=['FL3','FL4'])
fluor1_FL3 = compensate_transform_fxn3(data=multicolor_data)[:,'FL3']
compensate_transform_fxn4 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data],
channels=['FL1','FL3'])
fluor2_FL3 = compensate_transform_fxn4(data=multicolor_data)[:,'FL3']
fluor1_data = (fluor1_FL1,fluor1_FL2,fluor1_FL3)
fluor2_data = (fluor2_FL3,fluor2_FL4) Using my proposed API: compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples={'fluor1':fluor1_data,
'fluor2':fluor2_data})
fluor1_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor1'),
('FL2','fluor1'),
('FL3','fluor1'),
('FL4','fluor2')])[:,['FL1','FL2','FL3']]
fluor2_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor1'),
('FL2','fluor1'),
('FL3','fluor2'),
('FL4','fluor2')])[:,['FL3','FL4']] (Note: Use of (Eventually:) compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples={'fluor1':fluor1_data,
'fluor2':fluor2_data})
multicolor_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor1'),
('FL2','fluor1'),
('FL3','fluor1'),
('FL3','fluor2'),
('FL4','fluor2')])
fluor1_data = multicolor_data[:,['fluor1@FL1','fluor1@FL2','fluor1@FL3']]
fluor2_data = multicolor_data[:,['fluor2@FL3','fluor2@FL4']] Example 3: Separately unmixing two sets of channelsA bit esoteric, but if you wanted to unmix two separate sets of crosstalking fluorophores, the current implementation again requires that you make two transform functions. (Best justification I can think of for this might be if you had an established compensation protocol for two fluorophores that you wanted to preserve and you added two additional orthogonal fluorophores). code
Using the current implementation: compensate_transform_fxn1 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data],
channels=['FL1','FL2'])
multicolor_data = compensate_transform_fxn1(data=multicolor_data)
compensate_transform_fxn2 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor3_data,fluor4_data],
channels=['FL3','FL4'])
multicolor_data = compensate_transform_fxn2(data=multicolor_data) (Note: This approach uses two 2x2 submatrices of compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples={'fluor1':fluor1_data,
'fluor2':fluor2_data,
'fluor3':fluor3_data,
'fluor4':fluor4_data})
multicolor_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor1'),
('FL2','fluor2')])
multicolor_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL3','fluor3'),
('FL4','fluor4')]) Example 4: Swapping the fluorophores for which two channels are compensatedSwitching from code
Using the current implementation: compensate_transform_fxn1 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data],
channels=['FL1','FL2'])
multicolor_data1 = compensate_transform_fxn1(data=multicolor_data)
compensate_transform_fxn2 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor2_data,fluor1_data],
channels=['FL1','FL2'])
multicolor_data2 = compensate_transform_fxn2(data=multicolor_data) Using my proposed API: compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples={'fluor1':fluor1_data,
'fluor2':fluor2_data})
multicolor_data1 = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor1'),
('FL2','fluor2')])
multicolor_data2 = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor2'),
('FL2','fluor1')]) Example 5: Compensating a subset of channels (possibly outdated)At least as currently implemented, compensating a subset of the channels originally specified would require making a new transform function. code
Using the current implementation: compensate_transform_fxn1 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data,fluor3_data],
channels=['FL1','FL2','FL3'])
# This would require and subsequently change FL2. It would also likely yield
# slightly different compensated FL1 and FL3 signals.
multicolor_data = compensate_transform_fxn1(data=multicolor_data)
# This would outright fail, I assume.
multicolor_data = compensate_transform_fxn1(data=multicolor_data, channels=['FL1','FL3'])
compensate_transform_fxn2 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor3_data],
channels=['FL1','FL3'])
multicolor_data = compensate_transform_fxn2(data=multicolor_data) Using my proposed API: compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples={'fluor1':fluor1_data,
'fluor2':fluor2_data,
'fluor3':fluor3_data})
multicolor_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor1'),
('FL3','fluor3')]) Example 6: Common use caseFor completeness, here's how I envision the common use case would look. code
Using the current implementation: compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data,fluor3_data],
channels=['FL1','FL2','FL3'])
multicolor_data = compensate_transform_fxn(data=multicolor_data) Using my proposed API: compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples={'fluor1':fluor1_data,
'fluor2':fluor2_data,
'fluor3':fluor3_data})
multicolor_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor1'),
('FL2','fluor2'),
('FL3','fluor3')]) |
Thanks for the detailed examples. To be honest, though, I remain skeptical about the general idea of using explicit fluorophore labels. I think the crux of the disagreement is in this statement of yours:
I am not convinced this is a remotely common use case. Trying different channel/fluorophore combinations doesn't seem to be something a user would do frequently, with the exception of an initial exploratory phase in which many calls to
compensate_transform_fxn1 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data,fluor3_data,fluor4_data],
comp_channels=['FL1','FL2','FL3','FL4'])
multicolor_data = compensate_transform_fxn1(data=multicolor_data) And if they're different, whether one is more "correct".
compensate_transform_fxn1 = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples=[fluor1_data,fluor2_data,fluor3_data],
comp_channels=['FL1','FL2','FL3'])
# The following would do the same as `compensate_transform_fxn1(data=multicolor_data)`,
# but only modify FL1 and FL3 in `multicolor_data`.
multicolor_data = compensate_transform_fxn1(data=multicolor_data, channels=['FL1','FL3']) Another issue is that, since you didn't show how you're calculating your spillover matrix, I'm not sure what your math is doing exactly. I assumed that, for example, in this case: compensate_transform_fxn = compensate.get_transform_fxn(nfc_sample=None,
sfc_samples={'fluor1':fluor1_data,
'fluor2':fluor2_data,
'fluor3':fluor3_data,
'fluor4':fluor4_data})
multicolor_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor1'),
('FL2','fluor2')])
multicolor_data_fl1_only = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FL1','fluor1')]) the The use of fluorophore labels has one additional, potentially very confusing issue in flow cytometers with channel names taken from fluorophores. For example, if channels are FITC, PE, and APC, and you're using fluorophores of the same names, the following would be allowed in your version of the API: multicolor_data = compensate_transform_fxn(data=multicolor_data,
channel_fluorophore_pairs=[('FITC','PE'),
('PE','APC'),
('APC','FITC')]) which is very confusing imo. To be clear, I don't think your API version is bad. I think what you have in example 6 is perfectly usable (unless you're using a flow cytometer with fluorophore names used for channels, in which case it becomes kind of confusing). But I'm not convinced that the additional complexity of introducing fluorophore labels is justified for n-to-n compensation, given that the simplest use case will be used significantly more than any other. I'm kind of torn, however, because I think with (significantly) more time and discussion some of these ideas could work well with m-to-n compensation. I think I'd like to wait for a reply and think about this a little more. (EDIT: changed last paragraph) |
The math underlying compensation describes all of the unmixed signals—signals about which I think I would absolutely be curious. A well-designed API would expose all those signals in an elegant way that does not detract from the common use case. That being said, I agree examination of the non-"primary" signals would probably not be the most common use case. To address channels,fluorophores = zip(*channel_fluorophore_pairs)
channels = unique(channels)
fluorophores = unique(fluorophores)
spillover = pd.DataFrame([[statistic_fxn(sfc_sample[:,channel]) for sfc_sample in sfc_samples]
for channel in channels],
index=channels,
columns=fluorophores)
# Example result:
# >>> spillover
# fluor1 fluor2
# FL1 42.00 17.25
# FL2 21.00 51.75
# FL3 0.42 69.00 (The above would have to be modified to incorporate background subtraction. And My goal, though, is really as stated at the beginning of this post (elegantly exposing all unmixed signals), and if there's a better API to achieve that, I'm amenable (even if we can't fully realize it in this next FlowCal release). Fluorophore labels, for example, are largely just a kludge glue that bridges the two compensation functions (
And we can continue to wait on m-to-n compensation (sorry, my intuition just says that's gonna be the way superior strategy when you're considering more channels than fluorophores, so I keep bringing it back in, but you're wise to wait for evidence). |
Sounds good. I'll move on to units tests for the current version then. Fell free to open issues for removing |
Some other details as I think about the For uncalibrated data, I think we would technically need to check that all detector voltages match among For calibrated data, I don't think detector voltage should matter anymore? And in that case, I guess I would preserve the original detector voltage in Regarding I don't think I also don't think |
Thinking more about this use case (compensating calibrated data), I'm left wondering whether compensation and calibration are commutative. I think calibration of multicolor data should comport with calibration of single-color data if the multicolor data is first compensated (thereby calibrating the fluorescence of a fluorophore as if it were measured in isolation), but I don't know what happens if you compensate after calibration. Making the tutorial and examples should clarify this. If compensation and calibration are not commutative, though, we should probably address that explicitly in the compensation docstring. And we should more seriously consider throwing warnings or errors if |
To reply to the last few points raised:
|
I think I've addressed all previous issues with the last few commits. Among these:
I'll wait for comments. |
In the process of merging develop, I reaffirmed my opinion that the gray histogram colormap in the example files doesn't look very good. In particular, the first few histograms are white and not even visible. I know you added a small constant to the inducer levels to fix this, but it seems to me like a better fix would be a better colormap. This is what it looks like if I just change the colormap to I am gonna press strongly again for leaving the default colormap (no colormap specified). I always found the discussion on "perceptually uniform colormaps" a little out of place, and I think it only adds bloat to an example file that's already large. This is what it looks like with the default colormap: I would also be in favor of using viridis just because it doesn't look bad, although if we do that I would like to not have a big comment above talking about perceptually uniform colormaps. I'm leaving the default colors for now until we reach an agreement. |
I really think The updated example plots are confusing now and overwhelmed by the compensation workflow, and the compensation tutorial could be more focused. (The violin plot tutorial also looks bad now and incorrectly shows Min and Max in the last plot.) In particular, I think it's really confusing to show both the input (orange) and output (green) data at every stage of the analysis.
Other minor issues:
I'll continue my review after we've addressed the major issues listed above. |
Solves the Python API portion of #238.
@JS3xton and I discussed a few issues to resolve before this gets accepted. I will transcribe those later (unless you beat me to it).