Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MR FIX legacy data #241

Merged
merged 27 commits into from
Dec 28, 2022
Merged

MR FIX legacy data #241

merged 27 commits into from
Dec 28, 2022

Conversation

coalsont
Copy link
Member

Expose number of wisharts and make a melodic-like mode for icaDim, update example scripts to match.

@coalsont coalsont requested review from glasserm and mharms December 13, 2022 01:50
@mharms
Copy link
Contributor

mharms commented Dec 13, 2022

I'm a bit confused. Is this functionality already in v4.7.0?

@coalsont
Copy link
Member Author

No, this is not in 4.7.0. There was a "bugfix" for identifying problem runs and rerunning them with 1 volume wishart, but it did not include these additional options.

@coalsont coalsont changed the title Mr fix legacy data MR FIX legacy data Dec 13, 2022
@mharms
Copy link
Contributor

mharms commented Dec 13, 2022

Have you done any testing already? We should test with hp = -1, 0, 200, 'pd2' to cover the four main supported "modes" of hp specification.

@coalsont
Copy link
Member Author

I have done some testing on HCP-style and legacy-style data, with the recommended settings (hp=0, and hp=-1 during the concatenated section). I didn't test polynomial detrending.

@mharms
Copy link
Contributor

mharms commented Dec 13, 2022

Can we test hp = 2000, 'pd2' as well?

@coalsont
Copy link
Member Author

coalsont commented Dec 13, 2022

I have tested hp 2000 and pd2, and didn't see any error messages.

@coalsont
Copy link
Member Author

@mharms I have made edits to the comments which should resolve the discussions. Please check them and say whether you are happy with the PR now.

@coalsont
Copy link
Member Author

@glasserm any comments?

Copy link
Contributor

@glasserm glasserm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main thing missing here is the --legacy flag to trigger the legacy behavior and the --processing-mode=(HCPStyleData|LegacyStyleData)] stuff that the other scripts with legacy options have. I don't think we want legacy mode users to have to figure out what things like volWisharts, ciftiWisharts, or icadimMode mean. Such users are already unlikely to be super comfortable with the HCP Pipelines, so we need to keep things simple. Those are fine as expert flags for someone like Takuya or me, but are not a good thing to expose to the typical user.

@coalsont
Copy link
Member Author

We wanted to be able to change the wisharts for other purposes, which is why I made them options. Having more than one option that changes the wisharts would create problems with precedence, and hide additional sets of defaults in the code. This is why I suggested a config file to contain the "legacy" settings (config handling is already built into the new options library), to simplify the three arguments into one, without creating new problems. That would require converting ReApply to newopts, but it would also allow the possibility for hcp_fix_multi_run to pass more of its arguments to ReApply... via a config file saved into the subject dir while running, rather than needing to be copied manually into the ReApply batch file.

Regardless, "--legacy" is not a good option name for triggering this functionality, since we established there could be data other than legacy-style that would benefit from it, and it isn't clear from the name what aspect of the processing it changes. A "--fix-mode=legacy" option might be a better way to name it, allowing easy extension to an NHP mode. But, of course, using the config functionality would allow new settings without editing the code itself. I don't see a reason not to use a config file for this, unless we specifically want the wishart settings and other details to not be exposed in the usage (and not allow users to use other combinations without editing the code).

I am not as clear whether this processing mode has drawbacks considerable enough to disable it without the "legacy-style" processing mode. One end of the scale is slice timing interpolation, which interferes with motion correction. It also raises the question of what settings for the wisharts should require the processing mode check.

@mharms
Copy link
Contributor

mharms commented Dec 19, 2022

@coalsont nicely articulates the unease I was feeling about automatically triggering these new features as part of the other "legacy"-style code.

@coalsont
Copy link
Member Author

coalsont commented Dec 19, 2022

I think even with Matt's idea, there would be separate "--processing-mode=legacy" to enable, and "--fix-mode=legacy" or similar options to actually change the mode. Config files to contain the groups of settings, rather than writing them into the code, just seems more maintainable and appropriate to me.

@coalsont
Copy link
Member Author

I don't think melodiclike prevents fitting multiple wisharts, though the "half spectrum" noise estimate may not be good enough for stable results.

@mharms
Copy link
Contributor

mharms commented Dec 22, 2022

Isn't the point of the fitting to determine the dimensionality? If so, then if melodiclike fixes the dimensionality, then what role do volwisharts and ciftiwisharts play?

@coalsont
Copy link
Member Author

coalsont commented Dec 22, 2022

I think it is true that melodic doesn't fit multiple wisharts. That may be an unimportant detail, and the actual effect of the setting is described in the help info.

@glasserm
Copy link
Contributor

You wouldn't use multiple Wisharts with melodiclike, so that can be an error. Melodic uses a fixed number of eigenvalues to do variance normalization from the rest, that is why this is melodiclike.

@mharms
Copy link
Contributor

mharms commented Dec 22, 2022

I was perhaps conflating the estimate of the dimensionality for the VN (which apparently melodic does with a hard-coded assumption that the back-half of the eigen spectrum is unstructured?), from the estimate of the dimensionality to use for the ICA decomposition itself...
In the iterative approach, are those two different dimensionalities the same, or are they still different?

@glasserm
Copy link
Contributor

  1. Melodiclike is an option for the variance normalization approach. Melodic uses a fixed d=30 for variance normalization (i.e., do variance normalization on the eigenvalues to the right of 30; Line 580 of https://git.fmrib.ox.ac.uk/fsl/melodic/-/blob/master/meldata.cc). I thought this was a bad idea, so that was part of why ica_dim converges to do variance normalization with the number of eigenvalues to the right of the estimated dimensionality. This doesn't work correctly with legacy data; however, so I just use the right-hand half of the eigenvalues, which allows for some dynamic scaling according to number of timepoints (rather than just picking a fixed number like melodic did).
  2. Melodic estimates the spatial degrees of freedom a bit differently than ica_dim. It looks at the 3D data smoothness to estimate the spatial degrees of freedom and creates a Wishart distribution based on that, rather than fitting to the back half of the eigen spectrum as ica_dim does to figure out the Wishart distribution. The 3D smoothness approach will never work for CIFTI data, so we still need to use our ica_dim approach for such data; however, multiple Wishart distributions will also not work well for legacy data with few timepoints, so those should be set to 1.

@mharms
Copy link
Contributor

mharms commented Dec 22, 2022

Thanks @glasserm. That's helpful additional context in your comment above. One other thing might be worth commenting on here (or even including as a comment in functionhighpass... where the default number of Wisharts is set) is why/how did the need for multiple Wisharts come into existence in the first place? (E.g., there isn't by any chance a figure of the fit for the differing number of Wisharts in the supplement of a paper, is there?)

Co-authored-by: Michael Harms <[email protected]>
@glasserm
Copy link
Contributor

I've asked Alex to put that in his paper. I did an OHBM abstract on it in 2020.

@coalsont
Copy link
Member Author

Since the VN step is orthogonal to the wishart modeling step, I don't see a reason to make it an error to combine them in a way we aren't currently expecting. A warning makes sense for combinations that we expect to be suboptimal, but haven't had a reason to try. We already print a bunch of stuff just for enabling the option anyway.

At present, warnings are printed to stdout, with just the addition of the word "WARNING". Do we want to stick with that, or have all warnings go to stderr? We could even test if stderr is a tty, and if not, log to both stderr and stdout.

@coalsont
Copy link
Member Author

On the other hand, maybe the point is that we want to warn users that what we call "melodiclike" is intended for low timepoints, and could check the number of timepoints against some threshold (300?) and warn the user that default mode should work better. This could also suggest that the mode should be named something like "fewtimepoints".

@glasserm
Copy link
Contributor

I don't know exactly how many timepoints is too few. The HCP-YA Emotion task was 176 timepoints and lasted 2.1 minutes and works fine. The legacy data had 90 timepoints and lasted 4.5 minutes and didn't work. There may be an interaction between slow TR and few timepoints as well.

@glasserm
Copy link
Contributor

Warnings could go to both stderr and stdout if not a tty.

@mharms
Copy link
Contributor

mharms commented Dec 23, 2022

I'm fine with either melodiclike or fewtimepoints. Regarding putting WARNINGS into stderr, is that something we do anywhere else? Personally, I think error messages should go to both stdout and stderr (perhaps they already do via our error handling functions?) Under that model, if warnings go to stderr, they should be duplicated in stdout as well. But there seems to be differing opinions online about whether warnings properly belong in stderr in the first place.

@coalsont
Copy link
Member Author

For unixlike tools where the stdout is likely to be parsed by some other command, warnings (and any other stuff besides the parseable output) should absolutely go to stderr. For other tools, stderr is for things you want to make extra sure the user sees (if they redirect just stdout to a file, stderr will still show up in the terminal), so stderr is still a good place for warnings (as long as warnings aren't overused for minor things).

I didn't remember it, but yes our log functions already did put errors into both stderr and stdout, if stderr wasn't a tty, but left warnings on stdout. Checking whether stdout and stderr go to the same place might be better, but I don't know if there is an easy way to do that. To be clear, I am suggesting changing the shlib so all warnings in all pipelines would get printed to the same place(s) as errors.

@mharms
Copy link
Contributor

mharms commented Dec 23, 2022

I don't have a strong opinion. It would probably elevate the likelihood that warnings would be seen, but would also mean that a non-empty stderr file is no longer a sign of a error per se (i.e., the stderr file may be non-empty, even though the processing completed).
This is all further complicated by the additional layer that QuNex adds. QuNex generates just a single log of the pipeline output, so I assume that it is redirecting stdout and stderr into the same file.

@coalsont
Copy link
Member Author

coalsont commented Dec 23, 2022

If we output a warning from the pipelines, it is generally serious enough that it should be looked at as if it could be an error, right? Hopefully no tools are specifically using log file sizes as a completion check.

Yes, that would suggest that at present, qunex logs that have a pipelines-produced error have the message printed twice in the log. I don't see this as a significant problem, given the technical challenge of avoiding it (versus the more important feature of showing it in both .o and .e logs), and would be fine with having warnings do the same.

@glasserm
Copy link
Contributor

I agree.

@mharms
Copy link
Contributor

mharms commented Dec 23, 2022

Will this have any impact on where the various "Warnings" from wb_command end up? I assume those will still be stdout only?

@coalsont
Copy link
Member Author

coalsont commented Dec 23, 2022

No, it does not affect how other commands choose to report warnings. If they choose to use stderr, it will be on stderr.

FYI, it appears all "log messages" from wb_command (even debug, if enabled) go to stderr at present. The only things that go to stdout are things that don't use the logging macros.

@coalsont coalsont merged commit 86d7a8a into master Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants