Option to set IID subsample #56

handwerkerd · 2023-05-22T21:43:20Z

As discussed at the May 2023 tedana developers call, we decided to add an option to let a user set the IID subsampling value sub_iid_sp_median I think this PR has it working, but I want to make sure everything is correct and properly documented before saying it's ready to merge.

As part of this PR, we'll always estimate the IID subsample and both the estimated IID subsample and the one used (different if user provided) will be logged.

handwerkerd · 2023-05-24T16:02:57Z

@eurunuela @tsalo @CesarCaballeroGaudes @javiergcas @dowdlelt @n-reddy I have some results from testing this idea & I wanted to share. As a reminder, for this dataset, I noticed that most runs estimated a subsampling depth to 2 (to make the remaining sparse data IID) but a bunch of runs has estimates of 1 or 3 which resulted in the estimated number of PCA components being way too high or low. The idea was that, given the acquisition parameters are identical, the spatial smoothness for all runs should be similar, so setting the subsampling depth to 2 for all runs might address the issue.

The attached data is from a study where 25 participants did event-related word nonword task runs, movie viewing with paced breathing runs, and paced breathing runs. In the figure, each marker is a run and there are 174 runs total. In case it mattered, each type of run is a different marker shape. The runs where the program estimated a depth of 2 are green while estimates of 1 or 3 were blue or red. For the first row, the x-axis is the number of components for the aic, kic, and mdl criteria if I use the estimated subsampling depth and the y-axis is if I set the depth to 2 using this PR. (axis scaling is equal so the green makers, where depth is 2 for both, is the unity line.) You can see that the number of components gets closer to typical values from runs where it was set to 2, but some are still near the extremes.

The second row shows variances explained by the retained components in the PCA and again, the values are a bit less extreme using a fixed value.

The 3rd row is variance explained vs number of components when subsampling is set to 2 and the 4th row is for estimated subsample size Here's its very clear that setting the value to 2 means all the values now seem to be within a similar overall distrubtion.

The big minus is that there are still extreme cases with too many retained components (only with aic) and cases where the total variance explained by the PCA seems a bit low. This makes me think that there is an underlying limit to the MAPCA method that we still haven't characterized and I'd like to figure out a better option (possibly taking advantage of multi-echo info). Still I think this is an improvement and I'd like to clean up this PR to merge and link it to tedana.

Thoughts?

handwerkerd · 2023-05-24T21:41:56Z

I cleaned up the code & documentation and fixed an existing bug that was causing integration tests to fail (A file name was mistyped). If we decide this option is worth adding, this PR is ready for review/merging.

codecov-commenter · 2023-05-24T21:43:34Z

Codecov Report

Patch coverage: 88.23% and project coverage change: +5.17 🎉

Comparison is base (45837f9) 90.26% compared to head (8abb34d) 95.43%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   90.26%   95.43%   +5.17%     
==========================================
  Files           3        3              
  Lines         298      307       +9     
==========================================
+ Hits          269      293      +24     
+ Misses         29       14      -15

Impacted Files	Coverage Δ
mapca/mapca.py	`95.33% <88.23%> (+8.38%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

dowdlelt · 2023-05-25T21:30:13Z

Am I right in making the observation that, for the ones in which the autoselection would have chosen 3, these are ones that the selection of 2 led to a relatively low amount of variance explained? Something about that jumps out to me. I am troubled also by the large jump from 2 to 3, thats a huge change for a single increment in a parameter. Fixing things at should likely help folks, but I agree with your "find a better method" mention.

I am fully supportive of this idea and will take a closer look at things, but just trying to grapple with the image.

handwerkerd · 2023-05-30T15:40:11Z

@dowdlelt I see a way where that might sense to me. If we assume that the runs where a sparseness factor of 3 is estimates have a slight increase in spatial smoothness, then it would be possible to model those runs with slightly fewer components than other runs. This is testable. I could make a smoothness estimate for each run and see if there's the ones with 3 are slightly smoother and if there's a more general relationship between spatial smoothness and the number of components that were estimated. I'm not sure I'll be able to get to runing that test during the next few days.

handwerkerd · 2023-06-05T19:25:51Z

The new figure includes spatial smoothness from each run (AFNI's 3dFWHMx)
Rows 1 & 2: number of components vs variance explained (Row 1 was in my previous comment)
Rows 3 & 4: number of components vs FWHM
Rows 5 & 6: variance explaiend vs FWHM
The odd rows are colored by the estimated subsampling depth (same as in my previous comment) and the even rows are colored by subject number to see if this is a subject-specific effect.

The issues are definitely in some subjects more than others and there's one subject with a larger FWHM than others, but the component estimates in that subject are typical and the estimated subsampling depth was 2.
Subsampling depth of 3 is definitely a couple of subjects with higher FWHM vals, but not the highest vals.
Subsampling depth of 1 is in runs with lower FWHM than others, but still within the group.

There's something here, but I don't think it's as simple as a FWHM->dimensionality estimate mapping.

I don't think this PR fully solves the issue, but I do think it might be useful.

handwerkerd · 2023-06-07T18:18:49Z

Someone suggested to me that the sparse sampling approach might fail with multi-slice sequences since there is some dependence in non-neighboring slices. To quickly check this, I ran MAPCA on 206 runs of optimally combined data from 2 participants. The subsampling value was consistently 2 in all runs. This hints that the multi-slice might be a confounding factor. That said, the component estimates ranged from 40-120 (150 volumes) so there are other factors as well.

I'm leaning more to saying MA-PCA is just not working as intended. This PR might help isolate some problems, but not solving the underlying issues. I'm mixed in merging this PR and giving users this option. Thoughts from others?

dowdlelt · 2023-06-07T19:42:43Z

I wanted to test it real quick before giving a thumbs up, but haven't had a chance. I definitely think it is the right thing to do - it's a sneak parameter, and being able to ask people what it was when they say something on neurostars is worth it (and suggest setting it to 2).

eurunuela · 2023-07-11T14:33:00Z

Hey @dowdlelt, have you had the chance to test it?

I lean towards merging this PR.

dowdlelt

I've found some good data to play with, but in the meantime, looks good to me. Happy subsampling!

dowdlelt · 2023-08-03T15:32:13Z

@eurunuela - I don't think you ever actually approved this, came up in the meeting today. Assuming you approve, we should be good to go.

eurunuela · 2023-08-03T15:33:41Z

Sorry @dowdlelt, I do approve.

Didn't know we had a meeting!

dowdlelt

Github confuses me - maybe because I wasn't requested my previous reviews didn't count? Do it live I say!

(but really, should be good to go)

handwerkerd added 2 commits May 22, 2023 17:39

More logging and setting subsample

216b334

More logging

d3a37e7

handwerkerd added 5 commits May 24, 2023 16:44

Cleaned up code and doc and fixed bug in testing

c30c4e3

style fixes

e799f3e

more style fixes

45f46f4

style fix

92b1c41

black formatting

8abb34d

handwerkerd marked this pull request as ready for review May 24, 2023 21:41

handwerkerd mentioned this pull request Jun 7, 2023

logging IID subsampling #57

Merged

dowdlelt approved these changes Jul 11, 2023

View reviewed changes

dowdlelt approved these changes Aug 3, 2023

View reviewed changes

eurunuela approved these changes Aug 3, 2023

View reviewed changes

handwerkerd requested a review from dowdlelt August 9, 2023 20:54

dowdlelt approved these changes Aug 9, 2023

View reviewed changes

Merge branch 'main' into OptionToSetSubsample

0cc0811

handwerkerd merged commit 556d013 into ME-ICA:main Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to set IID subsample #56

Option to set IID subsample #56

handwerkerd commented May 22, 2023

handwerkerd commented May 24, 2023

handwerkerd commented May 24, 2023

codecov-commenter commented May 24, 2023

dowdlelt commented May 25, 2023

handwerkerd commented May 30, 2023

handwerkerd commented Jun 5, 2023

handwerkerd commented Jun 7, 2023

dowdlelt commented Jun 7, 2023

eurunuela commented Jul 11, 2023

dowdlelt left a comment

dowdlelt commented Aug 3, 2023

eurunuela commented Aug 3, 2023

dowdlelt left a comment

Option to set IID subsample #56

Option to set IID subsample #56

Conversation

handwerkerd commented May 22, 2023

handwerkerd commented May 24, 2023

handwerkerd commented May 24, 2023

codecov-commenter commented May 24, 2023

Codecov Report

dowdlelt commented May 25, 2023

handwerkerd commented May 30, 2023

handwerkerd commented Jun 5, 2023

handwerkerd commented Jun 7, 2023

dowdlelt commented Jun 7, 2023

eurunuela commented Jul 11, 2023

dowdlelt left a comment

Choose a reason for hiding this comment

dowdlelt commented Aug 3, 2023

eurunuela commented Aug 3, 2023

dowdlelt left a comment

Choose a reason for hiding this comment