Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to set IID subsample #56

Merged
merged 8 commits into from
Aug 10, 2023

Conversation

handwerkerd
Copy link
Member

As discussed at the May 2023 tedana developers call, we decided to add an option to let a user set the IID subsampling value sub_iid_sp_median I think this PR has it working, but I want to make sure everything is correct and properly documented before saying it's ready to merge.

As part of this PR, we'll always estimate the IID subsample and both the estimated IID subsample and the one used (different if user provided) will be logged.

@handwerkerd
Copy link
Member Author

@eurunuela @tsalo @CesarCaballeroGaudes @javiergcas @dowdlelt @n-reddy I have some results from testing this idea & I wanted to share. As a reminder, for this dataset, I noticed that most runs estimated a subsampling depth to 2 (to make the remaining sparse data IID) but a bunch of runs has estimates of 1 or 3 which resulted in the estimated number of PCA components being way too high or low. The idea was that, given the acquisition parameters are identical, the spatial smoothness for all runs should be similar, so setting the subsampling depth to 2 for all runs might address the issue.

The attached data is from a study where 25 participants did event-related word nonword task runs, movie viewing with paced breathing runs, and paced breathing runs. In the figure, each marker is a run and there are 174 runs total. In case it mattered, each type of run is a different marker shape. The runs where the program estimated a depth of 2 are green while estimates of 1 or 3 were blue or red. For the first row, the x-axis is the number of components for the aic, kic, and mdl criteria if I use the estimated subsampling depth and the y-axis is if I set the depth to 2 using this PR. (axis scaling is equal so the green makers, where depth is 2 for both, is the unity line.) You can see that the number of components gets closer to typical values from runs where it was set to 2, but some are still near the extremes.

The second row shows variances explained by the retained components in the PCA and again, the values are a bit less extreme using a fixed value.

The 3rd row is variance explained vs number of components when subsampling is set to 2 and the 4th row is for estimated subsample size Here's its very clear that setting the value to 2 means all the values now seem to be within a similar overall distrubtion.

The big minus is that there are still extreme cases with too many retained components (only with aic) and cases where the total variance explained by the PCA seems a bit low. This makes me think that there is an underlying limit to the MAPCA method that we still haven't characterized and I'd like to figure out a better option (possibly taking advantage of multi-echo info). Still I think this is an improvement and I'd like to clean up this PR to merge and link it to tedana.

Thoughts?
testing_fixed_subsampling

@handwerkerd
Copy link
Member Author

I cleaned up the code & documentation and fixed an existing bug that was causing integration tests to fail (A file name was mistyped). If we decide this option is worth adding, this PR is ready for review/merging.

@handwerkerd handwerkerd marked this pull request as ready for review May 24, 2023 21:41
@codecov-commenter
Copy link

Codecov Report

Patch coverage: 88.23% and project coverage change: +5.17 🎉

Comparison is base (45837f9) 90.26% compared to head (8abb34d) 95.43%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   90.26%   95.43%   +5.17%     
==========================================
  Files           3        3              
  Lines         298      307       +9     
==========================================
+ Hits          269      293      +24     
+ Misses         29       14      -15     
Impacted Files Coverage Δ
mapca/mapca.py 95.33% <88.23%> (+8.38%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dowdlelt
Copy link

Am I right in making the observation that, for the ones in which the autoselection would have chosen 3, these are ones that the selection of 2 led to a relatively low amount of variance explained? Something about that jumps out to me. I am troubled also by the large jump from 2 to 3, thats a huge change for a single increment in a parameter. Fixing things at should likely help folks, but I agree with your "find a better method" mention.

I am fully supportive of this idea and will take a closer look at things, but just trying to grapple with the image.

@handwerkerd
Copy link
Member Author

@dowdlelt I see a way where that might sense to me. If we assume that the runs where a sparseness factor of 3 is estimates have a slight increase in spatial smoothness, then it would be possible to model those runs with slightly fewer components than other runs. This is testable. I could make a smoothness estimate for each run and see if there's the ones with 3 are slightly smoother and if there's a more general relationship between spatial smoothness and the number of components that were estimated. I'm not sure I'll be able to get to runing that test during the next few days.

@handwerkerd
Copy link
Member Author

The new figure includes spatial smoothness from each run (AFNI's 3dFWHMx)
Rows 1 & 2: number of components vs variance explained (Row 1 was in my previous comment)
Rows 3 & 4: number of components vs FWHM
Rows 5 & 6: variance explaiend vs FWHM
The odd rows are colored by the estimated subsampling depth (same as in my previous comment) and the even rows are colored by subject number to see if this is a subject-specific effect.

The issues are definitely in some subjects more than others and there's one subject with a larger FWHM than others, but the component estimates in that subject are typical and the estimated subsampling depth was 2.
Subsampling depth of 3 is definitely a couple of subjects with higher FWHM vals, but not the highest vals.
Subsampling depth of 1 is in runs with lower FWHM than others, but still within the group.

There's something here, but I don't think it's as simple as a FWHM->dimensionality estimate mapping.

I don't think this PR fully solves the issue, but I do think it might be useful.
comps_fwhm

@handwerkerd
Copy link
Member Author

Someone suggested to me that the sparse sampling approach might fail with multi-slice sequences since there is some dependence in non-neighboring slices. To quickly check this, I ran MAPCA on 206 runs of optimally combined data from 2 participants. The subsampling value was consistently 2 in all runs. This hints that the multi-slice might be a confounding factor. That said, the component estimates ranged from 40-120 (150 volumes) so there are other factors as well.

I'm leaning more to saying MA-PCA is just not working as intended. This PR might help isolate some problems, but not solving the underlying issues. I'm mixed in merging this PR and giving users this option. Thoughts from others?

@dowdlelt
Copy link

dowdlelt commented Jun 7, 2023

I wanted to test it real quick before giving a thumbs up, but haven't had a chance. I definitely think it is the right thing to do - it's a sneak parameter, and being able to ask people what it was when they say something on neurostars is worth it (and suggest setting it to 2).

@eurunuela
Copy link
Collaborator

Hey @dowdlelt, have you had the chance to test it?

I lean towards merging this PR.

Copy link

@dowdlelt dowdlelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found some good data to play with, but in the meantime, looks good to me. Happy subsampling!

@dowdlelt
Copy link

dowdlelt commented Aug 3, 2023

@eurunuela - I don't think you ever actually approved this, came up in the meeting today. Assuming you approve, we should be good to go.

@eurunuela
Copy link
Collaborator

Sorry @dowdlelt, I do approve.

Didn't know we had a meeting!

@handwerkerd handwerkerd requested a review from dowdlelt August 9, 2023 20:54
Copy link

@dowdlelt dowdlelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github confuses me - maybe because I wasn't requested my previous reviews didn't count? Do it live I say!

(but really, should be good to go)

@handwerkerd handwerkerd merged commit 556d013 into ME-ICA:main Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants