From 14d6e645e5f9b868d8dc803abeff2bfa4a0db412 Mon Sep 17 00:00:00 2001 From: Michael Joseph Date: Thu, 25 Feb 2021 13:14:34 -0500 Subject: [PATCH] update core episodes --- _episodes/02-anatomy-of-nifti.md | 112 ++++++++++++++------ _episodes/04-open-mri-datasets.md | 169 +++++++++++++++++++++++++++--- 2 files changed, 235 insertions(+), 46 deletions(-) diff --git a/_episodes/02-anatomy-of-nifti.md b/_episodes/02-anatomy-of-nifti.md index de6dc2d..a7613d1 100644 --- a/_episodes/02-anatomy-of-nifti.md +++ b/_episodes/02-anatomy-of-nifti.md @@ -3,35 +3,41 @@ title: "Anatomy of a NIfTI" teaching: 20 exercises: 5 questions: -- "" +- "How are MRI data represented digitally?" objectives: - "Introduce Python data types" -- Load a NIfTI file into Python and understand how data is stored -- View and manipulate image data +- "Load an MRI scan into Python and understand how the data is stored" +- "Understand what a voxel is" +- "Introduce MRI coordinate systems" +- "View and manipulate image data" keypoints: - "" --- -In the last lesson, we introduced the NIfTI. NIfTI is one of the most ubiquitous file formats for storing neuroimaging data. We'll cover a few details to get started working with them. If you're interested in learning more about NIfTI images, we highly recommend [this blog post](https://brainder.org/2012/09/23/the-nifti-file-format/) about the NIfTI format. +In the last lesson, we introduced the NIfTI. NIfTI is one of the most ubiquitous file formats for storing neuroimaging data. +We'll cover a few details to get started working with them. +If you're interested in learning more about NIfTI images, we highly recommend [this blog post](https://brainder.org/2012/09/23/the-nifti-file-format/) about the NIfTI format. ## Reading NIfTI Images -[NiBabel](https://nipy.org/nibabel/) is a Python package for reading and writing neuroimaging data. To learn more about how NiBabel handles NIfTIs, check out the [Working with NIfTI images](https://nipy.org/nibabel/nifti_images.html) page of the NiBabel documentation. +[NiBabel](https://nipy.org/nibabel/) is a Python package for reading and writing neuroimaging data. +To learn more about how NiBabel handles NIfTIs, check out the [Working with NIfTI images](https://nipy.org/nibabel/nifti_images.html) page of the NiBabel documentation, from which this episode is heavily based. ~~~ import nibabel as nib ~~~ {: .language-python} -First, use the `load()` function to create a NiBabel image object from a NIfTI file. We'll load in an example T1w image from the zip file we just downloaded. +First, use the `load()` function to create a NiBabel image object from a NIfTI file. +We'll load in an example T1w image from the zip file we just downloaded. ~~~ t1_img = nib.load("../../../data/someones_anatomy.nii.gz") ~~~ {: .language-python} -Loading in a NIfTI file with `NiBabel` gives us a special type of data object which encodes all the information in the file. Each bit of information is called an **attribute** in Python's terminology. To see all of these attributes, type `t1_img.` followed by Tab. - +Loading in a NIfTI file with `NiBabel` gives us a special type of data object which encodes all the information in the file.Each bit of information is called an **attribute** in Python's terminology. +To see all of these attributes, type `t1_img.` followed by Tab. There are three main attributes that we'll discuss today: ### 1. [Header](https://nipy.org/nibabel/nibabel_images.html#the-image-header): contains metadata about the image, such as image dimensions, data type, etc. @@ -88,8 +94,12 @@ srow_z : [ 0. 0. 2.75 -91. ] intent_name : b'' magic : b'n+1' ~~~ +{: .output} -`t1_hdr` is a Python **dictionary**. Dictionaries are containers that hold pairs of objects - keys and values. Let's take a look at all of the keys. Similar to `t1_img` in which attributes can be accessed by typing `t1_img.` followed by Tab, you can do the same with `t1_hdr`. In particular, we'll be using a **method** belonging to `t1_hdr` that will allow you to view the keys associated with it. +`t1_hdr` is a Python **dictionary**. +Dictionaries are containers that hold pairs of objects - keys and values. Let's take a look at all of the keys. +Similar to `t1_img` in which attributes can be accessed by typing `t1_img.` followed by Tab, you can do the same with `t1_hdr`. +In particular, we'll be using a **method** belonging to `t1_hdr` that will allow you to view the keys associated with it. ~~~ t1_hdr.keys() @@ -141,15 +151,18 @@ t1_hdr.keys() 'intent_name', 'magic'] ~~~ +{: .output} -Notice that **methods** require you to include () at the end of them whereas **attributes** do not. The key difference between a method and an attribute is: +Notice that **methods** require you to include () at the end of them whereas **attributes** do not. +The key difference between a method and an attribute is: * Attributes are stored *values* kept within an object * Methods are *processes* that we can run using the object. Usually a method takes attributes, performs an operation on them, then returns it for you to use. When you type in `t1_img.` followed by Tab, you'll see that attributes are highlighted in orange and methods highlighted in blue. -The output above is a list of **keys** you can use from `t1_hdr` to access **values**. We can access the value stored by a given key by typing: +The output above is a list of **keys** you can use from `t1_hdr` to access **values**. +We can access the value stored by a given key by typing: ~~~ t1_hdr[''] @@ -173,7 +186,9 @@ t1_hdr[''] ### 2. Data -As you've seen above, the header contains useful information that gives us information about the properties (metadata) associated with the MR data we've loaded in. Now we'll move in to loading the actual *image data itself*. We can achieve this by using the method called `t1_img.get_fdata()`. +As you've seen above, the header contains useful information that gives us information about the properties (metadata) associated with the MR data we've loaded in. +Now we'll move in to loading the actual *image data itself*. +We can achieve this by using the method called `t1_img.get_fdata()`. ~~~ t1_data = t1_img.get_fdata() @@ -268,6 +283,7 @@ array([[[ 8.79311806, 8.79311806, 8.79311806, ..., 7.73574093, [ 8.79311806, 9.1455771 , 9.1455771 , ..., 8.08819997, 7.73574093, 7.73574093]]]) ~~~ +{: .output} What type of data is this exactly? We can determine this by calling the `type()` function on `t1_data`. @@ -279,11 +295,13 @@ type(t1_data) ~~~ numpy.ndarray ~~~ +{: .output} -The data is a multidimensional **array** representing the image data. In Python, an array is used to store lists of numerical data into something like a table. +The data is a multidimensional **array** representing the image data. +In Python, an array is used to store lists of numerical data into something like a table. > ## Check out attributes of the array -> How can we see the number of dimensions in the `t1_data` array? Once again, all of the attributes of the array can be seen by typing `t1_data.` followed by . +> How can we see the number of dimensions in the `t1_data` array? Once again, all of the attributes of the array can be seen by typing `t1_data.` followed by Tab. > ~~~ > t1_data.ndim > ~~~ @@ -310,12 +328,15 @@ The data is a multidimensional **array** representing the image data. In Python, > > ~~~ > > (57, 67, 56) > > ~~~ +> > {: .output} > {: .solution} {: .challenge} -The 3 numbers given here represent the number of values *along a respective dimension (x,y,z)*. This brain was scanned in 57 slices with a resolution of 67 x 56 voxels per slice. That means there are: +The 3 numbers given here represent the number of values *along a respective dimension (x,y,z)*. +This brain was scanned in 57 slices with a resolution of 67 x 56 voxels per slice. +That means there are: -$57 * 67 * 56 = 213864$ +`57 * 67 * 56 = 213864` voxels in total! @@ -329,9 +350,11 @@ t1_data.dtype ~~~ dtype('float64') ~~~ +{: .output} This tells us that each element in the array (or voxel) is a floating-point number. -The data type of an image controls the range of possible intensities. As the number of possible values increases, the amount of space the image takes up in memory also increases. +The data type of an image controls the range of possible intensities. +As the number of possible values increases, the amount of space the image takes up in memory also increases. ~~~ import numpy as np @@ -344,10 +367,12 @@ print(np.max(t1_data)) 6.678363800048828 96.55541983246803 ~~~ +{: .output} For our data, the range of intensity values goes from 0 (black) to more positive digits (whiter). -How do we examine what value a particular voxel is? We can inspect the value of a voxel by selecting an index as follows: +How do we examine what value a particular voxel is? +We can inspect the value of a voxel by selecting an index as follows: `data[x,y,z]` @@ -361,14 +386,20 @@ t1_data[9, 19, 2] ~~~ 32.40787395834923 ~~~ +{: .output} -**NOTE**: Python uses **zero-based indexing**. The first item in the array is item 0. The second item is item 1, the third is item 2, etc. +**NOTE**: Python uses **zero-based indexing**. +The first item in the array is item 0. +The second item is item 1, the third is item 2, etc. -This yields a single value representing the intensity of the signal at a particular voxel! Next we'll see how to not just pull one voxel but a slice or an array of voxels for visualization and analysis! +This yields a single value representing the intensity of the signal at a particular voxel! +Next we'll see how to not just pull one voxel but a slice or an array of voxels for visualization and analysis! ## Working with Image Data -Slicing does exactly what it seems to imply. Giving our 3D volume, we pull out a 2D slice of our data. Here's an example of slicing from left to right (sagittal slicing): +Slicing does exactly what it seems to imply. +Giving our 3D volume, we pull out a 2D slice of our data. +Here's an example of slicing from left to right (sagittal slicing): ![](../fig/T1w.gif) @@ -377,11 +408,12 @@ This gif is a series of 2D images or **slices** moving from left to right. Let's pull the 10th slice in the x axis. ~~~ -x_slice = t1_data[9:, :, :] +x_slice = t1_data[9, :, :] ~~~ {: .language-python} -This is similar to the indexing we did before to pull out a single voxel. However, instead of providing a value for each axis, the `:` indicates that we want to grab *all* values from that particular axis. +This is similar to the indexing we did before to pull out a single voxel. +However, instead of providing a value for each axis, the `:` indicates that we want to grab *all* values from that particular axis. > ## Slicing MRI Data > Now try selecting the 20th slice from the y axis. @@ -407,7 +439,10 @@ We've been slicing and dicing brain images but we have no idea what they look li ## Visualizing -We previously inspected the signal intensity of the voxel at coordinates (10,20,3). Let's see what out data looks like when we slice it at this location. We've already indexed the data at each x, y, and z axis. Let's use `matplotlib`. +We previously inspected the signal intensity of the voxel at coordinates (10,20,3). +Let's see what out data looks like when we slice it at this location. +We've already indexed the data at each x, y, and z axis. +Let's use `matplotlib`. ~~~ import matplotlib.pyplot as plt @@ -425,7 +460,8 @@ Now, we're going to step away from discussing our data and talk about the final ### 3. [Affine](https://nipy.org/nibabel/coordinate_systems.html): tells the position of the image array data in a reference space -The final important piece of metadata associated with an image file is the **affine matrix**. Below is the affine matrix for our data. +The final important piece of metadata associated with an image file is the **affine matrix**. +Below is the affine matrix for our data. ~~~ t1_affine = t1_img.affine @@ -439,6 +475,7 @@ array([[ 2.75, 0. , 0. , -78. ], [ 0. , 0. , 2.75, -91. ], [ 0. , 0. , 0. , 1. ]]) ~~~ +{: .output} To explain this concept, recall that we referred to coordinates in our data as (x,y,z) coordinates such that: @@ -446,7 +483,9 @@ To explain this concept, recall that we referred to coordinates in our data as ( * y is the second dimension of `t1_data` * z is the third dimension of `t1_data` -Although this tells us how to access our data in terms of voxels in a 3D volume, it doesn't tell us much about the actual dimensions in our data (centimetres, right or left, up or down, back or front). The affine matrix allows us to translate between *voxel coordinates* in (x,y,z) and *world space coordinates* in (left/right,bottom/top,back/front). An important thing to note is that in reality in which order you have: +Although this tells us how to access our data in terms of voxels in a 3D volume, it doesn't tell us much about the actual dimensions in our data (centimetres, right or left, up or down, back or front). +The affine matrix allows us to translate between *voxel coordinates* in (x,y,z) and *world space coordinates* in (left/right,bottom/top,back/front). +An important thing to note is that in reality in which order you have: * left/right * bottom/top @@ -467,22 +506,27 @@ The concept of an affine matrix may seem confusing at first but an example might Suppose we have two voxels located at the the following coordinates: $(64,100,2)$ -And we wanted to know what the distances between these two voxels are in terms of real world distances (millimetres). This information cannot be derived from using voxel coordinates so we turn to the **affine matrix**. +And we wanted to know what the distances between these two voxels are in terms of real world distances (millimetres). +This information cannot be derived from using voxel coordinates so we turn to the **affine matrix**. -Now, the affine matrix we'll be using happens to be encoded in **RAS**. That means once we apply the matrix our coordinates are as follows: -$$(\text{Right},\text{Anterior},\text{Superior})$$ +Now, the affine matrix we'll be using happens to be encoded in **RAS**. +That means once we apply the matrix our coordinates are as follows: +`(Right, Anterior, Superior)` So increasing a coordinate value in the first dimension corresponds to moving to the right of the person being scanned. Applying our affine matrix yields the following coordinates: -$(10.25,30.5,9.2)$ + +`(10.25,30.5,9.2)` This means that: -* Voxel 1 is $90.23-10.25= 79.98$ in the R axis. Positive values mean move right -* Voxel 1 is $0.2-30.5= -30.3$ in the A axis. Negative values mean move posterior -* Voxel 1 is $2.15-9.2= -7.05$ in the S axis. Negatve values mean move inferior +* Voxel 1 is `90.23-10.25= 79.98` in the R axis. Positive values mean move right +* Voxel 1 is `0.2-30.5= -30.3` in the A axis. Negative values mean move posterior +* Voxel 1 is `2.15-9.2= -7.05` in the S axis. Negative values mean move inferior -This covers the basics of how NIfTI data and metadata are stored and organized in the context of Python. In the next segment we'll talk a bit about an increasingly important component of MR data analysis - data organization. This is a key component to reproducible analysis and so we'll spend a bit of time here. +This covers the basics of how NIfTI data and metadata are stored and organized in the context of Python. +In the next segment we'll talk a bit about an increasingly important component of MR data analysis - data organization. +This is a key component to reproducible analysis and so we'll spend a bit of time here. {% include links.md %} diff --git a/_episodes/04-open-mri-datasets.md b/_episodes/04-open-mri-datasets.md index 0bf53ba..70d362f 100644 --- a/_episodes/04-open-mri-datasets.md +++ b/_episodes/04-open-mri-datasets.md @@ -1,25 +1,170 @@ --- -title: "Open MRI Datasets" -teaching: 20 -exercises: 5 +title: "Exploring Open MRI Datasets" +teaching: 30 +exercises: 15 questions: -- "" +- "How does standardizing neuroimaging data ease the data exploration process" objectives: -- "" +- "Understand the BIDS format" +- "Use PyBIDS in to easily explore a BIDS dataset" keypoints: -- "" +- "BIDS is an organizational principle for neuroimaging data" +- "PyBIDS is a Python-based tool that allows for easy exploration of BIDS-formatted neuroimaging data" --- -## OpenNeuro +## Tutorial Dataset -## Downloading Data +In this episode, we will be using a subset of a publicly available dataset, **ds000030**, from [openneuro.org](https://openneuro.org/datasets/ds000030). The dataset is structured according to the Brain Imaging Data Structure (BIDS). BIDS is a simple and intuitive way to organize and describe your neuroimaging and behavioural data. Neuroimaging experiments result in complicated data that can be arranged in several different ways. BIDS tackles this problem by suggesting a new standard (based on consensus from multiple researchers across the world) for the arrangement of neuroimaging datasets. -### Datalad +Using the same structure for all of your studies will allow you to easily reuse all of your scripts between studies. Additionally, sharing code with other researchers will be much easier. -### Amazon Web Services (AWS) +Let's take a look at the `participants.tsv` file to see what the demographics for this dataset look like. -## Querying a BIDS Dataset +~~~ +import pandas as pd -## Exploring Data +participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t') +participant_metadata +~~~ +{: .language-python} + +**OUTPUT:** +
participant_id diagnosis age gender bart bht dwi pamenc pamret rest scap stopsignal T1w taskswitch ScannerSerialNumber ghost_NoGhost
0 sub-10159 CONTROL 30 F 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
1 sub-10171 CONTROL 24 M 1.0 1.0 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
2 sub-10189 CONTROL 49 M 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
3 sub-10193 CONTROL 40 M 1.0 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN 35343.0 No_ghost
4 sub-10206 CONTROL 21 M 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
+ +From this table we can easily view unique diagnostic groups: + +~~~ +participant_metadata['diagnosis'].unique() +~~~ +{: .language-python} + +~~~ +array(['CONTROL', 'SCHZ', 'BIPOLAR', 'ADHD'], dtype=object) +~~~ +{: .output} + +For this tutorial, we're just going to work with participants that are either CONTROL or SCHZ (`diagnosis`) and have both a T1w (`T1w == 1`) and rest (`rest == 1`) scan. Also, you'll notice some of the T1w scans included in this dataset have a ghosting artifact. We'll need to filter these out as well (`ghost_NoGhost == 'No_ghost'`). + +We'll filter this data out like so: +~~~ +participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) & + (participant_metadata.T1w == 1) & + (participant_metadata.rest == 1) & + (participant_metadata.ghost_NoGhost == 'No_ghost')] +participant_metadata['diagnosis'].unique() +~~~ + +~~~ +array(['CONTROL', 'SCHZ'], dtype=object) +~~~ +{: .output} + +From this we have a list of participants corresponding to a list of participants who are of interest in our analysis. We can then use this list in order to download participant from software such as `aws` or `datalad`. In fact, this is exactly how we set up a list of participants to download for this workshop! Since we've already downloaded the dataset, we can now explore the structure using PyBIDS: + +~~~ +import bids.layout +layout = bids.layout.BIDSLayout('../data/ds000030') +~~~ +{: .language-python} + +The pybids layout object lets you query your BIDS dataset according to a number of parameters by using a get_*() method. +We can get a list of the subjects we've downloaded from the dataset. + +~~~ +layout.get_subjects() +~~~ +{: .language-python} + +~~~ +['10171', + '10292', + '10365', + '10438', + '10565', + '10788', + '11106', + '11108', + '11122', + ... + '50083'] +~~~ +{: .output} + +We can also pull a list of imaging modalities in the dataset: + +~~~ +layout.get_modalities() +~~~ +{: .language-python} + +~~~ +['anat', 'func'] +~~~ +{: .output} + +As well as tasks and more!: + +~~~ +#Task fMRI +print(layout.get_tasks()) + +#Data types (bold, brainmask, confounds, smoothwm, probtissue, warp...) +print(layout.get_types()) +~~~ +{: .language-python} + +~~~ +['rest'] + +['bold', + 'brainmask', + 'confounds', + 'description', + 'dtissue', + 'fsaverage5', + 'inflated', + 'midthickness', + 'participants', + 'pial', + 'preproc', + 'probtissue', + 'smoothwm', + 'warp'] +~~~ +{: .output} + + +In addition we can specify sub-types of each BIDS category: + +~~~ +layout.get_types(modality='func') +~~~ +{: .language-python} + +~~~ +['brainmask', 'confounds', 'fsaverage5', 'preproc'] +~~~ +{: .output} +We can use this functionality to pull all our fMRI NIfTI files: + +~~~ +layout.get(task='rest', type='bold', extensions='nii.gz', return_type='file') +~~~ +{: .language-python} +~~~ +TO FILL +~~~ +{: .output} + +Finally, we can convert the data stored in `bids.layout` into a `pandas.DataFrame` : + +~~~ +df = layout.as_data_frame() +df.head() +~~~ +{: .language-python} + +**OUTPUT:** +
path modality subject task type
0 ~/Desktop/dc-mri/data... NaN NaN rest bold
1 ~/Desktop/dc-mri/data... NaN NaN NaN participants
2 ~/Desktop/dc-mri/data... NaN NaN NaN NaN
3 ~/Desktop/dc-mri/data... anat 10565 NaN brainmask
4 ~/Desktop/dc-mri/data... anat 10565 NaN probtissue
{% include links.md %}