# Comparing two populations

One last thing we might ask is what makes two populations distinct.
This could be two conditions or even two clusters. This is possible
with the `FindMarkers` function which works almost like `FindAllMarkers`
since it calls the former. Many of the parameters are equivalent,
however `FindMarkers` allows you to perform a differential expression
analysis between two populations which are defined with the `ident.1`
and `ident.2` parameters. By default, these are cell identities present
in the `active.ident` but we can select another variable contained in
the metadata using the `group.by` parameter. So if we want to study the
impact of gender in the cells of the platelet then we would run:

`FindMarkers(objectName, ident.1 = "female", ident.2 = "male", group.by = "sex", subset.ident = "Platelet")`

(if we had a "sex" column in the metadata slot). If we want to study
the impact of sex in all cells then we leave the `subset.ident` parameter
as default (i.e. `NULL`).

Here we will test the difference between our NK and CD8+ T clusters which
were very difficult to differentiate. The results are similar to
`FindAllMarkers` with the difference that there is no `gene` column
because as a differential analysis it will not be possible to have a
gene overexpressed in both NK and CD8+ T cells.

``` r
NK_CD8_diff_markers <- FindMarkers(pbmc_small,
                           ident.1 = "NK",
                           ident.2 = "CD8+ T")

## Merge markers results with biomart annotation
NK_CD8_diff_markers_annotated <- merge(x = NK_CD8_diff_markers,  #First df to merge
                                       y = annotated_hg19,       #Second df to merge
                                       by.x = 0,                 #Column name of first df used for matching lines, 0 for rownames
                                       by.y = "ensembl_gene_id", #Column name of second df used for matching lines
                                       all.x = TRUE)             #Keep all lines from first df even if there is no match with second df

## Filter dataset based on Fold change and p-value adjusted
NK_CD8_diff_markers_annotated_signif <- subset(NK_CD8_diff_markers_annotated,
                                               p_val_adj < 0.05 &
                                                 abs(avg_log2FC) >= 0.25)       #Filter dataframe based on p_val_adj column

## Sorting results by average log2(Fold Change)
NK_CD8_diff_markers_annotated_signif <- NK_CD8_diff_markers_annotated_signif %>%                 #Rearrange df with dplyr package
  arrange(desc(avg_log2FC))                  #Sort lines by descending the column avg_log2FC and by group

## Most DE gene marker for each cluster
kable(NK_CD8_diff_markers_annotated_signif[(c(1:3, (nrow(NK_CD8_diff_markers_annotated_signif)-2):nrow(NK_CD8_diff_markers_annotated_signif))),])
```

|     | Row.names       | p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | external_gene_name | description                                                                             | gene_biotype   | chromosome_name |
|:--|:-----|--:|----:|--:|--:|----:|:------|:---------------------------|:-----|:-----|
| 1   | ENSG00000163453 |     0 |   7.673621 | 0.551 | 0.006 |  0.00e+00 | IGFBP7             | insulin-like growth factor binding protein 7 \[Source:HGNC Symbol;Acc:5476\]            | protein_coding | 4               |
| 2   | ENSG00000135077 |     0 |   5.656534 | 0.314 | 0.009 |  0.00e+00 | HAVCR2             | hepatitis A virus cellular receptor 2 \[Source:HGNC Symbol;Acc:18437\]                  | protein_coding | 5               |
| 3   | ENSG00000175294 |     0 |   5.513915 | 0.109 | 0.000 |  2.03e-05 | CATSPER1           | cation channel, sperm associated 1 \[Source:HGNC Symbol;Acc:17116\]                     | protein_coding | 11              |
| 224 | ENSG00000167286 |     0 |  -3.971611 | 0.083 | 0.885 |  0.00e+00 | CD3D               | CD3d molecule, delta (CD3-TCR complex) \[Source:HGNC Symbol;Acc:1673\]                  | protein_coding | 11              |
| 225 | ENSG00000137078 |     0 |  -4.379914 | 0.006 | 0.232 |  1.01e-05 | SIT1               | signaling threshold regulating transmembrane adaptor 1 \[Source:HGNC Symbol;Acc:17710\] | protein_coding | 9               |
| 226 | ENSG00000172116 |     0 |  -5.098008 | 0.019 | 0.368 |  0.00e+00 | CD8B               | CD8b molecule \[Source:HGNC Symbol;Acc:1707\]                                           | protein_coding | 2               |

Here are the results for the three most over-expressed genes in NK cells
(`avg_log2FC` positive) and the 3 most over-expressed genes in CD8+ T
cells (`avg_log2FC` negative).

You can totally use the different methods of gene cluster analysis
on this one.