This repository contains code used to create figures for this paper
All RNAseq and single cell data is available using the GEO accession GSE184398. Some data that is not available on GEO or UCSF Data Library has been added to this folder files_used_for_plots
-
Figure 1C
get_flow_score.py
input files are infiles_used_for_plots
-
Figure 1D
correlation_score_from_flow.py
Input files should have this format (example file for Tcell flow vs Score correlation `files_used_for_plots/tcell_flow_and_Score.tsv`).
|flow_Tcell |Score
IPIBLAD032.T1.rna.live | 31.3 | 46.98
Input files can be made from calculating scores with get_score.py and joining the corresponding flow data
`python correlation_score_from_flow.py ../files_used_for_plots/tcell_flow_and_Score.tsv`
-
Figure 1E
cross_whisker_plots_flow.py
input files needed
1. feature scores tsv file output of `get_score.py` (example file in `/files_used_for_plots/tcell_score_percentile_of_percentiles.tsv` 2. flow scores tsv file output of `get_flow_score.py` (example file in `/files_used_for_plots/Flow_score_Sept10_2020_Tcell_percentile_of_percent.tsv`
-
Figures 2A-C, 2E-F, 2J, 2L
3_feature_UMAP_IPI.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt for IPI. 2. The feature scores for TCGA using not included but can be generated using TCGA data and `get_score.py``
-
Figures 2H-I, 2K boxplots made using
seaborn.boxplot
with data in/files_used_for_plots/feature_matrix.txt
-
Figure 2M heatmap was made using
seaborn.clustermap
with medians calculated from TPMs in/files_used_for_plots/feature_matrix.txt
-
Figure 2N barplot and bubble plots were made using
seaborn.barplot
andseaborn.scatter
respectively with medians calculated from TPMs in ``/files_used_for_plots/feature_matrix.txt` for IPI the chemokine TPMs for TCGA are not included
-
Figures 3A
3_feature_UMAP_IPI.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt for IPI.
-
Figures 3B-C
TCGA_survival_3_features.py
input files needed
Can be found here XENA Browser
1. A folder with TCGA clinical data files for each of these indications ('HNSC','KIRC','SKCM','BLCA','SARC','OV','UCEC','UCS','COAD','LIHC','LUAD','PAAD','GBM') TCGA |_ BLCA | |_ BLCA_clinical_data.tsv | |_COAD | |_ COAD_clinical_data.tsv . . . 2. TCGA archetypes file `/files_used_for_plots/TCGA_archetypes.tsv`
-
Figures 3D-E boxplots made using
seaborn.boxplot
with data in/files_used_for_plots/feature_matrix.txt
-
Figure 4A
get_flow_score.py
-
Figure 4B heatmap was made using
seaborn.clustermap
-
Figures 4C
6_feature_UMAP_IPI.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt for IPI.
-
Figures 4D,4G boxplots made using
seaborn.boxplot
with data in/files_used_for_plots/feature_matrix.txt
-
Figures 4E the alluvial plot was made using RAWgraphs 2.0
-
Figure 4F heatmap was made using
seaborn.clustermap
with medians calculated from TPMs in/files_used_for_plots/feature_matrix.txt
-
Figure 5A
get_flow_score.py
-
Figures 5A, 5O, 5R
10_feature_UMAP_IPI.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt for IPI.
-
Figures 5B the alluvial plot was made using RAWgraphs 2.0
-
Figures 5D-N boxplots and scatter plots were made using
seaborn.boxplot
andseaborn.scatter
respectively. -
Figures 5P-Q, 5S-T boxplots made using
seaborn.boxplot
with data in/files_used_for_plots/feature_matrix.txt
- Figures 6A-D heatmap and bubble plots were made using
seaborn.clustermap
andseaborn.scatter
respectively with medians of chemokines calculated from TPMs in ``/files_used_for_plots/feature_matrix.txt`. TPMS for other genes can be found on using the GEO accession GSE184398
-
Figures 7A-B
10_feature_UMAP_IPI.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt for IPI.
-
Figures 7B-E boxplot, heatmaps and bubble plot were made using
seaborn.boxplot
seaborn.clustermap
andseaborn.scatter
respectively.TPMS for genes can be found on using the GEO accession GSE184398 -
Figures 7F
TCGA_survival_10_features.py
input files needed
Can be found here XENA Browser
1. A folder with TCGA clinical data files for each of these indications ('HNSC','KIRC','SKCM','BLCA','SARC','OV','UCEC','UCS','COAD','LIHC','LUAD','PAAD','GBM') TCGA |_ BLCA | |_ BLCA_clinical_data.tsv | |_COAD | |_ COAD_clinical_data.tsv . . . 2. TCGA archetypes file `/files_used_for_plots/TCGA_archetypes_assigned.tsv`
##Supplementary Figures
-
Figure 1SA
Kmeans clustering using
sklearn.cluster.KMeans
of and dataframe of TPMS of all compartments. These can be found on using the GEO accession GSE184398 -
Figure 1SB
makes a volcano plot from a standard Limma DGE output e.g. logFC | AveExpr | t | P.Value | adj.P.Val | B PLK1 7.374675473 | 1.415522735 | 6.886987802 | 3.92E-06 | 0.050802876 | 3.971754014 ZWINT 5.772193613 | 1.340166868 | 5.872941797 | 2.48E-05 | 0.065626979 | 2.450364751 python volcano_plots.py -h Usage: volcano_plots.py -i <input file name> [options] Make Volcano Plots Options: -h, --help show this help message and exit -i <input file> Path and name of input file -o <output file> Path and names of output files e.g DGE_treg_vs_tcell (a .png,.pdf & .svg file extension will automatically be added) -p <p-value> P-value cutoff DEFAULT 0.005 -t <title> Title in quotes e.g. "Tregs vs Tcells"
-
Figure 1SC
correlation_score_from_flow.py
see figure 1D -
Figure 1SD
get_score.py
python get_score.py -hUsage: get_score.py -o <output file name> -g <gene signature file name> -f <gene expresson tsv> [options] ALL OPTIONS ARE REQUIRED Get gene signature score Options: -h, --help show this help message and exit -t <title> title used for all output files -g <gene sig> path to a file with a list of genes (HUGO Names) in gene signature. -f <gene file> path to normalized gene expression file (TPM or logCPM) with samples in the columns and genes in the rows
-
Figure 1SE
cross_whisker_plots_flow.py
see figure 1E
-
Figure 2SA & 2SB
DBI_frequency_of_clusters.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt for IPI.
-
Figures 2SD
3_feature_UMAP_IPI.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt for IPI.
-
Figures 2SE-F boxplots and scatter plots were made using
seaborn.boxplot
andseaborn.scatter
respectively.
-
Figures 3SA
3_feature_UMAP_IPI.py
-
Figures 3SB-D
TCGA_survival_3_features.py
input files needed
Can be found here XENA Browser
1. A folder with TCGA clinical data files for each of these indications ('HNSC','KIRC','SKCM','BLCA','SARC','OV','UCEC','UCS','COAD','LIHC','LUAD','PAAD','GBM') TCGA |_ BLCA | |_ BLCA_clinical_data.tsv | |_COAD | |_ COAD_clinical_data.tsv . . . 2. TCGA archetypes file `/files_used_for_plots/TCGA_archetypes.tsv`
-
Figure 4SB
makes a volcano plot from a standard Limma DGE output e.g. logFC | AveExpr | t | P.Value | adj.P.Val | B PLK1 7.374675473 | 1.415522735 | 6.886987802 | 3.92E-06 | 0.050802876 | 3.971754014 ZWINT 5.772193613 | 1.340166868 | 5.872941797 | 2.48E-05 | 0.065626979 | 2.450364751 python volcano_plots.py -h Usage: volcano_plots.py -i <input file name> [options] Make Volcano Plots Options: -h, --help show this help message and exit -i <input file> Path and name of input file -o <output file> Path and names of output files e.g DGE_treg_vs_tcell (a .png,.pdf & .svg file extension will automatically be added) -p <p-value> P-value cutoff DEFAULT 0.005 -t <title> Title in quotes e.g. "Tregs vs Tcells"
-
Figure 4SC, 4Sk
correlation_score_from_flow.py
see figure 1D -
Figures 4SD, 4SF-G, 4SL, 4SP
6_feature_UMAP_IPI.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt
-
Figure 4SE
DBI_frequency_of_clusters.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt.
-
Figures 4SG-H, 4SI, 4SN-N boxplots and scatter plots were made using
seaborn.boxplot
andseaborn.scatter
respectively.
-
Figure 5SF
correlation_score_from_flow.py
see figure 1D -
Figure 5SG
get_flow_score.py
see figure 1C -
Figure 4SE
DBI_frequency_of_clusters.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt.
-
Figures 5SI-J
10_feature_UMAP_IPI.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt
-
Figures 6SA, 6SE-F
10_feature_UMAP_IPI.py
input file needed
1. feature matrix tsv file output (example file in `/files_used_for_plots/feature_matrix.txt
-
Figures 6SC-E, 6SG boxplots made using
seaborn.boxplot
with data in/files_used_for_plots/feature_matrix.txt
-
Figures 7SA-K
TCGA_survival_10_features.py
input files needed
Can be found here XENA Browser
1. A folder with TCGA clinical data files for each of these indications ('HNSC','KIRC','SKCM','BLCA','SARC','OV','UCEC','UCS','COAD','LIHC','LUAD','PAAD','GBM') TCGA |_ BLCA | |_ BLCA_clinical_data.tsv | |_COAD | |_ COAD_clinical_data.tsv . . . 2. TCGA archetypes file `/files_used_for_plots/TCGA_archetypes_assigned.tsv`
Bushra Samad (email: Bushra[dot]Samad[at]ucsf[dot]edu) Alexis Combes (email: Alexis[dot]Combes[at]ucsf[dot]edu) Matthew Krummel (email: Matthew[dot]Krummel[at]ucsf[dot]edu)