custom gene set? #2

pvtodorov · 2021-09-29T11:20:37Z

Hi!

I'm trying to figure out how to generate a custom geneset file using MAGMA that could plug into scDRS. I found the following text within the manuscript

We use MAGMA20 502 to compute gene-level association p-values from disease
503 GWAS summary statistics (Box 1, step 1). We use a reference panel based on individuals of European ancestry in the 1000 Genomes Project97 504 . We use a 10-kb window around the gene body to map SNPs to genes. We select the top 1,000 genes based 505 on MAGMA p-values as putative disease genes.

Could you provide an example of the MAGMA inputs and commands that performs this?

KangchengHou · 2021-09-29T16:14:41Z

Thanks for your interest!

I attach a guide to compute MAGMA gene sets from GWAS summary statistics:

# Step 1: download MAGMA software, gene location file, and reference data from
# https://ctg.cncr.nl/software/magma after this step, one should have a folder <MAGMA_DIR>
# with the following files:
# 1) <MAGMA_DIR>/magma 2) <MAGMA_DIR>/g1000_eur.(bed|bim|fam) 3) <MAGMA_DIR>/NCBI37.3.gene.loc

magma_dir=<MAGMA_DIR>

# Step 2: make gene annotation file for MAGMA using the following command, this only needs to be done
# once for different GWAS summary statistics, and the file will be saved to out/step1.genes.annot
mkdir -p out/step1
${magma_dir}/magma \
    --annotate window=10,10 \
    --snp-loc ${magma_dir}/g1000_eur.bim \
    --gene-loc ${magma_dir}/NCBI37.3.gene.loc \
    --out out/step1

# Step 3: run MAGMA using the following command, this takes a GWAS file ${trait}.pval,
# which at least has the columns: SNP, P, N, which corresponds to the SNP id
# (matched to the ${magma_dir}/g1000_eur.bim), p-value, sample size. For example,
# <trait>.pval file looks like
#
# CHR     BP      SNP             P           N
# 1       717587  rs144155419     0.453345    279949
# 1       740284  rs61770167      0.921906    282079
# 1       769223  rs60320384      0.059349    281744
#
# After this step, one should obtain a file out/step2/${trait}.gene.out, and the top genes with
# largest Z-scores can be input to scDRS.

mkdir -p out/step2
${magma_dir}/magma \
    --bfile ${magma_dir}/g1000_eur \
    --pval ${trait}.pval use='SNP,P' ncol='N' \
    --gene-annot out/step1.genes.annot \
    --out out/step2/${trait}

We will update our document correspondingly.

pvtodorov · 2021-09-29T23:51:56Z

Wow! Awesome & thank you for quick response 🚀

jaredslosberg · 2022-01-24T20:00:00Z

Hi Kangcheng. Do you have any documentation on your .sumstats files? Specifically, are the CHISQ and Z columns from the GWAS, or an output of a step within scDRS?

Here is the first few lines of PASS_IBD_deLange2017.sumstats, for example:

SNP A1 A2 N CHISQ Z
rs7899632 A G 59957 3.4299 -1.852
rs3750595 A C 59957 3.3124 1.82
rs10786405 T C 59957 3.4559 1.859
rs3793692 A G 59957 0.119 -0.345
rs737657 A G 59957 0.0912 -0.302
rs17524355 T C 59957 4.02 -2.005
rs7086391 T C 59957 3.7249 -1.93

martinjzhang · 2022-01-24T20:02:46Z

Hi @jaredslosberg

Those are from GWAS and follow the LDSC format https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format

Let us know if you have further questions/suggestions.

Best,
Martin

twoneu · 2022-07-27T04:40:04Z

Hi @martinjzhang! Thank you for this awesome package.

Do you have a recommended method for converting the Entrez IDs produced by MAGMA to gene symbols? It seems that the compute-score tool does not recognize the Entrez IDs (First 3 elements for 'TRAIT': [], []) and returns a segmentation fault.

This is what my geneset looks like after following the tutorial above:

KangchengHou · 2022-07-27T04:51:59Z

@twoneu MAGMA's NCBI37.3.gene.loc (e.g., see https://ctg.cncr.nl/software/MAGMA/aux_files/NCBI37.3.zip) has 1st column as entrez IDs and last column as gene symbol.

Also adding that the gene symbols in the .gs file should be consistent with var_names in AnnData.

twoneu · 2022-07-30T13:51:36Z

Hi Kangcheng. Do you have any documentation on your .sumstats files? Specifically, are the CHISQ and Z columns from the GWAS, or an output of a step within scDRS?

Here is the first few lines of PASS_IBD_deLange2017.sumstats, for example:

SNP A1 A2 N CHISQ Z rs7899632 A G 59957 3.4299 -1.852 rs3750595 A C 59957 3.3124 1.82 rs10786405 T C 59957 3.4559 1.859 rs3793692 A G 59957 0.119 -0.345 rs737657 A G 59957 0.0912 -0.302 rs17524355 T C 59957 4.02 -2.005 rs7086391 T C 59957 3.7249 -1.93

Sorry, another question - how would you recommend that we generate genesets using MAGMA for summary statistics like these with no P-value column?

Thank you so much for your time!

KangchengHou · 2022-07-30T16:37:33Z

@twoneu are there other columns containing information needed to derive a p-value? We can help if you provide us information about columns you have. Also see MungeSumstats which help standardize sumstats.

twoneu · 2022-07-30T17:33:41Z

The only columns are SNP, A1, A2, N, Z, and CHISQ (see PASS_IBD_deLange2017.sumstats as an example). Mungesumstats seems to require a P value to run as well.

KangchengHou · 2022-07-31T00:17:23Z

@twoneu You can use the following function to convert z-score to p-value.

import scipy.stats
import numpy as np
# zscore can be a numpy array
pval = 2 * (1 - scipy.stats.norm.cdf(np.abs(zscore)))

You need to manually add a column to the sumstats file.

UPDATE NOTE: My previous reply pval = scdrs.util.zsc2pval(zsc) is for computing one-sided p-value. But in GWAS we typically do two-sided p-value (as in the updated reply).

KangchengHou · 2022-07-31T18:42:23Z

@twoneu please see the updated reply above (i made a mistake earlier: we should compute two-sided p-value instead one-sided p-value to convert GWAS z-score to p-value)

pariaaliour · 2024-02-06T17:16:03Z

Thanks for your interest!

I attach a guide to compute MAGMA gene sets from GWAS summary statistics:

# Step 1: download MAGMA software, gene location file, and reference data from
# https://ctg.cncr.nl/software/magma after this step, one should have a folder <MAGMA_DIR>
# with the following files:
# 1) <MAGMA_DIR>/magma 2) <MAGMA_DIR>/g1000_eur.(bed|bim|fam) 3) <MAGMA_DIR>/NCBI37.3.gene.loc

magma_dir=<MAGMA_DIR>

# Step 2: make gene annotation file for MAGMA using the following command, this only needs to be done
# once for different GWAS summary statistics, and the file will be saved to out/step1.genes.annot
mkdir -p out/step1
${magma_dir}/magma \
    --annotate window=10,10 \
    --snp-loc ${magma_dir}/g1000_eur.bim \
    --gene-loc ${magma_dir}/NCBI37.3.gene.loc \
    --out out/step1

# Step 3: run MAGMA using the following command, this takes a GWAS file ${trait}.pval,
# which at least has the columns: SNP, P, N, which corresponds to the SNP id
# (matched to the ${magma_dir}/g1000_eur.bim), p-value, sample size. For example,
# <trait>.pval file looks like
#
# CHR     BP      SNP             P           N
# 1       717587  rs144155419     0.453345    279949
# 1       740284  rs61770167      0.921906    282079
# 1       769223  rs60320384      0.059349    281744
#
# After this step, one should obtain a file out/step2/${trait}.gene.out, and the top genes with
# largest Z-scores can be input to scDRS.

mkdir -p out/step2
${magma_dir}/magma \
    --bfile ${magma_dir}/g1000_eur \
    --pval ${trait}.pval use='SNP,P' ncol='N' \
    --gene-annot out/step1.genes.annot \
    --out out/step2/${trait}

We will update our document correspondingly.

Thank you so much for this post!
I got a question and appreciate it if you could help me with this; I'm wondering in the second script as I have three files (bim, fam, bed) separately how do I put it here --bfile ${magma_dir}/g1000_eur. I tried to combined them in one file using cat but I keep getting this error: ERROR - parsing arguments: file '/home/paria/scratch/als-project/analysis/ocu_med_sc/rpca/scDRS/MAGMA/g1000_eur.bed' specified in --bfile does not exist or is not a file.
I would like to know how to make the g1000_eur including all three .bed, .bim, .fam.
Many thanks,
Paria

KangchengHou · 2024-02-06T18:03:34Z

Hi,

You should keep the three files as is: g1000_eur.bed / g1000_eur.bim / g1000_eur.fam

--bfile corresponds to a prefix to the three files

pariaaliour · 2024-02-06T21:55:38Z

Hi,

You should keep the three files as is: g1000_eur.bed / g1000_eur.bim / g1000_eur.fam

--bfile corresponds to a prefix to the three files

Perfect, thanks for the swift response! I have one more questions if you don't mind.
First, based on step 2, the --snp-loc is ${magma_dir}/g1000_eur.bim \ but in MAGMA tutorial (https://ctg.cncr.nl/software/MAGMA/doc/manual_v1.10.pdf) it says the --snp-loc could be provided by user. I was wondering if I have a gwas should the --snp-loc in the step2 --bfile in step 3 be the gwas?
Thanks,
Paria

KangchengHou · 2024-02-07T06:17:00Z

Hi,

--snp-loc should be --snp-loc ${magma_dir}/g1000_eur.bim

GWAS should be --pval in step 3

pariaaliour · 2024-02-07T19:44:14Z

Perfect, thank you!

KangchengHou closed this as completed Sep 30, 2021

parkjooyoung99 mentioned this issue Oct 10, 2022

concerning beta or OR when choosing putative disease genes #42

Open

martinjzhang mentioned this issue Nov 11, 2022

compute-score can not complete #46

Closed

zhangpicb mentioned this issue Nov 17, 2023

custom gene set ? #73

Closed

martinjzhang mentioned this issue Aug 3, 2024

fail to get the "1000 top associated genes" based on MAGMA #93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

custom gene set? #2

custom gene set? #2

pvtodorov commented Sep 29, 2021

KangchengHou commented Sep 29, 2021

pvtodorov commented Sep 29, 2021

jaredslosberg commented Jan 24, 2022

martinjzhang commented Jan 24, 2022

twoneu commented Jul 27, 2022

KangchengHou commented Jul 27, 2022

twoneu commented Jul 30, 2022

KangchengHou commented Jul 30, 2022

twoneu commented Jul 30, 2022 •

edited

Loading

KangchengHou commented Jul 31, 2022 •

edited

Loading

KangchengHou commented Jul 31, 2022

pariaaliour commented Feb 6, 2024

KangchengHou commented Feb 6, 2024

pariaaliour commented Feb 6, 2024

KangchengHou commented Feb 7, 2024

pariaaliour commented Feb 7, 2024

custom gene set? #2

custom gene set? #2

Comments

pvtodorov commented Sep 29, 2021

KangchengHou commented Sep 29, 2021

pvtodorov commented Sep 29, 2021

jaredslosberg commented Jan 24, 2022

martinjzhang commented Jan 24, 2022

twoneu commented Jul 27, 2022

KangchengHou commented Jul 27, 2022

twoneu commented Jul 30, 2022

KangchengHou commented Jul 30, 2022

twoneu commented Jul 30, 2022 • edited Loading

KangchengHou commented Jul 31, 2022 • edited Loading

KangchengHou commented Jul 31, 2022

pariaaliour commented Feb 6, 2024

KangchengHou commented Feb 6, 2024

pariaaliour commented Feb 6, 2024

KangchengHou commented Feb 7, 2024

pariaaliour commented Feb 7, 2024

twoneu commented Jul 30, 2022 •

edited

Loading

KangchengHou commented Jul 31, 2022 •

edited

Loading