-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
custom gene set? #2
Comments
Thanks for your interest! I attach a guide to compute MAGMA gene sets from GWAS summary statistics: # Step 1: download MAGMA software, gene location file, and reference data from
# https://ctg.cncr.nl/software/magma after this step, one should have a folder <MAGMA_DIR>
# with the following files:
# 1) <MAGMA_DIR>/magma 2) <MAGMA_DIR>/g1000_eur.(bed|bim|fam) 3) <MAGMA_DIR>/NCBI37.3.gene.loc
magma_dir=<MAGMA_DIR>
# Step 2: make gene annotation file for MAGMA using the following command, this only needs to be done
# once for different GWAS summary statistics, and the file will be saved to out/step1.genes.annot
mkdir -p out/step1
${magma_dir}/magma \
--annotate window=10,10 \
--snp-loc ${magma_dir}/g1000_eur.bim \
--gene-loc ${magma_dir}/NCBI37.3.gene.loc \
--out out/step1
# Step 3: run MAGMA using the following command, this takes a GWAS file ${trait}.pval,
# which at least has the columns: SNP, P, N, which corresponds to the SNP id
# (matched to the ${magma_dir}/g1000_eur.bim), p-value, sample size. For example,
# <trait>.pval file looks like
#
# CHR BP SNP P N
# 1 717587 rs144155419 0.453345 279949
# 1 740284 rs61770167 0.921906 282079
# 1 769223 rs60320384 0.059349 281744
#
# After this step, one should obtain a file out/step2/${trait}.gene.out, and the top genes with
# largest Z-scores can be input to scDRS.
mkdir -p out/step2
${magma_dir}/magma \
--bfile ${magma_dir}/g1000_eur \
--pval ${trait}.pval use='SNP,P' ncol='N' \
--gene-annot out/step1.genes.annot \
--out out/step2/${trait} We will update our document correspondingly. |
Wow! Awesome & thank you for quick response 🚀 |
Hi Kangcheng. Do you have any documentation on your .sumstats files? Specifically, are the CHISQ and Z columns from the GWAS, or an output of a step within scDRS? Here is the first few lines of PASS_IBD_deLange2017.sumstats, for example: SNP A1 A2 N CHISQ Z |
Those are from GWAS and follow the LDSC format https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format Let us know if you have further questions/suggestions. Best, |
Hi @martinjzhang! Thank you for this awesome package. Do you have a recommended method for converting the Entrez IDs produced by MAGMA to gene symbols? It seems that the compute-score tool does not recognize the Entrez IDs ( This is what my geneset looks like after following the tutorial above: |
@twoneu MAGMA's NCBI37.3.gene.loc (e.g., see https://ctg.cncr.nl/software/MAGMA/aux_files/NCBI37.3.zip) has 1st column as entrez IDs and last column as gene symbol. Also adding that the gene symbols in the .gs file should be consistent with var_names in AnnData. |
Sorry, another question - how would you recommend that we generate genesets using MAGMA for summary statistics like these with no P-value column? Thank you so much for your time! |
@twoneu are there other columns containing information needed to derive a p-value? We can help if you provide us information about columns you have. Also see MungeSumstats which help standardize sumstats. |
The only columns are SNP, A1, A2, N, Z, and CHISQ (see PASS_IBD_deLange2017.sumstats as an example). Mungesumstats seems to require a P value to run as well. |
@twoneu You can use the following function to convert z-score to p-value. import scipy.stats
import numpy as np
# zscore can be a numpy array
pval = 2 * (1 - scipy.stats.norm.cdf(np.abs(zscore))) You need to manually add a column to the sumstats file. UPDATE NOTE: My previous reply |
@twoneu please see the updated reply above (i made a mistake earlier: we should compute two-sided p-value instead one-sided p-value to convert GWAS z-score to p-value) |
Thank you so much for this post! |
Hi, You should keep the three files as is: g1000_eur.bed / g1000_eur.bim / g1000_eur.fam
|
Perfect, thanks for the swift response! I have one more questions if you don't mind. |
Hi, --snp-loc should be --snp-loc ${magma_dir}/g1000_eur.bim GWAS should be --pval in step 3 |
Perfect, thank you! |
Hi!
I'm trying to figure out how to generate a custom geneset file using MAGMA that could plug into scDRS. I found the following text within the manuscript
Could you provide an example of the MAGMA inputs and commands that performs this?
The text was updated successfully, but these errors were encountered: