Installation time: approximately 5-10 min, depending on your machine
configuration.
Set up the FUSION program folder:
cd ~/scratch/programs/
wget https://github.com/gusevlab/fusion_twas/archive/master.zip -O fusion.zip
unzip fusion.zip
The following assumes that you already have conda installed. For more
information on conda, see https://docs.conda.io/en/latest/miniconda.html
To
create the conda environment for fusion and focus, please using the yml
files provided - see https://github.com/rodrigoduarte88/neuro_rTWAS/blob/main/fusion_final_environment.yml
conda env create --file fusion_final_environment.yml
This yml file contains most software and library versions required
to run focus/fusion.
We still will need to install the R library “plink2R”. To do this,
rename libraries for plink2R in the conda environment folder (as
detailed here).
cd /users/rodrigoduarte88/scratch/miniconda3/envs/fusion_final/lib
mv liblapack.so libRlapack.so
mv libblas.so libRblas.so
You will need to start
R, and install manually plink2R using the following command:
conda activate fusion_final
R
devtools::install_github("carbocation/plink2R/plink2R", ref="carbocation-permit-r361")
Now, let’s create the conda environment for FOCUS
conda create -n focus python=3.7 r-base
conda activate focus
pip install pyfocus --user
pip install mygene --user
pip install rpy2 --user
These include the SNP weights for FOCUS/FUSION and the 1000 Genomes
reference panel for the population of interest. Please download the
required files from the King’s College London Research Data Repository
(KORDS), at https://doi.org/10.18742/22179655. Then decompress
files.
tar zxvf FOCUS_weights.tgz
tar zxvf FUSION_weights.tgz
tar zxvf 1000G_ref_panel.tgz
N.B.: The reference
panels are annotated with dbsnp151/hg19 information.
Your GWAS summary statistics must be annotated with variant IDs according to dbsnp151. Use munge_sumstats.py from the ldsc package for pre-filtering. You can find an example of how this was done on the scripts available from https://github.com/rodrigoduarte88/TWAS_HERVs-SCZ. You can also check the FUSION guidelines for additional instructions.
Summary statistics for FUSION should look like:
SNP A1 A2 Z
rs10 A C -0.501
rs1000000 G A 2.238
rs10000003 A G -1.324
rs10000010 T C -0.082
rs10000013 C A -2.04
Summary statistics for FOCUS should look like:
CHR SNP BP A1 A2 Z N
7 rs10 92383888 A C -0.501 58749.13
12 rs1000000 126890980 G A 2.238 58749.13
4 rs10000003 57561647 A G -1.324 58749.13
4 rs10000010 21618674 T C -0.082 58749.13
4 rs10000013 37225069 C A -2.04 58749.13
To run FUSION, activate the conda environment, and use the FUSION
weights and linkage disequilibrium reference panel provided.
conda activate fusion_final
Rscript FUSION.assoc_test.R \
--sumstats PGC2.SCZ.sumstats.fusion \
--weights ./wrapped/CMC.pos \
--weights_dir ./wrapped/ \
--ref_ld_chr ./LDREF_harmonized/1000G.EUR. \
--chr 22 \
--out PGC2.SCZ.22.dat
To run the conditional analysis, you can follow the instructions as
provided by the authors of FUSION. For example, first, you have to
obtain a file containing only Bonferroni significant hits, and then you
can perform the conditional analysis.
Combine all files from
all chromosomes
head -1 PGC2.SCZ.1.dat > SCZ_____all_chr.tsv
tail -n +2 -q PGC2.SCZ.* >> SCZ_____all_chr.tsv
Create file with significant hits only (Bonferroni)
bonferroni_p='bc -l <<< "scale=50; 0.05/8212"'
#
8212 is the number of expressed features in the weights
cat SCZ_____all_chr.tsv | awk -v var="${bonferroni_p}" 'NR == 1 || $20 < var' > SCZ_____all_chr.tsv.Sig
Rscript FUSION.post_process.R \
--sumstats PGC2.SCZ.sumstats.fusion \
--input SCZ_____all_chr.tsv.Sig \
--out SCZ_____all_chr.tsv.Sig.analysis \
--ref_ld_chr ./LDREF_harmonized/1000G.EUR. \
--chr 22 \
--plot --locus_win 100000
To run FOCUS, activate the conda environment, and use the FOCUS
weights and linkage disequilibrium reference panel provided.
conda activate focus
module load mesa-glu/9.0.1-gcc-9.4.0
# this is for CREATE
users - loads libGL.so.1
focus finemap schizophrenia.gwas.focus \
LDREF_harmonized/1000G.EUR.22 CMC_brain_focus_database.db \
--chr 22 --plot --p-threshold 5E-08 \
--out SCZ_pgc3_CMC.5e-8.chr.22 --locations 37:EUR
For interpretation of the output files, please use the instructions
provided by the authors of FOCUS and FUSION. The results
contain gene and HERV expression signatures associated with genetic
susceptibility to your trait of interest.