Running a retrotranscriptome-wide association study (rTWAS)

1) Set up a conda environment to run FUSION and FOCUS.

Installation time: approximately 5-10 min, depending on your machine configuration.
Set up the FUSION program folder:
cd ~/scratch/programs/
wget https://github.com/gusevlab/fusion_twas/archive/master.zip -O fusion.zip
unzip fusion.zip

The following assumes that you already have conda installed. For more information on conda, see https://docs.conda.io/en/latest/miniconda.html
To create the conda environment for fusion and focus, please using the yml files provided - see https://github.com/rodrigoduarte88/neuro_rTWAS/blob/main/fusion_final_environment.yml
conda env create --file fusion_final_environment.yml
This yml file contains most software and library versions required to run focus/fusion.

We still will need to install the R library “plink2R”. To do this, rename libraries for plink2R in the conda environment folder (as detailed here).
cd /users/rodrigoduarte88/scratch/miniconda3/envs/fusion_final/lib
mv liblapack.so libRlapack.so
mv libblas.so libRblas.so

You will need to start R, and install manually plink2R using the following command:
conda activate fusion_final
R
devtools::install_github("carbocation/plink2R/plink2R", ref="carbocation-permit-r361")

Now, let’s create the conda environment for FOCUS
conda create -n focus python=3.7 r-base
conda activate focus
pip install pyfocus --user
pip install mygene --user
pip install rpy2 --user

2) Download the required files and decompress

These include the SNP weights for FOCUS/FUSION and the 1000 Genomes reference panel for the population of interest. Please download the required files from the King’s College London Research Data Repository (KORDS), at https://doi.org/10.18742/22179655. Then decompress files.
tar zxvf FOCUS_weights.tgz
tar zxvf FUSION_weights.tgz
tar zxvf 1000G_ref_panel.tgz

N.B.: The reference panels are annotated with dbsnp151/hg19 information.

3) Preprocessing GWAS summary statistics

Your GWAS summary statistics must be annotated with variant IDs according to dbsnp151. Use munge_sumstats.py from the ldsc package for pre-filtering. You can find an example of how this was done on the scripts available from https://github.com/rodrigoduarte88/TWAS_HERVs-SCZ. You can also check the FUSION guidelines for additional instructions.

Summary statistics for FUSION should look like:

SNP     A1      A2      Z
rs10    A       C       -0.501
rs1000000       G       A       2.238
rs10000003      A       G       -1.324
rs10000010      T       C       -0.082
rs10000013      C       A       -2.04

Summary statistics for FOCUS should look like:

CHR     SNP     BP      A1      A2      Z       N
7       rs10    92383888        A       C       -0.501  58749.13
12      rs1000000       126890980       G       A       2.238   58749.13
4       rs10000003      57561647        A       G       -1.324  58749.13
4       rs10000010      21618674        T       C       -0.082  58749.13
4       rs10000013      37225069        C       A       -2.04   58749.13

4) Running FUSION

To run FUSION, activate the conda environment, and use the FUSION weights and linkage disequilibrium reference panel provided.
conda activate fusion_final

Rscript FUSION.assoc_test.R \
--sumstats PGC2.SCZ.sumstats.fusion \
--weights ./wrapped/CMC.pos \
--weights_dir ./wrapped/ \
--ref_ld_chr ./LDREF_harmonized/1000G.EUR. \
--chr 22 \
--out PGC2.SCZ.22.dat

To run the conditional analysis, you can follow the instructions as provided by the authors of FUSION. For example, first, you have to obtain a file containing only Bonferroni significant hits, and then you can perform the conditional analysis.

Combine all files from all chromosomes
head -1 PGC2.SCZ.1.dat > SCZ_____all_chr.tsv
tail -n +2 -q PGC2.SCZ.* >> SCZ_____all_chr.tsv

Create file with significant hits only (Bonferroni)
bonferroni_p='bc -l <<< "scale=50; 0.05/8212"' # 8212 is the number of expressed features in the weights
cat SCZ_____all_chr.tsv | awk -v var="${bonferroni_p}" 'NR == 1 || $20 < var' > SCZ_____all_chr.tsv.Sig

Rscript FUSION.post_process.R \
--sumstats PGC2.SCZ.sumstats.fusion \
--input SCZ_____all_chr.tsv.Sig \
--out SCZ_____all_chr.tsv.Sig.analysis \
--ref_ld_chr ./LDREF_harmonized/1000G.EUR. \
--chr 22 \
--plot --locus_win 100000

5) Running FOCUS

To run FOCUS, activate the conda environment, and use the FOCUS weights and linkage disequilibrium reference panel provided.
conda activate focus
module load mesa-glu/9.0.1-gcc-9.4.0 # this is for CREATE users - loads libGL.so.1

focus finemap schizophrenia.gwas.focus \
LDREF_harmonized/1000G.EUR.22 CMC_brain_focus_database.db \
--chr 22 --plot --p-threshold 5E-08 \
--out SCZ_pgc3_CMC.5e-8.chr.22 --locations 37:EUR

For interpretation of the output files, please use the instructions provided by the authors of FOCUS and FUSION. The results contain gene and HERV expression signatures associated with genetic susceptibility to your trait of interest.