Estimate local ancestry using LAMPLD

What you need:

A system to run bash script (linux-based, mac os or cygwin on ms windows)
R and ggplot2 package
Plink software (http://www.cog-genomics.org/plink/1.9)
SHAPEIT software for phasing (https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#download)
GRCH37 genetic map from HAPMAP project for SHAPEIT (ftp://ftp.ncbi.nlm.nih.gov/hapmap/recombination/2011-01_phaseII_B37/genetic_map_HapMapII_GRCh37.tar.gz)
LAMPLD software (http://lamp.icsi.berkeley.edu/lamp/)
Demo data (https://github.com/asthmacollaboratory/Resources/local_ancestry_LAMPLD)
-- Reference population genotypes (reference data, usually from 1000 Genomes)
-- Sample population genotypes (your data/ sample data)

IMPORTANT: The demo data in this workflow only contains 340 markers and is provided for demonstrating the data formatting steps only. Higher marker density is necessary to run a more accurate estimation on local ancestry. You may also run the demo analysis distributed with the LAMPLD software.

Steps

0. Download sample data and start your analysis in the demo_data directory

1. Merge reference and sample data
See step 1 of https://github.com/asthmacollaboratory/Resources/wiki/Estimate-global-ancestry-using-admixutre

2. Remove AT/CG SNPs

Identify AT/CG SNPs to be removed
../bin/remove_ATGC.sh out.v1.merge

Remove the AT/CG SNPs using plink
fpI=out.v1.merge
plink --bfile $fpI --chr 22 --exclude $fpI.removeATCG.txt --make-bed --out $fpI.xATCG.22

3. Extract reference sample for phasing step
plink --bfile $fpI.xATCG.22 --keep demo.refNew.txt --make-bed --out $fpI.xATCG.ref.22

4. Format genetic map for phasing using shapeit. A copy of the output file of this step genetic_map_GRCh37_chr22.mod.txt is also provided.
for chr in {1..22};
do
cat genetic_map_GRCh37_chr$i.txt | awk '{print "\t""\t"}' > genetic_map_GRCh37_chr$i.mod.txt
done

5. Phasing using SHAPEIT
shapeit -B $fpI.xATCG.ref.22 -M genetic_map_GRCh37_chr22.mod.txt -O $fpI.xATCG.ref.phased.22 -T 10

6. Create LAMPLD input files for ancestral data (one file for each ancestral population)
Rscript ../bin/r_Format_LAMPLD_Ancestrals.R $fpI.xATCG.ref.phased demo.refNew.ceu.txt demo.refNew.yri.txt demo.refNew.nam.txt

7. Make sample genotype data in numeric format using the merged data from step 1 and reference allele from step 6. Use “--keep” flag to format sample genotype data only so ancestral data are not included.
plink --bfile $fpI.xATCG.22 --keep demo.sampleNew.txt --a1-allele $fpI.xATCG.ref.phased.allele.22.txt --recode A --out $fpI.xATCG.sample.22

8. Format the file from step 7 for LAMPLD (sample.hap)
sed '1d' $fpI.xATCG.sample.22.raw > temp1.txt
cut -d' ' -f7- temp1.txt > temp2.txt
sed 's/ //g' temp2.txt > temp3.txt
sed 's/NA/?/g' temp3.txt > $fpI.xATCG.sample.22.hap
rm -rf temp1.txt
rm -rf temp2.txt
rm -rf temp3.txt

9. Extract sample ID using the file from step 7 for formatting LAMPLD output later
cut -d' ' -f1,2 $fpI.xATCG.sample.22.raw | sed '1d' > $fpI.xATCG.sample.s.22.txt

10. Run LAMPLD
nAnc=3
winSize=290
unolanc $nAnc $winSize 15 $fpI.xATCG.ref.phased.pos.22.txt $fpI.xATCG.ref.phased.CEU.22.txt $fpI.xATCG.ref.phased.YRI.22.txt $fpI.xATCG.ref.phased.NAM.22.txt $fpI.xATCG.sample.22.hap $fpI.xATCG.LAMPLD.22

11. Format LAMPLD raw output using perl script available in LAMPLD package

If convertLAMPLDout.pl is not found in your path, replace convertLAMPLDout.pl with the full path.

perl convertLAMPLDout.pl $fpI.xATCG.LAMPLD.22 $fpI.xATCG.LAMPLD.long.22.txt

12. For 3 ancestral populations, replace the population label by numeric local ancestry

In the long format file, each individual is represented by two lines. 0, 1 and 2 represents the ancestral populations used in the LAMPLD command line input. In our example, 0=ceu, 1=yri, 2=nam
sed 's/0/9/g' $fpI.xATCG.LAMPLD.long.22.txt > temp1.txt
sed 's/1/0/g' temp1.txt > temp2.txt
sed 's/2/0/g' temp2.txt > temp3.txt
sed 's/9/1/g' temp3.txt > $fpI.xATCG.LAMPLD.ceu.22.txt
sed 's/1/0/g' $fpI.xATCG.LAMPLD.long.22.txt > temp1.txt
sed 's/2/1/g' temp1.txt > $fpI.xATCG.LAMPLD.nam.22.txt
sed 's/2/0/g' $fpI.xATCG.LAMPLD.long.22.txt > $fpI.xATCG.LAMPLD.yri.22.txt
rm -rf temp*.txt

13. Format the file from step 12 into dosage file for each ancestry
../bin/format_LAMPLD_dosage.sh $fpI

Credits: Bogdan Pasanius' Lab and Donglei Hu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimate local ancestry using LAMPLD

What you need:

Steps

Clone this wiki locally