I'm currently attempting association analysis with an extremely small set of patient exomes (n=10), with no control or parental exomes available. Downloading the ExAC VCF of variant sites (http://exac.broadinstitute.org/downloads) or the 1000G integrated call sets (http://ftp.1000genomes.ebi.ac.uk/) and combining this with our pooled patient VCFs has not been successful (I suspect the approach of attempting to merge such large VCFs generated from different pipelines is rather naive).
Looking at the primary literature, I have gathered it should be possible to use these resources to help increase statistical power for our analysis. My question is how do I take these large .vcfs with many samples and successfully merge them to our patient .vcfs, such that the combined VCF can be used downstream to run analysis packages? (PODKAT, PLINK, etc.)
The third link would be useful, except our coverage is fine and instead we lack the sample size they have by several orders of magnitude (gulp)!
– carsweshau Jun 22 '17 at 04:08