8

I have the genotyped data from impute2 output in .gen format (imputed to 1000G P3). The file has genotype posterior probabilities (GP:3 values per variant). I have converted .gen to .vcf using qctools and the .vcf file has GT:GP format. I need to convert the .vcf file with GT:GP format to GT:DS. Genotype dosages are recommended for use in qtltools/fastqtl analysis. However, I cannot find a tool that would keep the .vcf format and convert GP to DS. Any help much appreciated!

Nilufer
  • 81
  • 1
  • 2

3 Answers3

5

You can do this in Hail.

Here's the rough code to do it (0.1 versions).

Setup:

from hail import *
hc = HailContext()

Import the .gen file. VCF works too:

dataset = hc.import_gen(
    'src/test/resources/example.gen', 
    'src/test/resources/example.sample')

Remap the genotype schema and export to VCF:

dataset.annotate_genotypes_expr('g = {GT: g.call(), DS: g.dosage()}')\
    .export_vcf('/tmp/out.vcf.bgz')

Take a look at the getting started page if you want to try it out!

I should note that you may be able to do QTL analyses in Hail, depending on the method you want to use. See blog post here.

Tim
  • 151
  • 3
2

Hm, I didn't know that the plugin existed so I wrote my own script to convert GP to minor allele dosage on github. Maybe someone else will find it useful :) https://github.com/7methylg/VCF-GP-to-DS

Hannah
  • 21
  • 1
0

There's the dosage plugin for bcftools, but it only outputs tab separated values. It would not be too hard to extend the plugin to output a VCF with the DS tag instead, but it has not been done yet. There's a good chance the bcftools devs would respond to a feature request...

In any case, this code:

curl https://raw.githubusercontent.com/samtools/bcftools/develop/test/convert.vcf > convert.vcf
bcftools +dosage convert.vcf > output.tsv
head -2 output.tsv 

has the output:

#[1]CHROM       [2]POS  [3]REF  [4]ALT  [5]NA00001      [6]NA00002      [7]NA00003      [8]NA00004      [9]NA00005      [10]NA00006     [11]NA00007     [12]NA00008    [13]NA00009     [14]NA00010
X       2698560 G       A       0.1     0.0     0.1     0.2     0.3     0.2     0.2     0.2     0.2     0.1

This is using bcftools version 1.3.1.

Here is an excerpt from the bcftools manual for the dosage plugin:

dosage

print genotype dosage. By default the plugin searches for PL, GL and GT, in that order.

winni2k
  • 2,266
  • 11
  • 28