0

I have .vcf file with partly decoded genome provided by some service. How can I convert this file into other popular formats which may be used in other services? I've tried plink2 but unfortunately, rather I just can't use it correctly.

Also, where could I get a most completed list of a free DNA services?


Here's part of my DNA raw data with file headers:

##fileformat=VCFv4.2
##source=Genotek
##reference=hg19
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  vh6913
chr1    752721  rs3131972   A   G   .   .   .   GT  1/1
chr1    759036  rs114525117 G   A   .   .   .   GT  0/0
chr1    801536  rs79373928  T   G   .   .   .   GT  0/0
chr1    834830  rs116452738 G   A   .   .   .   GT  0/0
chr1    835092  rs72631887  T   G   .   .   .   GT  0/0
chr1    838555  rs4970383   C   A   .   .   .   GT  0/1
chr1    838665  rs28678693  T   C   .   .   .   GT  0/0
chr1    840753  rs4970382   T   C   .   .   .   GT  1/1
chr1    846808  rs4475691   C   T   .   .   .   GT  0/1
chr1    851390  rs72631889  G   T   .   .   .   GT  0/0
chr1    854250  rs7537756   A   G   .   .   .   GT  0/1
chr1    861808  rs13302982  A   G   .   .   .   GT  1/1
chr1    863130  rs376747791 A   G   .   .   .   GT  0/0
chr1    866893  rs2880024   T   C   .   .   .   GT  0/0
chr1    868404  rs13302914  C   T   .   .   .   GT  1/1
chr1    872952  rs76723341  C   T   .   .   .   GT  0/0
chr1    878331  rs148327885 C   T   .   .   .   GT  0/1
chr1    879911  rs143853699 G   A   .   .   .   GT  0/0
chr1    881627  rs2272757   G   A   .   .   .   GT  0/0
chr1    884767  rs67274836  G   A   .   .   .   GT  0/0
chr1    888659  rs3748597   T   C   .   .   .   GT  1/1
chr1    889238  rs3828049   G   A   .   .   .   GT  0/0
chr1    891277  rs77608078  C   T   .   .   .   GT  0/0
... and more than 636 thousand rows
  • 1
    Hi, you probably should be a bit more specific about which formats you need. – Wouter De Coster Feb 22 '21 at 21:41
  • Yes, please [edit] your question and explain what kind of conversion you need. VCF file normally contain specific variants, so there are very limited options for "converting". I suspect you are thinking of a gVCF file with information for all positions of a genome, but we can't really help without more details. – terdon Feb 27 '21 at 18:47
  • @WouterDeCoster I need to get 23andme file format –  Mar 01 '21 at 05:00
  • @terdon I do not understand all these nuances about VCF, I just have a file with .vcf extension. Should I post a part of it for example? –  Mar 01 '21 at 05:02
  • Yes please. Post an example of your input file and the output you would want from that example. If you tried plink, please also tell us what you tried and how iot failed so we don't suggest the same solution. – terdon Mar 01 '21 at 09:11
  • @terdon thanks for your response, I updated question with data. Unfortunately, I don't know what is the format 23andme using, sorry. I thought this is kind of well known format. With plink I tried this solution but seems this not works this way for me, may be my plink version has differences (I use latest) –  Mar 01 '21 at 13:48
  • OK, that is a VCF file. But you still haven't told us what you want to convert it to. This is the format most commonly used to store variant information. What is it you need? – terdon Mar 01 '21 at 14:02
  • He said he needs to convert it to 23andme format. This is quite easy bcftools convert -c ID,CHROM,POS,AA -s SampleName -f 23andme-ref.fa --tsv2vcf 23andme.txt -Oz -o out.vcf.gz – user438383 Mar 01 '21 at 14:08
  • @user438383 Your command converts from 23andme to VCF, not the other way around. – Ram RS Mar 01 '21 at 16:40
  • @RamRS yeah, I need vice versa. Could you help? –  Mar 02 '21 at 08:26
  • Isn't this all that's needed? awk 'BEGIN{OFS="\t"} /^[^#]/{print $3, gsub("chr", "", $1), $2, $4$5}' sample.vcf > sample.23ame (from a brief look at http://fileformats.archiveteam.org/wiki/23andMe#File_format ) – bug313 Mar 02 '21 at 10:58
  • The awk would work, sure, but one is better off using bcftools query -f – Ram RS Mar 03 '21 at 15:04
  • For completeness, that command would be bcftools query -f '%ID\t%CHROM\t%POS\t%REF%ALT\n' sample.vcf > sample.23andme but you might or might not have to strip of the "chr" part, depending on the further use I guess... – bug313 Mar 05 '21 at 18:55

0 Answers0