I'm considering purchasing the 'MyGenome' product by Veritas Genetics to analyze my genome for a project. I'd like the data to be in FASTA format, but Veritas only provides VCF data. Is it possible to convert this VCF data into FASTA format?
Asked
Active
Viewed 526 times
5
-
What is "MyGenome" exactly? Will they sequence your entire genome? Only your exome? Specific regions of the exome? – terdon Feb 27 '18 at 16:35
-
@terdon Whole genome 30X on HiseqX. Link – benn Feb 27 '18 at 16:39
-
Yeah, that doesn't sound like actual whole genome. Or, rather, it sounds like they might sequence the whole genome but then only analyze specific regions of it (which is a reasonable thing to do, I'm not saying they're swindling you). "myGenome is a whole genome *screening* test [. . . ]". If so, the method in b.nota's answer won't accurately reconstruct your genome, but only those regions of your genome that differ from the reference genome in the regions they happened t screen. – terdon Feb 27 '18 at 16:42
-
@terdon depends a bit on how they make the VCF file, don't you think? The interpretation is only based on specific regions, but they should have a file with all variants which OP should get for this VCF -> FASTA exercise. – benn Feb 27 '18 at 16:58
-
@b.nota my guess (and it is only a guess) is that they will only do variant calling for targeted regions. That greatly speeds up the process and reduces the resources needed and also possibly protects them from litigation in some countries. – terdon Feb 27 '18 at 17:03
-
Good chance you guess right. OP might want to verify with MyGenome first, whether all variants are reported in VCF file. – benn Feb 27 '18 at 17:45
-
2Possible duplicate of How to manipulate a reference FASTA or bam to include variants from a VCF? – gringer Feb 27 '18 at 20:32
-
It's possible to create "a" fasta as discussed, but have my doubts whether that is a useful format for you to work with. But that's not the question, I guess. – Wouter De Coster Feb 28 '18 at 12:18
1 Answers
5
You can try gatk function FastaAlternateReferenceMaker
java -jar GenomeAnalysisTK.jar \
-T FastaAlternateReferenceMaker \
-R reference.fasta \
-o output.fasta \
-L input.intervals \
-V input.vcf \
[--snpmask mask.vcf]
benn
- 3,571
- 9
- 28
-
1It might be worth pointing out that this will only reconstruct the OP's genome if the OP has WGS data. If not, I this tool will presumably use the reference genome for everything not explicitly mentioned in the file, so it won't be the OP's genome, per se. – terdon Feb 27 '18 at 16:36
-
@b.nota thank you for your answer. Just want to let you know that I've contacted veritas support for more information (still awaiting their response) and will accept your answer once I'm able to confirm that this works. – WagonWheelWilly Mar 08 '18 at 13:35