1

My objective is to take a g.vcf.gz file and from 25-30 unmapped contigs with titles like "NW_020192317.1", I want to make a subset of ~10k variants from each of the unmapped contigs and make one final g.vcf file that includes the header from the original g.vcf.gz file.

From the post here, I have tried running:

samtools view -bo output.g.vcf -s 123.4 inputfile.g.vcf.gz NW_020192317.1

and I keep receiving the output:

Aborted (core dumped)

I'm not too picky about the number of variants per contig, as long as it is in the thousands, that's why I tried using ".4"

Alternatively, I tried using the code:

samtools view input.g.vcf.gz "NW_020192317.1:1-2" > output.g.vcf

and I recieved the same response of "Aborted (core dumped)" with nothing else. I am confused what is wrong here and why it is throwing an error.

  • 2
    You at least need to be using bcftools, not samtools, if you are working with vcf files. samtools is designed for bam/sam, not vcf, and that’s why it’s throwing that error – user438383 Sep 14 '23 at 20:21
  • Do you have a list of contigs of interest? Are those contig names the first field of the VCF? Does this need to be a random selection or can you keep the first 10k variants from each contig? – terdon Sep 15 '23 at 08:29
  • Also, gvcf files contain both variants non-variant lines. Do you only want variants? – terdon Sep 15 '23 at 13:40

1 Answers1

2

After some guidance from my advisor, this command allowed me to get the output file I needed.

bcftools view -o newvcfFileName.g.vcf -s 123.2 locationOfTheFile chrom1 chrom2 ... chrom23(mito DNA) --force-samples

I listed every chromosome's title until I typed --force-samples. I found it simpler to copy and paste the names in.