for illumina reads (only)
prerequisite: download n prepare
the fna n gff file of yeast reference
mapping_in_02.directory
1st step is mapping the downloaded reads to the reference genome:
- build index for the reference using the following command
bwa index ../01./GCF_000146045.2_R64_genomic.fna
index the reference genome for rapid searching and aligning
do the mapping using bwa mem (better with bash shell script, then there will be no need for each individual one by one)
After mapping, you could use GATK4 for the variation calling based on the mapping result.
The steps of
variation calling by GATK4 include dealing with the mapping result ( mark duplicates and add read group ), raw variation
identification ( joint call followed by single individual variation calling ), quality recalibration and realign, second round of variation
calling and final variation filtering ( filtering is
applied to haplotyping to remove uninformative mutations. An obvious filter is to remove SNPs with identical calls in all the samples ).
The final variation calling result with SNPs and InDels could be obtained and the final vcf file could be sent to SnpEff for further annotation ( an easier way is using SnpEff in Galaxy with the right reference genome version, eg R64-1-1.75 ).
The isolates from NCBI would indicate that there are 16 chromosomes and one mitochondrial chromosome after SnpEff analysis. In the analysis, the numbers of insertions and deletions will be obtained.
-xflag for the specific technology. – Maximilian Press Aug 31 '22 at 20:18-x srflag. – Maximilian Press Sep 01 '22 at 16:16