Most Popular

1500 questions
8
votes
2 answers

How to compare groups using WGS data?

We have whole genome sequencing data for patients (not-cancer) (n=60) and for healthy controls (n=20). The sequencing centre has provided us with the best practice bioinformatics analyses including reads mapping (.BAM) and variant calling using GATK…
jessica
  • 81
  • 2
8
votes
2 answers

R packages for data analyses of pooled CRISPR screens

We are designing a CRISPR/Cas9 experiment and thinking of the down-stream data analyses. Are there any R packages for analysing raw NGS read count data from pooled genetic screens using CRIPSR/Cas9 to disrupt gene expression in a population of…
natasa
  • 89
  • 2
8
votes
1 answer

Why most aligners do not output the "X" CIGAR operation?

As I read the SAM spec, the "X" CIGAR operator represents a mismatch. This seems useful as we can know where are the mismatches without looking at the reference genome. However, many popular aligners such as BWA do not output "X". Why do they omit…
medbe
  • 847
  • 1
  • 7
  • 9
8
votes
3 answers

How should the popular press compare similarity of genomes?

Note this is a question from a lay reader. I've read in the popular press that ~1-4% of the genome of non-African Homo sapiens is inherited from Neanderthals, or that Melanesians derive a similar amount from the Denisovans. This seems like a lot,…
pwray
  • 191
  • 3
8
votes
3 answers

Books on bioinformatics algorithms

I'm looking for a book about bioinformatics algorithms, such as alignment, BLAST search, and variant calling. I'm hoping reading about this subject will give me a deeper understanding of the foundations of bioinformatics, and I'm also interested in…
Ólavur
  • 189
  • 3
8
votes
1 answer

Using a Bash Script to search TaxIDs against NCBI's Taxonomy yields "400 Bad Request" error?

I've been searching TaxIDs against NCBI's Taxonomy DB to get taxonomic lineages for species. I have successfully done this for 1,000's of TaxIDs that were returned to me by Blast+ blasts in a CSV. (The first column in the CSV was the TaxID for the…
ljs
  • 265
  • 1
  • 5
8
votes
1 answer

Counting repeated kmers sequences that match at least x % of reads sequence

Working on a fastQ file, I would like to get the occurrences of repeated sequences for all possible kmers of a given length that cover at least 90% of the read's length for the whole data set. example : for a length 6 with the kmer "ATTGGG" and a…
hilta007
  • 173
  • 6
8
votes
2 answers

Convert bam file to highly compressible bam

I have a large collection of bam files and I want to post-process each of them into another bam where I can make queries about: the reads position and pair-endness, insert sizes, MAPQ and other flags, etc. of the reads but where I don't need to…
719016
  • 2,324
  • 13
  • 19
8
votes
3 answers

Why does cutadapt remove low quality bases from the ends of reads only?

I use cutadapt to remove low quality bases from my Illumina reads. The algorithm only removes low quality bases from the end until it reaches a good quality base. If there is a bad quality base beyond that, it is not trimmed. Why? Why doesn't the…
Biomagician
  • 2,459
  • 16
  • 30
8
votes
3 answers

Tumor purity/contamination/admixture estimation

Can anyone recommend a good tool for estimating the tumor content given a matched tumor and normal file for DNA NGS whole genome sequencing data or whole exome data? Is it possible to estimate this without a normal sample as well?
8
votes
1 answer

How do kmer counters determine which kmer is 'canonical'?

When counting canonical kmers, ie kmers in which both the forward and reverse complement of a sequence are treated as identical, how do kmer counting programs decide which kmer to use as the canonical sequence? Do they all work the same way? To…
conchoecia
  • 3,141
  • 2
  • 16
  • 40
8
votes
7 answers

How to subset a BAM by a list of QNAMEs?

I have a text file 'qnames.txt' with QNAMEs in the following format: EXAMPLE:QNAME1 EXAMPLE:QNAME2 EXAMPLE:QNAME3 EXAMPLE:QNAME4 EXAMPLE:QNAME5 I would like to subset my BAM file.bam via all of these QNAMEs into a new SAM. Naturally, I can do this…
EB2127
  • 1,413
  • 2
  • 10
  • 23
8
votes
2 answers

Are mitochondrial genes to exclude in scRNA-seq such as ribosomal genes?

In this answer, it is stated that ribosomal genes should be excluded prior to normalization in scRNA-seq as contaminants. Do mitochondrial genes have to be excluded as well? I plotted the top 50 expressed genes for a specific dataset and they tend…
gc5
  • 1,783
  • 18
  • 32
8
votes
3 answers

How can I count the number of reads that support a variant in a bam file?

I am calling variants from a human sample using bwa mem to align the reads and gatk to call the variants. I'm trying to understand why a specific variant was not called in my sample. I have checked the bam alignments in a GUI viewer and I can see…
terdon
  • 10,071
  • 5
  • 22
  • 48
8
votes
4 answers

Is there a point in recalibration of scores for variant calling?

The most variant calling pipeline GATK include a Base Quality Score Recalibration (BQSR) which requires a list of known variants. Recently, some work has been done for reference-free recalibration of scores as well: Lacer and atlas, which is…
Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59