Most Popular

1500 questions
11
votes
2 answers

How can I calculate gene_length for RPKM calculation from counts data?

I have read counts data and I want to convert them into RPKM values. For this conversion I need the gene length. Does the gene length need to be calculated based on the sum of coding exonic lengths? Or are there any different ways for that? I know…
stack_learner
  • 1,262
  • 14
  • 26
11
votes
1 answer

Classifying samples based on marker gene expression

I have a few sets of marker genes that I can classify RNA-seq samples using semi-supervised clustering. I would like to automate the process, however, I am struggling to find the ideal algorithm that could generate some kind of score for marker…
GWW
  • 752
  • 4
  • 14
11
votes
2 answers

Can I model technical replicates in DESeq2?

I’d normally use collapseReplicates (or do the collapsing upstream) to handle technical replicates. However, in my current RNA-seq experimental design, samples were sequenced twice using different library preparation protocols, leading to marked…
Konrad Rudolph
  • 4,845
  • 14
  • 45
11
votes
2 answers

Difference between BWA-backtrack and BWA-MEM

Many of my colleagues recommend I use BWA-MEM instead of regular old BWA. The problem is I don't understand why and reading the BWA man page doesn't seem to help the matter. What is the difference between BWA and BWA-MEM? And, in which instances…
David Ross
  • 313
  • 2
  • 5
11
votes
1 answer

What are the optimal parameters for docking a large ligand using Hex?

I'm looking to dock a large ligand (~90kDa) to a receptor slightly larger receptor (~125kDa) using Hex. If anyone is familiar with docking large structures, are there any recommended parameters for finding the best docking solution? Parameters in…
Te-Yo
  • 303
  • 1
  • 6
10
votes
3 answers

Which sequence alignment tools support codon alignment?

Sometimes it useful to perform a nucleotide protein coding gene sequence alignment based on codons, not on individual nucleotides. For example for further codon model analysis it is important to have full codons. A widely used approach here is to…
Iakov Davydov
  • 2,695
  • 1
  • 13
  • 34
10
votes
3 answers

Selecting sites from VCF which have an alt AD > 10

I have high-depth variant calling created using the HaplotypeCaller with --output_mode EMIT_ALL_SITES I'm interested in finding all sites (regardless of genotype call heterozygous or homozygous) where at least one of the alternative alleles have an…
Matt Bashton
  • 1,069
  • 6
  • 16
10
votes
3 answers

Pooling data in metagenome assembly

I have 12 human gut microbiome WGS Nextseq reads (151 bp paired end). What will be an effective strategy to assemble a metagenome? Let us say I have already filtered the fastq for quality, adapter sequence and host contamination (human, in this…
deepseas
  • 163
  • 1
  • 6
10
votes
4 answers

How do I generate a color-coded tanglegram?

I want to compare two phylogenies and colour the association lines based on some metadata I have. I have been using ape cophyloplot but I have not had any success in getting the lines to colour accurately according to my data (see previous…
AudileF
  • 955
  • 8
  • 25
10
votes
2 answers

How do you generate read-length vs read-quality plot for long-read sequencing data (e.g., MinION)?

How do you generate read-length vs read-quality plot (heat map with histograms in the margin) for long-read sequencing data from the Oxford Nanopore Technologies (ONT) MinION? The MinKNOW software from ONT provides a plot like this during base…
Mark Ebbert
  • 1,354
  • 10
  • 22
10
votes
1 answer

Intersection of two genomic ranges to keep metadata

I am trying to find intersection of two genomic ranges (gr1 and gr2) and keep metadata from one of them gr1 chrI [1, 100] * | 0.1 chrI [101, 200] * | 0.2 gr2 chrI [50, 150] + | intersect(gr1, gr2) will remove metadata, where subsetByOverlaps(gr1,…
Suvar
  • 203
  • 2
  • 6
10
votes
1 answer

How to extract fasta from a blastdb

How to extract the sequence used to create a blast database. This is useful when you download a blastdb from somewhere else e.g. one of the databases provided by NCBI including the 16SMicrobial database. Or alternatively, when you want to double…
amblina
  • 332
  • 2
  • 10
10
votes
5 answers

How can we find the distance between all residues in a PDB file?

If we have a PDB structrure, how can we find residues physically interacting with each other in space? I know that we must find the distance between residues and if the distance is less than 5-6 Angstrom, we say that residues are physically…
Sara
  • 777
  • 1
  • 6
  • 18
10
votes
5 answers

How can I extract only insertions from a VCF file?

I'm looking to subset a standard VCF file to generate one which only includes insertions (i.e. not indels). I can get part of the way there with: bcftools view -v indels | awk '{if(length($4) == 1) print}' However this wouldn't catch an…
blmoore
  • 366
  • 3
  • 14
10
votes
1 answer

Why does this human bam file only have one copy of each chromosome?

As we know that in human DNA sequence, one copy of chromosome comes from mother's DNA and another copy comes from father's DNA so as to form two copies of each chromosome in human DNA. So, if we extract exome sequence from its DNA then each exome…
Lot_to_learn
  • 530
  • 3
  • 14