Most Popular
1500 questions
5
votes
1 answer
Is the optional SAM NM field strictly computable from the MD and CIGAR?
From SAM Optional Fields Specification the NM field is
Edit distance to the reference, including ambiguous bases but excluding clipping
Assuming both the MD and CIGAR are present, is the edit distance simply the number of characters [A-Z]…
mattm
- 754
- 7
- 19
5
votes
2 answers
How can I remove (non-trivial) duplicates from a VCF file?
This is related to the question I asked here. Consider a vcf file that contains duplicate variants, but where the duplicates aren't simply the same thing in the same notation but instead one is a subset of the other. For…
terdon
- 10,071
- 5
- 22
- 48
5
votes
2 answers
BAM to gene expression matrix (UMI counts per gene per cell),10X
I am trying to reproduce some results of a scRNASeq experiment. However I am new to the server-side aspect of such analyses and am very confused at the moment.
The data provided by the authors of the paper is in .BAM format and from there I wish to…
h3ab74
- 836
- 5
- 14
5
votes
1 answer
Customizing bigWig file
I generate bigWig files using bamCoverage from deeptools, in part for my colleagues to visualize their mapped libraries in the IGV viewer.
A problem is that the displayed track name is apparently the file name, which is not convenient because some…
bli
- 3,130
- 2
- 15
- 36
5
votes
3 answers
Alignment with arbitrary number of mismatches or gaps
I have 23bp long reads and want to find all possible alignments of them to the human genome (hg19, hg38) for an arbitrary number of mismatches (<7), possibly also small indels. I've read in literature that people use bowtie2 for this, so I've tried…
Flagon13
- 105
- 5
5
votes
3 answers
How to get fasta alignment file from SAM/BAM file?
I am not talking about consensus sequence, I know how to get consensus sequence using mpileup in samtools/bcftools. As I understand , SAM/BAM files are basically sequence alignment format so it's natural to expect a straightforward way of…
Ahmed Abdullah
- 367
- 2
- 8
5
votes
1 answer
Should the cell sorting marker genes be excluded during clustering?
We sort different populations of blood cells using a number of fluorescent flow cytometry markers and then sequence RNA. We want to see what the transcriptome tells us about the similarity and relation between these cells. In my experience on bulk…
Peter
- 2,634
- 15
- 33
5
votes
6 answers
How to extract metadata from NCBI's short read archive (SRA) for a few runs?
I wish to extract metadata from a list of runs on NCBI's short read archive. For instance, I'd like to extract the library name ("HS0798") from the following run…
init_js
- 319
- 2
- 9
5
votes
1 answer
Why do I get so many insertions from Minimap2 on my Nanopore WGS?
I'm a starting my analysis on nanopore whole genome sequencing. I start my analysis from this popular Github.
The sample I downloaded was WGS for NA12878, so I would assume it's alignment to GRCh38 shouldn't be that bad.
But ... I'm getting lot's of…
SmallChess
- 2,699
- 3
- 19
- 35
5
votes
2 answers
Calculate the percentage of each unique phylogenetic tree in a BEAST output
I have a nexus formatted BEAST output containing 20,000 phylogenetic trees of seven taxa. Is there any way to get the percentage of each unique phylogenetic tree contained in this output?
I already made an unsuccessful attempt with R.
jvddorpe
- 191
- 5
5
votes
1 answer
Count genomic ranges
I have a set of genomic ranges that are potentially overlapping. I want to count the amount of ranges at certain positions using R.
I'm Pretty sure there are good solutions, but I seem to be unable to find them.
Solutions like cut or findIntervals…
sargas
- 153
- 3
5
votes
2 answers
Perfect Phylogeny vs Maximum parsimony
I am searching various sources about phylogenetics. I saw some materials about perfect phylogeny and also phylogenies acquired from maximum parsimony constraint. They seem very similar to me. Are they the same?
Dandelion
- 153
- 1
- 6
5
votes
0 answers
Iupred definition of long/short form disorder
I need to predict disorder of proteins and here there is a description of Iupred in its two variants, long and short-disorder. This is a manuscript defining the method. I couldn't find a precise definition of short or long-disorder.
I need to…
aerijman
- 645
- 5
- 14
5
votes
4 answers
How do you query and explore ENCODE data?
I am looking for a modular way to query data from ENCODE.
For example, I would like to get CHiP-seq or similar tracks for a specific cell line. What's the proper way to do it?
Finally, is there an API to do it?
0x90
- 1,437
- 9
- 18
5
votes
0 answers
What is a sensible forcefield choice for membrane proteins when using PDB2PQR?
I am generating PQR files for a membrane protein that is almost entirely buried in the membrane. The goal is to calculate the electrostatic charge across the surface of the protein. There are no lipids in the structure.
PDB2PQR has predefined…
James
- 409
- 2
- 13