Most Popular
1500 questions
13
votes
1 answer
scRNA-seq multi-dataset integration for small datasets
There have been a few methods proposed for integration (or batch correction) of scRNA-seq datasets, such as Seurat CCA, MNN Correct, Scanorama, and Harmony. The concern is generally about the maximum number of cells that they handle, but I haven't…
burger
- 2,179
- 10
- 21
13
votes
4 answers
What methods are available to find a cutoff value for non-expressed genes in RNA-seq?
I have a gene expression count matrix produced from bulk RNA-seq data. I'd like to find genes that were not expressed in a group of samples and were expressed in another group.
The problem of course is that not all effectively non-expressed genes…
Peter
- 2,634
- 15
- 33
13
votes
2 answers
What to use to edit RNA alignments?
I have many alignments from Rfam Database, and I would like to edit them.
I saw that many tools are used for Protein sequence alignments, but there is something specific to edit RNA alignments ?
e.g. Stockholm Alignment of Pistol (just few…
Peter
- 353
- 1
- 8
13
votes
1 answer
What is the most compact data structure for canonical k-mers with the fastest lookup time?
edit: Results are current as of Dec 4, 2018 13:00 PST.
Background
K-mers have many uses in bioinformatics, and for this reason it would be useful to know the most RAM-efficient and fastest way to work with them programmatically. There have been…
conchoecia
- 3,141
- 2
- 16
- 40
13
votes
2 answers
Normalization methods with RNA-Seq ERCC spike in?
ERCC spike-in is a set of synthetic controls developed for RNA-Seq. I'm interested in using it to normalize my RNA-Seq samples. In particular, I'd like to use the spike-ins to remove technical bias and any variation that should not be part of my…
SmallChess
- 2,699
- 3
- 19
- 35
13
votes
4 answers
Library for computing BWT-based alignments
I am writing a software tool to which I would like to add the ability to compute alignments using the efficient Burrows-Wheeler Transform (BWT) approach made popular by tools such as BWA and Bowtie. As far as I can tell, though, both of these tools…
Daniel Standage
- 5,080
- 15
- 50
13
votes
6 answers
Converting gene names from one public database format to another
This is a question from /u/apivan19 on reddit. The original post can be found here.
I have some proteomics data that was given to me with the UniProt gene identifiers in column 1. I've been trying to convert these to normal gene symbols using…
gringer
- 14,012
- 5
- 23
- 79
13
votes
4 answers
How do I efficiently subset a very large line-based file?
This has come up repeatedly recently: I have a very large text file (in the order of several GiB) and I need to perform line-based subsetting for around 10,000 lines. There exist solutions for specific scenarios (e.g. samtools view -s for randomly…
Konrad Rudolph
- 4,845
- 14
- 45
13
votes
2 answers
How to decide number of neighbors and resolution for Louvain clustering
I am using Louvain clustering (1,2) to cluster cells in scRNAseq data, as implemented by scanpy.
One of the parameter required for this kind of clustering is the number of neighbors used to construct the neighborhood graph of cells (docs).
Larger…
gc5
- 1,783
- 18
- 32
13
votes
1 answer
Is it possible to use SNP heterozygosity as a proxy for Indel heterozygosity?
I have estimated genome-wide heterozygosity levels using maximum likelihood and classical substitution model implemented in package atlas. These estimates are way more robust than classical SNP calling and simply counting number of heterogeneous…
Kamil S Jaron
- 5,542
- 2
- 25
- 59
13
votes
6 answers
Are there any databases of templates for common bioinformatic file formats?
I want some templates of different file formats that I can use to test my scripts and identify possible bugs in my code.
For example, consider nucleotide FASTA, a simple but often abused format, I would want templates to capture regular and…
Chris_Rands
- 3,948
- 12
- 31
13
votes
1 answer
Compare alignment quality of multiple sequencing runs aligned against the same reference genome
I have run Oxford Nanopore Technologies' MinION sequencing on the same DNA sample using three flowcells, each aligned against the same reference genome (E.coli K12 MG1655) using both BWA MEM and GraphMap and stored as BAM files.
How can I…
Scott Gigante
- 2,133
- 1
- 13
- 32
13
votes
1 answer
What's the difference between VCF spec versions 4.1 and 4.2?
What are the key differences between VCF versions 4.1 and 4.2?
It looks like v4.3 contains a changelog (specs available here) but earlier specifications do not.
This biostar post points out one difference: the introduction of Number=R for fields…
blmoore
- 366
- 3
- 14
13
votes
2 answers
Is it possible to perform MinION sequencing offline?
I vaguely remember, that the original plan of Oxford Nanopore was to provide cheap sequencers (MinION), but charge for base-calling. For that reason the base-calling was performed in the cloud, and the plan was to make it commercial once the…
Iakov Davydov
- 2,695
- 1
- 13
- 34
13
votes
2 answers
Remapping genomic coordinates to account for indels
I'm interested in obtaining coding sequences of my favourite gene in all individuals from the 1000Genomes (and similar projects). I use GATK to get the right subset of variants, vcf-consensus to map these variants onto the reference genome and…
Greg Slodkowicz
- 232
- 1
- 5