Most Popular
1500 questions
5
votes
5 answers
QC measures for NGS sequencing
What are good means for performing quality control (QC) or NGS reads?
I'm aware of:
FastQC
NGS Screen
Kraken (e.g., for screening against contaminants)
What are other popular means for such QC?
Manuel
- 588
- 4
- 5
5
votes
2 answers
Assign cell types to groups of cells based on their gene expression profiles
I have large filtered, normalized dataset of scRNA-seq data of C.Elegans species. Rows are genes (10 000), columns are cells (66 000). Let's say that I got 40 different groups of the cells based on their expression profiles, how could I now assign…
Nikita Vlasenko
- 2,558
- 3
- 26
- 38
5
votes
1 answer
How to show only my read on IGV?
I have a large BAM file on my IGV. I'd like to visualize a read alignment. I have the read name, and I don't care anything else in the file.
IGV has an option "Select by name" (see screenshot). The problem is that IGV doesn't hide the alignments not…
SmallChess
- 2,699
- 3
- 19
- 35
5
votes
1 answer
Why doesn't Biopython AlignIO.read() recognise the 'mauve' format?
On this Biopython tutorial, they describe how to import a multiple sequence alignment in the Mauve (XMFA: extensible multi fasta format). So I imported the AlignIO module:
from Bio import AlignIO
alignment = AlignIO.read(open("alignment.xmfa"),…
Biomagician
- 2,459
- 16
- 30
5
votes
2 answers
Difference between de novo transcriptome assembly methods
I have been looking around (including read the original papers) to understand what is essentially the difference between StringTie in non-reference based mode (de novo) and Trinity de novo assembly. I understand that in the genome-guided nature of…
kaka01
- 111
- 1
- 6
5
votes
0 answers
Map domain names from UniProt bed files to domain accessions
I want to get a bed file mapping human protein domains to the human genome. UniProt actually offers such a thing here. The problem, however, is that the file doesn't include any kind of domain accession, so I have no way of knowing exactly what…
terdon
- 10,071
- 5
- 22
- 48
5
votes
1 answer
Single-cell RNA sequencing (scRNA-seq): filtering cells by transcript counts, how to choose cutoffs?
I am running a notebook with example for the MAGIC algorithm.
In the data preprocessing step, a filtering operation is required to filter out cells with a small count of transcripts.
My question is twofold:
Why filtering out cells with a small…
gc5
- 1,783
- 18
- 32
5
votes
3 answers
How to indicate the END of a haplotype block in VCF?
In VCF I know how to indicate that two genotypes are in the same phase by using consecutive "0|1" and "1|0" genotype fields, for example. However, how do I deal with the case that the first two genotypes are in phase and the second 2 are in phase,…
Dan
- 612
- 3
- 12
5
votes
1 answer
How to compute the chance of failing to detect a gene given the detection limit of a protocol
In Shapiro et al., when discussing about loss of molecules as source of error in single-cell sequencing, it is written that:
Another source of error is losses, which can be severe. The detection limit of published protocols is $5$–$10$ molecules of…
gc5
- 1,783
- 18
- 32
5
votes
1 answer
Normalizing RNAseq for PCA and CCA
Usually the expression data is transformed to log space using either RPKM, FPKM or CPM, this is required when looking for differential expression because the data is tested against the normal distribution(limma) or the negative bionimal distribution…
llrs
- 4,693
- 1
- 18
- 42
5
votes
1 answer
Filter out outliers of the scRNA-seq (heterogenous cells)
I am new to data science. I have a dataset of single-cell gene expression from multiple cell types in C. Elegans. The dataset is from the paper Comprehensive single-cell transcriptional profiling of a multicellular organism
My main question is,…
Nikita Vlasenko
- 2,558
- 3
- 26
- 38
5
votes
1 answer
how do I predict bacterial small non-coding RNA for a specific mRNA?
I am working on Vibrio parahaemolyticus. I have a gene that it is possible might be regulated by small non-coding RNAs. How can I predict possible sRNAs that target the transcript of this gene. Most software offers the opposite of what I want, i.e.,…
Mohammad ALKADI
- 51
- 2
5
votes
7 answers
calculating nucleotide frequency per column
I have some sequences shown below
CAGGTAGCC
CCGGTCAGA
AGGGTTTGA
TTGGTGAGG
CAAGTATGA
ACTGTATGC
CTGGTAACC
TATGTACTG
GCTGTGAGA
CAGGTGGGC
TCAGTGAGA
GGGGTGAGT
TGGGTATGT
GAGGTGAGA
CAGGTGGAG
Each line has 9 nucleotides.
Consider it to be 9 columns.I want…
user3138373
- 420
- 1
- 5
- 13
5
votes
1 answer
Why is SMALT better for microbial genomics than other mappers?
SMALT seems to be one of the most used read mappers for bacterial data, see, e.g., this query. I do not say that it is not a great mapper, but I cannot easily see what are its main strengths compared to mappers such as BWA-MEM, Bowtie2, NovoAlign or…
Karel Břinda
- 1,909
- 9
- 19
5
votes
1 answer
Rapid metagenomics classifiers on long read data
I recently used the minION (Nanopore, 9.4 flow cell, RAD001 kit) to generate a metagenome out of environmental samples.
Passed reads weren't brilliant (196, average 1,594bp lenght), but working with centrifuge the classification outputs turned out…
André Soares
- 161
- 1
- 3