Most Popular

1500 questions
5
votes
5 answers

QC measures for NGS sequencing

What are good means for performing quality control (QC) or NGS reads? I'm aware of: FastQC NGS Screen Kraken (e.g., for screening against contaminants) What are other popular means for such QC?
Manuel
  • 588
  • 4
  • 5
5
votes
2 answers

Assign cell types to groups of cells based on their gene expression profiles

I have large filtered, normalized dataset of scRNA-seq data of C.Elegans species. Rows are genes (10 000), columns are cells (66 000). Let's say that I got 40 different groups of the cells based on their expression profiles, how could I now assign…
Nikita Vlasenko
  • 2,558
  • 3
  • 26
  • 38
5
votes
1 answer

How to show only my read on IGV?

I have a large BAM file on my IGV. I'd like to visualize a read alignment. I have the read name, and I don't care anything else in the file. IGV has an option "Select by name" (see screenshot). The problem is that IGV doesn't hide the alignments not…
SmallChess
  • 2,699
  • 3
  • 19
  • 35
5
votes
1 answer

Why doesn't Biopython AlignIO.read() recognise the 'mauve' format?

On this Biopython tutorial, they describe how to import a multiple sequence alignment in the Mauve (XMFA: extensible multi fasta format). So I imported the AlignIO module: from Bio import AlignIO alignment = AlignIO.read(open("alignment.xmfa"),…
Biomagician
  • 2,459
  • 16
  • 30
5
votes
2 answers

Difference between de novo transcriptome assembly methods

I have been looking around (including read the original papers) to understand what is essentially the difference between StringTie in non-reference based mode (de novo) and Trinity de novo assembly. I understand that in the genome-guided nature of…
kaka01
  • 111
  • 1
  • 6
5
votes
0 answers

Map domain names from UniProt bed files to domain accessions

I want to get a bed file mapping human protein domains to the human genome. UniProt actually offers such a thing here. The problem, however, is that the file doesn't include any kind of domain accession, so I have no way of knowing exactly what…
terdon
  • 10,071
  • 5
  • 22
  • 48
5
votes
1 answer

Single-cell RNA sequencing (scRNA-seq): filtering cells by transcript counts, how to choose cutoffs?

I am running a notebook with example for the MAGIC algorithm. In the data preprocessing step, a filtering operation is required to filter out cells with a small count of transcripts. My question is twofold: Why filtering out cells with a small…
gc5
  • 1,783
  • 18
  • 32
5
votes
3 answers

How to indicate the END of a haplotype block in VCF?

In VCF I know how to indicate that two genotypes are in the same phase by using consecutive "0|1" and "1|0" genotype fields, for example. However, how do I deal with the case that the first two genotypes are in phase and the second 2 are in phase,…
Dan
  • 612
  • 3
  • 12
5
votes
1 answer

How to compute the chance of failing to detect a gene given the detection limit of a protocol

In Shapiro et al., when discussing about loss of molecules as source of error in single-cell sequencing, it is written that: Another source of error is losses, which can be severe. The detection limit of published protocols is $5$–$10$ molecules of…
gc5
  • 1,783
  • 18
  • 32
5
votes
1 answer

Normalizing RNAseq for PCA and CCA

Usually the expression data is transformed to log space using either RPKM, FPKM or CPM, this is required when looking for differential expression because the data is tested against the normal distribution(limma) or the negative bionimal distribution…
llrs
  • 4,693
  • 1
  • 18
  • 42
5
votes
1 answer

Filter out outliers of the scRNA-seq (heterogenous cells)

I am new to data science. I have a dataset of single-cell gene expression from multiple cell types in C. Elegans. The dataset is from the paper Comprehensive single-cell transcriptional profiling of a multicellular organism My main question is,…
Nikita Vlasenko
  • 2,558
  • 3
  • 26
  • 38
5
votes
1 answer

how do I predict bacterial small non-coding RNA for a specific mRNA?

I am working on Vibrio parahaemolyticus. I have a gene that it is possible might be regulated by small non-coding RNAs. How can I predict possible sRNAs that target the transcript of this gene. Most software offers the opposite of what I want, i.e.,…
5
votes
7 answers

calculating nucleotide frequency per column

I have some sequences shown below CAGGTAGCC CCGGTCAGA AGGGTTTGA TTGGTGAGG CAAGTATGA ACTGTATGC CTGGTAACC TATGTACTG GCTGTGAGA CAGGTGGGC TCAGTGAGA GGGGTGAGT TGGGTATGT GAGGTGAGA CAGGTGGAG Each line has 9 nucleotides. Consider it to be 9 columns.I want…
user3138373
  • 420
  • 1
  • 5
  • 13
5
votes
1 answer

Why is SMALT better for microbial genomics than other mappers?

SMALT seems to be one of the most used read mappers for bacterial data, see, e.g., this query. I do not say that it is not a great mapper, but I cannot easily see what are its main strengths compared to mappers such as BWA-MEM, Bowtie2, NovoAlign or…
Karel Břinda
  • 1,909
  • 9
  • 19
5
votes
1 answer

Rapid metagenomics classifiers on long read data

I recently used the minION (Nanopore, 9.4 flow cell, RAD001 kit) to generate a metagenome out of environmental samples. Passed reads weren't brilliant (196, average 1,594bp lenght), but working with centrifuge the classification outputs turned out…
André Soares
  • 161
  • 1
  • 3