Most Popular

1500 questions
8
votes
1 answer

Sleuth: transcripts with beta close to 0 are considered differentially expressed in a likelihood-ratio test

I'm comparing the results that I obtain when doing a DE analysis with the Wald test and the likelihood-ratio test. One the thing that I've noticed is that there are many genes with 'beta' close to zero that are considered differentially expressed…
elsoja
  • 241
  • 1
  • 3
8
votes
2 answers

Melt p-values for CpG sites mapping to the same gene

I have some data I am working with, and I am curious if I am able to combine p-values from a paired t-test for CpG sites in the genome using Fisher's Method to get one p-value for each unique gene. Linked here is the Wikipedia page for Fisher's…
user1309
  • 81
  • 1
8
votes
5 answers

Merging bed records based on name

I generated a file starting with the following bed lines: $ head -6 /tmp/bed_with_gene_ids.bed I 3746 3909 "WBGene00023193" . - I 3746 3909 "WBGene00023193" . - I 4118 4220 "WBGene00022277" . - I 4118 4358…
bli
  • 3,130
  • 2
  • 15
  • 36
8
votes
4 answers

How to manipulate a reference FASTA or bam to include variants from a VCF?

I have some software which takes fastas as the input. I need to include SNVs and InDels from a VCF into the reference hg38 and then use this. The problem is, I don't know of an algorithmically sound way to do this. Are there any existing software…
ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20
8
votes
2 answers

Subset FASTA file by species name

I have a problem: I've managed to download a massive fasta file of 1500 sequences, but now I want to split them into separate fasta files based on the genus. EDIT The fasta file looks like this: terminase_large.fasta >YP_009300697.1 terminase large…
tahunami
  • 303
  • 2
  • 8
8
votes
1 answer

How GFF3 attributes (9th column) varies from one gene prediction algorithm to another

GFF3 files are in tabular format with 9 fields per line, separated by tabs. The first 8 fields share almost same data structure, but the 9th field varies a lot depending on feature type and gene prediction algorithm. Presently I am trying to build…
Arijit Panda
  • 285
  • 1
  • 8
8
votes
1 answer

When performing differential expression analysis, should genes with low read counts be removed before or after normalization?

I have RNA seq data which I've quantified using Kallisto. I'd like to use tximport to transform the read count data into input for EdgeR, following the R code supplied in the tximport documentation: cts <- txi$counts normMat <- txi$length normMat…
J0HN_TIT0R
  • 541
  • 1
  • 4
  • 7
8
votes
2 answers

Formula for k-mer coverage

Let $C$ be base coverage, $R$ is the length of reads and $K$ is the length of $k$-mer. Then $k$-mer coverage $C_k$ can be computed as $C_k = C\cdot(R - K + 1)/R$. Could someone please explain why is this equation valid (I'm mostly confused as why it…
user44697
  • 263
  • 3
  • 6
8
votes
1 answer

How can I use Nanopore reads to close gaps or resolve repeats in a short-read assembly?

Low coverage MinION reads should be useful to close gaps and resolve repeats left by short-read assemblers. However, I haven't had any success with the software I know about. I'm aware of the following packages, either for scaffolding or closing…
Tom Harrop
  • 203
  • 1
  • 7
8
votes
1 answer

Getting protein FASTA sequence based on keyword with python

I would like to gather proteins FASTA sequence from Entrez with python 2.7. I am looking for any proteins that have the keywords: "terminase" and "large" in their name. So far I got this code: from Bio import Entrez Entrez.email =…
tahunami
  • 303
  • 2
  • 8
8
votes
2 answers

Phyre2 vs ITasser, completely different models generated

Does anyone have experience generating pdb structures with Phyre and ITasser online tools. The results generated from each given the same amino acid sequence input are very different and I am wondering whether or not this is a usual experience. I…
Te-Yo
  • 303
  • 1
  • 6
8
votes
2 answers

How is the GT field in a VCF file defined?

As my question in SO was closed and asked to be posted in this forum, I am posting it here. I am not from the bioinformatics domain. However, for the sake of analysis, I am trying to pick up certain basics related to the GT field in the VCF file. I…
The Great
  • 227
  • 2
  • 8
8
votes
2 answers

Understanding some of the computational bottlenecks of Covid-19 research

I am a researcher in high-performance computing (with very little bioinformatics background), and I am trying to understand what are the current biggest computational bottlenecks of software used for research on the topic of Covid-19 (testing,…
Vincent
  • 181
  • 3
8
votes
3 answers

probeset to probeset mappings between Affymetrix arrays

I am interested in identifying mappings between different types of Affymetrix arrays. I am aware that mappings between gene and probeset can be extracted using Ensembl's Biomart database. Ensembe gene id ENSG00000181019 maps to 1. AFFY…
Prradep
  • 410
  • 4
  • 15
8
votes
1 answer

Coronavirus RNA structures?

Is there anything known about the RNA structures of coronaviruses? More specifically - do they have any interesting known structures in the translatable region, like RRE of HIV or the double loops in flaviviruses? Update Here is a recent development…
Roger V.
  • 381
  • 1
  • 15