Most Popular
1500 questions
9
votes
2 answers
Duplicate genes with RSEM counts: Which one to choose?
I have Ensembl ids in the first column and samples with RSEM counts data in other columns. I converted Ensembl ids to gene symbols. Now I see there are three genes repeated twice.
Gene S1 S2 S3
COG8 804.07 1475.16 323.80
COG8 …
stack_learner
- 1,262
- 14
- 26
9
votes
2 answers
How do PCR duplicates arise and why is it important to remove them for NGS analysis?
I am trying to understand PCR duplicates in NGS analyses (actually whole-genome). I searched, and the best answer I found is in this blog.
However I don't understand if I understood how PCR duplicates arise correctly because I cannot see the…
gc5
- 1,783
- 18
- 32
9
votes
2 answers
5'UTR and 3'UTR annotation in yeast
I am working on a project in which I need to compute several parameters (such GC content and length) of 5'UTR and 3'UTR sequences of Saccharomyces cerevisiae yeast genes.
The problem is finding a proper annotation for these regions in yeast. I have…
plat
- 1,032
- 5
- 15
9
votes
3 answers
What is deep sequencing?
People talk about deep sequencing. Is there any way to calculate how deep the sequencing is ? What should be the optimum depth to get reliable data ?
I am doing whole genome sequencing of a virus genome which is 10 kb long. I got 80000 reads from…
L R Joshi
- 719
- 3
- 11
9
votes
1 answer
Chimera Alignments
I have a structure with two subunits. I am trying to show movement of the C-terminal subunit upon ligand binding by superposition with another structure from the same strain in the apo form.
I want to superpose the N-terminal subunits (A), note the…
9
votes
4 answers
Ultimate reproducibility in R?
I'm looking for a convenient and reliable way to make an R analysis reproducible, either at different times or across collaborators.
Listing the package versions or a sessionInfo() output is not very helpful, as it leaves the work of re-creating the…
Peter
- 2,634
- 15
- 33
9
votes
0 answers
Run cuffcompare in strand-agnostic mode
Is there a way to run Cufflinks' cuffcompare in a strand-agnostic mode?
I would like to do this because I have some RNA-seq datasets derived from an unstranded run, that should be compared to a reference transcriptome. I think that it makes no…
aechchiki
- 2,676
- 11
- 34
9
votes
1 answer
How to count reads in bam per bed interval with bedtools
I recently installed Ubuntu 16.04 (because I was still using 12.04). But it seems my bedtools scripts don't work properly anymore. I can't figure out how to use the new bedtools for my old ways. What I want to do is get the number of reads from a…
benn
- 3,571
- 9
- 28
9
votes
1 answer
FASTQC overrepresented sequences after trimming
I have a set of RNA-seq samples from different experiments (Single and Paired End, depending on the experiment). I ran FASTQC in all the samples and found overrepresented adapter sequences:
I removed the adaptors (TruSeq adaptors) using Cutadapt (in…
plat
- 1,032
- 5
- 15
9
votes
2 answers
Solutions for managing data in a small bioinformatics / 'omics lab?
A different sort of problem: even a small 'omics lab generates a lot of data, raw, intermediate and processed. What (software) solutions exist for managing this data, such that "old" data can be retrieved and checked or re-analysed, even after…
agapow
- 788
- 3
- 11
9
votes
3 answers
How to convert GFF3 to GTF2
I would like to convert a file in gff3 format to a gtf2.2 format.
The reason why I would like to do this is: I have a set of transcripts assembled by a bunch of different software (and using RNA-seq data from different sequencing technologies) and I…
aechchiki
- 2,676
- 11
- 34
9
votes
1 answer
Expected allele frequency distribution of SNVs in real NGS data
I have a huge amount of ~20x human WGS samples, aligned, and all SNVs that were called with GATK under standard germline parameters set.
What I need to do is to model SNVs Allele Frequency (AF) for different underlying Copy Numbers. I'd better…
German Demidov
- 373
- 1
- 2
- 9
9
votes
1 answer
How to retrieve logical expressions (KO based) for reactions from KEGG?
The completeness of a module can easily be checked by looking at the Definition entry associated with the module. For example, in module M00010, it is given as:
Definition K01647 (K01681,K01682) (K00031,K00030)
which can be translated to:
K01647…
Cleb
- 743
- 7
- 18
9
votes
1 answer
How are MACS2's narrow peak and broad peak algorithms different?
The peak calling tool MACS2 can call peaks in either narrow peak mode (for focused signals like transcription factor ChIPseq) or broad peak mode (for more defuse signals, like certain histone modifications).
The algorithm for narrow peak calling is…
Ian Sudbery
- 3,311
- 1
- 11
- 21
9
votes
2 answers
How to import a large amount of .bed, .gff, .vcf, .paf, .sam files into an SQL database?
Are there best practices to load different bioinformatics file formats such as VCF, BED, GFF, and SAM to SQL databases? I am wondering how people out there do that efficiently.
All of these three formats are tab-separated files, so basically the…
0x90
- 1,437
- 9
- 18