Most Popular
1500 questions
16
votes
3 answers
What are the advantages and disadvantages between using KEGG or Reactome?
As enrichment analysis a usual step is to infer the pathways enriched in a list of genes. However I can't find a discussion about which database is better. Two of the most popular (in my particular environment) are Reactome and KEGG (Maybe because…
llrs
- 4,693
- 1
- 18
- 42
16
votes
2 answers
Difference between CPM and TPM and which one for downstream analysis?
What the difference between TPM and CPM when dealing with RNA seq data?
What metrics would you use if you have to perform some down stream analysis other than Differential expression for eg.
Clustering analysis using Hclust function and then…
novicebioinforesearcher
- 771
- 1
- 6
- 15
16
votes
3 answers
Designing a lab NGS file database schema
I am the resident Bioinfo Geek in a hospital academic lab that routinely employs NGS as well as CyTOF and other large volume data producing technologies. I am sick of our current "protocol" for metadata collection and association with the final…
Gus
- 346
- 1
- 7
16
votes
2 answers
Merge hundreds of small BAM files into a single BAM file
I am working with over a million (long) reads, and aligning them to a large genome. I am considering running my alignment jobs in parallel, distributing horizontally across hundreds of nodes rather than trying to run a single job with dozens of…
Scott Gigante
- 2,133
- 1
- 13
- 32
15
votes
3 answers
How to obtain .bed file with coordinates of all genes
I want to get a .bed file with the genes' names and canonical coordinates, also I would like to have coordinates of exons, too. I can get the list from UCSC, however, if I choose UCSC Genes - knownCanonical, I can not extract coordinates of exons.…
German Demidov
- 373
- 1
- 2
- 9
15
votes
1 answer
What is the difference between a Bioinformatics pipeline and workflow?
I want to understand the difference between pipeline systems and workflow engines.
After reading A Review of Scalable Bioinformatics Pipelines I had a good overview of current bioinformatics pipelines. After some further research I found that there…
A.Dumas
- 497
- 3
- 9
15
votes
3 answers
Publicly available genome sequence database for viruses?
As a small introductory project, I want to compare genome sequences of different strains of influenza virus.
What are the publicly available databases of influenza virus gene/genome sequences?
AlwaysTrying44
- 435
- 2
- 9
15
votes
2 answers
Alignment based vs reference-free (transcriptome analysis)?
I want to focus on transcriptome analysis. We know it's possible to analyze RNA-Seq experiment based on alignment or k-mers.
Possible alignment workflow:
Align sequence reads with TopHat2
Quantify the gene expression with Cufflinks
Possible…
SmallChess
- 2,699
- 3
- 19
- 35
15
votes
5 answers
How do I carry out an ancestry/admixture test on a single VCF file?
This is a question from /u/beneficii9 on reddit. The original post can be found here.
Through the Personal Genome Project, I have had my whole genome sequenced by Veritas, and have it in the form of a single VCF file for the whole genome and one…
gringer
- 14,012
- 5
- 23
- 79
15
votes
1 answer
Why is bwa-mem the standard algorithm when using bwa?
The industry standard for aligning short reads seems to be bwa-mem. However, in my tests I have seen that using bwa backtrack (bwa-aln + bwa-sampe + bwa-samse) performs better. It is slightly slower, but gives significantly better results in terms…
terdon
- 10,071
- 5
- 22
- 48
15
votes
4 answers
How to subset samples from a VCF file?
I have VCF files (SNPs & indels) for WGS on 100 samples, but I want to only use a specific subset of 10 of the samples. Is there a relatively easy way to pull out only the 10 samples, while still keeping all of the data for the entire genome?
I have…
KLuc
- 171
- 1
- 1
- 5
15
votes
2 answers
Downloading a reference Genome for Bowtie2
How do I download a reference genome that I can use with bowtie2? Specifically HG19. On UCSC there are a lot of file options.
EMiller
- 483
- 1
- 4
- 11
15
votes
2 answers
Meaning of BWA-MEM MAPQ scores
Does anyone know what the MAPQ values produced by BWA-MEM mean?
I'm looking for something similar to what Keith Bradnam discovered for Tophat v 1.4.1, where he realized that:
0 = maps to 5 or more locations
1 = maps to 3-4 locations
3 = maps to
…
ijoseph
- 253
- 1
- 2
- 8
15
votes
5 answers
Good / recommended way to archive fastq and bam files?
We have a lot of Illumina sequenced exome data. Currently we are using spring for its great lossless compression, but we are looking if there is anything better (and most preferably opensource) which can let us compress our fastq files.
We also want…
Karthik Nair
- 331
- 1
- 7
15
votes
2 answers
How can I call structural variants (SVs) from pair-end short read resequencing data?
I have a reference genome and now I would like to call structural variants from Illumina pair-end whole genome resequencing data (insert size 700bp).
There are many tools for SV calls (I made an incomplete list of tools bellow). There is also a…
Kamil S Jaron
- 5,542
- 2
- 25
- 59