Highest Voted Questions - Bioinformatics Stack Exchange

4

votes

2 answers

Targeted NGS, up to 99% of reads have been marked as duplicates

Currently I'm performing whole analysis (pipeline from *.fastq to *.vcf) of 41 samples (targeted NGS). I rely on GATK best practices, however with some modifications. I decided to use the following tools: #mapping bwa mem (with mem - alternate…

asked Feb 14 '19 at 05:55

Adamm

206
2
11

4

votes

1 answer

Preparing binary matrix data for Scikit classification algorithms

I made this post in regular stack overflow but I was told about this awesome feature by @nbryans. I am a researcher (my programming knowledge is small) conducting analysis on a set of antibiotic (methicillin) resistant and a set of antibiotic…

asked Jun 13 '17 at 15:38

Daniel Harris

303
2
7

4

votes

1 answer

How to quantify similarity of genomes and find differences in set of S aureus genomes?

I have around 500 annotated proteomes of different bacterial strains and would like to quantify their similarity (or difference). I found gt genomediff from genometools gives me some scores that I can use to generate nice clusters, but I am not sure…

asked Feb 13 '19 at 03:09

Soerendip

1,295
11
22

4

votes

1 answer

Error in as.vector(x) : no method for coercing this S4 class to a vector

I tried to run the following code in R studio. Everything worked fine, except at the last step [write.table(mdat, "recount_mdat.csv")] when I tried to export the 'mdat', I got the following error: Error in as.vector(x) : no method for coercing this…

asked Feb 12 '19 at 15:41

Priya

351
1
3
8

4

votes

2 answers

Installing DESeq2 in Ubuntu

I am trying to install DESeq2 in my Ubuntu with R version 3.5.1. According to the package repository in Bioconductor the version should be 3.5. > R.version platform x86_64-pc-linux-gnu arch x86_64 os …

asked Feb 10 '19 at 19:01

aerijman

645
5
14

4

votes

1 answer

Gene Ranking - signal to noise ratio used in GSEA-P algorithm?

I'm looking at Broad Institute's orignal GSEA-P algorithm R script which I downloaded here: http://software.broadinstitute.org/gsea/downloads.jsp. I'm trying to adapt their GSEA.1.0.R script to process datasets that have 1 gene expression profile as…

asked Jan 30 '19 at 21:40

lrthistlethwaite

141
3

4

votes

2 answers

Plotting coverage of annotation over collection of region

I'm trying to plot "meta" coverage of annotation: i.e. features (eg. gene class) over certain regions. It is similar to read coverage plots over gene body, except my input is two bed files (both in BED6 format) - (A) one containing the regions for…

asked Jan 28 '19 at 12:20

Siddharth

345
2
12

4

votes

4 answers

Database for proteome-wide predictions of protein structures

Accuracies of protein structure predictors have improved quite a lot in recent years. Algorithms such as Rosetta have gotten robust enough to predict structures of large number of proteins. However, I could't find any initiative to make a database…

asked Jan 24 '19 at 18:44

user345394

675
6
20

4

votes

1 answer

Prediction of prokaryotic origins of replication (ORI)

I want to predict origins of replication (ORI) on hundreds of prokaryotic genomes. The most straight-forward solution would be to use most commonly used tool, Ori-Finder. It uses integrated gene prediction, analysis of base composition asymmetry,…

asked Jan 17 '19 at 09:41

MrTomRod

191
1
4

4

votes

2 answers

"Sequence Duplication Levels" module still fails after pre-processing Illumina data

I want to ask about why the sequence duplication levels are high after I trimmed by using Trimmomatic? I am using the following Trimmomatic operations: HEADCROP = 19 TRAILING = 20 MINLEN = 66. How can i solve this problem? Thank You.

asked Jan 10 '19 at 14:35

yy97

43
1
3

4

votes

1 answer

Can blat use more than one core/CPU to speed up the alignment?

I am using BLAT to align two versions of the genome of C. elegans. I can see in the Activity Monitor of my Mac Book Pro High Sierra that blat is using 100% of a CPU. However, is this programme able to use more than one core / CPU to speed up the…

asked Jan 06 '19 at 22:02

Biomagician

2,459
16
30

4

votes

1 answer

Question on nanopore sequencing data process pipeline (cDNA-PCR)

I recently started doing the analysis on nanopore sequencing data. As I was searching for some help on pre-processing of the data, I found your nice setup pipeline created here:…

nanopore

asked Dec 26 '18 at 12:50

Jungwoo Lee

43
2

4

votes

2 answers

Why are there missing calls in a VCF file from exome sequencing?

My data is a VCF file generated from an exome sequencing variant call pipeline. I'm not very familiar with the sequencing and variant calling process. I noticed that there are some missing genotypes, which are recorded as "./." at the GT field. From…

asked Jun 13 '17 at 01:33

Yan

143
4

4

votes

1 answer

How to export web NCBI tBLASTn results in table format with many queries?

Context I'm an MSc student working on writing up my thesis (back home now) from my laptop and, therefore, unfortunately don't have access to a workstation/server capable of doing the tBLASTn search that I wanted to do. As a result I have been trying…

asked Dec 24 '18 at 01:52

user3883

41
1

4

votes

1 answer

What kind of "gff" format does bioawk parse?

I was wondering if I could use the gff parsing capability of bioawk to facilitate the parsing of gtf files, and I looked at the following help message: $ bioawk -c help bed: 1:chrom 2:start 3:end 4:name 5:score 6:strand 7:thickstart 8:thickend…

asked Jun 12 '17 at 15:30

bli

3,130
2
15
36

Most Popular